@p4u from vocdoni.io reported that the mempool might behave incorrectly under a
high load. The consequences can range from pauses between blocks to the peers
disconnecting from this node.
My current theory is that the flowrate lib we're using to control flow
(multiplex over a single TCP connection) was not designed w/ large blobs
(1MB batch of txs) in mind.
I've tried decreasing the Mempool reactor priority, but that did not
have any visible effect. What actually worked is adding a time.Sleep
into mempool.Reactor#broadcastTxRoutine after an each successful send ==
manual control flow of sort.
As a temporary remedy (until the mempool package
is refactored), the max-batch-bytes was disabled. Transactions will be sent
one by one without batching
Closes#5796
When set to true, an invalid transaction will be kept in the cache (this may help some applications to protect against spam).
NOTE: this is a temporary config option. The more correct solution would be to add a TTL to each transaction (i.e. CheckTx may return a TTL in ResponseCheckTx).
Closes: #5751
Before: scheduler receives psBlockProcessed event, but does not mark block as processed because peer timed out (or was removed for other reasons) and all associated blocks were rescheduled.
After: scheduler receives psBlockProcessed event and marks block as processed in any case (even if peer who provided this block errors).
Closes#5387
When a peer is stopped due to some network issue, the Reactor calls scheduler#handleRemovePeer, which removes the peer from the scheduler. BUT the peer stays in the processor, which sometimes could lead to "duplicate block enqueued by processor" panic WHEN the same block is requested by the scheduler again from a different peer. The solution is to return scPeerError, which will be propagated to the processor. The processor will clean up the blocks associated with the peer in purgePeer.
Closes#5513, #5517
Fixes#5439. This is really a workaround for #5519 (unless we require async implementations to return ordered responses, but that kind of defeats the purpose of having an async API).
## Description
Check block protocol version in header validate basic.
I tried searching for where we check the P2P protocol version but was unable to find it. When we check compatibility with a node we check we both have the same block protocol and are on the same network, but we do not check if we are on the same P2P protocol. It makes sense if there is a handshake change because we would not be able to establish a secure connection, but a p2p protocol version bump may be because of a p2p message change, which would go unnoticed until that message is sent over the wire. Is this purposeful?
Closes: #4790
On startup, the peer-to-peer stack may have peers connected before the state sync process begins, causing these to not trigger `AddPeer` events and thus not be used for snapshot discovery. Broadcasting a snapshot request to these explicitly makes sure we discover snapshots from existing peers as well.
* config: rename prof_laddr to pprof_laddr and move it to rpc
also, remove `/unsafe_start_cpu_profiler`, `/unsafe_stop_cpu_profiler`
and `/unsafe_write_heap_profile` in favor of pprof server functionality.
Closes#5303
* update changelog
* log start
State sync broke in #5231 since the genesis state is not propagated explicitly from `NewNode()` to `Node.OnStart()` and further into the state sync initialization. This is a hack until we can clean up the node startup process.