You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

382 lines
12 KiB

blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
add support for block pruning via ABCI Commit response (#4588) * Added BlockStore.DeleteBlock() * Added initial block pruner prototype * wip * Added BlockStore.PruneBlocks() * Added consensus setting for block pruning * Added BlockStore base * Error on replay if base does not have blocks * Handle missing blocks when sending VoteSetMaj23Message * Error message tweak * Properly update blockstore state * Error message fix again * blockchain: ignore peer missing blocks * Added FIXME * Added test for block replay with truncated history * Handle peer base in blockchain reactor * Improved replay error handling * Added tests for Store.PruneBlocks() * Fix non-RPC handling of truncated block history * Panic on missing block meta in needProofBlock() * Updated changelog * Handle truncated block history in RPC layer * Added info about earliest block in /status RPC * Reorder height and base in blockchain reactor messages * Updated changelog * Fix tests * Appease linter * Minor review fixes * Non-empty BlockStores should always have base > 0 * Update code to assume base > 0 invariant * Added blockstore tests for pruning to 0 * Make sure we don't prune below the current base * Added BlockStore.Size() * config: added retain_blocks recommendations * Update v1 blockchain reactor to handle blockstore base * Added state database pruning * Propagate errors on missing validator sets * Comment tweaks * Improved error message Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * use ABCI field ResponseCommit.retain_height instead of retain-blocks config option * remove State.RetainHeight, return value instead * fix minor issues * rename pruneHeights() to pruneBlocks() * noop to fix GitHub borkage Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
add support for block pruning via ABCI Commit response (#4588) * Added BlockStore.DeleteBlock() * Added initial block pruner prototype * wip * Added BlockStore.PruneBlocks() * Added consensus setting for block pruning * Added BlockStore base * Error on replay if base does not have blocks * Handle missing blocks when sending VoteSetMaj23Message * Error message tweak * Properly update blockstore state * Error message fix again * blockchain: ignore peer missing blocks * Added FIXME * Added test for block replay with truncated history * Handle peer base in blockchain reactor * Improved replay error handling * Added tests for Store.PruneBlocks() * Fix non-RPC handling of truncated block history * Panic on missing block meta in needProofBlock() * Updated changelog * Handle truncated block history in RPC layer * Added info about earliest block in /status RPC * Reorder height and base in blockchain reactor messages * Updated changelog * Fix tests * Appease linter * Minor review fixes * Non-empty BlockStores should always have base > 0 * Update code to assume base > 0 invariant * Added blockstore tests for pruning to 0 * Make sure we don't prune below the current base * Added BlockStore.Size() * config: added retain_blocks recommendations * Update v1 blockchain reactor to handle blockstore base * Added state database pruning * Propagate errors on missing validator sets * Comment tweaks * Improved error message Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * use ABCI field ResponseCommit.retain_height instead of retain-blocks config option * remove State.RetainHeight, return value instead * fix minor issues * rename pruneHeights() to pruneBlocks() * noop to fix GitHub borkage Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
add support for block pruning via ABCI Commit response (#4588) * Added BlockStore.DeleteBlock() * Added initial block pruner prototype * wip * Added BlockStore.PruneBlocks() * Added consensus setting for block pruning * Added BlockStore base * Error on replay if base does not have blocks * Handle missing blocks when sending VoteSetMaj23Message * Error message tweak * Properly update blockstore state * Error message fix again * blockchain: ignore peer missing blocks * Added FIXME * Added test for block replay with truncated history * Handle peer base in blockchain reactor * Improved replay error handling * Added tests for Store.PruneBlocks() * Fix non-RPC handling of truncated block history * Panic on missing block meta in needProofBlock() * Updated changelog * Handle truncated block history in RPC layer * Added info about earliest block in /status RPC * Reorder height and base in blockchain reactor messages * Updated changelog * Fix tests * Appease linter * Minor review fixes * Non-empty BlockStores should always have base > 0 * Update code to assume base > 0 invariant * Added blockstore tests for pruning to 0 * Make sure we don't prune below the current base * Added BlockStore.Size() * config: added retain_blocks recommendations * Update v1 blockchain reactor to handle blockstore base * Added state database pruning * Propagate errors on missing validator sets * Comment tweaks * Improved error message Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * use ABCI field ResponseCommit.retain_height instead of retain-blocks config option * remove State.RetainHeight, return value instead * fix minor issues * rename pruneHeights() to pruneBlocks() * noop to fix GitHub borkage Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
add support for block pruning via ABCI Commit response (#4588) * Added BlockStore.DeleteBlock() * Added initial block pruner prototype * wip * Added BlockStore.PruneBlocks() * Added consensus setting for block pruning * Added BlockStore base * Error on replay if base does not have blocks * Handle missing blocks when sending VoteSetMaj23Message * Error message tweak * Properly update blockstore state * Error message fix again * blockchain: ignore peer missing blocks * Added FIXME * Added test for block replay with truncated history * Handle peer base in blockchain reactor * Improved replay error handling * Added tests for Store.PruneBlocks() * Fix non-RPC handling of truncated block history * Panic on missing block meta in needProofBlock() * Updated changelog * Handle truncated block history in RPC layer * Added info about earliest block in /status RPC * Reorder height and base in blockchain reactor messages * Updated changelog * Fix tests * Appease linter * Minor review fixes * Non-empty BlockStores should always have base > 0 * Update code to assume base > 0 invariant * Added blockstore tests for pruning to 0 * Make sure we don't prune below the current base * Added BlockStore.Size() * config: added retain_blocks recommendations * Update v1 blockchain reactor to handle blockstore base * Added state database pruning * Propagate errors on missing validator sets * Comment tweaks * Improved error message Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * use ABCI field ResponseCommit.retain_height instead of retain-blocks config option * remove State.RetainHeight, return value instead * fix minor issues * rename pruneHeights() to pruneBlocks() * noop to fix GitHub borkage Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
add support for block pruning via ABCI Commit response (#4588) * Added BlockStore.DeleteBlock() * Added initial block pruner prototype * wip * Added BlockStore.PruneBlocks() * Added consensus setting for block pruning * Added BlockStore base * Error on replay if base does not have blocks * Handle missing blocks when sending VoteSetMaj23Message * Error message tweak * Properly update blockstore state * Error message fix again * blockchain: ignore peer missing blocks * Added FIXME * Added test for block replay with truncated history * Handle peer base in blockchain reactor * Improved replay error handling * Added tests for Store.PruneBlocks() * Fix non-RPC handling of truncated block history * Panic on missing block meta in needProofBlock() * Updated changelog * Handle truncated block history in RPC layer * Added info about earliest block in /status RPC * Reorder height and base in blockchain reactor messages * Updated changelog * Fix tests * Appease linter * Minor review fixes * Non-empty BlockStores should always have base > 0 * Update code to assume base > 0 invariant * Added blockstore tests for pruning to 0 * Make sure we don't prune below the current base * Added BlockStore.Size() * config: added retain_blocks recommendations * Update v1 blockchain reactor to handle blockstore base * Added state database pruning * Propagate errors on missing validator sets * Comment tweaks * Improved error message Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * use ABCI field ResponseCommit.retain_height instead of retain-blocks config option * remove State.RetainHeight, return value instead * fix minor issues * rename pruneHeights() to pruneBlocks() * noop to fix GitHub borkage Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
  1. package v1
  2. import (
  3. "sort"
  4. "github.com/tendermint/tendermint/libs/log"
  5. "github.com/tendermint/tendermint/p2p"
  6. "github.com/tendermint/tendermint/types"
  7. )
  8. // BlockPool keeps track of the fast sync peers, block requests and block responses.
  9. type BlockPool struct {
  10. logger log.Logger
  11. // Set of peers that have sent status responses, with height bigger than pool.Height
  12. peers map[p2p.ID]*BpPeer
  13. // Set of block heights and the corresponding peers from where a block response is expected or has been received.
  14. blocks map[int64]p2p.ID
  15. plannedRequests map[int64]struct{} // list of blocks to be assigned peers for blockRequest
  16. nextRequestHeight int64 // next height to be added to plannedRequests
  17. Height int64 // height of next block to execute
  18. MaxPeerHeight int64 // maximum height of all peers
  19. toBcR bcReactor
  20. }
  21. // NewBlockPool creates a new BlockPool.
  22. func NewBlockPool(height int64, toBcR bcReactor) *BlockPool {
  23. return &BlockPool{
  24. Height: height,
  25. MaxPeerHeight: 0,
  26. peers: make(map[p2p.ID]*BpPeer),
  27. blocks: make(map[int64]p2p.ID),
  28. plannedRequests: make(map[int64]struct{}),
  29. nextRequestHeight: height,
  30. toBcR: toBcR,
  31. }
  32. }
  33. // SetLogger sets the logger of the pool.
  34. func (pool *BlockPool) SetLogger(l log.Logger) {
  35. pool.logger = l
  36. }
  37. // ReachedMaxHeight check if the pool has reached the maximum peer height.
  38. func (pool *BlockPool) ReachedMaxHeight() bool {
  39. return pool.Height >= pool.MaxPeerHeight
  40. }
  41. func (pool *BlockPool) rescheduleRequest(peerID p2p.ID, height int64) {
  42. pool.logger.Info("reschedule requests made to peer for height ", "peerID", peerID, "height", height)
  43. pool.plannedRequests[height] = struct{}{}
  44. delete(pool.blocks, height)
  45. pool.peers[peerID].RemoveBlock(height)
  46. }
  47. // Updates the pool's max height. If no peers are left MaxPeerHeight is set to 0.
  48. func (pool *BlockPool) updateMaxPeerHeight() {
  49. var newMax int64
  50. for _, peer := range pool.peers {
  51. peerHeight := peer.Height
  52. if peerHeight > newMax {
  53. newMax = peerHeight
  54. }
  55. }
  56. pool.MaxPeerHeight = newMax
  57. }
  58. // UpdatePeer adds a new peer or updates an existing peer with a new base and height.
  59. // If a peer is short it is not added.
  60. func (pool *BlockPool) UpdatePeer(peerID p2p.ID, base int64, height int64) error {
  61. peer := pool.peers[peerID]
  62. if peer == nil {
  63. if height < pool.Height {
  64. pool.logger.Info("Peer height too small",
  65. "peer", peerID, "height", height, "fsm_height", pool.Height)
  66. return errPeerTooShort
  67. }
  68. // Add new peer.
  69. peer = NewBpPeer(peerID, base, height, pool.toBcR.sendPeerError, nil)
  70. peer.SetLogger(pool.logger.With("peer", peerID))
  71. pool.peers[peerID] = peer
  72. pool.logger.Info("added peer", "peerID", peerID, "base", base, "height", height, "num_peers", len(pool.peers))
  73. } else {
  74. // Check if peer is lowering its height. This is not allowed.
  75. if height < peer.Height {
  76. pool.RemovePeer(peerID, errPeerLowersItsHeight)
  77. return errPeerLowersItsHeight
  78. }
  79. // Update existing peer.
  80. peer.Base = base
  81. peer.Height = height
  82. }
  83. // Update the pool's MaxPeerHeight if needed.
  84. pool.updateMaxPeerHeight()
  85. return nil
  86. }
  87. // SetNoBlock records that the peer does not have a block for height and
  88. // schedules a new request for that height from another peer.
  89. func (pool *BlockPool) SetNoBlock(peerID p2p.ID, height int64) {
  90. peer := pool.peers[peerID]
  91. if peer == nil {
  92. return
  93. }
  94. peer.SetNoBlock(height)
  95. pool.rescheduleRequest(peerID, height)
  96. }
  97. // Cleans and deletes the peer. Recomputes the max peer height.
  98. func (pool *BlockPool) deletePeer(peer *BpPeer) {
  99. if peer == nil {
  100. return
  101. }
  102. peer.Cleanup()
  103. delete(pool.peers, peer.ID)
  104. if peer.Height == pool.MaxPeerHeight {
  105. pool.updateMaxPeerHeight()
  106. }
  107. }
  108. // RemovePeer removes the blocks and requests from the peer, reschedules them and deletes the peer.
  109. func (pool *BlockPool) RemovePeer(peerID p2p.ID, err error) {
  110. peer := pool.peers[peerID]
  111. if peer == nil {
  112. return
  113. }
  114. pool.logger.Info("removing peer", "peerID", peerID, "error", err)
  115. // Reschedule the block requests made to the peer, or received and not processed yet.
  116. // Note that some of the requests may be removed further down.
  117. for h := range pool.peers[peerID].blocks {
  118. pool.rescheduleRequest(peerID, h)
  119. }
  120. oldMaxPeerHeight := pool.MaxPeerHeight
  121. // Delete the peer. This operation may result in the pool's MaxPeerHeight being lowered.
  122. pool.deletePeer(peer)
  123. // Check if the pool's MaxPeerHeight has been lowered.
  124. // This may happen if the tallest peer has been removed.
  125. if oldMaxPeerHeight > pool.MaxPeerHeight {
  126. // Remove any planned requests for heights over the new MaxPeerHeight.
  127. for h := range pool.plannedRequests {
  128. if h > pool.MaxPeerHeight {
  129. delete(pool.plannedRequests, h)
  130. }
  131. }
  132. // Adjust the nextRequestHeight to the new max plus one.
  133. if pool.nextRequestHeight > pool.MaxPeerHeight {
  134. pool.nextRequestHeight = pool.MaxPeerHeight + 1
  135. }
  136. }
  137. }
  138. func (pool *BlockPool) removeShortPeers() {
  139. for _, peer := range pool.peers {
  140. if peer.Height < pool.Height {
  141. pool.RemovePeer(peer.ID, nil)
  142. }
  143. }
  144. }
  145. func (pool *BlockPool) removeBadPeers() {
  146. pool.removeShortPeers()
  147. for _, peer := range pool.peers {
  148. if err := peer.CheckRate(); err != nil {
  149. pool.RemovePeer(peer.ID, err)
  150. pool.toBcR.sendPeerError(err, peer.ID)
  151. }
  152. }
  153. }
  154. // MakeNextRequests creates more requests if the block pool is running low.
  155. func (pool *BlockPool) MakeNextRequests(maxNumRequests int) {
  156. heights := pool.makeRequestBatch(maxNumRequests)
  157. if len(heights) != 0 {
  158. pool.logger.Info("makeNextRequests will make following requests",
  159. "number", len(heights), "heights", heights)
  160. }
  161. for _, height := range heights {
  162. h := int64(height)
  163. if !pool.sendRequest(h) {
  164. // If a good peer was not found for sending the request at height h then return,
  165. // as it shouldn't be possible to find a peer for h+1.
  166. return
  167. }
  168. delete(pool.plannedRequests, h)
  169. }
  170. }
  171. // Makes a batch of requests sorted by height such that the block pool has up to maxNumRequests entries.
  172. func (pool *BlockPool) makeRequestBatch(maxNumRequests int) []int {
  173. pool.removeBadPeers()
  174. // At this point pool.requests may include heights for requests to be redone due to removal of peers:
  175. // - peers timed out or were removed by switch
  176. // - FSM timed out on waiting to advance the block execution due to missing blocks at h or h+1
  177. // Determine the number of requests needed by subtracting the number of requests already made from the maximum
  178. // allowed
  179. numNeeded := maxNumRequests - len(pool.blocks)
  180. for len(pool.plannedRequests) < numNeeded {
  181. if pool.nextRequestHeight > pool.MaxPeerHeight {
  182. break
  183. }
  184. pool.plannedRequests[pool.nextRequestHeight] = struct{}{}
  185. pool.nextRequestHeight++
  186. }
  187. heights := make([]int, 0, len(pool.plannedRequests))
  188. for k := range pool.plannedRequests {
  189. heights = append(heights, int(k))
  190. }
  191. sort.Ints(heights)
  192. return heights
  193. }
  194. func (pool *BlockPool) sendRequest(height int64) bool {
  195. for _, peer := range pool.peers {
  196. if peer.NumPendingBlockRequests >= maxRequestsPerPeer {
  197. continue
  198. }
  199. if peer.Base > height || peer.Height < height || peer.NoBlock(height) {
  200. continue
  201. }
  202. err := pool.toBcR.sendBlockRequest(peer.ID, height)
  203. if err == errNilPeerForBlockRequest {
  204. // Switch does not have this peer, remove it and continue to look for another peer.
  205. pool.logger.Error("switch does not have peer..removing peer selected for height", "peer",
  206. peer.ID, "height", height)
  207. pool.RemovePeer(peer.ID, err)
  208. continue
  209. }
  210. if err == errSendQueueFull {
  211. pool.logger.Error("peer queue is full", "peer", peer.ID, "height", height)
  212. continue
  213. }
  214. pool.logger.Info("assigned request to peer", "peer", peer.ID, "height", height)
  215. pool.blocks[height] = peer.ID
  216. peer.RequestSent(height)
  217. return true
  218. }
  219. pool.logger.Error("could not find peer to send request for block at height", "height", height)
  220. return false
  221. }
  222. // AddBlock validates that the block comes from the peer it was expected from and stores it in the 'blocks' map.
  223. func (pool *BlockPool) AddBlock(peerID p2p.ID, block *types.Block, blockSize int) error {
  224. peer, ok := pool.peers[peerID]
  225. if !ok {
  226. pool.logger.Error("block from unknown peer", "height", block.Height, "peer", peerID)
  227. return errBadDataFromPeer
  228. }
  229. if wantPeerID, ok := pool.blocks[block.Height]; ok && wantPeerID != peerID {
  230. pool.logger.Error("block received from wrong peer", "height", block.Height,
  231. "peer", peerID, "expected_peer", wantPeerID)
  232. return errBadDataFromPeer
  233. }
  234. return peer.AddBlock(block, blockSize)
  235. }
  236. // BlockData stores the peer responsible to deliver a block and the actual block if delivered.
  237. type BlockData struct {
  238. block *types.Block
  239. peer *BpPeer
  240. }
  241. // BlockAndPeerAtHeight retrieves the block and delivery peer at specified height.
  242. // Returns errMissingBlock if a block was not found
  243. func (pool *BlockPool) BlockAndPeerAtHeight(height int64) (bData *BlockData, err error) {
  244. peerID := pool.blocks[height]
  245. peer := pool.peers[peerID]
  246. if peer == nil {
  247. return nil, errMissingBlock
  248. }
  249. block, err := peer.BlockAtHeight(height)
  250. if err != nil {
  251. return nil, err
  252. }
  253. return &BlockData{peer: peer, block: block}, nil
  254. }
  255. // FirstTwoBlocksAndPeers returns the blocks and the delivery peers at pool's height H and H+1.
  256. func (pool *BlockPool) FirstTwoBlocksAndPeers() (first, second *BlockData, err error) {
  257. first, err = pool.BlockAndPeerAtHeight(pool.Height)
  258. second, err2 := pool.BlockAndPeerAtHeight(pool.Height + 1)
  259. if err == nil {
  260. err = err2
  261. }
  262. return
  263. }
  264. // InvalidateFirstTwoBlocks removes the peers that sent us the first two blocks, blocks are removed by RemovePeer().
  265. func (pool *BlockPool) InvalidateFirstTwoBlocks(err error) {
  266. first, err1 := pool.BlockAndPeerAtHeight(pool.Height)
  267. second, err2 := pool.BlockAndPeerAtHeight(pool.Height + 1)
  268. if err1 == nil {
  269. pool.RemovePeer(first.peer.ID, err)
  270. }
  271. if err2 == nil {
  272. pool.RemovePeer(second.peer.ID, err)
  273. }
  274. }
  275. // ProcessedCurrentHeightBlock performs cleanup after a block is processed. It removes block at pool height and
  276. // the peers that are now short.
  277. func (pool *BlockPool) ProcessedCurrentHeightBlock() {
  278. peerID, peerOk := pool.blocks[pool.Height]
  279. if peerOk {
  280. pool.peers[peerID].RemoveBlock(pool.Height)
  281. }
  282. delete(pool.blocks, pool.Height)
  283. pool.logger.Debug("removed block at height", "height", pool.Height)
  284. pool.Height++
  285. pool.removeShortPeers()
  286. }
  287. // RemovePeerAtCurrentHeights checks if a block at pool's height H exists and if not, it removes the
  288. // delivery peer and returns. If a block at height H exists then the check and peer removal is done for H+1.
  289. // This function is called when the FSM is not able to make progress for some time.
  290. // This happens if either the block H or H+1 have not been delivered.
  291. func (pool *BlockPool) RemovePeerAtCurrentHeights(err error) {
  292. peerID := pool.blocks[pool.Height]
  293. peer, ok := pool.peers[peerID]
  294. if ok {
  295. if _, err := peer.BlockAtHeight(pool.Height); err != nil {
  296. pool.logger.Info("remove peer that hasn't sent block at pool.Height",
  297. "peer", peerID, "height", pool.Height)
  298. pool.RemovePeer(peerID, err)
  299. return
  300. }
  301. }
  302. peerID = pool.blocks[pool.Height+1]
  303. peer, ok = pool.peers[peerID]
  304. if ok {
  305. if _, err := peer.BlockAtHeight(pool.Height + 1); err != nil {
  306. pool.logger.Info("remove peer that hasn't sent block at pool.Height+1",
  307. "peer", peerID, "height", pool.Height+1)
  308. pool.RemovePeer(peerID, err)
  309. return
  310. }
  311. }
  312. }
  313. // Cleanup performs pool and peer cleanup
  314. func (pool *BlockPool) Cleanup() {
  315. for id, peer := range pool.peers {
  316. peer.Cleanup()
  317. delete(pool.peers, id)
  318. }
  319. pool.plannedRequests = make(map[int64]struct{})
  320. pool.blocks = make(map[int64]p2p.ID)
  321. pool.nextRequestHeight = 0
  322. pool.Height = 0
  323. pool.MaxPeerHeight = 0
  324. }
  325. // NumPeers returns the number of peers in the pool
  326. func (pool *BlockPool) NumPeers() int {
  327. return len(pool.peers)
  328. }
  329. // NeedsBlocks returns true if more blocks are required.
  330. func (pool *BlockPool) NeedsBlocks() bool {
  331. return len(pool.blocks) < maxNumRequests
  332. }