You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

462 lines
14 KiB

blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
add support for block pruning via ABCI Commit response (#4588) * Added BlockStore.DeleteBlock() * Added initial block pruner prototype * wip * Added BlockStore.PruneBlocks() * Added consensus setting for block pruning * Added BlockStore base * Error on replay if base does not have blocks * Handle missing blocks when sending VoteSetMaj23Message * Error message tweak * Properly update blockstore state * Error message fix again * blockchain: ignore peer missing blocks * Added FIXME * Added test for block replay with truncated history * Handle peer base in blockchain reactor * Improved replay error handling * Added tests for Store.PruneBlocks() * Fix non-RPC handling of truncated block history * Panic on missing block meta in needProofBlock() * Updated changelog * Handle truncated block history in RPC layer * Added info about earliest block in /status RPC * Reorder height and base in blockchain reactor messages * Updated changelog * Fix tests * Appease linter * Minor review fixes * Non-empty BlockStores should always have base > 0 * Update code to assume base > 0 invariant * Added blockstore tests for pruning to 0 * Make sure we don't prune below the current base * Added BlockStore.Size() * config: added retain_blocks recommendations * Update v1 blockchain reactor to handle blockstore base * Added state database pruning * Propagate errors on missing validator sets * Comment tweaks * Improved error message Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * use ABCI field ResponseCommit.retain_height instead of retain-blocks config option * remove State.RetainHeight, return value instead * fix minor issues * rename pruneHeights() to pruneBlocks() * noop to fix GitHub borkage Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
add support for block pruning via ABCI Commit response (#4588) * Added BlockStore.DeleteBlock() * Added initial block pruner prototype * wip * Added BlockStore.PruneBlocks() * Added consensus setting for block pruning * Added BlockStore base * Error on replay if base does not have blocks * Handle missing blocks when sending VoteSetMaj23Message * Error message tweak * Properly update blockstore state * Error message fix again * blockchain: ignore peer missing blocks * Added FIXME * Added test for block replay with truncated history * Handle peer base in blockchain reactor * Improved replay error handling * Added tests for Store.PruneBlocks() * Fix non-RPC handling of truncated block history * Panic on missing block meta in needProofBlock() * Updated changelog * Handle truncated block history in RPC layer * Added info about earliest block in /status RPC * Reorder height and base in blockchain reactor messages * Updated changelog * Fix tests * Appease linter * Minor review fixes * Non-empty BlockStores should always have base > 0 * Update code to assume base > 0 invariant * Added blockstore tests for pruning to 0 * Make sure we don't prune below the current base * Added BlockStore.Size() * config: added retain_blocks recommendations * Update v1 blockchain reactor to handle blockstore base * Added state database pruning * Propagate errors on missing validator sets * Comment tweaks * Improved error message Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * use ABCI field ResponseCommit.retain_height instead of retain-blocks config option * remove State.RetainHeight, return value instead * fix minor issues * rename pruneHeights() to pruneBlocks() * noop to fix GitHub borkage Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
add support for block pruning via ABCI Commit response (#4588) * Added BlockStore.DeleteBlock() * Added initial block pruner prototype * wip * Added BlockStore.PruneBlocks() * Added consensus setting for block pruning * Added BlockStore base * Error on replay if base does not have blocks * Handle missing blocks when sending VoteSetMaj23Message * Error message tweak * Properly update blockstore state * Error message fix again * blockchain: ignore peer missing blocks * Added FIXME * Added test for block replay with truncated history * Handle peer base in blockchain reactor * Improved replay error handling * Added tests for Store.PruneBlocks() * Fix non-RPC handling of truncated block history * Panic on missing block meta in needProofBlock() * Updated changelog * Handle truncated block history in RPC layer * Added info about earliest block in /status RPC * Reorder height and base in blockchain reactor messages * Updated changelog * Fix tests * Appease linter * Minor review fixes * Non-empty BlockStores should always have base > 0 * Update code to assume base > 0 invariant * Added blockstore tests for pruning to 0 * Make sure we don't prune below the current base * Added BlockStore.Size() * config: added retain_blocks recommendations * Update v1 blockchain reactor to handle blockstore base * Added state database pruning * Propagate errors on missing validator sets * Comment tweaks * Improved error message Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * use ABCI field ResponseCommit.retain_height instead of retain-blocks config option * remove State.RetainHeight, return value instead * fix minor issues * rename pruneHeights() to pruneBlocks() * noop to fix GitHub borkage Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
add support for block pruning via ABCI Commit response (#4588) * Added BlockStore.DeleteBlock() * Added initial block pruner prototype * wip * Added BlockStore.PruneBlocks() * Added consensus setting for block pruning * Added BlockStore base * Error on replay if base does not have blocks * Handle missing blocks when sending VoteSetMaj23Message * Error message tweak * Properly update blockstore state * Error message fix again * blockchain: ignore peer missing blocks * Added FIXME * Added test for block replay with truncated history * Handle peer base in blockchain reactor * Improved replay error handling * Added tests for Store.PruneBlocks() * Fix non-RPC handling of truncated block history * Panic on missing block meta in needProofBlock() * Updated changelog * Handle truncated block history in RPC layer * Added info about earliest block in /status RPC * Reorder height and base in blockchain reactor messages * Updated changelog * Fix tests * Appease linter * Minor review fixes * Non-empty BlockStores should always have base > 0 * Update code to assume base > 0 invariant * Added blockstore tests for pruning to 0 * Make sure we don't prune below the current base * Added BlockStore.Size() * config: added retain_blocks recommendations * Update v1 blockchain reactor to handle blockstore base * Added state database pruning * Propagate errors on missing validator sets * Comment tweaks * Improved error message Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * use ABCI field ResponseCommit.retain_height instead of retain-blocks config option * remove State.RetainHeight, return value instead * fix minor issues * rename pruneHeights() to pruneBlocks() * noop to fix GitHub borkage Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
  1. package v1
  2. import (
  3. "errors"
  4. "fmt"
  5. "sync"
  6. "time"
  7. "github.com/tendermint/tendermint/libs/log"
  8. "github.com/tendermint/tendermint/p2p"
  9. "github.com/tendermint/tendermint/types"
  10. )
  11. // Blockchain Reactor State
  12. type bcReactorFSMState struct {
  13. name string
  14. // called when transitioning out of current state
  15. handle func(*BcReactorFSM, bReactorEvent, bReactorEventData) (next *bcReactorFSMState, err error)
  16. // called when entering the state
  17. enter func(fsm *BcReactorFSM)
  18. // timeout to ensure FSM is not stuck in a state forever
  19. // the timer is owned and run by the fsm instance
  20. timeout time.Duration
  21. }
  22. func (s *bcReactorFSMState) String() string {
  23. return s.name
  24. }
  25. // BcReactorFSM is the datastructure for the Blockchain Reactor State Machine
  26. type BcReactorFSM struct {
  27. logger log.Logger
  28. mtx sync.Mutex
  29. startTime time.Time
  30. state *bcReactorFSMState
  31. stateTimer *time.Timer
  32. pool *BlockPool
  33. // interface used to call the Blockchain reactor to send StatusRequest, BlockRequest, reporting errors, etc.
  34. toBcR bcReactor
  35. }
  36. // NewFSM creates a new reactor FSM.
  37. func NewFSM(height int64, toBcR bcReactor) *BcReactorFSM {
  38. return &BcReactorFSM{
  39. state: unknown,
  40. startTime: time.Now(),
  41. pool: NewBlockPool(height, toBcR),
  42. toBcR: toBcR,
  43. }
  44. }
  45. // bReactorEventData is part of the message sent by the reactor to the FSM and used by the state handlers.
  46. type bReactorEventData struct {
  47. peerID p2p.ID
  48. err error // for peer error: timeout, slow; for processed block event if error occurred
  49. base int64 // for status response
  50. height int64 // for status response; for processed block event
  51. block *types.Block // for block response
  52. stateName string // for state timeout events
  53. length int // for block response event, length of received block, used to detect slow peers
  54. maxNumRequests int // for request needed event, maximum number of pending requests
  55. }
  56. // Blockchain Reactor Events (the input to the state machine)
  57. type bReactorEvent uint
  58. const (
  59. // message type events
  60. startFSMEv = iota + 1
  61. statusResponseEv
  62. blockResponseEv
  63. noBlockResponseEv
  64. processedBlockEv
  65. makeRequestsEv
  66. stopFSMEv
  67. // other events
  68. peerRemoveEv = iota + 256
  69. stateTimeoutEv
  70. )
  71. func (msg *bcReactorMessage) String() string {
  72. var dataStr string
  73. switch msg.event {
  74. case startFSMEv:
  75. dataStr = ""
  76. case statusResponseEv:
  77. dataStr = fmt.Sprintf("peer=%v base=%v height=%v", msg.data.peerID, msg.data.base, msg.data.height)
  78. case blockResponseEv:
  79. dataStr = fmt.Sprintf("peer=%v block.height=%v length=%v",
  80. msg.data.peerID, msg.data.block.Height, msg.data.length)
  81. case noBlockResponseEv:
  82. dataStr = fmt.Sprintf("peer=%v requested height=%v",
  83. msg.data.peerID, msg.data.height)
  84. case processedBlockEv:
  85. dataStr = fmt.Sprintf("error=%v", msg.data.err)
  86. case makeRequestsEv:
  87. dataStr = ""
  88. case stopFSMEv:
  89. dataStr = ""
  90. case peerRemoveEv:
  91. dataStr = fmt.Sprintf("peer: %v is being removed by the switch", msg.data.peerID)
  92. case stateTimeoutEv:
  93. dataStr = fmt.Sprintf("state=%v", msg.data.stateName)
  94. default:
  95. dataStr = "cannot interpret message data"
  96. }
  97. return fmt.Sprintf("%v: %v", msg.event, dataStr)
  98. }
  99. func (ev bReactorEvent) String() string {
  100. switch ev {
  101. case startFSMEv:
  102. return "startFSMEv"
  103. case statusResponseEv:
  104. return "statusResponseEv"
  105. case blockResponseEv:
  106. return "blockResponseEv"
  107. case noBlockResponseEv:
  108. return "noBlockResponseEv"
  109. case processedBlockEv:
  110. return "processedBlockEv"
  111. case makeRequestsEv:
  112. return "makeRequestsEv"
  113. case stopFSMEv:
  114. return "stopFSMEv"
  115. case peerRemoveEv:
  116. return "peerRemoveEv"
  117. case stateTimeoutEv:
  118. return "stateTimeoutEv"
  119. default:
  120. return "event unknown"
  121. }
  122. }
  123. // states
  124. var (
  125. unknown *bcReactorFSMState
  126. waitForPeer *bcReactorFSMState
  127. waitForBlock *bcReactorFSMState
  128. finished *bcReactorFSMState
  129. )
  130. // timeouts for state timers
  131. const (
  132. waitForPeerTimeout = 3 * time.Second
  133. waitForBlockAtCurrentHeightTimeout = 10 * time.Second
  134. )
  135. // errors
  136. var (
  137. // internal to the package
  138. errNoErrorFinished = errors.New("fast sync is finished")
  139. errInvalidEvent = errors.New("invalid event in current state")
  140. errMissingBlock = errors.New("missing blocks")
  141. errNilPeerForBlockRequest = errors.New("peer for block request does not exist in the switch")
  142. errSendQueueFull = errors.New("block request not made, send-queue is full")
  143. errPeerTooShort = errors.New("peer height too low, old peer removed/ new peer not added")
  144. errSwitchRemovesPeer = errors.New("switch is removing peer")
  145. errTimeoutEventWrongState = errors.New("timeout event for a state different than the current one")
  146. errNoTallerPeer = errors.New("fast sync timed out on waiting for a peer taller than this node")
  147. // reported eventually to the switch
  148. // handle return
  149. errPeerLowersItsHeight = errors.New("fast sync peer reports a height lower than previous")
  150. // handle return
  151. errNoPeerResponseForCurrentHeights = errors.New("fast sync timed out on peer block response for current heights")
  152. errNoPeerResponse = errors.New("fast sync timed out on peer block response") // xx
  153. errBadDataFromPeer = errors.New("fast sync received block from wrong peer or block is bad") // xx
  154. errDuplicateBlock = errors.New("fast sync received duplicate block from peer")
  155. errBlockVerificationFailure = errors.New("fast sync block verification failure") // xx
  156. errSlowPeer = errors.New("fast sync peer is not sending us data fast enough") // xx
  157. )
  158. func init() {
  159. unknown = &bcReactorFSMState{
  160. name: "unknown",
  161. handle: func(fsm *BcReactorFSM, ev bReactorEvent, data bReactorEventData) (*bcReactorFSMState, error) {
  162. switch ev {
  163. case startFSMEv:
  164. // Broadcast Status message. Currently doesn't return non-nil error.
  165. fsm.toBcR.sendStatusRequest()
  166. return waitForPeer, nil
  167. case stopFSMEv:
  168. return finished, errNoErrorFinished
  169. default:
  170. return unknown, errInvalidEvent
  171. }
  172. },
  173. }
  174. waitForPeer = &bcReactorFSMState{
  175. name: "waitForPeer",
  176. timeout: waitForPeerTimeout,
  177. enter: func(fsm *BcReactorFSM) {
  178. // Stop when leaving the state.
  179. fsm.resetStateTimer()
  180. },
  181. handle: func(fsm *BcReactorFSM, ev bReactorEvent, data bReactorEventData) (*bcReactorFSMState, error) {
  182. switch ev {
  183. case stateTimeoutEv:
  184. if data.stateName != "waitForPeer" {
  185. fsm.logger.Error("received a state timeout event for different state",
  186. "state", data.stateName)
  187. return waitForPeer, errTimeoutEventWrongState
  188. }
  189. // There was no statusResponse received from any peer.
  190. // Should we send status request again?
  191. return finished, errNoTallerPeer
  192. case statusResponseEv:
  193. if err := fsm.pool.UpdatePeer(data.peerID, data.base, data.height); err != nil {
  194. if fsm.pool.NumPeers() == 0 {
  195. return waitForPeer, err
  196. }
  197. }
  198. if fsm.stateTimer != nil {
  199. fsm.stateTimer.Stop()
  200. }
  201. return waitForBlock, nil
  202. case stopFSMEv:
  203. if fsm.stateTimer != nil {
  204. fsm.stateTimer.Stop()
  205. }
  206. return finished, errNoErrorFinished
  207. default:
  208. return waitForPeer, errInvalidEvent
  209. }
  210. },
  211. }
  212. waitForBlock = &bcReactorFSMState{
  213. name: "waitForBlock",
  214. timeout: waitForBlockAtCurrentHeightTimeout,
  215. enter: func(fsm *BcReactorFSM) {
  216. // Stop when leaving the state.
  217. fsm.resetStateTimer()
  218. },
  219. handle: func(fsm *BcReactorFSM, ev bReactorEvent, data bReactorEventData) (*bcReactorFSMState, error) {
  220. switch ev {
  221. case statusResponseEv:
  222. err := fsm.pool.UpdatePeer(data.peerID, data.base, data.height)
  223. if fsm.pool.NumPeers() == 0 {
  224. return waitForPeer, err
  225. }
  226. if fsm.pool.ReachedMaxHeight() {
  227. return finished, err
  228. }
  229. return waitForBlock, err
  230. case blockResponseEv:
  231. fsm.logger.Debug("blockResponseEv", "H", data.block.Height)
  232. err := fsm.pool.AddBlock(data.peerID, data.block, data.length)
  233. if err != nil {
  234. // A block was received that was unsolicited, from unexpected peer, or that we already have it.
  235. // Ignore block, remove peer and send error to switch.
  236. fsm.pool.RemovePeer(data.peerID, err)
  237. fsm.toBcR.sendPeerError(err, data.peerID)
  238. }
  239. if fsm.pool.NumPeers() == 0 {
  240. return waitForPeer, err
  241. }
  242. return waitForBlock, err
  243. case noBlockResponseEv:
  244. fsm.logger.Error("peer does not have requested block", "peer", data.peerID)
  245. return waitForBlock, nil
  246. case processedBlockEv:
  247. if data.err != nil {
  248. first, second, _ := fsm.pool.FirstTwoBlocksAndPeers()
  249. fsm.logger.Error("error processing block", "err", data.err,
  250. "first", first.block.Height, "second", second.block.Height)
  251. fsm.logger.Error("send peer error for", "peer", first.peer.ID)
  252. fsm.toBcR.sendPeerError(data.err, first.peer.ID)
  253. fsm.logger.Error("send peer error for", "peer", second.peer.ID)
  254. fsm.toBcR.sendPeerError(data.err, second.peer.ID)
  255. // Remove the first two blocks. This will also remove the peers
  256. fsm.pool.InvalidateFirstTwoBlocks(data.err)
  257. } else {
  258. fsm.pool.ProcessedCurrentHeightBlock()
  259. // Since we advanced one block reset the state timer
  260. fsm.resetStateTimer()
  261. }
  262. // Both cases above may result in achieving maximum height.
  263. if fsm.pool.ReachedMaxHeight() {
  264. return finished, nil
  265. }
  266. return waitForBlock, data.err
  267. case peerRemoveEv:
  268. // This event is sent by the switch to remove disconnected and errored peers.
  269. fsm.pool.RemovePeer(data.peerID, data.err)
  270. if fsm.pool.NumPeers() == 0 {
  271. return waitForPeer, nil
  272. }
  273. if fsm.pool.ReachedMaxHeight() {
  274. return finished, nil
  275. }
  276. return waitForBlock, nil
  277. case makeRequestsEv:
  278. fsm.makeNextRequests(data.maxNumRequests)
  279. return waitForBlock, nil
  280. case stateTimeoutEv:
  281. if data.stateName != "waitForBlock" {
  282. fsm.logger.Error("received a state timeout event for different state",
  283. "state", data.stateName)
  284. return waitForBlock, errTimeoutEventWrongState
  285. }
  286. // We haven't received the block at current height or height+1. Remove peer.
  287. fsm.pool.RemovePeerAtCurrentHeights(errNoPeerResponseForCurrentHeights)
  288. fsm.resetStateTimer()
  289. if fsm.pool.NumPeers() == 0 {
  290. return waitForPeer, errNoPeerResponseForCurrentHeights
  291. }
  292. if fsm.pool.ReachedMaxHeight() {
  293. return finished, nil
  294. }
  295. return waitForBlock, errNoPeerResponseForCurrentHeights
  296. case stopFSMEv:
  297. if fsm.stateTimer != nil {
  298. fsm.stateTimer.Stop()
  299. }
  300. return finished, errNoErrorFinished
  301. default:
  302. return waitForBlock, errInvalidEvent
  303. }
  304. },
  305. }
  306. finished = &bcReactorFSMState{
  307. name: "finished",
  308. enter: func(fsm *BcReactorFSM) {
  309. fsm.logger.Info("Time to switch to consensus reactor!", "height", fsm.pool.Height)
  310. fsm.toBcR.switchToConsensus()
  311. fsm.cleanup()
  312. },
  313. handle: func(fsm *BcReactorFSM, ev bReactorEvent, data bReactorEventData) (*bcReactorFSMState, error) {
  314. return finished, nil
  315. },
  316. }
  317. }
  318. // Interface used by FSM for sending Block and Status requests,
  319. // informing of peer errors and state timeouts
  320. // Implemented by BlockchainReactor and tests
  321. type bcReactor interface {
  322. sendStatusRequest()
  323. sendBlockRequest(peerID p2p.ID, height int64) error
  324. sendPeerError(err error, peerID p2p.ID)
  325. resetStateTimer(name string, timer **time.Timer, timeout time.Duration)
  326. switchToConsensus()
  327. }
  328. // SetLogger sets the FSM logger.
  329. func (fsm *BcReactorFSM) SetLogger(l log.Logger) {
  330. fsm.logger = l
  331. fsm.pool.SetLogger(l)
  332. }
  333. // Start starts the FSM.
  334. func (fsm *BcReactorFSM) Start() {
  335. _ = fsm.Handle(&bcReactorMessage{event: startFSMEv})
  336. }
  337. // Handle processes messages and events sent to the FSM.
  338. func (fsm *BcReactorFSM) Handle(msg *bcReactorMessage) error {
  339. fsm.mtx.Lock()
  340. defer fsm.mtx.Unlock()
  341. fsm.logger.Debug("FSM received", "event", msg, "state", fsm.state)
  342. if fsm.state == nil {
  343. fsm.state = unknown
  344. }
  345. next, err := fsm.state.handle(fsm, msg.event, msg.data)
  346. if err != nil {
  347. fsm.logger.Error("FSM event handler returned", "err", err,
  348. "state", fsm.state, "event", msg.event)
  349. }
  350. oldState := fsm.state.name
  351. fsm.transition(next)
  352. if oldState != fsm.state.name {
  353. fsm.logger.Info("FSM changed state", "new_state", fsm.state)
  354. }
  355. return err
  356. }
  357. func (fsm *BcReactorFSM) transition(next *bcReactorFSMState) {
  358. if next == nil {
  359. return
  360. }
  361. if fsm.state != next {
  362. fsm.state = next
  363. if next.enter != nil {
  364. next.enter(fsm)
  365. }
  366. }
  367. }
  368. // Called when entering an FSM state in order to detect lack of progress in the state machine.
  369. // Note the use of the 'bcr' interface to facilitate testing without timer expiring.
  370. func (fsm *BcReactorFSM) resetStateTimer() {
  371. fsm.toBcR.resetStateTimer(fsm.state.name, &fsm.stateTimer, fsm.state.timeout)
  372. }
  373. func (fsm *BcReactorFSM) isCaughtUp() bool {
  374. return fsm.state == finished
  375. }
  376. func (fsm *BcReactorFSM) makeNextRequests(maxNumRequests int) {
  377. fsm.pool.MakeNextRequests(maxNumRequests)
  378. }
  379. func (fsm *BcReactorFSM) cleanup() {
  380. fsm.pool.Cleanup()
  381. }
  382. // NeedsBlocks checks if more block requests are required.
  383. func (fsm *BcReactorFSM) NeedsBlocks() bool {
  384. fsm.mtx.Lock()
  385. defer fsm.mtx.Unlock()
  386. return fsm.state.name == "waitForBlock" && fsm.pool.NeedsBlocks()
  387. }
  388. // FirstTwoBlocks returns the two blocks at pool height and height+1
  389. func (fsm *BcReactorFSM) FirstTwoBlocks() (first, second *types.Block, err error) {
  390. fsm.mtx.Lock()
  391. defer fsm.mtx.Unlock()
  392. firstBP, secondBP, err := fsm.pool.FirstTwoBlocksAndPeers()
  393. if err == nil {
  394. first = firstBP.block
  395. second = secondBP.block
  396. }
  397. return
  398. }
  399. // Status returns the pool's height and the maximum peer height.
  400. func (fsm *BcReactorFSM) Status() (height, maxPeerHeight int64) {
  401. fsm.mtx.Lock()
  402. defer fsm.mtx.Unlock()
  403. return fsm.pool.Height, fsm.pool.MaxPeerHeight
  404. }