You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

453 lines
13 KiB

blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
add support for block pruning via ABCI Commit response (#4588) * Added BlockStore.DeleteBlock() * Added initial block pruner prototype * wip * Added BlockStore.PruneBlocks() * Added consensus setting for block pruning * Added BlockStore base * Error on replay if base does not have blocks * Handle missing blocks when sending VoteSetMaj23Message * Error message tweak * Properly update blockstore state * Error message fix again * blockchain: ignore peer missing blocks * Added FIXME * Added test for block replay with truncated history * Handle peer base in blockchain reactor * Improved replay error handling * Added tests for Store.PruneBlocks() * Fix non-RPC handling of truncated block history * Panic on missing block meta in needProofBlock() * Updated changelog * Handle truncated block history in RPC layer * Added info about earliest block in /status RPC * Reorder height and base in blockchain reactor messages * Updated changelog * Fix tests * Appease linter * Minor review fixes * Non-empty BlockStores should always have base > 0 * Update code to assume base > 0 invariant * Added blockstore tests for pruning to 0 * Make sure we don't prune below the current base * Added BlockStore.Size() * config: added retain_blocks recommendations * Update v1 blockchain reactor to handle blockstore base * Added state database pruning * Propagate errors on missing validator sets * Comment tweaks * Improved error message Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * use ABCI field ResponseCommit.retain_height instead of retain-blocks config option * remove State.RetainHeight, return value instead * fix minor issues * rename pruneHeights() to pruneBlocks() * noop to fix GitHub borkage Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
add support for block pruning via ABCI Commit response (#4588) * Added BlockStore.DeleteBlock() * Added initial block pruner prototype * wip * Added BlockStore.PruneBlocks() * Added consensus setting for block pruning * Added BlockStore base * Error on replay if base does not have blocks * Handle missing blocks when sending VoteSetMaj23Message * Error message tweak * Properly update blockstore state * Error message fix again * blockchain: ignore peer missing blocks * Added FIXME * Added test for block replay with truncated history * Handle peer base in blockchain reactor * Improved replay error handling * Added tests for Store.PruneBlocks() * Fix non-RPC handling of truncated block history * Panic on missing block meta in needProofBlock() * Updated changelog * Handle truncated block history in RPC layer * Added info about earliest block in /status RPC * Reorder height and base in blockchain reactor messages * Updated changelog * Fix tests * Appease linter * Minor review fixes * Non-empty BlockStores should always have base > 0 * Update code to assume base > 0 invariant * Added blockstore tests for pruning to 0 * Make sure we don't prune below the current base * Added BlockStore.Size() * config: added retain_blocks recommendations * Update v1 blockchain reactor to handle blockstore base * Added state database pruning * Propagate errors on missing validator sets * Comment tweaks * Improved error message Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * use ABCI field ResponseCommit.retain_height instead of retain-blocks config option * remove State.RetainHeight, return value instead * fix minor issues * rename pruneHeights() to pruneBlocks() * noop to fix GitHub borkage Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
add support for block pruning via ABCI Commit response (#4588) * Added BlockStore.DeleteBlock() * Added initial block pruner prototype * wip * Added BlockStore.PruneBlocks() * Added consensus setting for block pruning * Added BlockStore base * Error on replay if base does not have blocks * Handle missing blocks when sending VoteSetMaj23Message * Error message tweak * Properly update blockstore state * Error message fix again * blockchain: ignore peer missing blocks * Added FIXME * Added test for block replay with truncated history * Handle peer base in blockchain reactor * Improved replay error handling * Added tests for Store.PruneBlocks() * Fix non-RPC handling of truncated block history * Panic on missing block meta in needProofBlock() * Updated changelog * Handle truncated block history in RPC layer * Added info about earliest block in /status RPC * Reorder height and base in blockchain reactor messages * Updated changelog * Fix tests * Appease linter * Minor review fixes * Non-empty BlockStores should always have base > 0 * Update code to assume base > 0 invariant * Added blockstore tests for pruning to 0 * Make sure we don't prune below the current base * Added BlockStore.Size() * config: added retain_blocks recommendations * Update v1 blockchain reactor to handle blockstore base * Added state database pruning * Propagate errors on missing validator sets * Comment tweaks * Improved error message Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * use ABCI field ResponseCommit.retain_height instead of retain-blocks config option * remove State.RetainHeight, return value instead * fix minor issues * rename pruneHeights() to pruneBlocks() * noop to fix GitHub borkage Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
add support for block pruning via ABCI Commit response (#4588) * Added BlockStore.DeleteBlock() * Added initial block pruner prototype * wip * Added BlockStore.PruneBlocks() * Added consensus setting for block pruning * Added BlockStore base * Error on replay if base does not have blocks * Handle missing blocks when sending VoteSetMaj23Message * Error message tweak * Properly update blockstore state * Error message fix again * blockchain: ignore peer missing blocks * Added FIXME * Added test for block replay with truncated history * Handle peer base in blockchain reactor * Improved replay error handling * Added tests for Store.PruneBlocks() * Fix non-RPC handling of truncated block history * Panic on missing block meta in needProofBlock() * Updated changelog * Handle truncated block history in RPC layer * Added info about earliest block in /status RPC * Reorder height and base in blockchain reactor messages * Updated changelog * Fix tests * Appease linter * Minor review fixes * Non-empty BlockStores should always have base > 0 * Update code to assume base > 0 invariant * Added blockstore tests for pruning to 0 * Make sure we don't prune below the current base * Added BlockStore.Size() * config: added retain_blocks recommendations * Update v1 blockchain reactor to handle blockstore base * Added state database pruning * Propagate errors on missing validator sets * Comment tweaks * Improved error message Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * use ABCI field ResponseCommit.retain_height instead of retain-blocks config option * remove State.RetainHeight, return value instead * fix minor issues * rename pruneHeights() to pruneBlocks() * noop to fix GitHub borkage Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
5 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
5 years ago
  1. package v1
  2. import (
  3. "errors"
  4. "fmt"
  5. "sync"
  6. "time"
  7. "github.com/tendermint/tendermint/libs/log"
  8. "github.com/tendermint/tendermint/p2p"
  9. "github.com/tendermint/tendermint/types"
  10. )
  11. // Blockchain Reactor State
  12. type bcReactorFSMState struct {
  13. name string
  14. // called when transitioning out of current state
  15. handle func(*BcReactorFSM, bReactorEvent, bReactorEventData) (next *bcReactorFSMState, err error)
  16. // called when entering the state
  17. enter func(fsm *BcReactorFSM)
  18. // timeout to ensure FSM is not stuck in a state forever
  19. // the timer is owned and run by the fsm instance
  20. timeout time.Duration
  21. }
  22. func (s *bcReactorFSMState) String() string {
  23. return s.name
  24. }
  25. // BcReactorFSM is the datastructure for the Blockchain Reactor State Machine
  26. type BcReactorFSM struct {
  27. logger log.Logger
  28. mtx sync.Mutex
  29. startTime time.Time
  30. state *bcReactorFSMState
  31. stateTimer *time.Timer
  32. pool *BlockPool
  33. // interface used to call the Blockchain reactor to send StatusRequest, BlockRequest, reporting errors, etc.
  34. toBcR bcReactor
  35. }
  36. // NewFSM creates a new reactor FSM.
  37. func NewFSM(height int64, toBcR bcReactor) *BcReactorFSM {
  38. return &BcReactorFSM{
  39. state: unknown,
  40. startTime: time.Now(),
  41. pool: NewBlockPool(height, toBcR),
  42. toBcR: toBcR,
  43. }
  44. }
  45. // bReactorEventData is part of the message sent by the reactor to the FSM and used by the state handlers.
  46. type bReactorEventData struct {
  47. peerID p2p.ID
  48. err error // for peer error: timeout, slow; for processed block event if error occurred
  49. base int64 // for status response
  50. height int64 // for status response; for processed block event
  51. block *types.Block // for block response
  52. stateName string // for state timeout events
  53. length int // for block response event, length of received block, used to detect slow peers
  54. maxNumRequests int // for request needed event, maximum number of pending requests
  55. }
  56. // Blockchain Reactor Events (the input to the state machine)
  57. type bReactorEvent uint
  58. const (
  59. // message type events
  60. startFSMEv = iota + 1
  61. statusResponseEv
  62. blockResponseEv
  63. processedBlockEv
  64. makeRequestsEv
  65. stopFSMEv
  66. // other events
  67. peerRemoveEv = iota + 256
  68. stateTimeoutEv
  69. )
  70. func (msg *bcReactorMessage) String() string {
  71. var dataStr string
  72. switch msg.event {
  73. case startFSMEv:
  74. dataStr = ""
  75. case statusResponseEv:
  76. dataStr = fmt.Sprintf("peer=%v base=%v height=%v", msg.data.peerID, msg.data.base, msg.data.height)
  77. case blockResponseEv:
  78. dataStr = fmt.Sprintf("peer=%v block.height=%v length=%v",
  79. msg.data.peerID, msg.data.block.Height, msg.data.length)
  80. case processedBlockEv:
  81. dataStr = fmt.Sprintf("error=%v", msg.data.err)
  82. case makeRequestsEv:
  83. dataStr = ""
  84. case stopFSMEv:
  85. dataStr = ""
  86. case peerRemoveEv:
  87. dataStr = fmt.Sprintf("peer: %v is being removed by the switch", msg.data.peerID)
  88. case stateTimeoutEv:
  89. dataStr = fmt.Sprintf("state=%v", msg.data.stateName)
  90. default:
  91. dataStr = "cannot interpret message data"
  92. }
  93. return fmt.Sprintf("%v: %v", msg.event, dataStr)
  94. }
  95. func (ev bReactorEvent) String() string {
  96. switch ev {
  97. case startFSMEv:
  98. return "startFSMEv"
  99. case statusResponseEv:
  100. return "statusResponseEv"
  101. case blockResponseEv:
  102. return "blockResponseEv"
  103. case processedBlockEv:
  104. return "processedBlockEv"
  105. case makeRequestsEv:
  106. return "makeRequestsEv"
  107. case stopFSMEv:
  108. return "stopFSMEv"
  109. case peerRemoveEv:
  110. return "peerRemoveEv"
  111. case stateTimeoutEv:
  112. return "stateTimeoutEv"
  113. default:
  114. return "event unknown"
  115. }
  116. }
  117. // states
  118. var (
  119. unknown *bcReactorFSMState
  120. waitForPeer *bcReactorFSMState
  121. waitForBlock *bcReactorFSMState
  122. finished *bcReactorFSMState
  123. )
  124. // timeouts for state timers
  125. const (
  126. waitForPeerTimeout = 3 * time.Second
  127. waitForBlockAtCurrentHeightTimeout = 10 * time.Second
  128. )
  129. // errors
  130. var (
  131. // internal to the package
  132. errNoErrorFinished = errors.New("fast sync is finished")
  133. errInvalidEvent = errors.New("invalid event in current state")
  134. errMissingBlock = errors.New("missing blocks")
  135. errNilPeerForBlockRequest = errors.New("peer for block request does not exist in the switch")
  136. errSendQueueFull = errors.New("block request not made, send-queue is full")
  137. errPeerTooShort = errors.New("peer height too low, old peer removed/ new peer not added")
  138. errSwitchRemovesPeer = errors.New("switch is removing peer")
  139. errTimeoutEventWrongState = errors.New("timeout event for a state different than the current one")
  140. errNoTallerPeer = errors.New("fast sync timed out on waiting for a peer taller than this node")
  141. // reported eventually to the switch
  142. // handle return
  143. errPeerLowersItsHeight = errors.New("fast sync peer reports a height lower than previous")
  144. // handle return
  145. errNoPeerResponseForCurrentHeights = errors.New("fast sync timed out on peer block response for current heights")
  146. errNoPeerResponse = errors.New("fast sync timed out on peer block response") // xx
  147. errBadDataFromPeer = errors.New("fast sync received block from wrong peer or block is bad") // xx
  148. errDuplicateBlock = errors.New("fast sync received duplicate block from peer")
  149. errBlockVerificationFailure = errors.New("fast sync block verification failure") // xx
  150. errSlowPeer = errors.New("fast sync peer is not sending us data fast enough") // xx
  151. )
  152. func init() {
  153. unknown = &bcReactorFSMState{
  154. name: "unknown",
  155. handle: func(fsm *BcReactorFSM, ev bReactorEvent, data bReactorEventData) (*bcReactorFSMState, error) {
  156. switch ev {
  157. case startFSMEv:
  158. // Broadcast Status message. Currently doesn't return non-nil error.
  159. fsm.toBcR.sendStatusRequest()
  160. return waitForPeer, nil
  161. case stopFSMEv:
  162. return finished, errNoErrorFinished
  163. default:
  164. return unknown, errInvalidEvent
  165. }
  166. },
  167. }
  168. waitForPeer = &bcReactorFSMState{
  169. name: "waitForPeer",
  170. timeout: waitForPeerTimeout,
  171. enter: func(fsm *BcReactorFSM) {
  172. // Stop when leaving the state.
  173. fsm.resetStateTimer()
  174. },
  175. handle: func(fsm *BcReactorFSM, ev bReactorEvent, data bReactorEventData) (*bcReactorFSMState, error) {
  176. switch ev {
  177. case stateTimeoutEv:
  178. if data.stateName != "waitForPeer" {
  179. fsm.logger.Error("received a state timeout event for different state",
  180. "state", data.stateName)
  181. return waitForPeer, errTimeoutEventWrongState
  182. }
  183. // There was no statusResponse received from any peer.
  184. // Should we send status request again?
  185. return finished, errNoTallerPeer
  186. case statusResponseEv:
  187. if err := fsm.pool.UpdatePeer(data.peerID, data.base, data.height); err != nil {
  188. if fsm.pool.NumPeers() == 0 {
  189. return waitForPeer, err
  190. }
  191. }
  192. if fsm.stateTimer != nil {
  193. fsm.stateTimer.Stop()
  194. }
  195. return waitForBlock, nil
  196. case stopFSMEv:
  197. if fsm.stateTimer != nil {
  198. fsm.stateTimer.Stop()
  199. }
  200. return finished, errNoErrorFinished
  201. default:
  202. return waitForPeer, errInvalidEvent
  203. }
  204. },
  205. }
  206. waitForBlock = &bcReactorFSMState{
  207. name: "waitForBlock",
  208. timeout: waitForBlockAtCurrentHeightTimeout,
  209. enter: func(fsm *BcReactorFSM) {
  210. // Stop when leaving the state.
  211. fsm.resetStateTimer()
  212. },
  213. handle: func(fsm *BcReactorFSM, ev bReactorEvent, data bReactorEventData) (*bcReactorFSMState, error) {
  214. switch ev {
  215. case statusResponseEv:
  216. err := fsm.pool.UpdatePeer(data.peerID, data.base, data.height)
  217. if fsm.pool.NumPeers() == 0 {
  218. return waitForPeer, err
  219. }
  220. if fsm.pool.ReachedMaxHeight() {
  221. return finished, err
  222. }
  223. return waitForBlock, err
  224. case blockResponseEv:
  225. fsm.logger.Debug("blockResponseEv", "H", data.block.Height)
  226. err := fsm.pool.AddBlock(data.peerID, data.block, data.length)
  227. if err != nil {
  228. // A block was received that was unsolicited, from unexpected peer, or that we already have it.
  229. // Ignore block, remove peer and send error to switch.
  230. fsm.pool.RemovePeer(data.peerID, err)
  231. fsm.toBcR.sendPeerError(err, data.peerID)
  232. }
  233. if fsm.pool.NumPeers() == 0 {
  234. return waitForPeer, err
  235. }
  236. return waitForBlock, err
  237. case processedBlockEv:
  238. if data.err != nil {
  239. first, second, _ := fsm.pool.FirstTwoBlocksAndPeers()
  240. fsm.logger.Error("error processing block", "err", data.err,
  241. "first", first.block.Height, "second", second.block.Height)
  242. fsm.logger.Error("send peer error for", "peer", first.peer.ID)
  243. fsm.toBcR.sendPeerError(data.err, first.peer.ID)
  244. fsm.logger.Error("send peer error for", "peer", second.peer.ID)
  245. fsm.toBcR.sendPeerError(data.err, second.peer.ID)
  246. // Remove the first two blocks. This will also remove the peers
  247. fsm.pool.InvalidateFirstTwoBlocks(data.err)
  248. } else {
  249. fsm.pool.ProcessedCurrentHeightBlock()
  250. // Since we advanced one block reset the state timer
  251. fsm.resetStateTimer()
  252. }
  253. // Both cases above may result in achieving maximum height.
  254. if fsm.pool.ReachedMaxHeight() {
  255. return finished, nil
  256. }
  257. return waitForBlock, data.err
  258. case peerRemoveEv:
  259. // This event is sent by the switch to remove disconnected and errored peers.
  260. fsm.pool.RemovePeer(data.peerID, data.err)
  261. if fsm.pool.NumPeers() == 0 {
  262. return waitForPeer, nil
  263. }
  264. if fsm.pool.ReachedMaxHeight() {
  265. return finished, nil
  266. }
  267. return waitForBlock, nil
  268. case makeRequestsEv:
  269. fsm.makeNextRequests(data.maxNumRequests)
  270. return waitForBlock, nil
  271. case stateTimeoutEv:
  272. if data.stateName != "waitForBlock" {
  273. fsm.logger.Error("received a state timeout event for different state",
  274. "state", data.stateName)
  275. return waitForBlock, errTimeoutEventWrongState
  276. }
  277. // We haven't received the block at current height or height+1. Remove peer.
  278. fsm.pool.RemovePeerAtCurrentHeights(errNoPeerResponseForCurrentHeights)
  279. fsm.resetStateTimer()
  280. if fsm.pool.NumPeers() == 0 {
  281. return waitForPeer, errNoPeerResponseForCurrentHeights
  282. }
  283. if fsm.pool.ReachedMaxHeight() {
  284. return finished, nil
  285. }
  286. return waitForBlock, errNoPeerResponseForCurrentHeights
  287. case stopFSMEv:
  288. if fsm.stateTimer != nil {
  289. fsm.stateTimer.Stop()
  290. }
  291. return finished, errNoErrorFinished
  292. default:
  293. return waitForBlock, errInvalidEvent
  294. }
  295. },
  296. }
  297. finished = &bcReactorFSMState{
  298. name: "finished",
  299. enter: func(fsm *BcReactorFSM) {
  300. fsm.logger.Info("Time to switch to consensus reactor!", "height", fsm.pool.Height)
  301. fsm.toBcR.switchToConsensus()
  302. fsm.cleanup()
  303. },
  304. handle: func(fsm *BcReactorFSM, ev bReactorEvent, data bReactorEventData) (*bcReactorFSMState, error) {
  305. return finished, nil
  306. },
  307. }
  308. }
  309. // Interface used by FSM for sending Block and Status requests,
  310. // informing of peer errors and state timeouts
  311. // Implemented by BlockchainReactor and tests
  312. type bcReactor interface {
  313. sendStatusRequest()
  314. sendBlockRequest(peerID p2p.ID, height int64) error
  315. sendPeerError(err error, peerID p2p.ID)
  316. resetStateTimer(name string, timer **time.Timer, timeout time.Duration)
  317. switchToConsensus()
  318. }
  319. // SetLogger sets the FSM logger.
  320. func (fsm *BcReactorFSM) SetLogger(l log.Logger) {
  321. fsm.logger = l
  322. fsm.pool.SetLogger(l)
  323. }
  324. // Start starts the FSM.
  325. func (fsm *BcReactorFSM) Start() {
  326. _ = fsm.Handle(&bcReactorMessage{event: startFSMEv})
  327. }
  328. // Handle processes messages and events sent to the FSM.
  329. func (fsm *BcReactorFSM) Handle(msg *bcReactorMessage) error {
  330. fsm.mtx.Lock()
  331. defer fsm.mtx.Unlock()
  332. fsm.logger.Debug("FSM received", "event", msg, "state", fsm.state)
  333. if fsm.state == nil {
  334. fsm.state = unknown
  335. }
  336. next, err := fsm.state.handle(fsm, msg.event, msg.data)
  337. if err != nil {
  338. fsm.logger.Error("FSM event handler returned", "err", err,
  339. "state", fsm.state, "event", msg.event)
  340. }
  341. oldState := fsm.state.name
  342. fsm.transition(next)
  343. if oldState != fsm.state.name {
  344. fsm.logger.Info("FSM changed state", "new_state", fsm.state)
  345. }
  346. return err
  347. }
  348. func (fsm *BcReactorFSM) transition(next *bcReactorFSMState) {
  349. if next == nil {
  350. return
  351. }
  352. if fsm.state != next {
  353. fsm.state = next
  354. if next.enter != nil {
  355. next.enter(fsm)
  356. }
  357. }
  358. }
  359. // Called when entering an FSM state in order to detect lack of progress in the state machine.
  360. // Note the use of the 'bcr' interface to facilitate testing without timer expiring.
  361. func (fsm *BcReactorFSM) resetStateTimer() {
  362. fsm.toBcR.resetStateTimer(fsm.state.name, &fsm.stateTimer, fsm.state.timeout)
  363. }
  364. func (fsm *BcReactorFSM) isCaughtUp() bool {
  365. return fsm.state == finished
  366. }
  367. func (fsm *BcReactorFSM) makeNextRequests(maxNumRequests int) {
  368. fsm.pool.MakeNextRequests(maxNumRequests)
  369. }
  370. func (fsm *BcReactorFSM) cleanup() {
  371. fsm.pool.Cleanup()
  372. }
  373. // NeedsBlocks checks if more block requests are required.
  374. func (fsm *BcReactorFSM) NeedsBlocks() bool {
  375. fsm.mtx.Lock()
  376. defer fsm.mtx.Unlock()
  377. return fsm.state.name == "waitForBlock" && fsm.pool.NeedsBlocks()
  378. }
  379. // FirstTwoBlocks returns the two blocks at pool height and height+1
  380. func (fsm *BcReactorFSM) FirstTwoBlocks() (first, second *types.Block, err error) {
  381. fsm.mtx.Lock()
  382. defer fsm.mtx.Unlock()
  383. firstBP, secondBP, err := fsm.pool.FirstTwoBlocksAndPeers()
  384. if err == nil {
  385. first = firstBP.block
  386. second = secondBP.block
  387. }
  388. return
  389. }
  390. // Status returns the pool's height and the maximum peer height.
  391. func (fsm *BcReactorFSM) Status() (height, maxPeerHeight int64) {
  392. fsm.mtx.Lock()
  393. defer fsm.mtx.Unlock()
  394. return fsm.pool.Height, fsm.pool.MaxPeerHeight
  395. }