You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

622 lines
18 KiB

blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
6 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
6 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
6 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
6 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
6 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
6 years ago
blockchain: Reorg reactor (#3561) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry
6 years ago
  1. package v1
  2. import (
  3. "errors"
  4. "fmt"
  5. "reflect"
  6. "time"
  7. amino "github.com/tendermint/go-amino"
  8. "github.com/tendermint/tendermint/behaviour"
  9. "github.com/tendermint/tendermint/libs/log"
  10. "github.com/tendermint/tendermint/p2p"
  11. sm "github.com/tendermint/tendermint/state"
  12. "github.com/tendermint/tendermint/store"
  13. "github.com/tendermint/tendermint/types"
  14. )
  15. const (
  16. // BlockchainChannel is a channel for blocks and status updates (`BlockStore` height)
  17. BlockchainChannel = byte(0x40)
  18. trySyncIntervalMS = 10
  19. trySendIntervalMS = 10
  20. // ask for best height every 10s
  21. statusUpdateIntervalSeconds = 10
  22. // NOTE: keep up to date with bcBlockResponseMessage
  23. bcBlockResponseMessagePrefixSize = 4
  24. bcBlockResponseMessageFieldKeySize = 1
  25. maxMsgSize = types.MaxBlockSizeBytes +
  26. bcBlockResponseMessagePrefixSize +
  27. bcBlockResponseMessageFieldKeySize
  28. )
  29. var (
  30. // Maximum number of requests that can be pending per peer, i.e. for which requests have been sent but blocks
  31. // have not been received.
  32. maxRequestsPerPeer = 20
  33. // Maximum number of block requests for the reactor, pending or for which blocks have been received.
  34. maxNumRequests = 64
  35. )
  36. type consensusReactor interface {
  37. // for when we switch from blockchain reactor and fast sync to
  38. // the consensus machine
  39. SwitchToConsensus(sm.State, int)
  40. }
  41. // BlockchainReactor handles long-term catchup syncing.
  42. type BlockchainReactor struct {
  43. p2p.BaseReactor
  44. initialState sm.State // immutable
  45. state sm.State
  46. blockExec *sm.BlockExecutor
  47. store *store.BlockStore
  48. fastSync bool
  49. fsm *BcReactorFSM
  50. blocksSynced int
  51. // Receive goroutine forwards messages to this channel to be processed in the context of the poolRoutine.
  52. messagesForFSMCh chan bcReactorMessage
  53. // Switch goroutine may send RemovePeer to the blockchain reactor. This is an error message that is relayed
  54. // to this channel to be processed in the context of the poolRoutine.
  55. errorsForFSMCh chan bcReactorMessage
  56. // This channel is used by the FSM and indirectly the block pool to report errors to the blockchain reactor and
  57. // the switch.
  58. eventsFromFSMCh chan bcFsmMessage
  59. swReporter *behaviour.SwitchReporter
  60. }
  61. // NewBlockchainReactor returns new reactor instance.
  62. func NewBlockchainReactor(state sm.State, blockExec *sm.BlockExecutor, store *store.BlockStore,
  63. fastSync bool) *BlockchainReactor {
  64. if state.LastBlockHeight != store.Height() {
  65. panic(fmt.Sprintf("state (%v) and store (%v) height mismatch", state.LastBlockHeight,
  66. store.Height()))
  67. }
  68. const capacity = 1000
  69. eventsFromFSMCh := make(chan bcFsmMessage, capacity)
  70. messagesForFSMCh := make(chan bcReactorMessage, capacity)
  71. errorsForFSMCh := make(chan bcReactorMessage, capacity)
  72. startHeight := store.Height() + 1
  73. bcR := &BlockchainReactor{
  74. initialState: state,
  75. state: state,
  76. blockExec: blockExec,
  77. fastSync: fastSync,
  78. store: store,
  79. messagesForFSMCh: messagesForFSMCh,
  80. eventsFromFSMCh: eventsFromFSMCh,
  81. errorsForFSMCh: errorsForFSMCh,
  82. }
  83. fsm := NewFSM(startHeight, bcR)
  84. bcR.fsm = fsm
  85. bcR.BaseReactor = *p2p.NewBaseReactor("BlockchainReactor", bcR)
  86. //bcR.swReporter = behaviour.NewSwitcReporter(bcR.BaseReactor.Switch)
  87. return bcR
  88. }
  89. // bcReactorMessage is used by the reactor to send messages to the FSM.
  90. type bcReactorMessage struct {
  91. event bReactorEvent
  92. data bReactorEventData
  93. }
  94. type bFsmEvent uint
  95. const (
  96. // message type events
  97. peerErrorEv = iota + 1
  98. syncFinishedEv
  99. )
  100. type bFsmEventData struct {
  101. peerID p2p.ID
  102. err error
  103. }
  104. // bcFsmMessage is used by the FSM to send messages to the reactor
  105. type bcFsmMessage struct {
  106. event bFsmEvent
  107. data bFsmEventData
  108. }
  109. // SetLogger implements service.Service by setting the logger on reactor and pool.
  110. func (bcR *BlockchainReactor) SetLogger(l log.Logger) {
  111. bcR.BaseService.Logger = l
  112. bcR.fsm.SetLogger(l)
  113. }
  114. // OnStart implements service.Service.
  115. func (bcR *BlockchainReactor) OnStart() error {
  116. bcR.swReporter = behaviour.NewSwitcReporter(bcR.BaseReactor.Switch)
  117. if bcR.fastSync {
  118. go bcR.poolRoutine()
  119. }
  120. return nil
  121. }
  122. // OnStop implements service.Service.
  123. func (bcR *BlockchainReactor) OnStop() {
  124. _ = bcR.Stop()
  125. }
  126. // GetChannels implements Reactor
  127. func (bcR *BlockchainReactor) GetChannels() []*p2p.ChannelDescriptor {
  128. return []*p2p.ChannelDescriptor{
  129. {
  130. ID: BlockchainChannel,
  131. Priority: 10,
  132. SendQueueCapacity: 2000,
  133. RecvBufferCapacity: 50 * 4096,
  134. RecvMessageCapacity: maxMsgSize,
  135. },
  136. }
  137. }
  138. // AddPeer implements Reactor by sending our state to peer.
  139. func (bcR *BlockchainReactor) AddPeer(peer p2p.Peer) {
  140. msgBytes := cdc.MustMarshalBinaryBare(&bcStatusResponseMessage{bcR.store.Height()})
  141. peer.Send(BlockchainChannel, msgBytes)
  142. // it's OK if send fails. will try later in poolRoutine
  143. // peer is added to the pool once we receive the first
  144. // bcStatusResponseMessage from the peer and call pool.updatePeer()
  145. }
  146. // sendBlockToPeer loads a block and sends it to the requesting peer.
  147. // If the block doesn't exist a bcNoBlockResponseMessage is sent.
  148. // If all nodes are honest, no node should be requesting for a block that doesn't exist.
  149. func (bcR *BlockchainReactor) sendBlockToPeer(msg *bcBlockRequestMessage,
  150. src p2p.Peer) (queued bool) {
  151. block := bcR.store.LoadBlock(msg.Height)
  152. if block != nil {
  153. msgBytes := cdc.MustMarshalBinaryBare(&bcBlockResponseMessage{Block: block})
  154. return src.TrySend(BlockchainChannel, msgBytes)
  155. }
  156. bcR.Logger.Info("peer asking for a block we don't have", "src", src, "height", msg.Height)
  157. msgBytes := cdc.MustMarshalBinaryBare(&bcNoBlockResponseMessage{Height: msg.Height})
  158. return src.TrySend(BlockchainChannel, msgBytes)
  159. }
  160. func (bcR *BlockchainReactor) sendStatusResponseToPeer(msg *bcStatusRequestMessage, src p2p.Peer) (queued bool) {
  161. msgBytes := cdc.MustMarshalBinaryBare(&bcStatusResponseMessage{bcR.store.Height()})
  162. return src.TrySend(BlockchainChannel, msgBytes)
  163. }
  164. // RemovePeer implements Reactor by removing peer from the pool.
  165. func (bcR *BlockchainReactor) RemovePeer(peer p2p.Peer, reason interface{}) {
  166. msgData := bcReactorMessage{
  167. event: peerRemoveEv,
  168. data: bReactorEventData{
  169. peerID: peer.ID(),
  170. err: errSwitchRemovesPeer,
  171. },
  172. }
  173. bcR.errorsForFSMCh <- msgData
  174. }
  175. // Receive implements Reactor by handling 4 types of messages (look below).
  176. func (bcR *BlockchainReactor) Receive(chID byte, src p2p.Peer, msgBytes []byte) {
  177. msg, err := decodeMsg(msgBytes)
  178. if err != nil {
  179. bcR.Logger.Error("error decoding message",
  180. "src", src, "chId", chID, "msg", msg, "err", err, "bytes", msgBytes)
  181. _ = bcR.swReporter.Report(behaviour.BadMessage(src.ID(), err.Error()))
  182. return
  183. }
  184. if err = msg.ValidateBasic(); err != nil {
  185. bcR.Logger.Error("peer sent us invalid msg", "peer", src, "msg", msg, "err", err)
  186. _ = bcR.swReporter.Report(behaviour.BadMessage(src.ID(), err.Error()))
  187. return
  188. }
  189. bcR.Logger.Debug("Receive", "src", src, "chID", chID, "msg", msg)
  190. switch msg := msg.(type) {
  191. case *bcBlockRequestMessage:
  192. if queued := bcR.sendBlockToPeer(msg, src); !queued {
  193. // Unfortunately not queued since the queue is full.
  194. bcR.Logger.Error("Could not send block message to peer", "src", src, "height", msg.Height)
  195. }
  196. case *bcStatusRequestMessage:
  197. // Send peer our state.
  198. if queued := bcR.sendStatusResponseToPeer(msg, src); !queued {
  199. // Unfortunately not queued since the queue is full.
  200. bcR.Logger.Error("Could not send status message to peer", "src", src)
  201. }
  202. case *bcBlockResponseMessage:
  203. msgForFSM := bcReactorMessage{
  204. event: blockResponseEv,
  205. data: bReactorEventData{
  206. peerID: src.ID(),
  207. height: msg.Block.Height,
  208. block: msg.Block,
  209. length: len(msgBytes),
  210. },
  211. }
  212. bcR.Logger.Info("Received", "src", src, "height", msg.Block.Height)
  213. bcR.messagesForFSMCh <- msgForFSM
  214. case *bcStatusResponseMessage:
  215. // Got a peer status. Unverified.
  216. msgForFSM := bcReactorMessage{
  217. event: statusResponseEv,
  218. data: bReactorEventData{
  219. peerID: src.ID(),
  220. height: msg.Height,
  221. length: len(msgBytes),
  222. },
  223. }
  224. bcR.messagesForFSMCh <- msgForFSM
  225. default:
  226. bcR.Logger.Error(fmt.Sprintf("unknown message type %v", reflect.TypeOf(msg)))
  227. }
  228. }
  229. // processBlocksRoutine processes blocks until signlaed to stop over the stopProcessing channel
  230. func (bcR *BlockchainReactor) processBlocksRoutine(stopProcessing chan struct{}) {
  231. processReceivedBlockTicker := time.NewTicker(trySyncIntervalMS * time.Millisecond)
  232. doProcessBlockCh := make(chan struct{}, 1)
  233. lastHundred := time.Now()
  234. lastRate := 0.0
  235. ForLoop:
  236. for {
  237. select {
  238. case <-stopProcessing:
  239. bcR.Logger.Info("finishing block execution")
  240. break ForLoop
  241. case <-processReceivedBlockTicker.C: // try to execute blocks
  242. select {
  243. case doProcessBlockCh <- struct{}{}:
  244. default:
  245. }
  246. case <-doProcessBlockCh:
  247. for {
  248. err := bcR.processBlock()
  249. if err == errMissingBlock {
  250. break
  251. }
  252. // Notify FSM of block processing result.
  253. msgForFSM := bcReactorMessage{
  254. event: processedBlockEv,
  255. data: bReactorEventData{
  256. err: err,
  257. },
  258. }
  259. _ = bcR.fsm.Handle(&msgForFSM)
  260. if err != nil {
  261. break
  262. }
  263. bcR.blocksSynced++
  264. if bcR.blocksSynced%100 == 0 {
  265. lastRate = 0.9*lastRate + 0.1*(100/time.Since(lastHundred).Seconds())
  266. height, maxPeerHeight := bcR.fsm.Status()
  267. bcR.Logger.Info("Fast Sync Rate", "height", height,
  268. "max_peer_height", maxPeerHeight, "blocks/s", lastRate)
  269. lastHundred = time.Now()
  270. }
  271. }
  272. }
  273. }
  274. }
  275. // poolRoutine receives and handles messages from the Receive() routine and from the FSM.
  276. func (bcR *BlockchainReactor) poolRoutine() {
  277. bcR.fsm.Start()
  278. sendBlockRequestTicker := time.NewTicker(trySendIntervalMS * time.Millisecond)
  279. statusUpdateTicker := time.NewTicker(statusUpdateIntervalSeconds * time.Second)
  280. stopProcessing := make(chan struct{}, 1)
  281. go bcR.processBlocksRoutine(stopProcessing)
  282. ForLoop:
  283. for {
  284. select {
  285. case <-sendBlockRequestTicker.C:
  286. if !bcR.fsm.NeedsBlocks() {
  287. continue
  288. }
  289. _ = bcR.fsm.Handle(&bcReactorMessage{
  290. event: makeRequestsEv,
  291. data: bReactorEventData{
  292. maxNumRequests: maxNumRequests}})
  293. case <-statusUpdateTicker.C:
  294. // Ask for status updates.
  295. go bcR.sendStatusRequest()
  296. case msg := <-bcR.messagesForFSMCh:
  297. // Sent from the Receive() routine when status (statusResponseEv) and
  298. // block (blockResponseEv) response events are received
  299. _ = bcR.fsm.Handle(&msg)
  300. case msg := <-bcR.errorsForFSMCh:
  301. // Sent from the switch.RemovePeer() routine (RemovePeerEv) and
  302. // FSM state timer expiry routine (stateTimeoutEv).
  303. _ = bcR.fsm.Handle(&msg)
  304. case msg := <-bcR.eventsFromFSMCh:
  305. switch msg.event {
  306. case syncFinishedEv:
  307. stopProcessing <- struct{}{}
  308. // Sent from the FSM when it enters finished state.
  309. break ForLoop
  310. case peerErrorEv:
  311. // Sent from the FSM when it detects peer error
  312. bcR.reportPeerErrorToSwitch(msg.data.err, msg.data.peerID)
  313. if msg.data.err == errNoPeerResponse {
  314. // Sent from the peer timeout handler routine
  315. _ = bcR.fsm.Handle(&bcReactorMessage{
  316. event: peerRemoveEv,
  317. data: bReactorEventData{
  318. peerID: msg.data.peerID,
  319. err: msg.data.err,
  320. },
  321. })
  322. }
  323. // else {
  324. // For slow peers, or errors due to blocks received from wrong peer
  325. // the FSM had already removed the peers
  326. // }
  327. default:
  328. bcR.Logger.Error("Event from FSM not supported", "type", msg.event)
  329. }
  330. case <-bcR.Quit():
  331. break ForLoop
  332. }
  333. }
  334. }
  335. func (bcR *BlockchainReactor) reportPeerErrorToSwitch(err error, peerID p2p.ID) {
  336. peer := bcR.Switch.Peers().Get(peerID)
  337. if peer != nil {
  338. _ = bcR.swReporter.Report(behaviour.BadMessage(peerID, err.Error()))
  339. }
  340. }
  341. func (bcR *BlockchainReactor) processBlock() error {
  342. first, second, err := bcR.fsm.FirstTwoBlocks()
  343. if err != nil {
  344. // We need both to sync the first block.
  345. return err
  346. }
  347. chainID := bcR.initialState.ChainID
  348. firstParts := first.MakePartSet(types.BlockPartSizeBytes)
  349. firstPartsHeader := firstParts.Header()
  350. firstID := types.BlockID{Hash: first.Hash(), PartsHeader: firstPartsHeader}
  351. // Finally, verify the first block using the second's commit
  352. // NOTE: we can probably make this more efficient, but note that calling
  353. // first.Hash() doesn't verify the tx contents, so MakePartSet() is
  354. // currently necessary.
  355. err = bcR.state.Validators.VerifyCommit(chainID, firstID, first.Height, second.LastCommit)
  356. if err != nil {
  357. bcR.Logger.Error("error during commit verification", "err", err,
  358. "first", first.Height, "second", second.Height)
  359. return errBlockVerificationFailure
  360. }
  361. bcR.store.SaveBlock(first, firstParts, second.LastCommit)
  362. bcR.state, err = bcR.blockExec.ApplyBlock(bcR.state, firstID, first)
  363. if err != nil {
  364. panic(fmt.Sprintf("failed to process committed block (%d:%X): %v", first.Height, first.Hash(), err))
  365. }
  366. return nil
  367. }
  368. // Implements bcRNotifier
  369. // sendStatusRequest broadcasts `BlockStore` height.
  370. func (bcR *BlockchainReactor) sendStatusRequest() {
  371. msgBytes := cdc.MustMarshalBinaryBare(&bcStatusRequestMessage{bcR.store.Height()})
  372. bcR.Switch.Broadcast(BlockchainChannel, msgBytes)
  373. }
  374. // Implements bcRNotifier
  375. // BlockRequest sends `BlockRequest` height.
  376. func (bcR *BlockchainReactor) sendBlockRequest(peerID p2p.ID, height int64) error {
  377. peer := bcR.Switch.Peers().Get(peerID)
  378. if peer == nil {
  379. return errNilPeerForBlockRequest
  380. }
  381. msgBytes := cdc.MustMarshalBinaryBare(&bcBlockRequestMessage{height})
  382. queued := peer.TrySend(BlockchainChannel, msgBytes)
  383. if !queued {
  384. return errSendQueueFull
  385. }
  386. return nil
  387. }
  388. // Implements bcRNotifier
  389. func (bcR *BlockchainReactor) switchToConsensus() {
  390. conR, ok := bcR.Switch.Reactor("CONSENSUS").(consensusReactor)
  391. if ok {
  392. conR.SwitchToConsensus(bcR.state, bcR.blocksSynced)
  393. bcR.eventsFromFSMCh <- bcFsmMessage{event: syncFinishedEv}
  394. }
  395. // else {
  396. // Should only happen during testing.
  397. // }
  398. }
  399. // Implements bcRNotifier
  400. // Called by FSM and pool:
  401. // - pool calls when it detects slow peer or when peer times out
  402. // - FSM calls when:
  403. // - adding a block (addBlock) fails
  404. // - reactor processing of a block reports failure and FSM sends back the peers of first and second blocks
  405. func (bcR *BlockchainReactor) sendPeerError(err error, peerID p2p.ID) {
  406. bcR.Logger.Info("sendPeerError:", "peer", peerID, "error", err)
  407. msgData := bcFsmMessage{
  408. event: peerErrorEv,
  409. data: bFsmEventData{
  410. peerID: peerID,
  411. err: err,
  412. },
  413. }
  414. bcR.eventsFromFSMCh <- msgData
  415. }
  416. // Implements bcRNotifier
  417. func (bcR *BlockchainReactor) resetStateTimer(name string, timer **time.Timer, timeout time.Duration) {
  418. if timer == nil {
  419. panic("nil timer pointer parameter")
  420. }
  421. if *timer == nil {
  422. *timer = time.AfterFunc(timeout, func() {
  423. msg := bcReactorMessage{
  424. event: stateTimeoutEv,
  425. data: bReactorEventData{
  426. stateName: name,
  427. },
  428. }
  429. bcR.errorsForFSMCh <- msg
  430. })
  431. } else {
  432. (*timer).Reset(timeout)
  433. }
  434. }
  435. //-----------------------------------------------------------------------------
  436. // Messages
  437. // BlockchainMessage is a generic message for this reactor.
  438. type BlockchainMessage interface {
  439. ValidateBasic() error
  440. }
  441. // RegisterBlockchainMessages registers the fast sync messages for amino encoding.
  442. func RegisterBlockchainMessages(cdc *amino.Codec) {
  443. cdc.RegisterInterface((*BlockchainMessage)(nil), nil)
  444. cdc.RegisterConcrete(&bcBlockRequestMessage{}, "tendermint/blockchain/BlockRequest", nil)
  445. cdc.RegisterConcrete(&bcBlockResponseMessage{}, "tendermint/blockchain/BlockResponse", nil)
  446. cdc.RegisterConcrete(&bcNoBlockResponseMessage{}, "tendermint/blockchain/NoBlockResponse", nil)
  447. cdc.RegisterConcrete(&bcStatusResponseMessage{}, "tendermint/blockchain/StatusResponse", nil)
  448. cdc.RegisterConcrete(&bcStatusRequestMessage{}, "tendermint/blockchain/StatusRequest", nil)
  449. }
  450. func decodeMsg(bz []byte) (msg BlockchainMessage, err error) {
  451. if len(bz) > maxMsgSize {
  452. return msg, fmt.Errorf("msg exceeds max size (%d > %d)", len(bz), maxMsgSize)
  453. }
  454. err = cdc.UnmarshalBinaryBare(bz, &msg)
  455. return
  456. }
  457. //-------------------------------------
  458. type bcBlockRequestMessage struct {
  459. Height int64
  460. }
  461. // ValidateBasic performs basic validation.
  462. func (m *bcBlockRequestMessage) ValidateBasic() error {
  463. if m.Height < 0 {
  464. return errors.New("negative Height")
  465. }
  466. return nil
  467. }
  468. func (m *bcBlockRequestMessage) String() string {
  469. return fmt.Sprintf("[bcBlockRequestMessage %v]", m.Height)
  470. }
  471. type bcNoBlockResponseMessage struct {
  472. Height int64
  473. }
  474. // ValidateBasic performs basic validation.
  475. func (m *bcNoBlockResponseMessage) ValidateBasic() error {
  476. if m.Height < 0 {
  477. return errors.New("negative Height")
  478. }
  479. return nil
  480. }
  481. func (m *bcNoBlockResponseMessage) String() string {
  482. return fmt.Sprintf("[bcNoBlockResponseMessage %d]", m.Height)
  483. }
  484. //-------------------------------------
  485. type bcBlockResponseMessage struct {
  486. Block *types.Block
  487. }
  488. // ValidateBasic performs basic validation.
  489. func (m *bcBlockResponseMessage) ValidateBasic() error {
  490. return m.Block.ValidateBasic()
  491. }
  492. func (m *bcBlockResponseMessage) String() string {
  493. return fmt.Sprintf("[bcBlockResponseMessage %v]", m.Block.Height)
  494. }
  495. //-------------------------------------
  496. type bcStatusRequestMessage struct {
  497. Height int64
  498. }
  499. // ValidateBasic performs basic validation.
  500. func (m *bcStatusRequestMessage) ValidateBasic() error {
  501. if m.Height < 0 {
  502. return errors.New("negative Height")
  503. }
  504. return nil
  505. }
  506. func (m *bcStatusRequestMessage) String() string {
  507. return fmt.Sprintf("[bcStatusRequestMessage %v]", m.Height)
  508. }
  509. //-------------------------------------
  510. type bcStatusResponseMessage struct {
  511. Height int64
  512. }
  513. // ValidateBasic performs basic validation.
  514. func (m *bcStatusResponseMessage) ValidateBasic() error {
  515. if m.Height < 0 {
  516. return errors.New("negative Height")
  517. }
  518. return nil
  519. }
  520. func (m *bcStatusResponseMessage) String() string {
  521. return fmt.Sprintf("[bcStatusResponseMessage %v]", m.Height)
  522. }