You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1118 lines
31 KiB

statesync: remove deadlock on init fail (#7029) When statesync is stopped during shutdown, it has the possibility of deadlocking. A dump of goroutines reveals that this is related to the peerUpdates channel not returning anything on its `Done()` channel when `OnStop` is called. As this is occuring, `processPeerUpdate` is attempting to acquire the reactor lock. It appears that this lock can never be acquired. I looked for the places where the lock may remain locked accidentally and cleaned them up in hopes to eradicate the issue. Dumps of the relevant goroutines may be found below. Note that the line numbers below are relative to the code in the `v0.35.0-rc1` tag. ``` goroutine 36 [chan receive]: github.com/tendermint/tendermint/internal/statesync.(*Reactor).OnStop(0xc00058f200) github.com/tendermint/tendermint/internal/statesync/reactor.go:243 +0x117 github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc00058f200, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/node.(*nodeImpl).OnStop(0xc0001ea240) github.com/tendermint/tendermint/node/node.go:769 +0x132 github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc0001ea240, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1() github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62 github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000629500, 0x7fdb52f96358, 0xc0002b5030, 0xc00000daa0) github.com/tendermint/tendermint/libs/os/os.go:26 +0x102 created by github.com/tendermint/tendermint/libs/os.TrapSignal github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6 goroutine 188 [semacquire]: sync.runtime_SemacquireMutex(0xc00026b1cc, 0x0, 0x1) runtime/sema.go:71 +0x47 sync.(*Mutex).lockSlow(0xc00026b1c8) sync/mutex.go:138 +0x105 sync.(*Mutex).Lock(...) sync/mutex.go:81 sync.(*RWMutex).Lock(0xc00026b1c8) sync/rwmutex.go:111 +0x90 github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdate(0xc00026b080, 0xc000650008, 0x28, 0x124de90, 0x4) github.com/tendermint/tendermint/internal/statesync/reactor.go:849 +0x1a5 github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdates(0xc00026b080) github.com/tendermint/tendermint/internal/statesync/reactor.go:883 +0xab created by github.com/tendermint/tendermint/internal/statesync.(*Reactor.OnStart github.com/tendermint/tendermint/internal/statesync/reactor.go:219 +0xcd) ```
3 years ago
statesync: remove deadlock on init fail (#7029) When statesync is stopped during shutdown, it has the possibility of deadlocking. A dump of goroutines reveals that this is related to the peerUpdates channel not returning anything on its `Done()` channel when `OnStop` is called. As this is occuring, `processPeerUpdate` is attempting to acquire the reactor lock. It appears that this lock can never be acquired. I looked for the places where the lock may remain locked accidentally and cleaned them up in hopes to eradicate the issue. Dumps of the relevant goroutines may be found below. Note that the line numbers below are relative to the code in the `v0.35.0-rc1` tag. ``` goroutine 36 [chan receive]: github.com/tendermint/tendermint/internal/statesync.(*Reactor).OnStop(0xc00058f200) github.com/tendermint/tendermint/internal/statesync/reactor.go:243 +0x117 github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc00058f200, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/node.(*nodeImpl).OnStop(0xc0001ea240) github.com/tendermint/tendermint/node/node.go:769 +0x132 github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc0001ea240, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1() github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62 github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000629500, 0x7fdb52f96358, 0xc0002b5030, 0xc00000daa0) github.com/tendermint/tendermint/libs/os/os.go:26 +0x102 created by github.com/tendermint/tendermint/libs/os.TrapSignal github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6 goroutine 188 [semacquire]: sync.runtime_SemacquireMutex(0xc00026b1cc, 0x0, 0x1) runtime/sema.go:71 +0x47 sync.(*Mutex).lockSlow(0xc00026b1c8) sync/mutex.go:138 +0x105 sync.(*Mutex).Lock(...) sync/mutex.go:81 sync.(*RWMutex).Lock(0xc00026b1c8) sync/rwmutex.go:111 +0x90 github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdate(0xc00026b080, 0xc000650008, 0x28, 0x124de90, 0x4) github.com/tendermint/tendermint/internal/statesync/reactor.go:849 +0x1a5 github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdates(0xc00026b080) github.com/tendermint/tendermint/internal/statesync/reactor.go:883 +0xab created by github.com/tendermint/tendermint/internal/statesync.(*Reactor.OnStart github.com/tendermint/tendermint/internal/statesync/reactor.go:219 +0xcd) ```
3 years ago
statesync: remove deadlock on init fail (#7029) When statesync is stopped during shutdown, it has the possibility of deadlocking. A dump of goroutines reveals that this is related to the peerUpdates channel not returning anything on its `Done()` channel when `OnStop` is called. As this is occuring, `processPeerUpdate` is attempting to acquire the reactor lock. It appears that this lock can never be acquired. I looked for the places where the lock may remain locked accidentally and cleaned them up in hopes to eradicate the issue. Dumps of the relevant goroutines may be found below. Note that the line numbers below are relative to the code in the `v0.35.0-rc1` tag. ``` goroutine 36 [chan receive]: github.com/tendermint/tendermint/internal/statesync.(*Reactor).OnStop(0xc00058f200) github.com/tendermint/tendermint/internal/statesync/reactor.go:243 +0x117 github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc00058f200, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/node.(*nodeImpl).OnStop(0xc0001ea240) github.com/tendermint/tendermint/node/node.go:769 +0x132 github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc0001ea240, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1() github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62 github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000629500, 0x7fdb52f96358, 0xc0002b5030, 0xc00000daa0) github.com/tendermint/tendermint/libs/os/os.go:26 +0x102 created by github.com/tendermint/tendermint/libs/os.TrapSignal github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6 goroutine 188 [semacquire]: sync.runtime_SemacquireMutex(0xc00026b1cc, 0x0, 0x1) runtime/sema.go:71 +0x47 sync.(*Mutex).lockSlow(0xc00026b1c8) sync/mutex.go:138 +0x105 sync.(*Mutex).Lock(...) sync/mutex.go:81 sync.(*RWMutex).Lock(0xc00026b1c8) sync/rwmutex.go:111 +0x90 github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdate(0xc00026b080, 0xc000650008, 0x28, 0x124de90, 0x4) github.com/tendermint/tendermint/internal/statesync/reactor.go:849 +0x1a5 github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdates(0xc00026b080) github.com/tendermint/tendermint/internal/statesync/reactor.go:883 +0xab created by github.com/tendermint/tendermint/internal/statesync.(*Reactor.OnStart github.com/tendermint/tendermint/internal/statesync/reactor.go:219 +0xcd) ```
3 years ago
statesync: remove deadlock on init fail (#7029) When statesync is stopped during shutdown, it has the possibility of deadlocking. A dump of goroutines reveals that this is related to the peerUpdates channel not returning anything on its `Done()` channel when `OnStop` is called. As this is occuring, `processPeerUpdate` is attempting to acquire the reactor lock. It appears that this lock can never be acquired. I looked for the places where the lock may remain locked accidentally and cleaned them up in hopes to eradicate the issue. Dumps of the relevant goroutines may be found below. Note that the line numbers below are relative to the code in the `v0.35.0-rc1` tag. ``` goroutine 36 [chan receive]: github.com/tendermint/tendermint/internal/statesync.(*Reactor).OnStop(0xc00058f200) github.com/tendermint/tendermint/internal/statesync/reactor.go:243 +0x117 github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc00058f200, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/node.(*nodeImpl).OnStop(0xc0001ea240) github.com/tendermint/tendermint/node/node.go:769 +0x132 github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc0001ea240, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1() github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62 github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000629500, 0x7fdb52f96358, 0xc0002b5030, 0xc00000daa0) github.com/tendermint/tendermint/libs/os/os.go:26 +0x102 created by github.com/tendermint/tendermint/libs/os.TrapSignal github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6 goroutine 188 [semacquire]: sync.runtime_SemacquireMutex(0xc00026b1cc, 0x0, 0x1) runtime/sema.go:71 +0x47 sync.(*Mutex).lockSlow(0xc00026b1c8) sync/mutex.go:138 +0x105 sync.(*Mutex).Lock(...) sync/mutex.go:81 sync.(*RWMutex).Lock(0xc00026b1c8) sync/rwmutex.go:111 +0x90 github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdate(0xc00026b080, 0xc000650008, 0x28, 0x124de90, 0x4) github.com/tendermint/tendermint/internal/statesync/reactor.go:849 +0x1a5 github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdates(0xc00026b080) github.com/tendermint/tendermint/internal/statesync/reactor.go:883 +0xab created by github.com/tendermint/tendermint/internal/statesync.(*Reactor.OnStart github.com/tendermint/tendermint/internal/statesync/reactor.go:219 +0xcd) ```
3 years ago
  1. package statesync
  2. import (
  3. "bytes"
  4. "context"
  5. "errors"
  6. "fmt"
  7. "reflect"
  8. "runtime/debug"
  9. "sort"
  10. "sync"
  11. "time"
  12. abci "github.com/tendermint/tendermint/abci/types"
  13. "github.com/tendermint/tendermint/config"
  14. "github.com/tendermint/tendermint/internal/eventbus"
  15. "github.com/tendermint/tendermint/internal/p2p"
  16. "github.com/tendermint/tendermint/internal/proxy"
  17. sm "github.com/tendermint/tendermint/internal/state"
  18. "github.com/tendermint/tendermint/internal/store"
  19. "github.com/tendermint/tendermint/libs/log"
  20. "github.com/tendermint/tendermint/libs/service"
  21. "github.com/tendermint/tendermint/light"
  22. "github.com/tendermint/tendermint/light/provider"
  23. ssproto "github.com/tendermint/tendermint/proto/tendermint/statesync"
  24. "github.com/tendermint/tendermint/types"
  25. )
  26. var (
  27. _ service.Service = (*Reactor)(nil)
  28. _ p2p.Wrapper = (*ssproto.Message)(nil)
  29. )
  30. const (
  31. // SnapshotChannel exchanges snapshot metadata
  32. SnapshotChannel = p2p.ChannelID(0x60)
  33. // ChunkChannel exchanges chunk contents
  34. ChunkChannel = p2p.ChannelID(0x61)
  35. // LightBlockChannel exchanges light blocks
  36. LightBlockChannel = p2p.ChannelID(0x62)
  37. // ParamsChannel exchanges consensus params
  38. ParamsChannel = p2p.ChannelID(0x63)
  39. // recentSnapshots is the number of recent snapshots to send and receive per peer.
  40. recentSnapshots = 10
  41. // snapshotMsgSize is the maximum size of a snapshotResponseMessage
  42. snapshotMsgSize = int(4e6) // ~4MB
  43. // chunkMsgSize is the maximum size of a chunkResponseMessage
  44. chunkMsgSize = int(16e6) // ~16MB
  45. // lightBlockMsgSize is the maximum size of a lightBlockResponseMessage
  46. lightBlockMsgSize = int(1e7) // ~1MB
  47. // paramMsgSize is the maximum size of a paramsResponseMessage
  48. paramMsgSize = int(1e5) // ~100kb
  49. // lightBlockResponseTimeout is how long the dispatcher waits for a peer to
  50. // return a light block
  51. lightBlockResponseTimeout = 10 * time.Second
  52. // consensusParamsResponseTimeout is the time the p2p state provider waits
  53. // before performing a secondary call
  54. consensusParamsResponseTimeout = 5 * time.Second
  55. // maxLightBlockRequestRetries is the amount of retries acceptable before
  56. // the backfill process aborts
  57. maxLightBlockRequestRetries = 20
  58. )
  59. func getChannelDescriptors() map[p2p.ChannelID]*p2p.ChannelDescriptor {
  60. return map[p2p.ChannelID]*p2p.ChannelDescriptor{
  61. SnapshotChannel: {
  62. ID: SnapshotChannel,
  63. MessageType: new(ssproto.Message),
  64. Priority: 6,
  65. SendQueueCapacity: 10,
  66. RecvMessageCapacity: snapshotMsgSize,
  67. RecvBufferCapacity: 128,
  68. },
  69. ChunkChannel: {
  70. ID: ChunkChannel,
  71. Priority: 3,
  72. MessageType: new(ssproto.Message),
  73. SendQueueCapacity: 4,
  74. RecvMessageCapacity: chunkMsgSize,
  75. RecvBufferCapacity: 128,
  76. },
  77. LightBlockChannel: {
  78. ID: LightBlockChannel,
  79. MessageType: new(ssproto.Message),
  80. Priority: 5,
  81. SendQueueCapacity: 10,
  82. RecvMessageCapacity: lightBlockMsgSize,
  83. RecvBufferCapacity: 128,
  84. },
  85. ParamsChannel: {
  86. ID: ParamsChannel,
  87. MessageType: new(ssproto.Message),
  88. Priority: 2,
  89. SendQueueCapacity: 10,
  90. RecvMessageCapacity: paramMsgSize,
  91. RecvBufferCapacity: 128,
  92. },
  93. }
  94. }
  95. // Metricer defines an interface used for the rpc sync info query, please see statesync.metrics
  96. // for the details.
  97. type Metricer interface {
  98. TotalSnapshots() int64
  99. ChunkProcessAvgTime() time.Duration
  100. SnapshotHeight() int64
  101. SnapshotChunksCount() int64
  102. SnapshotChunksTotal() int64
  103. BackFilledBlocks() int64
  104. BackFillBlocksTotal() int64
  105. }
  106. // Reactor handles state sync, both restoring snapshots for the local node and
  107. // serving snapshots for other nodes.
  108. type Reactor struct {
  109. service.BaseService
  110. logger log.Logger
  111. chainID string
  112. initialHeight int64
  113. cfg config.StateSyncConfig
  114. stateStore sm.Store
  115. blockStore *store.BlockStore
  116. conn proxy.AppConnSnapshot
  117. connQuery proxy.AppConnQuery
  118. tempDir string
  119. snapshotCh *p2p.Channel
  120. chunkCh *p2p.Channel
  121. blockCh *p2p.Channel
  122. paramsCh *p2p.Channel
  123. peerUpdates *p2p.PeerUpdates
  124. // Dispatcher is used to multiplex light block requests and responses over multiple
  125. // peers used by the p2p state provider and in reverse sync.
  126. dispatcher *Dispatcher
  127. peers *peerList
  128. // These will only be set when a state sync is in progress. It is used to feed
  129. // received snapshots and chunks into the syncer and manage incoming and outgoing
  130. // providers.
  131. mtx sync.RWMutex
  132. syncer *syncer
  133. providers map[types.NodeID]*BlockProvider
  134. stateProvider StateProvider
  135. eventBus *eventbus.EventBus
  136. metrics *Metrics
  137. backfillBlockTotal int64
  138. backfilledBlocks int64
  139. }
  140. // NewReactor returns a reference to a new state sync reactor, which implements
  141. // the service.Service interface. It accepts a logger, connections for snapshots
  142. // and querying, references to p2p Channels and a channel to listen for peer
  143. // updates on. Note, the reactor will close all p2p Channels when stopping.
  144. func NewReactor(
  145. ctx context.Context,
  146. chainID string,
  147. initialHeight int64,
  148. cfg config.StateSyncConfig,
  149. logger log.Logger,
  150. conn proxy.AppConnSnapshot,
  151. connQuery proxy.AppConnQuery,
  152. channelCreator p2p.ChannelCreator,
  153. peerUpdates *p2p.PeerUpdates,
  154. stateStore sm.Store,
  155. blockStore *store.BlockStore,
  156. tempDir string,
  157. ssMetrics *Metrics,
  158. eventBus *eventbus.EventBus,
  159. ) (*Reactor, error) {
  160. chDesc := getChannelDescriptors()
  161. snapshotCh, err := channelCreator(ctx, chDesc[SnapshotChannel])
  162. if err != nil {
  163. return nil, err
  164. }
  165. chunkCh, err := channelCreator(ctx, chDesc[ChunkChannel])
  166. if err != nil {
  167. return nil, err
  168. }
  169. blockCh, err := channelCreator(ctx, chDesc[LightBlockChannel])
  170. if err != nil {
  171. return nil, err
  172. }
  173. paramsCh, err := channelCreator(ctx, chDesc[ParamsChannel])
  174. if err != nil {
  175. return nil, err
  176. }
  177. r := &Reactor{
  178. logger: logger,
  179. chainID: chainID,
  180. initialHeight: initialHeight,
  181. cfg: cfg,
  182. conn: conn,
  183. connQuery: connQuery,
  184. snapshotCh: snapshotCh,
  185. chunkCh: chunkCh,
  186. blockCh: blockCh,
  187. paramsCh: paramsCh,
  188. peerUpdates: peerUpdates,
  189. tempDir: tempDir,
  190. stateStore: stateStore,
  191. blockStore: blockStore,
  192. peers: newPeerList(),
  193. dispatcher: NewDispatcher(blockCh),
  194. providers: make(map[types.NodeID]*BlockProvider),
  195. metrics: ssMetrics,
  196. eventBus: eventBus,
  197. }
  198. r.BaseService = *service.NewBaseService(logger, "StateSync", r)
  199. return r, nil
  200. }
  201. // OnStart starts separate go routines for each p2p Channel and listens for
  202. // envelopes on each. In addition, it also listens for peer updates and handles
  203. // messages on that p2p channel accordingly. Note, we do not launch a go-routine to
  204. // handle individual envelopes as to not have to deal with bounding workers or pools.
  205. // The caller must be sure to execute OnStop to ensure the outbound p2p Channels are
  206. // closed. No error is returned.
  207. func (r *Reactor) OnStart(ctx context.Context) error {
  208. go r.processCh(ctx, r.snapshotCh, "snapshot")
  209. go r.processCh(ctx, r.chunkCh, "chunk")
  210. go r.processCh(ctx, r.blockCh, "light block")
  211. go r.processCh(ctx, r.paramsCh, "consensus params")
  212. go r.processPeerUpdates(ctx)
  213. return nil
  214. }
  215. // OnStop stops the reactor by signaling to all spawned goroutines to exit and
  216. // blocking until they all exit.
  217. func (r *Reactor) OnStop() {
  218. // tell the dispatcher to stop sending any more requests
  219. r.dispatcher.Close()
  220. }
  221. func (r *Reactor) PublishStatus(ctx context.Context, event types.EventDataStateSyncStatus) error {
  222. if r.eventBus == nil {
  223. return errors.New("event system is not configured")
  224. }
  225. return r.eventBus.PublishEventStateSyncStatus(ctx, event)
  226. }
  227. // Sync runs a state sync, fetching snapshots and providing chunks to the
  228. // application. At the close of the operation, Sync will bootstrap the state
  229. // store and persist the commit at that height so that either consensus or
  230. // blocksync can commence. It will then proceed to backfill the necessary amount
  231. // of historical blocks before participating in consensus
  232. func (r *Reactor) Sync(ctx context.Context) (sm.State, error) {
  233. // We need at least two peers (for cross-referencing of light blocks) before we can
  234. // begin state sync
  235. if err := r.waitForEnoughPeers(ctx, 2); err != nil {
  236. return sm.State{}, err
  237. }
  238. r.mtx.Lock()
  239. if r.syncer != nil {
  240. r.mtx.Unlock()
  241. return sm.State{}, errors.New("a state sync is already in progress")
  242. }
  243. if err := r.initStateProvider(ctx, r.chainID, r.initialHeight); err != nil {
  244. r.mtx.Unlock()
  245. return sm.State{}, err
  246. }
  247. r.syncer = newSyncer(
  248. r.cfg,
  249. r.logger,
  250. r.conn,
  251. r.connQuery,
  252. r.stateProvider,
  253. r.snapshotCh,
  254. r.chunkCh,
  255. r.tempDir,
  256. r.metrics,
  257. )
  258. r.mtx.Unlock()
  259. defer func() {
  260. r.mtx.Lock()
  261. // reset syncing objects at the close of Sync
  262. r.syncer = nil
  263. r.stateProvider = nil
  264. r.mtx.Unlock()
  265. }()
  266. requestSnapshotsHook := func() error {
  267. // request snapshots from all currently connected peers
  268. return r.snapshotCh.Send(ctx, p2p.Envelope{
  269. Broadcast: true,
  270. Message: &ssproto.SnapshotsRequest{},
  271. })
  272. }
  273. state, commit, err := r.syncer.SyncAny(ctx, r.cfg.DiscoveryTime, requestSnapshotsHook)
  274. if err != nil {
  275. return sm.State{}, err
  276. }
  277. err = r.stateStore.Bootstrap(state)
  278. if err != nil {
  279. return sm.State{}, fmt.Errorf("failed to bootstrap node with new state: %w", err)
  280. }
  281. err = r.blockStore.SaveSeenCommit(state.LastBlockHeight, commit)
  282. if err != nil {
  283. return sm.State{}, fmt.Errorf("failed to store last seen commit: %w", err)
  284. }
  285. err = r.Backfill(ctx, state)
  286. if err != nil {
  287. r.logger.Error("backfill failed. Proceeding optimistically...", "err", err)
  288. }
  289. return state, nil
  290. }
  291. // Backfill sequentially fetches, verifies and stores light blocks in reverse
  292. // order. It does not stop verifying blocks until reaching a block with a height
  293. // and time that is less or equal to the stopHeight and stopTime. The
  294. // trustedBlockID should be of the header at startHeight.
  295. func (r *Reactor) Backfill(ctx context.Context, state sm.State) error {
  296. params := state.ConsensusParams.Evidence
  297. stopHeight := state.LastBlockHeight - params.MaxAgeNumBlocks
  298. stopTime := state.LastBlockTime.Add(-params.MaxAgeDuration)
  299. // ensure that stop height doesn't go below the initial height
  300. if stopHeight < state.InitialHeight {
  301. stopHeight = state.InitialHeight
  302. // this essentially makes stop time a void criteria for termination
  303. stopTime = state.LastBlockTime
  304. }
  305. return r.backfill(
  306. ctx,
  307. state.ChainID,
  308. state.LastBlockHeight,
  309. stopHeight,
  310. state.InitialHeight,
  311. state.LastBlockID,
  312. stopTime,
  313. )
  314. }
  315. func (r *Reactor) backfill(
  316. ctx context.Context,
  317. chainID string,
  318. startHeight, stopHeight, initialHeight int64,
  319. trustedBlockID types.BlockID,
  320. stopTime time.Time,
  321. ) error {
  322. r.logger.Info("starting backfill process...", "startHeight", startHeight,
  323. "stopHeight", stopHeight, "stopTime", stopTime, "trustedBlockID", trustedBlockID)
  324. r.backfillBlockTotal = startHeight - stopHeight + 1
  325. r.metrics.BackFillBlocksTotal.Set(float64(r.backfillBlockTotal))
  326. const sleepTime = 1 * time.Second
  327. var (
  328. lastValidatorSet *types.ValidatorSet
  329. lastChangeHeight = startHeight
  330. )
  331. queue := newBlockQueue(startHeight, stopHeight, initialHeight, stopTime, maxLightBlockRequestRetries)
  332. // fetch light blocks across four workers. The aim with deploying concurrent
  333. // workers is to equate the network messaging time with the verification
  334. // time. Ideally we want the verification process to never have to be
  335. // waiting on blocks. If it takes 4s to retrieve a block and 1s to verify
  336. // it, then steady state involves four workers.
  337. for i := 0; i < int(r.cfg.Fetchers); i++ {
  338. ctxWithCancel, cancel := context.WithCancel(ctx)
  339. defer cancel()
  340. go func() {
  341. for {
  342. select {
  343. case <-ctx.Done():
  344. return
  345. case height := <-queue.nextHeight():
  346. // pop the next peer of the list to send a request to
  347. peer := r.peers.Pop(ctx)
  348. r.logger.Debug("fetching next block", "height", height, "peer", peer)
  349. subCtx, cancel := context.WithTimeout(ctxWithCancel, lightBlockResponseTimeout)
  350. defer cancel()
  351. lb, err := func() (*types.LightBlock, error) {
  352. defer cancel()
  353. // request the light block with a timeout
  354. return r.dispatcher.LightBlock(subCtx, height, peer)
  355. }()
  356. // once the peer has returned a value, add it back to the peer list to be used again
  357. r.peers.Append(peer)
  358. if errors.Is(err, context.Canceled) {
  359. return
  360. }
  361. if err != nil {
  362. queue.retry(height)
  363. if errors.Is(err, errNoConnectedPeers) {
  364. r.logger.Info("backfill: no connected peers to fetch light blocks from; sleeping...",
  365. "sleepTime", sleepTime)
  366. time.Sleep(sleepTime)
  367. } else {
  368. // we don't punish the peer as it might just have not responded in time
  369. r.logger.Info("backfill: error with fetching light block",
  370. "height", height, "err", err)
  371. }
  372. continue
  373. }
  374. if lb == nil {
  375. r.logger.Info("backfill: peer didn't have block, fetching from another peer", "height", height)
  376. queue.retry(height)
  377. // As we are fetching blocks backwards, if this node doesn't have the block it likely doesn't
  378. // have any prior ones, thus we remove it from the peer list.
  379. r.peers.Remove(peer)
  380. continue
  381. }
  382. // run a validate basic. This checks the validator set and commit
  383. // hashes line up
  384. err = lb.ValidateBasic(chainID)
  385. if err != nil || lb.Height != height {
  386. r.logger.Info("backfill: fetched light block failed validate basic, removing peer...",
  387. "err", err, "height", height)
  388. queue.retry(height)
  389. if serr := r.blockCh.SendError(ctx, p2p.PeerError{
  390. NodeID: peer,
  391. Err: fmt.Errorf("received invalid light block: %w", err),
  392. }); serr != nil {
  393. return
  394. }
  395. continue
  396. }
  397. // add block to queue to be verified
  398. queue.add(lightBlockResponse{
  399. block: lb,
  400. peer: peer,
  401. })
  402. r.logger.Debug("backfill: added light block to processing queue", "height", height)
  403. case <-queue.done():
  404. return
  405. }
  406. }
  407. }()
  408. }
  409. // verify all light blocks
  410. for {
  411. select {
  412. case <-ctx.Done():
  413. queue.close()
  414. return nil
  415. case resp := <-queue.verifyNext():
  416. // validate the header hash. We take the last block id of the
  417. // previous header (i.e. one height above) as the trusted hash which
  418. // we equate to. ValidatorsHash and CommitHash have already been
  419. // checked in the `ValidateBasic`
  420. if w, g := trustedBlockID.Hash, resp.block.Hash(); !bytes.Equal(w, g) {
  421. r.logger.Info("received invalid light block. header hash doesn't match trusted LastBlockID",
  422. "trustedHash", w, "receivedHash", g, "height", resp.block.Height)
  423. if err := r.blockCh.SendError(ctx, p2p.PeerError{
  424. NodeID: resp.peer,
  425. Err: fmt.Errorf("received invalid light block. Expected hash %v, got: %v", w, g),
  426. }); err != nil {
  427. return nil
  428. }
  429. queue.retry(resp.block.Height)
  430. continue
  431. }
  432. // save the signed headers
  433. if err := r.blockStore.SaveSignedHeader(resp.block.SignedHeader, trustedBlockID); err != nil {
  434. return err
  435. }
  436. // check if there has been a change in the validator set
  437. if lastValidatorSet != nil && !bytes.Equal(resp.block.Header.ValidatorsHash, resp.block.Header.NextValidatorsHash) {
  438. // save all the heights that the last validator set was the same
  439. if err := r.stateStore.SaveValidatorSets(resp.block.Height+1, lastChangeHeight, lastValidatorSet); err != nil {
  440. return err
  441. }
  442. // update the lastChangeHeight
  443. lastChangeHeight = resp.block.Height
  444. }
  445. trustedBlockID = resp.block.LastBlockID
  446. queue.success()
  447. r.logger.Info("backfill: verified and stored light block", "height", resp.block.Height)
  448. lastValidatorSet = resp.block.ValidatorSet
  449. r.backfilledBlocks++
  450. r.metrics.BackFilledBlocks.Add(1)
  451. // The block height might be less than the stopHeight because of the stopTime condition
  452. // hasn't been fulfilled.
  453. if resp.block.Height < stopHeight {
  454. r.backfillBlockTotal++
  455. r.metrics.BackFillBlocksTotal.Set(float64(r.backfillBlockTotal))
  456. }
  457. case <-queue.done():
  458. if err := queue.error(); err != nil {
  459. return err
  460. }
  461. // save the final batch of validators
  462. if err := r.stateStore.SaveValidatorSets(queue.terminal.Height, lastChangeHeight, lastValidatorSet); err != nil {
  463. return err
  464. }
  465. r.logger.Info("successfully completed backfill process", "endHeight", queue.terminal.Height)
  466. return nil
  467. }
  468. }
  469. }
  470. // handleSnapshotMessage handles envelopes sent from peers on the
  471. // SnapshotChannel. It returns an error only if the Envelope.Message is unknown
  472. // for this channel. This should never be called outside of handleMessage.
  473. func (r *Reactor) handleSnapshotMessage(ctx context.Context, envelope *p2p.Envelope) error {
  474. logger := r.logger.With("peer", envelope.From)
  475. switch msg := envelope.Message.(type) {
  476. case *ssproto.SnapshotsRequest:
  477. snapshots, err := r.recentSnapshots(ctx, recentSnapshots)
  478. if err != nil {
  479. logger.Error("failed to fetch snapshots", "err", err)
  480. return nil
  481. }
  482. for _, snapshot := range snapshots {
  483. logger.Info(
  484. "advertising snapshot",
  485. "height", snapshot.Height,
  486. "format", snapshot.Format,
  487. "peer", envelope.From,
  488. )
  489. if err := r.snapshotCh.Send(ctx, p2p.Envelope{
  490. To: envelope.From,
  491. Message: &ssproto.SnapshotsResponse{
  492. Height: snapshot.Height,
  493. Format: snapshot.Format,
  494. Chunks: snapshot.Chunks,
  495. Hash: snapshot.Hash,
  496. Metadata: snapshot.Metadata,
  497. },
  498. }); err != nil {
  499. return err
  500. }
  501. }
  502. case *ssproto.SnapshotsResponse:
  503. r.mtx.RLock()
  504. defer r.mtx.RUnlock()
  505. if r.syncer == nil {
  506. logger.Debug("received unexpected snapshot; no state sync in progress")
  507. return nil
  508. }
  509. logger.Info("received snapshot", "height", msg.Height, "format", msg.Format)
  510. _, err := r.syncer.AddSnapshot(envelope.From, &snapshot{
  511. Height: msg.Height,
  512. Format: msg.Format,
  513. Chunks: msg.Chunks,
  514. Hash: msg.Hash,
  515. Metadata: msg.Metadata,
  516. })
  517. if err != nil {
  518. logger.Error(
  519. "failed to add snapshot",
  520. "height", msg.Height,
  521. "format", msg.Format,
  522. "err", err,
  523. "channel", r.snapshotCh.ID,
  524. )
  525. return nil
  526. }
  527. logger.Info("added snapshot", "height", msg.Height, "format", msg.Format)
  528. default:
  529. return fmt.Errorf("received unknown message: %T", msg)
  530. }
  531. return nil
  532. }
  533. // handleChunkMessage handles envelopes sent from peers on the ChunkChannel.
  534. // It returns an error only if the Envelope.Message is unknown for this channel.
  535. // This should never be called outside of handleMessage.
  536. func (r *Reactor) handleChunkMessage(ctx context.Context, envelope *p2p.Envelope) error {
  537. switch msg := envelope.Message.(type) {
  538. case *ssproto.ChunkRequest:
  539. r.logger.Debug(
  540. "received chunk request",
  541. "height", msg.Height,
  542. "format", msg.Format,
  543. "chunk", msg.Index,
  544. "peer", envelope.From,
  545. )
  546. resp, err := r.conn.LoadSnapshotChunk(ctx, abci.RequestLoadSnapshotChunk{
  547. Height: msg.Height,
  548. Format: msg.Format,
  549. Chunk: msg.Index,
  550. })
  551. if err != nil {
  552. r.logger.Error(
  553. "failed to load chunk",
  554. "height", msg.Height,
  555. "format", msg.Format,
  556. "chunk", msg.Index,
  557. "err", err,
  558. "peer", envelope.From,
  559. )
  560. return nil
  561. }
  562. r.logger.Debug(
  563. "sending chunk",
  564. "height", msg.Height,
  565. "format", msg.Format,
  566. "chunk", msg.Index,
  567. "peer", envelope.From,
  568. )
  569. if err := r.chunkCh.Send(ctx, p2p.Envelope{
  570. To: envelope.From,
  571. Message: &ssproto.ChunkResponse{
  572. Height: msg.Height,
  573. Format: msg.Format,
  574. Index: msg.Index,
  575. Chunk: resp.Chunk,
  576. Missing: resp.Chunk == nil,
  577. },
  578. }); err != nil {
  579. return err
  580. }
  581. case *ssproto.ChunkResponse:
  582. r.mtx.RLock()
  583. defer r.mtx.RUnlock()
  584. if r.syncer == nil {
  585. r.logger.Debug("received unexpected chunk; no state sync in progress", "peer", envelope.From)
  586. return nil
  587. }
  588. r.logger.Debug(
  589. "received chunk; adding to sync",
  590. "height", msg.Height,
  591. "format", msg.Format,
  592. "chunk", msg.Index,
  593. "peer", envelope.From,
  594. )
  595. _, err := r.syncer.AddChunk(&chunk{
  596. Height: msg.Height,
  597. Format: msg.Format,
  598. Index: msg.Index,
  599. Chunk: msg.Chunk,
  600. Sender: envelope.From,
  601. })
  602. if err != nil {
  603. r.logger.Error(
  604. "failed to add chunk",
  605. "height", msg.Height,
  606. "format", msg.Format,
  607. "chunk", msg.Index,
  608. "err", err,
  609. "peer", envelope.From,
  610. )
  611. return nil
  612. }
  613. default:
  614. return fmt.Errorf("received unknown message: %T", msg)
  615. }
  616. return nil
  617. }
  618. func (r *Reactor) handleLightBlockMessage(ctx context.Context, envelope *p2p.Envelope) error {
  619. switch msg := envelope.Message.(type) {
  620. case *ssproto.LightBlockRequest:
  621. r.logger.Info("received light block request", "height", msg.Height)
  622. lb, err := r.fetchLightBlock(msg.Height)
  623. if err != nil {
  624. r.logger.Error("failed to retrieve light block", "err", err, "height", msg.Height)
  625. return err
  626. }
  627. if lb == nil {
  628. if err := r.blockCh.Send(ctx, p2p.Envelope{
  629. To: envelope.From,
  630. Message: &ssproto.LightBlockResponse{
  631. LightBlock: nil,
  632. },
  633. }); err != nil {
  634. return err
  635. }
  636. return nil
  637. }
  638. lbproto, err := lb.ToProto()
  639. if err != nil {
  640. r.logger.Error("marshaling light block to proto", "err", err)
  641. return nil
  642. }
  643. // NOTE: If we don't have the light block we will send a nil light block
  644. // back to the requested node, indicating that we don't have it.
  645. if err := r.blockCh.Send(ctx, p2p.Envelope{
  646. To: envelope.From,
  647. Message: &ssproto.LightBlockResponse{
  648. LightBlock: lbproto,
  649. },
  650. }); err != nil {
  651. return err
  652. }
  653. case *ssproto.LightBlockResponse:
  654. var height int64
  655. if msg.LightBlock != nil {
  656. height = msg.LightBlock.SignedHeader.Header.Height
  657. }
  658. r.logger.Info("received light block response", "peer", envelope.From, "height", height)
  659. if err := r.dispatcher.Respond(ctx, msg.LightBlock, envelope.From); err != nil {
  660. if errors.Is(err, context.Canceled) {
  661. return err
  662. }
  663. r.logger.Error("error processing light block response", "err", err, "height", height)
  664. }
  665. default:
  666. return fmt.Errorf("received unknown message: %T", msg)
  667. }
  668. return nil
  669. }
  670. func (r *Reactor) handleParamsMessage(ctx context.Context, envelope *p2p.Envelope) error {
  671. switch msg := envelope.Message.(type) {
  672. case *ssproto.ParamsRequest:
  673. r.logger.Debug("received consensus params request", "height", msg.Height)
  674. cp, err := r.stateStore.LoadConsensusParams(int64(msg.Height))
  675. if err != nil {
  676. r.logger.Error("failed to fetch requested consensus params", "err", err, "height", msg.Height)
  677. return nil
  678. }
  679. cpproto := cp.ToProto()
  680. if err := r.paramsCh.Send(ctx, p2p.Envelope{
  681. To: envelope.From,
  682. Message: &ssproto.ParamsResponse{
  683. Height: msg.Height,
  684. ConsensusParams: cpproto,
  685. },
  686. }); err != nil {
  687. return err
  688. }
  689. case *ssproto.ParamsResponse:
  690. r.mtx.RLock()
  691. defer r.mtx.RUnlock()
  692. r.logger.Debug("received consensus params response", "height", msg.Height)
  693. cp := types.ConsensusParamsFromProto(msg.ConsensusParams)
  694. if sp, ok := r.stateProvider.(*stateProviderP2P); ok {
  695. select {
  696. case sp.paramsRecvCh <- cp:
  697. case <-time.After(time.Second):
  698. return errors.New("failed to send consensus params, stateprovider not ready for response")
  699. }
  700. } else {
  701. r.logger.Debug("received unexpected params response; using RPC state provider", "peer", envelope.From)
  702. }
  703. default:
  704. return fmt.Errorf("received unknown message: %T", msg)
  705. }
  706. return nil
  707. }
  708. // handleMessage handles an Envelope sent from a peer on a specific p2p Channel.
  709. // It will handle errors and any possible panics gracefully. A caller can handle
  710. // any error returned by sending a PeerError on the respective channel.
  711. func (r *Reactor) handleMessage(ctx context.Context, chID p2p.ChannelID, envelope *p2p.Envelope) (err error) {
  712. defer func() {
  713. if e := recover(); e != nil {
  714. err = fmt.Errorf("panic in processing message: %v", e)
  715. r.logger.Error(
  716. "recovering from processing message panic",
  717. "err", err,
  718. "stack", string(debug.Stack()),
  719. )
  720. }
  721. }()
  722. r.logger.Debug("received message", "message", reflect.TypeOf(envelope.Message), "peer", envelope.From)
  723. switch chID {
  724. case SnapshotChannel:
  725. err = r.handleSnapshotMessage(ctx, envelope)
  726. case ChunkChannel:
  727. err = r.handleChunkMessage(ctx, envelope)
  728. case LightBlockChannel:
  729. err = r.handleLightBlockMessage(ctx, envelope)
  730. case ParamsChannel:
  731. err = r.handleParamsMessage(ctx, envelope)
  732. default:
  733. err = fmt.Errorf("unknown channel ID (%d) for envelope (%v)", chID, envelope)
  734. }
  735. return err
  736. }
  737. // processCh routes state sync messages to their respective handlers. Any error
  738. // encountered during message execution will result in a PeerError being sent on
  739. // the respective channel. When the reactor is stopped, we will catch the signal
  740. // and close the p2p Channel gracefully.
  741. func (r *Reactor) processCh(ctx context.Context, ch *p2p.Channel, chName string) {
  742. iter := ch.Receive(ctx)
  743. for iter.Next(ctx) {
  744. envelope := iter.Envelope()
  745. if err := r.handleMessage(ctx, ch.ID, envelope); err != nil {
  746. r.logger.Error("failed to process message",
  747. "err", err,
  748. "channel", chName,
  749. "ch_id", ch.ID,
  750. "envelope", envelope)
  751. if serr := ch.SendError(ctx, p2p.PeerError{
  752. NodeID: envelope.From,
  753. Err: err,
  754. }); serr != nil {
  755. return
  756. }
  757. }
  758. }
  759. }
  760. // processPeerUpdate processes a PeerUpdate, returning an error upon failing to
  761. // handle the PeerUpdate or if a panic is recovered.
  762. func (r *Reactor) processPeerUpdate(ctx context.Context, peerUpdate p2p.PeerUpdate) {
  763. r.logger.Info("received peer update", "peer", peerUpdate.NodeID, "status", peerUpdate.Status)
  764. switch peerUpdate.Status {
  765. case p2p.PeerStatusUp:
  766. if peerUpdate.Channels.Contains(SnapshotChannel) &&
  767. peerUpdate.Channels.Contains(ChunkChannel) &&
  768. peerUpdate.Channels.Contains(LightBlockChannel) &&
  769. peerUpdate.Channels.Contains(ParamsChannel) {
  770. r.peers.Append(peerUpdate.NodeID)
  771. } else {
  772. r.logger.Error("could not use peer for statesync", "peer", peerUpdate.NodeID)
  773. }
  774. case p2p.PeerStatusDown:
  775. r.peers.Remove(peerUpdate.NodeID)
  776. }
  777. r.mtx.Lock()
  778. defer r.mtx.Unlock()
  779. if r.syncer == nil {
  780. return
  781. }
  782. switch peerUpdate.Status {
  783. case p2p.PeerStatusUp:
  784. newProvider := NewBlockProvider(peerUpdate.NodeID, r.chainID, r.dispatcher)
  785. r.providers[peerUpdate.NodeID] = newProvider
  786. err := r.syncer.AddPeer(ctx, peerUpdate.NodeID)
  787. if err != nil {
  788. r.logger.Error("error adding peer to syncer", "error", err)
  789. return
  790. }
  791. if sp, ok := r.stateProvider.(*stateProviderP2P); ok {
  792. // we do this in a separate routine to not block whilst waiting for the light client to finish
  793. // whatever call it's currently executing
  794. go sp.addProvider(newProvider)
  795. }
  796. case p2p.PeerStatusDown:
  797. delete(r.providers, peerUpdate.NodeID)
  798. r.syncer.RemovePeer(peerUpdate.NodeID)
  799. }
  800. r.logger.Info("processed peer update", "peer", peerUpdate.NodeID, "status", peerUpdate.Status)
  801. }
  802. // processPeerUpdates initiates a blocking process where we listen for and handle
  803. // PeerUpdate messages. When the reactor is stopped, we will catch the signal and
  804. // close the p2p PeerUpdatesCh gracefully.
  805. func (r *Reactor) processPeerUpdates(ctx context.Context) {
  806. for {
  807. select {
  808. case <-ctx.Done():
  809. return
  810. case peerUpdate := <-r.peerUpdates.Updates():
  811. r.processPeerUpdate(ctx, peerUpdate)
  812. }
  813. }
  814. }
  815. // recentSnapshots fetches the n most recent snapshots from the app
  816. func (r *Reactor) recentSnapshots(ctx context.Context, n uint32) ([]*snapshot, error) {
  817. resp, err := r.conn.ListSnapshots(ctx, abci.RequestListSnapshots{})
  818. if err != nil {
  819. return nil, err
  820. }
  821. sort.Slice(resp.Snapshots, func(i, j int) bool {
  822. a := resp.Snapshots[i]
  823. b := resp.Snapshots[j]
  824. switch {
  825. case a.Height > b.Height:
  826. return true
  827. case a.Height == b.Height && a.Format > b.Format:
  828. return true
  829. default:
  830. return false
  831. }
  832. })
  833. snapshots := make([]*snapshot, 0, n)
  834. for i, s := range resp.Snapshots {
  835. if i >= recentSnapshots {
  836. break
  837. }
  838. snapshots = append(snapshots, &snapshot{
  839. Height: s.Height,
  840. Format: s.Format,
  841. Chunks: s.Chunks,
  842. Hash: s.Hash,
  843. Metadata: s.Metadata,
  844. })
  845. }
  846. return snapshots, nil
  847. }
  848. // fetchLightBlock works out whether the node has a light block at a particular
  849. // height and if so returns it so it can be gossiped to peers
  850. func (r *Reactor) fetchLightBlock(height uint64) (*types.LightBlock, error) {
  851. h := int64(height)
  852. blockMeta := r.blockStore.LoadBlockMeta(h)
  853. if blockMeta == nil {
  854. return nil, nil
  855. }
  856. commit := r.blockStore.LoadBlockCommit(h)
  857. if commit == nil {
  858. return nil, nil
  859. }
  860. vals, err := r.stateStore.LoadValidators(h)
  861. if err != nil {
  862. return nil, err
  863. }
  864. if vals == nil {
  865. return nil, nil
  866. }
  867. return &types.LightBlock{
  868. SignedHeader: &types.SignedHeader{
  869. Header: &blockMeta.Header,
  870. Commit: commit,
  871. },
  872. ValidatorSet: vals,
  873. }, nil
  874. }
  875. func (r *Reactor) waitForEnoughPeers(ctx context.Context, numPeers int) error {
  876. startAt := time.Now()
  877. t := time.NewTicker(100 * time.Millisecond)
  878. defer t.Stop()
  879. logT := time.NewTicker(time.Minute)
  880. defer logT.Stop()
  881. var iter int
  882. for r.peers.Len() < numPeers {
  883. iter++
  884. select {
  885. case <-ctx.Done():
  886. return fmt.Errorf("operation canceled while waiting for peers after %.2fs [%d/%d]",
  887. time.Since(startAt).Seconds(), r.peers.Len(), numPeers)
  888. case <-t.C:
  889. continue
  890. case <-logT.C:
  891. r.logger.Info("waiting for sufficient peers to start statesync",
  892. "duration", time.Since(startAt).String(),
  893. "target", numPeers,
  894. "peers", r.peers.Len(),
  895. "iters", iter,
  896. )
  897. continue
  898. }
  899. }
  900. return nil
  901. }
  902. func (r *Reactor) initStateProvider(ctx context.Context, chainID string, initialHeight int64) error {
  903. var err error
  904. to := light.TrustOptions{
  905. Period: r.cfg.TrustPeriod,
  906. Height: r.cfg.TrustHeight,
  907. Hash: r.cfg.TrustHashBytes(),
  908. }
  909. spLogger := r.logger.With("module", "stateprovider")
  910. spLogger.Info("initializing state provider", "trustPeriod", to.Period,
  911. "trustHeight", to.Height, "useP2P", r.cfg.UseP2P)
  912. if r.cfg.UseP2P {
  913. if err := r.waitForEnoughPeers(ctx, 2); err != nil {
  914. return err
  915. }
  916. peers := r.peers.All()
  917. providers := make([]provider.Provider, len(peers))
  918. for idx, p := range peers {
  919. providers[idx] = NewBlockProvider(p, chainID, r.dispatcher)
  920. }
  921. r.stateProvider, err = NewP2PStateProvider(ctx, chainID, initialHeight, providers, to, r.paramsCh, spLogger)
  922. if err != nil {
  923. return fmt.Errorf("failed to initialize P2P state provider: %w", err)
  924. }
  925. } else {
  926. r.stateProvider, err = NewRPCStateProvider(ctx, chainID, initialHeight, r.cfg.RPCServers, to, spLogger)
  927. if err != nil {
  928. return fmt.Errorf("failed to initialize RPC state provider: %w", err)
  929. }
  930. }
  931. return nil
  932. }
  933. func (r *Reactor) TotalSnapshots() int64 {
  934. r.mtx.RLock()
  935. defer r.mtx.RUnlock()
  936. if r.syncer != nil && r.syncer.snapshots != nil {
  937. return int64(len(r.syncer.snapshots.snapshots))
  938. }
  939. return 0
  940. }
  941. func (r *Reactor) ChunkProcessAvgTime() time.Duration {
  942. r.mtx.RLock()
  943. defer r.mtx.RUnlock()
  944. if r.syncer != nil {
  945. return time.Duration(r.syncer.avgChunkTime)
  946. }
  947. return time.Duration(0)
  948. }
  949. func (r *Reactor) SnapshotHeight() int64 {
  950. r.mtx.RLock()
  951. defer r.mtx.RUnlock()
  952. if r.syncer != nil {
  953. return r.syncer.lastSyncedSnapshotHeight
  954. }
  955. return 0
  956. }
  957. func (r *Reactor) SnapshotChunksCount() int64 {
  958. r.mtx.RLock()
  959. defer r.mtx.RUnlock()
  960. if r.syncer != nil && r.syncer.chunks != nil {
  961. return int64(r.syncer.chunks.numChunksReturned())
  962. }
  963. return 0
  964. }
  965. func (r *Reactor) SnapshotChunksTotal() int64 {
  966. r.mtx.RLock()
  967. defer r.mtx.RUnlock()
  968. if r.syncer != nil && r.syncer.processingSnapshot != nil {
  969. return int64(r.syncer.processingSnapshot.Chunks)
  970. }
  971. return 0
  972. }
  973. func (r *Reactor) BackFilledBlocks() int64 {
  974. r.mtx.RLock()
  975. defer r.mtx.RUnlock()
  976. return r.backfilledBlocks
  977. }
  978. func (r *Reactor) BackFillBlocksTotal() int64 {
  979. r.mtx.RLock()
  980. defer r.mtx.RUnlock()
  981. return r.backfillBlockTotal
  982. }