You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

699 lines
31 KiB

  1. ---
  2. order: 2
  3. title: Applications
  4. ---
  5. # Applications
  6. Please ensure you've first read the spec for [ABCI Methods and Types](abci.md)
  7. Here we cover the following components of ABCI applications:
  8. - [Connection State](#state) - the interplay between ABCI connections and application state
  9. and the differences between `CheckTx` and `DeliverTx`.
  10. - [Transaction Results](#transaction-results) - rules around transaction
  11. results and validity
  12. - [Validator Set Updates](#validator-updates) - how validator sets are
  13. changed during `InitChain` and `EndBlock`
  14. - [Query](#query) - standards for using the `Query` method and proofs about the
  15. application state
  16. - [Crash Recovery](#crash-recovery) - handshake protocol to synchronize
  17. Tendermint and the application on startup.
  18. - [State Sync](#state-sync) - rapid bootstrapping of new nodes by restoring state machine snapshots
  19. ## State
  20. Since Tendermint maintains four concurrent ABCI connections, it is typical
  21. for an application to maintain a distinct state for each, and for the states to
  22. be synchronized during `Commit`.
  23. ### Concurrency
  24. In principle, each of the four ABCI connections operate concurrently with one
  25. another. This means applications need to ensure access to state is
  26. thread safe. In practice, both the
  27. [default in-process ABCI client](https://github.com/tendermint/tendermint/blob/v0.34.4/abci/client/local_client.go#L18)
  28. and the
  29. [default Go ABCI
  30. server](https://github.com/tendermint/tendermint/blob/v0.34.4/abci/server/socket_server.go#L32)
  31. use global locks across all connections, so they are not
  32. concurrent at all. This means if your app is written in Go, and compiled in-process with Tendermint
  33. using the default `NewLocalClient`, or run out-of-process using the default `SocketServer`,
  34. ABCI messages from all connections will be linearizable (received one at a
  35. time).
  36. The existence of this global mutex means Go application developers can get
  37. thread safety for application state by routing *all* reads and writes through the ABCI
  38. system. Thus it may be *unsafe* to expose application state directly to an RPC
  39. interface, and unless explicit measures are taken, all queries should be routed through the ABCI Query method.
  40. ### BeginBlock
  41. The BeginBlock request can be used to run some code at the beginning of
  42. every block. It also allows Tendermint to send the current block hash
  43. and header to the application, before it sends any of the transactions.
  44. The app should remember the latest height and header (ie. from which it
  45. has run a successful Commit) so that it can tell Tendermint where to
  46. pick up from when it restarts. See information on the Handshake, below.
  47. ### Commit
  48. Application state should only be persisted to disk during `Commit`.
  49. Before `Commit` is called, Tendermint locks and flushes the mempool so that no new messages will
  50. be received on the mempool connection. This provides an opportunity to safely update all four connection
  51. states to the latest committed state at once.
  52. When `Commit` completes, it unlocks the mempool.
  53. WARNING: if the ABCI app logic processing the `Commit` message sends a
  54. `/broadcast_tx_sync` or `/broadcast_tx_commit` and waits for the response
  55. before proceeding, it will deadlock. Executing those `broadcast_tx` calls
  56. involves acquiring a lock that is held during the `Commit` call, so it's not
  57. possible. If you make the call to the `broadcast_tx` endpoints concurrently,
  58. that's no problem, it just can't be part of the sequential logic of the
  59. `Commit` function.
  60. ### Consensus Connection
  61. The Consensus Connection should maintain a `DeliverTxState` -
  62. the working state for block execution. It should be updated by the calls to
  63. `BeginBlock`, `DeliverTx`, and `EndBlock` during block execution and committed to
  64. disk as the "latest committed state" during `Commit`.
  65. Updates made to the DeliverTxState by each method call must be readable by each subsequent method -
  66. ie. the updates are linearizable.
  67. - [BeginBlock](#beginblock)
  68. - [EndBlock](#endblock)
  69. - [Deliver Tx](#delivertx)
  70. - [Commit](#commit)
  71. ### Mempool Connection
  72. The mempool Connection should maintain a `CheckTxState`
  73. to sequentially process pending transactions in the mempool that have
  74. not yet been committed. It should be initialized to the latest committed state
  75. at the end of every `Commit`.
  76. The CheckTxState may be updated concurrently with the DeliverTxState, as
  77. messages may be sent concurrently on the Consensus and Mempool connections. However,
  78. before calling `Commit`, Tendermint will lock and flush the mempool connection,
  79. ensuring that all existing CheckTx are responded to and no new ones can
  80. begin.
  81. After `Commit`, CheckTx is run again on all transactions that remain in the
  82. node's local mempool after filtering those included in the block. To prevent the
  83. mempool from rechecking all transactions every time a block is committed, set
  84. the configuration option `mempool.recheck=false`. As of Tendermint v0.32.1,
  85. an additional `Type` parameter is made available to the CheckTx function that
  86. indicates whether an incoming transaction is new (`CheckTxType_New`), or a
  87. recheck (`CheckTxType_Recheck`).
  88. Finally, the mempool will unlock and new transactions can be processed through CheckTx again.
  89. Note that CheckTx doesn't have to check everything that affects transaction validity; the
  90. expensive things can be skipped. In fact, CheckTx doesn't have to check
  91. anything; it might say that any transaction is a valid transaction.
  92. Unlike DeliverTx, CheckTx is just there as
  93. a sort of weak filter to keep invalid transactions out of the blockchain. It's
  94. weak, because a Byzantine node doesn't care about CheckTx; it can propose a
  95. block full of invalid transactions if it wants.
  96. #### Replay Protection
  97. To prevent old transactions from being replayed, CheckTx must implement
  98. replay protection.
  99. Tendermint provides the first defense layer by keeping a lightweight
  100. in-memory cache of 100k (`[mempool] cache_size`) last transactions in
  101. the mempool. If Tendermint is just started or the clients sent more than
  102. 100k transactions, old transactions may be sent to the application. So
  103. it is important CheckTx implements some logic to handle them.
  104. If there are cases in your application where a transaction may become invalid in some
  105. future state, you probably want to disable Tendermint's
  106. cache. You can do that by setting `[mempool] cache_size = 0` in the
  107. config.
  108. ### Query Connection
  109. The Info Connection should maintain a `QueryState` for answering queries from the user,
  110. and for initialization when Tendermint first starts up (both described further
  111. below).
  112. It should always contain the latest committed state associated with the
  113. latest committed block.
  114. QueryState should be set to the latest `DeliverTxState` at the end of every `Commit`,
  115. ie. after the full block has been processed and the state committed to disk.
  116. Otherwise it should never be modified.
  117. Tendermint Core currently uses the Query connection to filter peers upon
  118. connecting, according to IP address or node ID. For instance,
  119. returning non-OK ABCI response to either of the following queries will
  120. cause Tendermint to not connect to the corresponding peer:
  121. - `p2p/filter/addr/<ip addr>`, where `<ip addr>` is an IP address.
  122. - `p2p/filter/id/<id>`, where `<is>` is the hex-encoded node ID (the hash of
  123. the node's p2p pubkey).
  124. Note: these query formats are subject to change!
  125. ### Snapshot Connection
  126. The Snapshot Connection is optional, and is only used to serve state sync snapshots for other nodes
  127. and/or restore state sync snapshots to a local node being bootstrapped.
  128. ## Transaction Results
  129. `ResponseCheckTx` and `ResponseDeliverTx` contain the same fields.
  130. The `Info` and `Log` fields are non-deterministic values for debugging/convenience purposes
  131. that are otherwise ignored.
  132. The `Data` field must be strictly deterministic, but can be arbitrary data.
  133. ### Gas
  134. Ethereum introduced the notion of `gas` as an abstract representation of the
  135. cost of resources used by nodes when processing transactions. Every operation in the
  136. Ethereum Virtual Machine uses some amount of gas, and gas can be accepted at a market-variable price.
  137. Users propose a maximum amount of gas for their transaction; if the tx uses less, they get
  138. the difference credited back. Tendermint adopts a similar abstraction,
  139. though uses it only optionally and weakly, allowing applications to define
  140. their own sense of the cost of execution.
  141. In Tendermint, the `ConsensusParams.Block.MaxGas` limits the amount of `gas` that can be used in a block.
  142. The default value is `-1`, meaning no limit, or that the concept of gas is
  143. meaningless.
  144. Responses contain a `GasWanted` and `GasUsed` field. The former is the maximum
  145. amount of gas the sender of a tx is willing to use, and the later is how much it actually
  146. used. Applications should enforce that `GasUsed <= GasWanted` - ie. tx execution
  147. should halt before it can use more resources than it requested.
  148. When `MaxGas > -1`, Tendermint enforces the following rules:
  149. - `GasWanted <= MaxGas` for all txs in the mempool
  150. - `(sum of GasWanted in a block) <= MaxGas` when proposing a block
  151. If `MaxGas == -1`, no rules about gas are enforced.
  152. Note that Tendermint does not currently enforce anything about Gas in the consensus, only the mempool.
  153. This means it does not guarantee that committed blocks satisfy these rules!
  154. It is the application's responsibility to return non-zero response codes when gas limits are exceeded.
  155. The `GasUsed` field is ignored completely by Tendermint. That said, applications should enforce:
  156. - `GasUsed <= GasWanted` for any given transaction
  157. - `(sum of GasUsed in a block) <= MaxGas` for every block
  158. In the future, we intend to add a `Priority` field to the responses that can be
  159. used to explicitly prioritize txs in the mempool for inclusion in a block
  160. proposal. See [#1861](https://github.com/tendermint/tendermint/issues/1861).
  161. ### CheckTx
  162. If `Code != 0`, it will be rejected from the mempool and hence
  163. not broadcasted to other peers and not included in a proposal block.
  164. `Data` contains the result of the CheckTx transaction execution, if any. It is
  165. semantically meaningless to Tendermint.
  166. `Events` include any events for the execution, though since the transaction has not
  167. been committed yet, they are effectively ignored by Tendermint.
  168. ### DeliverTx
  169. DeliverTx is the workhorse of the blockchain. Tendermint sends the
  170. DeliverTx requests asynchronously but in order, and relies on the
  171. underlying socket protocol (ie. TCP) to ensure they are received by the
  172. app in order. They have already been ordered in the global consensus by
  173. the Tendermint protocol.
  174. If DeliverTx returns `Code != 0`, the transaction will be considered invalid,
  175. though it is still included in the block.
  176. DeliverTx returns a `abci.Result`, which includes a Code, Data, and Log.
  177. `Data` contains the result of the CheckTx transaction execution, if any. It is
  178. semantically meaningless to Tendermint.
  179. Both the `Code` and `Data` are included in a structure that is hashed into the
  180. `LastResultsHash` of the next block header.
  181. `Events` include any events for the execution, which Tendermint will use to index
  182. the transaction by. This allows transactions to be queried according to what
  183. events took place during their execution.
  184. ## Validator Updates
  185. The application may set the validator set during InitChain, and update it during
  186. EndBlock.
  187. Note that the maximum total power of the validator set is bounded by
  188. `MaxTotalVotingPower = MaxInt64 / 8`. Applications are responsible for ensuring
  189. they do not make changes to the validator set that cause it to exceed this
  190. limit.
  191. Additionally, applications must ensure that a single set of updates does not contain any duplicates -
  192. a given public key can only appear in an update once. If an update includes
  193. duplicates, the block execution will fail irrecoverably.
  194. ### InitChain
  195. ResponseInitChain can return a list of validators.
  196. If the list is empty, Tendermint will use the validators loaded in the genesis
  197. file.
  198. If the list is not empty, Tendermint will use it for the validator set.
  199. This way the application can determine the initial validator set for the
  200. blockchain.
  201. ### EndBlock
  202. Updates to the Tendermint validator set can be made by returning
  203. `ValidatorUpdate` objects in the `ResponseEndBlock`:
  204. ```protobuf
  205. message ValidatorUpdate {
  206. tendermint.crypto.keys.PublicKey pub_key
  207. int64 power
  208. }
  209. message PublicKey {
  210. oneof {
  211. ed25519 bytes = 1;
  212. }
  213. ```
  214. The `pub_key` currently supports only one type:
  215. - `type = "ed25519"`
  216. The `power` is the new voting power for the validator, with the
  217. following rules:
  218. - power must be non-negative
  219. - if power is 0, the validator must already exist, and will be removed from the
  220. validator set
  221. - if power is non-0:
  222. - if the validator does not already exist, it will be added to the validator
  223. set with the given power
  224. - if the validator does already exist, its power will be adjusted to the given power
  225. - the total power of the new validator set must not exceed MaxTotalVotingPower
  226. Note the updates returned in block `H` will only take effect at block `H+2`.
  227. ## Consensus Parameters
  228. ConsensusParams enforce certain limits in the blockchain, like the maximum size
  229. of blocks, amount of gas used in a block, and the maximum acceptable age of
  230. evidence. They can be set in InitChain and updated in EndBlock.
  231. ### BlockParams.MaxBytes
  232. The maximum size of a complete Protobuf encoded block.
  233. This is enforced by Tendermint consensus.
  234. This implies a maximum tx size that is this MaxBytes, less the expected size of
  235. the header, the validator set, and any included evidence in the block.
  236. Must have `0 < MaxBytes < 100 MB`.
  237. ### BlockParams.MaxGas
  238. The maximum of the sum of `GasWanted` in a proposed block.
  239. This is *not* enforced by Tendermint consensus.
  240. It is left to the app to enforce (ie. if txs are included past the
  241. limit, they should return non-zero codes). It is used by Tendermint to limit the
  242. txs included in a proposed block.
  243. Must have `MaxGas >= -1`.
  244. If `MaxGas == -1`, no limit is enforced.
  245. ### BlockParams.TimeIotaMs
  246. The minimum time between consecutive blocks (in milliseconds).
  247. This is enforced by Tendermint consensus.
  248. Must have `TimeIotaMs > 0` to ensure time monotonicity.
  249. > *Note: This is not exposed to the application*
  250. ### EvidenceParams.MaxAgeDuration
  251. This is the maximum age of evidence in time units.
  252. This is enforced by Tendermint consensus.
  253. If a block includes evidence older than this (AND the evidence was created more
  254. than `MaxAgeNumBlocks` ago), the block will be rejected (validators won't vote
  255. for it).
  256. Must have `MaxAgeDuration > 0`.
  257. ### EvidenceParams.MaxAgeNumBlocks
  258. This is the maximum age of evidence in blocks.
  259. This is enforced by Tendermint consensus.
  260. If a block includes evidence older than this (AND the evidence was created more
  261. than `MaxAgeDuration` ago), the block will be rejected (validators won't vote
  262. for it).
  263. Must have `MaxAgeNumBlocks > 0`.
  264. ### EvidenceParams.MaxNum
  265. This is the maximum number of evidence that can be committed to a single block.
  266. The product of this and the `MaxEvidenceBytes` must not exceed the size of
  267. a block minus it's overhead ( ~ `MaxBytes`).
  268. The amount must be a positive number.
  269. ### Updates
  270. The application may set the ConsensusParams during InitChain, and update them during
  271. EndBlock. If the ConsensusParams is empty, it will be ignored. Each field
  272. that is not empty will be applied in full. For instance, if updating the
  273. Block.MaxBytes, applications must also set the other Block fields (like
  274. Block.MaxGas), even if they are unchanged, as they will otherwise cause the
  275. value to be updated to 0.
  276. #### InitChain
  277. ResponseInitChain includes a ConsensusParams.
  278. If its nil, Tendermint will use the params loaded in the genesis
  279. file. If it's not nil, Tendermint will use it.
  280. This way the application can determine the initial consensus params for the
  281. blockchain.
  282. #### EndBlock
  283. ResponseEndBlock includes a ConsensusParams.
  284. If its nil, Tendermint will do nothing.
  285. If it's not nil, Tendermint will use it.
  286. This way the application can update the consensus params over time.
  287. Note the updates returned in block `H` will take effect right away for block
  288. `H+1`.
  289. ## Query
  290. Query is a generic method with lots of flexibility to enable diverse sets
  291. of queries on application state. Tendermint makes use of Query to filter new peers
  292. based on ID and IP, and exposes Query to the user over RPC.
  293. Note that calls to Query are not replicated across nodes, but rather query the
  294. local node's state - hence they may return stale reads. For reads that require
  295. consensus, use a transaction.
  296. The most important use of Query is to return Merkle proofs of the application state at some height
  297. that can be used for efficient application-specific light-clients.
  298. Note Tendermint has technically no requirements from the Query
  299. message for normal operation - that is, the ABCI app developer need not implement
  300. Query functionality if they do not wish too.
  301. ### Query Proofs
  302. The Tendermint block header includes a number of hashes, each providing an
  303. anchor for some type of proof about the blockchain. The `ValidatorsHash` enables
  304. quick verification of the validator set, the `DataHash` gives quick
  305. verification of the transactions included in the block, etc.
  306. The `AppHash` is unique in that it is application specific, and allows for
  307. application-specific Merkle proofs about the state of the application.
  308. While some applications keep all relevant state in the transactions themselves
  309. (like Bitcoin and its UTXOs), others maintain a separated state that is
  310. computed deterministically *from* transactions, but is not contained directly in
  311. the transactions themselves (like Ethereum contracts and accounts).
  312. For such applications, the `AppHash` provides a much more efficient way to verify light-client proofs.
  313. ABCI applications can take advantage of more efficient light-client proofs for
  314. their state as follows:
  315. - return the Merkle root of the deterministic application state in
  316. `ResponseCommit.Data`.
  317. - it will be included as the `AppHash` in the next block.
  318. - return efficient Merkle proofs about that application state in `ResponseQuery.Proof`
  319. that can be verified using the `AppHash` of the corresponding block.
  320. For instance, this allows an application's light-client to verify proofs of
  321. absence in the application state, something which is much less efficient to do using the block hash.
  322. Some applications (eg. Ethereum, Cosmos-SDK) have multiple "levels" of Merkle trees,
  323. where the leaves of one tree are the root hashes of others. To support this, and
  324. the general variability in Merkle proofs, the `ResponseQuery.Proof` has some minimal structure:
  325. ```protobuf
  326. message ProofOps {
  327. repeated ProofOp ops
  328. }
  329. message ProofOp {
  330. string type = 1;
  331. bytes key = 2;
  332. bytes data = 3;
  333. }
  334. ```
  335. Each `ProofOp` contains a proof for a single key in a single Merkle tree, of the specified `type`.
  336. This allows ABCI to support many different kinds of Merkle trees, encoding
  337. formats, and proofs (eg. of presence and absence) just by varying the `type`.
  338. The `data` contains the actual encoded proof, encoded according to the `type`.
  339. When verifying the full proof, the root hash for one ProofOp is the value being
  340. verified for the next ProofOp in the list. The root hash of the final ProofOp in
  341. the list should match the `AppHash` being verified against.
  342. ### Peer Filtering
  343. When Tendermint connects to a peer, it sends two queries to the ABCI application
  344. using the following paths, with no additional data:
  345. - `/p2p/filter/addr/<IP:PORT>`, where `<IP:PORT>` denote the IP address and
  346. the port of the connection
  347. - `p2p/filter/id/<ID>`, where `<ID>` is the peer node ID (ie. the
  348. pubkey.Address() for the peer's PubKey)
  349. If either of these queries return a non-zero ABCI code, Tendermint will refuse
  350. to connect to the peer.
  351. ### Paths
  352. Queries are directed at paths, and may optionally include additional data.
  353. The expectation is for there to be some number of high level paths
  354. differentiating concerns, like `/p2p`, `/store`, and `/app`. Currently,
  355. Tendermint only uses `/p2p`, for filtering peers. For more advanced use, see the
  356. implementation of
  357. [Query in the Cosmos-SDK](https://github.com/cosmos/cosmos-sdk/blob/v0.23.1/baseapp/baseapp.go#L333).
  358. ## Crash Recovery
  359. On startup, Tendermint calls the `Info` method on the Info Connection to get the latest
  360. committed state of the app. The app MUST return information consistent with the
  361. last block it succesfully completed Commit for.
  362. If the app succesfully committed block H but not H+1, then `last_block_height = H` and `last_block_app_hash = <hash returned by Commit for block H>`. If the app
  363. failed during the Commit of block H, then `last_block_height = H-1` and
  364. `last_block_app_hash = <hash returned by Commit for block H-1, which is the hash in the header of block H>`.
  365. We now distinguish three heights, and describe how Tendermint syncs itself with
  366. the app.
  367. ```md
  368. storeBlockHeight = height of the last block Tendermint saw a commit for
  369. stateBlockHeight = height of the last block for which Tendermint completed all
  370. block processing and saved all ABCI results to disk
  371. appBlockHeight = height of the last block for which ABCI app succesfully
  372. completed Commit
  373. ```
  374. Note we always have `storeBlockHeight >= stateBlockHeight` and `storeBlockHeight >= appBlockHeight`
  375. Note also we never call Commit on an ABCI app twice for the same height.
  376. The procedure is as follows.
  377. First, some simple start conditions:
  378. If `appBlockHeight == 0`, then call InitChain.
  379. If `storeBlockHeight == 0`, we're done.
  380. Now, some sanity checks:
  381. If `storeBlockHeight < appBlockHeight`, error
  382. If `storeBlockHeight < stateBlockHeight`, panic
  383. If `storeBlockHeight > stateBlockHeight+1`, panic
  384. Now, the meat:
  385. If `storeBlockHeight == stateBlockHeight && appBlockHeight < storeBlockHeight`,
  386. replay all blocks in full from `appBlockHeight` to `storeBlockHeight`.
  387. This happens if we completed processing the block, but the app forgot its height.
  388. If `storeBlockHeight == stateBlockHeight && appBlockHeight == storeBlockHeight`, we're done.
  389. This happens if we crashed at an opportune spot.
  390. If `storeBlockHeight == stateBlockHeight+1`
  391. This happens if we started processing the block but didn't finish.
  392. If `appBlockHeight < stateBlockHeight`
  393. replay all blocks in full from `appBlockHeight` to `storeBlockHeight-1`,
  394. and replay the block at `storeBlockHeight` using the WAL.
  395. This happens if the app forgot the last block it committed.
  396. If `appBlockHeight == stateBlockHeight`,
  397. replay the last block (storeBlockHeight) in full.
  398. This happens if we crashed before the app finished Commit
  399. If `appBlockHeight == storeBlockHeight`
  400. update the state using the saved ABCI responses but dont run the block against the real app.
  401. This happens if we crashed after the app finished Commit but before Tendermint saved the state.
  402. ## State Sync
  403. A new node joining the network can simply join consensus at the genesis height and replay all
  404. historical blocks until it is caught up. However, for large chains this can take a significant
  405. amount of time, often on the order of days or weeks.
  406. State sync is an alternative mechanism for bootstrapping a new node, where it fetches a snapshot
  407. of the state machine at a given height and restores it. Depending on the application, this can
  408. be several orders of magnitude faster than replaying blocks.
  409. Note that state sync does not currently backfill historical blocks, so the node will have a
  410. truncated block history - users are advised to consider the broader network implications of this in
  411. terms of block availability and auditability. This functionality may be added in the future.
  412. For details on the specific ABCI calls and types, see the [methods and types section](abci.md).
  413. ### Taking Snapshots
  414. Applications that want to support state syncing must take state snapshots at regular intervals. How
  415. this is accomplished is entirely up to the application. A snapshot consists of some metadata and
  416. a set of binary chunks in an arbitrary format:
  417. - `Height (uint64)`: The height at which the snapshot is taken. It must be taken after the given
  418. height has been committed, and must not contain data from any later heights.
  419. - `Format (uint32)`: An arbitrary snapshot format identifier. This can be used to version snapshot
  420. formats, e.g. to switch from Protobuf to MessagePack for serialization. The application can use
  421. this when restoring to choose whether to accept or reject a snapshot.
  422. - `Chunks (uint32)`: The number of chunks in the snapshot. Each chunk contains arbitrary binary
  423. data, and should be less than 16 MB; 10 MB is a good starting point.
  424. - `Hash ([]byte)`: An arbitrary hash of the snapshot. This is used to check whether a snapshot is
  425. the same across nodes when downloading chunks.
  426. - `Metadata ([]byte)`: Arbitrary snapshot metadata, e.g. chunk hashes for verification or any other
  427. necessary info.
  428. For a snapshot to be considered the same across nodes, all of these fields must be identical. When
  429. sent across the network, snapshot metadata messages are limited to 4 MB.
  430. When a new node is running state sync and discovering snapshots, Tendermint will query an existing
  431. application via the ABCI `ListSnapshots` method to discover available snapshots, and load binary
  432. snapshot chunks via `LoadSnapshotChunk`. The application is free to choose how to implement this
  433. and which formats to use, but should provide the following guarantees:
  434. - **Consistent:** A snapshot should be taken at a single isolated height, unaffected by
  435. concurrent writes. This can e.g. be accomplished by using a data store that supports ACID
  436. transactions with snapshot isolation.
  437. - **Asynchronous:** Taking a snapshot can be time-consuming, so it should not halt chain progress,
  438. for example by running in a separate thread.
  439. - **Deterministic:** A snapshot taken at the same height in the same format should be identical
  440. (at the byte level) across nodes, including all metadata. This ensures good availability of
  441. chunks, and that they fit together across nodes.
  442. A very basic approach might be to use a datastore with MVCC transactions (such as RocksDB),
  443. start a transaction immediately after block commit, and spawn a new thread which is passed the
  444. transaction handle. This thread can then export all data items, serialize them using e.g.
  445. Protobuf, hash the byte stream, split it into chunks, and store the chunks in the file system
  446. along with some metadata - all while the blockchain is applying new blocks in parallel.
  447. A more advanced approach might include incremental verification of individual chunks against the
  448. chain app hash, parallel or batched exports, compression, and so on.
  449. Old snapshots should be removed after some time - generally only the last two snapshots are needed
  450. (to prevent the last one from being removed while a node is restoring it).
  451. ### Bootstrapping a Node
  452. An empty node can be state synced by setting the configuration option `statesync.enabled =
  453. true`. The node also needs the chain genesis file for basic chain info, and configuration for
  454. light client verification of the restored snapshot: a set of Tendermint RPC servers, and a
  455. trusted header hash and corresponding height from a trusted source, via the `statesync`
  456. configuration section.
  457. Once started, the node will connect to the P2P network and begin discovering snapshots. These
  458. will be offered to the local application, and once a snapshot is accepted Tendermint will fetch
  459. and apply the snapshot chunks. After all chunks have been successfully applied, Tendermint verifies
  460. the app's `AppHash` against the chain using the light client, then switches the node to normal
  461. consensus operation.
  462. #### Snapshot Discovery
  463. When the empty node join the P2P network, it asks all peers to report snapshots via the
  464. `ListSnapshots` ABCI call (limited to 10 per node). After some time, the node picks the most
  465. suitable snapshot (generally prioritized by height, format, and number of peers), and offers it
  466. to the application via `OfferSnapshot`. The application can choose a number of responses,
  467. including accepting or rejecting it, rejecting the offered format, rejecting the peer who sent
  468. it, and so on. Tendermint will keep discovering and offering snapshots until one is accepted or
  469. the application aborts.
  470. #### Snapshot Restoration
  471. Once a snapshot has been accepted via `OfferSnapshot`, Tendermint begins downloading chunks from
  472. any peers that have the same snapshot (i.e. that have identical metadata fields). Chunks are
  473. spooled in a temporary directory, and then given to the application in sequential order via
  474. `ApplySnapshotChunk` until all chunks have been accepted.
  475. As with taking snapshots, the method for restoring them is entirely up to the application, but will
  476. generally be the inverse of how they are taken.
  477. During restoration, the application can respond to `ApplySnapshotChunk` with instructions for how
  478. to continue. This will typically be to accept the chunk and await the next one, but it can also
  479. ask for chunks to be refetched (either the current one or any number of previous ones), P2P peers
  480. to be banned, snapshots to be rejected or retried, and a number of other responses - see the ABCI
  481. reference for details.
  482. If Tendermint fails to fetch a chunk after some time, it will reject the snapshot and try a
  483. different one via `OfferSnapshot` - the application can choose whether it wants to support
  484. restarting restoration, or simply abort with an error.
  485. #### Snapshot Verification
  486. Once all chunks have been accepted, Tendermint issues an `Info` ABCI call to retrieve the
  487. `LastBlockAppHash`. This is compared with the trusted app hash from the chain, retrieved and
  488. verified using the light client. Tendermint also checks that `LastBlockHeight` corresponds to the
  489. height of the snapshot.
  490. This verification ensures that an application is valid before joining the network. However, the
  491. snapshot restoration may take a long time to complete, so applications may want to employ additional
  492. verification during the restore to detect failures early. This might e.g. include incremental
  493. verification of each chunk against the app hash (using bundled Merkle proofs), checksums to
  494. protect against data corruption by the disk or network, and so on. However, it is important to
  495. note that the only trusted information available is the app hash, and all other snapshot metadata
  496. can be spoofed by adversaries.
  497. Apps may also want to consider state sync denial-of-service vectors, where adversaries provide
  498. invalid or harmful snapshots to prevent nodes from joining the network. The application can
  499. counteract this by asking Tendermint to ban peers. As a last resort, node operators can use
  500. P2P configuration options to whitelist a set of trusted peers that can provide valid snapshots.
  501. #### Transition to Consensus
  502. Once the snapshot has been restored, Tendermint gathers additional information necessary for
  503. bootstrapping the node (e.g. chain ID, consensus parameters, validator sets, and block headers)
  504. from the genesis file and light client RPC servers. It also fetches and records the `AppVersion`
  505. from the ABCI application.
  506. Once the node is bootstrapped with this information and the restored state machine, it transitions
  507. to fast sync (if enabled) to fetch any remaining blocks up the the chain head, and then
  508. transitions to regular consensus operation. At this point the node operates like any other node,
  509. apart from having a truncated block history at the height of the restored snapshot.