You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

352 lines
16 KiB

  1. ---
  2. order: 1
  3. ---
  4. # Byzantine Consensus Algorithm
  5. ## Terms
  6. - The network is composed of optionally connected _nodes_. Nodes
  7. directly connected to a particular node are called _peers_.
  8. - The consensus process in deciding the next block (at some _height_
  9. `H`) is composed of one or many _rounds_.
  10. - `NewHeight`, `Propose`, `Prevote`, `Precommit`, and `Commit`
  11. represent state machine states of a round. (aka `RoundStep` or
  12. just "step").
  13. - A node is said to be _at_ a given height, round, and step, or at
  14. `(H,R,S)`, or at `(H,R)` in short to omit the step.
  15. - To _prevote_ or _precommit_ something means to broadcast a [prevote
  16. vote](https://godoc.org/github.com/tendermint/tendermint/types#Vote)
  17. or [first precommit
  18. vote](https://godoc.org/github.com/tendermint/tendermint/types#FirstPrecommit)
  19. for something.
  20. - A vote _at_ `(H,R)` is a vote signed with the bytes for `H` and `R`
  21. included in its [sign-bytes](../core/data_structures.md#vote).
  22. - _+2/3_ is short for "more than 2/3"
  23. - _1/3+_ is short for "1/3 or more"
  24. - A set of +2/3 of prevotes for a particular block or `<nil>` at
  25. `(H,R)` is called a _proof-of-lock-change_ or _PoLC_ for short.
  26. ## State Machine Overview
  27. At each height of the blockchain a round-based protocol is run to
  28. determine the next block. Each round is composed of three _steps_
  29. (`Propose`, `Prevote`, and `Precommit`), along with two special steps
  30. `Commit` and `NewHeight`.
  31. In the optimal scenario, the order of steps is:
  32. ```md
  33. NewHeight -> (Propose -> Prevote -> Precommit)+ -> Commit -> NewHeight ->...
  34. ```
  35. The sequence `(Propose -> Prevote -> Precommit)` is called a _round_.
  36. There may be more than one round required to commit a block at a given
  37. height. Examples for why more rounds may be required include:
  38. - The designated proposer was not online.
  39. - The block proposed by the designated proposer was not valid.
  40. - The block proposed by the designated proposer did not propagate
  41. in time.
  42. - The block proposed was valid, but +2/3 of prevotes for the proposed
  43. block were not received in time for enough validator nodes by the
  44. time they reached the `Precommit` step. Even though +2/3 of prevotes
  45. are necessary to progress to the next step, at least one validator
  46. may have voted `<nil>` or maliciously voted for something else.
  47. - The block proposed was valid, and +2/3 of prevotes were received for
  48. enough nodes, but +2/3 of precommits for the proposed block were not
  49. received for enough validator nodes.
  50. Some of these problems are resolved by moving onto the next round &
  51. proposer. Others are resolved by increasing certain round timeout
  52. parameters over each successive round.
  53. ## State Machine Diagram
  54. ```md
  55. +-------------------------------------+
  56. v |(Wait til `CommmitTime+timeoutCommit`)
  57. +-----------+ +-----+-----+
  58. +----------> | Propose +--------------+ | NewHeight |
  59. | +-----------+ | +-----------+
  60. | | ^
  61. |(Else, after timeoutPrecommit) v |
  62. +-----+-----+ +-----------+ |
  63. | Precommit | <------------------------+ Prevote | |
  64. +-----+-----+ +-----------+ |
  65. |(When +2/3 Precommits for block found) |
  66. v |
  67. +--------------------------------------------------------------------+
  68. | Commit |
  69. | |
  70. | * Set CommitTime = now; |
  71. | * Wait for block, then stage/save/commit block; |
  72. +--------------------------------------------------------------------+
  73. ```
  74. # Background Gossip
  75. A node may not have a corresponding validator private key, but it
  76. nevertheless plays an active role in the consensus process by relaying
  77. relevant meta-data, proposals, blocks, and votes to its peers. A node
  78. that has the private keys of an active validator and is engaged in
  79. signing votes is called a _validator-node_. All nodes (not just
  80. validator-nodes) have an associated state (the current height, round,
  81. and step) and work to make progress.
  82. Between two nodes there exists a `Connection`, and multiplexed on top of
  83. this connection are fairly throttled `Channel`s of information. An
  84. epidemic gossip protocol is implemented among some of these channels to
  85. bring peers up to speed on the most recent state of consensus. For
  86. example,
  87. - Nodes gossip `PartSet` parts of the current round's proposer's
  88. proposed block. A LibSwift inspired algorithm is used to quickly
  89. broadcast blocks across the gossip network.
  90. - Nodes gossip prevote/precommit votes. A node `NODE_A` that is ahead
  91. of `NODE_B` can send `NODE_B` prevotes or precommits for `NODE_B`'s
  92. current (or future) round to enable it to progress forward.
  93. - Nodes gossip prevotes for the proposed PoLC (proof-of-lock-change)
  94. round if one is proposed.
  95. - Nodes gossip to nodes lagging in blockchain height with block
  96. [commits](https://godoc.org/github.com/tendermint/tendermint/types#Commit)
  97. for older blocks.
  98. - Nodes opportunistically gossip `HasVote` messages to hint peers what
  99. votes it already has.
  100. - Nodes broadcast their current state to all neighboring peers. (but
  101. is not gossiped further)
  102. There's more, but let's not get ahead of ourselves here.
  103. ## Proposals
  104. A proposal is signed and published by the designated proposer at each
  105. round. The proposer is chosen by a deterministic and non-choking round
  106. robin selection algorithm that selects proposers in proportion to their
  107. voting power (see
  108. [implementation](https://github.com/tendermint/tendermint/blob/master/types/validator_set.go)).
  109. A proposal at `(H,R)` is composed of a block and an optional latest
  110. `PoLC-Round < R` which is included iff the proposer knows of one. This
  111. hints the network to allow nodes to unlock (when safe) to ensure the
  112. liveness property.
  113. ## State Machine Spec
  114. ### Propose Step (height:H,round:R)
  115. Upon entering `Propose`:
  116. - The designated proposer proposes a block at `(H,R)`.
  117. The `Propose` step ends:
  118. - After `timeoutProposeR` after entering `Propose`. --> goto
  119. `Prevote(H,R)`
  120. - After receiving proposal block and all prevotes at `PoLC-Round`. -->
  121. goto `Prevote(H,R)`
  122. - After [common exit conditions](#common-exit-conditions)
  123. ### Prevote Step (height:H,round:R)
  124. Upon entering `Prevote`, each validator broadcasts its prevote vote.
  125. - First, if the validator is locked on a block since `LastLockRound`
  126. but now has a PoLC for something else at round `PoLC-Round` where
  127. `LastLockRound < PoLC-Round < R`, then it unlocks.
  128. - If the validator is still locked on a block, it prevotes that.
  129. - Else, if the proposed block from `Propose(H,R)` is good, it
  130. prevotes that.
  131. - Else, if the proposal is invalid or wasn't received on time, it
  132. prevotes `<nil>`.
  133. The `Prevote` step ends:
  134. - After +2/3 prevotes for a particular block or `<nil>`. -->; goto
  135. `Precommit(H,R)`
  136. - After `timeoutPrevote` after receiving any +2/3 prevotes. --> goto
  137. `Precommit(H,R)`
  138. - After [common exit conditions](#common-exit-conditions)
  139. ### Precommit Step (height:H,round:R)
  140. Upon entering `Precommit`, each validator broadcasts its precommit vote.
  141. - If the validator has a PoLC at `(H,R)` for a particular block `B`, it
  142. (re)locks (or changes lock to) and precommits `B` and sets
  143. `LastLockRound = R`.
  144. - Else, if the validator has a PoLC at `(H,R)` for `<nil>`, it unlocks
  145. and precommits `<nil>`.
  146. - Else, it keeps the lock unchanged and precommits `<nil>`.
  147. A precommit for `<nil>` means "I didn’t see a PoLC for this round, but I
  148. did get +2/3 prevotes and waited a bit".
  149. The Precommit step ends:
  150. - After +2/3 precommits for `<nil>`. --> goto `Propose(H,R+1)`
  151. - After `timeoutPrecommit` after receiving any +2/3 precommits. --> goto
  152. `Propose(H,R+1)`
  153. - After [common exit conditions](#common-exit-conditions)
  154. ### Common exit conditions
  155. - After +2/3 precommits for a particular block. --> goto
  156. `Commit(H)`
  157. - After any +2/3 prevotes received at `(H,R+x)`. --> goto
  158. `Prevote(H,R+x)`
  159. - After any +2/3 precommits received at `(H,R+x)`. --> goto
  160. `Precommit(H,R+x)`
  161. ### Commit Step (height:H)
  162. - Set `CommitTime = now()`
  163. - Wait until block is received. --> goto `NewHeight(H+1)`
  164. ### NewHeight Step (height:H)
  165. - Move `Precommits` to `LastCommit` and increment height.
  166. - Set `StartTime = CommitTime+timeoutCommit`
  167. - Wait until `StartTime` to receive straggler commits. --> goto
  168. `Propose(H,0)`
  169. ## Proofs
  170. ### Proof of Safety
  171. Assume that at most -1/3 of the voting power of validators is byzantine.
  172. If a validator commits block `B` at round `R`, it's because it saw +2/3
  173. of precommits at round `R`. This implies that 1/3+ of honest nodes are
  174. still locked at round `R' > R`. These locked validators will remain
  175. locked until they see a PoLC at `R' > R`, but this won't happen because
  176. 1/3+ are locked and honest, so at most -2/3 are available to vote for
  177. anything other than `B`.
  178. ### Proof of Liveness
  179. If 1/3+ honest validators are locked on two different blocks from
  180. different rounds, a proposers' `PoLC-Round` will eventually cause nodes
  181. locked from the earlier round to unlock. Eventually, the designated
  182. proposer will be one that is aware of a PoLC at the later round. Also,
  183. `timeoutProposalR` increments with round `R`, while the size of a
  184. proposal are capped, so eventually the network is able to "fully gossip"
  185. the whole proposal (e.g. the block & PoLC).
  186. ### Proof of Fork Accountability
  187. Define the JSet (justification-vote-set) at height `H` of a validator
  188. `V1` to be all the votes signed by the validator at `H` along with
  189. justification PoLC prevotes for each lock change. For example, if `V1`
  190. signed the following precommits: `Precommit(B1 @ round 0)`,
  191. `Precommit(<nil> @ round 1)`, `Precommit(B2 @ round 4)` (note that no
  192. precommits were signed for rounds 2 and 3, and that's ok),
  193. `Precommit(B1 @ round 0)` must be justified by a PoLC at round 0, and
  194. `Precommit(B2 @ round 4)` must be justified by a PoLC at round 4; but
  195. the precommit for `<nil>` at round 1 is not a lock-change by definition
  196. so the JSet for `V1` need not include any prevotes at round 1, 2, or 3
  197. (unless `V1` happened to have prevoted for those rounds).
  198. Further, define the JSet at height `H` of a set of validators `VSet` to
  199. be the union of the JSets for each validator in `VSet`. For a given
  200. commit by honest validators at round `R` for block `B` we can construct
  201. a JSet to justify the commit for `B` at `R`. We say that a JSet
  202. _justifies_ a commit at `(H,R)` if all the committers (validators in the
  203. commit-set) are each justified in the JSet with no duplicitous vote
  204. signatures (by the committers).
  205. - **Lemma**: When a fork is detected by the existence of two
  206. conflicting [commits](../core/data_structures.md#commit), the
  207. union of the JSets for both commits (if they can be compiled) must
  208. include double-signing by at least 1/3+ of the validator set.
  209. **Proof**: The commit cannot be at the same round, because that
  210. would immediately imply double-signing by 1/3+. Take the union of
  211. the JSets of both commits. If there is no double-signing by at least
  212. 1/3+ of the validator set in the union, then no honest validator
  213. could have precommitted any different block after the first commit.
  214. Yet, +2/3 did. Reductio ad absurdum.
  215. As a corollary, when there is a fork, an external process can determine
  216. the blame by requiring each validator to justify all of its round votes.
  217. Either we will find 1/3+ who cannot justify at least one of their votes,
  218. and/or, we will find 1/3+ who had double-signed.
  219. ### Alternative algorithm
  220. Alternatively, we can take the JSet of a commit to be the "full commit".
  221. That is, if light clients and validators do not consider a block to be
  222. committed unless the JSet of the commit is also known, then we get the
  223. desirable property that if there ever is a fork (e.g. there are two
  224. conflicting "full commits"), then 1/3+ of the validators are immediately
  225. punishable for double-signing.
  226. There are many ways to ensure that the gossip network efficiently share
  227. the JSet of a commit. One solution is to add a new message type that
  228. tells peers that this node has (or does not have) a +2/3 majority for B
  229. (or) at (H,R), and a bitarray of which votes contributed towards that
  230. majority. Peers can react by responding with appropriate votes.
  231. We will implement such an algorithm for the next iteration of the
  232. Tendermint consensus protocol.
  233. Other potential improvements include adding more data in votes such as
  234. the last known PoLC round that caused a lock change, and the last voted
  235. round/step (or, we may require that validators not skip any votes). This
  236. may make JSet verification/gossip logic easier to implement.
  237. ### Censorship Attacks
  238. Due to the definition of a block
  239. [commit](https://github.com/tendermint/tendermint/blob/master/docs/tendermint-core/validators.md), any 1/3+ coalition of
  240. validators can halt the blockchain by not broadcasting their votes. Such
  241. a coalition can also censor particular transactions by rejecting blocks
  242. that include these transactions, though this would result in a
  243. significant proportion of block proposals to be rejected, which would
  244. slow down the rate of block commits of the blockchain, reducing its
  245. utility and value. The malicious coalition might also broadcast votes in
  246. a trickle so as to grind blockchain block commits to a near halt, or
  247. engage in any combination of these attacks.
  248. If a global active adversary were also involved, it can partition the
  249. network in such a way that it may appear that the wrong subset of
  250. validators were responsible for the slowdown. This is not just a
  251. limitation of Tendermint, but rather a limitation of all consensus
  252. protocols whose network is potentially controlled by an active
  253. adversary.
  254. ### Overcoming Forks and Censorship Attacks
  255. For these types of attacks, a subset of the validators through external
  256. means should coordinate to sign a reorg-proposal that chooses a fork
  257. (and any evidence thereof) and the initial subset of validators with
  258. their signatures. Validators who sign such a reorg-proposal forego its
  259. collateral on all other forks. Clients should verify the signatures on
  260. the reorg-proposal, verify any evidence, and make a judgement or prompt
  261. the end-user for a decision. For example, a phone wallet app may prompt
  262. the user with a security warning, while a refrigerator may accept any
  263. reorg-proposal signed by +1/2 of the original validators.
  264. No non-synchronous Byzantine fault-tolerant algorithm can come to
  265. consensus when 1/3+ of validators are dishonest, yet a fork assumes that
  266. 1/3+ of validators have already been dishonest by double-signing or
  267. lock-changing without justification. So, signing the reorg-proposal is a
  268. coordination problem that cannot be solved by any non-synchronous
  269. protocol (i.e. automatically, and without making assumptions about the
  270. reliability of the underlying network). It must be provided by means
  271. external to the weakly-synchronous Tendermint consensus algorithm. For
  272. now, we leave the problem of reorg-proposal coordination to human
  273. coordination via internet media. Validators must take care to ensure
  274. that there are no significant network partitions, to avoid situations
  275. where two conflicting reorg-proposals are signed.
  276. Assuming that the external coordination medium and protocol is robust,
  277. it follows that forks are less of a concern than [censorship
  278. attacks](#censorship-attacks).
  279. ### Canonical vs subjective commit
  280. We distinguish between "canonical" and "subjective" commits. A subjective commit is what
  281. each validator sees locally when they decide to commit a block. The canonical commit is
  282. what is included by the proposer of the next block in the `LastCommit` field of
  283. the block. This is what makes it canonical and ensures every validator agrees on the canonical commit,
  284. even if it is different from the +2/3 votes a validator has seen, which caused the validator to
  285. commit the respective block. Each block contains a canonical +2/3 commit for the previous
  286. block.