* docs: deprecate specification dir, closes #1814 * update genesis * old spec dir, deprecation complete * rm a filepull/1965/head
@ -1,9 +1,329 @@ | |||||
We are working to finalize an updated Tendermint specification with formal | |||||
proofs of safety and liveness. | |||||
# Byzantine Consensus Algorithm | |||||
In the meantime, see the [description in the | |||||
docs](http://tendermint.readthedocs.io/en/master/specification/byzantine-consensus-algorithm.html). | |||||
## Terms | |||||
There are also relevant but somewhat outdated descriptions in Jae Kwon's [original | |||||
whitepaper](https://tendermint.com/static/docs/tendermint.pdf) and Ethan Buchman's [master's | |||||
thesis](https://atrium.lib.uoguelph.ca/xmlui/handle/10214/9769). | |||||
- The network is composed of optionally connected *nodes*. Nodes | |||||
directly connected to a particular node are called *peers*. | |||||
- The consensus process in deciding the next block (at some *height* | |||||
`H`) is composed of one or many *rounds*. | |||||
- `NewHeight`, `Propose`, `Prevote`, `Precommit`, and `Commit` | |||||
represent state machine states of a round. (aka `RoundStep` or | |||||
just "step"). | |||||
- A node is said to be *at* a given height, round, and step, or at | |||||
`(H,R,S)`, or at `(H,R)` in short to omit the step. | |||||
- To *prevote* or *precommit* something means to broadcast a [prevote | |||||
vote](https://godoc.org/github.com/tendermint/tendermint/types#Vote) | |||||
or [first precommit | |||||
vote](https://godoc.org/github.com/tendermint/tendermint/types#FirstPrecommit) | |||||
for something. | |||||
- A vote *at* `(H,R)` is a vote signed with the bytes for `H` and `R` | |||||
included in its [sign-bytes](block-structure.html#vote-sign-bytes). | |||||
- *+2/3* is short for "more than 2/3" | |||||
- *1/3+* is short for "1/3 or more" | |||||
- A set of +2/3 of prevotes for a particular block or `<nil>` at | |||||
`(H,R)` is called a *proof-of-lock-change* or *PoLC* for short. | |||||
## State Machine Overview | |||||
At each height of the blockchain a round-based protocol is run to | |||||
determine the next block. Each round is composed of three *steps* | |||||
(`Propose`, `Prevote`, and `Precommit`), along with two special steps | |||||
`Commit` and `NewHeight`. | |||||
In the optimal scenario, the order of steps is: | |||||
``` | |||||
NewHeight -> (Propose -> Prevote -> Precommit)+ -> Commit -> NewHeight ->... | |||||
``` | |||||
The sequence `(Propose -> Prevote -> Precommit)` is called a *round*. | |||||
There may be more than one round required to commit a block at a given | |||||
height. Examples for why more rounds may be required include: | |||||
- The designated proposer was not online. | |||||
- The block proposed by the designated proposer was not valid. | |||||
- The block proposed by the designated proposer did not propagate | |||||
in time. | |||||
- The block proposed was valid, but +2/3 of prevotes for the proposed | |||||
block were not received in time for enough validator nodes by the | |||||
time they reached the `Precommit` step. Even though +2/3 of prevotes | |||||
are necessary to progress to the next step, at least one validator | |||||
may have voted `<nil>` or maliciously voted for something else. | |||||
- The block proposed was valid, and +2/3 of prevotes were received for | |||||
enough nodes, but +2/3 of precommits for the proposed block were not | |||||
received for enough validator nodes. | |||||
Some of these problems are resolved by moving onto the next round & | |||||
proposer. Others are resolved by increasing certain round timeout | |||||
parameters over each successive round. | |||||
## State Machine Diagram | |||||
``` | |||||
+-------------------------------------+ | |||||
v |(Wait til `CommmitTime+timeoutCommit`) | |||||
+-----------+ +-----+-----+ | |||||
+----------> | Propose +--------------+ | NewHeight | | |||||
| +-----------+ | +-----------+ | |||||
| | ^ | |||||
|(Else, after timeoutPrecommit) v | | |||||
+-----+-----+ +-----------+ | | |||||
| Precommit | <------------------------+ Prevote | | | |||||
+-----+-----+ +-----------+ | | |||||
|(When +2/3 Precommits for block found) | | |||||
v | | |||||
+--------------------------------------------------------------------+ | |||||
| Commit | | |||||
| | | |||||
| * Set CommitTime = now; | | |||||
| * Wait for block, then stage/save/commit block; | | |||||
+--------------------------------------------------------------------+ | |||||
``` | |||||
Background Gossip | |||||
================= | |||||
A node may not have a corresponding validator private key, but it | |||||
nevertheless plays an active role in the consensus process by relaying | |||||
relevant meta-data, proposals, blocks, and votes to its peers. A node | |||||
that has the private keys of an active validator and is engaged in | |||||
signing votes is called a *validator-node*. All nodes (not just | |||||
validator-nodes) have an associated state (the current height, round, | |||||
and step) and work to make progress. | |||||
Between two nodes there exists a `Connection`, and multiplexed on top of | |||||
this connection are fairly throttled `Channel`s of information. An | |||||
epidemic gossip protocol is implemented among some of these channels to | |||||
bring peers up to speed on the most recent state of consensus. For | |||||
example, | |||||
- Nodes gossip `PartSet` parts of the current round's proposer's | |||||
proposed block. A LibSwift inspired algorithm is used to quickly | |||||
broadcast blocks across the gossip network. | |||||
- Nodes gossip prevote/precommit votes. A node `NODE_A` that is ahead | |||||
of `NODE_B` can send `NODE_B` prevotes or precommits for `NODE_B`'s | |||||
current (or future) round to enable it to progress forward. | |||||
- Nodes gossip prevotes for the proposed PoLC (proof-of-lock-change) | |||||
round if one is proposed. | |||||
- Nodes gossip to nodes lagging in blockchain height with block | |||||
[commits](https://godoc.org/github.com/tendermint/tendermint/types#Commit) | |||||
for older blocks. | |||||
- Nodes opportunistically gossip `HasVote` messages to hint peers what | |||||
votes it already has. | |||||
- Nodes broadcast their current state to all neighboring peers. (but | |||||
is not gossiped further) | |||||
There's more, but let's not get ahead of ourselves here. | |||||
## Proposals | |||||
A proposal is signed and published by the designated proposer at each | |||||
round. The proposer is chosen by a deterministic and non-choking round | |||||
robin selection algorithm that selects proposers in proportion to their | |||||
voting power (see | |||||
[implementation](https://github.com/tendermint/tendermint/blob/develop/types/validator_set.go)). | |||||
A proposal at `(H,R)` is composed of a block and an optional latest | |||||
`PoLC-Round < R` which is included iff the proposer knows of one. This | |||||
hints the network to allow nodes to unlock (when safe) to ensure the | |||||
liveness property. | |||||
## State Machine Spec | |||||
### Propose Step (height:H,round:R) | |||||
Upon entering `Propose`: - The designated proposer proposes a block at | |||||
`(H,R)`. | |||||
The `Propose` step ends: - After `timeoutProposeR` after entering | |||||
`Propose`. --> goto `Prevote(H,R)` - After receiving proposal block | |||||
and all prevotes at `PoLC-Round`. --> goto `Prevote(H,R)` - After | |||||
[common exit conditions](#common-exit-conditions) | |||||
### Prevote Step (height:H,round:R) | |||||
Upon entering `Prevote`, each validator broadcasts its prevote vote. | |||||
- First, if the validator is locked on a block since `LastLockRound` | |||||
but now has a PoLC for something else at round `PoLC-Round` where | |||||
`LastLockRound < PoLC-Round < R`, then it unlocks. | |||||
- If the validator is still locked on a block, it prevotes that. | |||||
- Else, if the proposed block from `Propose(H,R)` is good, it | |||||
prevotes that. | |||||
- Else, if the proposal is invalid or wasn't received on time, it | |||||
prevotes `<nil>`. | |||||
The `Prevote` step ends: - After +2/3 prevotes for a particular block or | |||||
`<nil>`. -->; goto `Precommit(H,R)` - After `timeoutPrevote` after | |||||
receiving any +2/3 prevotes. --> goto `Precommit(H,R)` - After | |||||
[common exit conditions](#common-exit-conditions) | |||||
### Precommit Step (height:H,round:R) | |||||
Upon entering `Precommit`, each validator broadcasts its precommit vote. | |||||
- If the validator has a PoLC at `(H,R)` for a particular block `B`, it | |||||
(re)locks (or changes lock to) and precommits `B` and sets | |||||
`LastLockRound = R`. - Else, if the validator has a PoLC at `(H,R)` for | |||||
`<nil>`, it unlocks and precommits `<nil>`. - Else, it keeps the lock | |||||
unchanged and precommits `<nil>`. | |||||
A precommit for `<nil>` means "I didn’t see a PoLC for this round, but I | |||||
did get +2/3 prevotes and waited a bit". | |||||
The Precommit step ends: - After +2/3 precommits for `<nil>`. --> | |||||
goto `Propose(H,R+1)` - After `timeoutPrecommit` after receiving any | |||||
+2/3 precommits. --> goto `Propose(H,R+1)` - After [common exit | |||||
conditions](#common-exit-conditions) | |||||
### Common exit conditions | |||||
- After +2/3 precommits for a particular block. --> goto | |||||
`Commit(H)` | |||||
- After any +2/3 prevotes received at `(H,R+x)`. --> goto | |||||
`Prevote(H,R+x)` | |||||
- After any +2/3 precommits received at `(H,R+x)`. --> goto | |||||
`Precommit(H,R+x)` | |||||
### Commit Step (height:H) | |||||
- Set `CommitTime = now()` | |||||
- Wait until block is received. --> goto `NewHeight(H+1)` | |||||
### NewHeight Step (height:H) | |||||
- Move `Precommits` to `LastCommit` and increment height. | |||||
- Set `StartTime = CommitTime+timeoutCommit` | |||||
- Wait until `StartTime` to receive straggler commits. --> goto | |||||
`Propose(H,0)` | |||||
## Proofs | |||||
### Proof of Safety | |||||
Assume that at most -1/3 of the voting power of validators is byzantine. | |||||
If a validator commits block `B` at round `R`, it's because it saw +2/3 | |||||
of precommits at round `R`. This implies that 1/3+ of honest nodes are | |||||
still locked at round `R' > R`. These locked validators will remain | |||||
locked until they see a PoLC at `R' > R`, but this won't happen because | |||||
1/3+ are locked and honest, so at most -2/3 are available to vote for | |||||
anything other than `B`. | |||||
### Proof of Liveness | |||||
If 1/3+ honest validators are locked on two different blocks from | |||||
different rounds, a proposers' `PoLC-Round` will eventually cause nodes | |||||
locked from the earlier round to unlock. Eventually, the designated | |||||
proposer will be one that is aware of a PoLC at the later round. Also, | |||||
`timeoutProposalR` increments with round `R`, while the size of a | |||||
proposal are capped, so eventually the network is able to "fully gossip" | |||||
the whole proposal (e.g. the block & PoLC). | |||||
### Proof of Fork Accountability | |||||
Define the JSet (justification-vote-set) at height `H` of a validator | |||||
`V1` to be all the votes signed by the validator at `H` along with | |||||
justification PoLC prevotes for each lock change. For example, if `V1` | |||||
signed the following precommits: `Precommit(B1 @ round 0)`, | |||||
`Precommit(<nil> @ round 1)`, `Precommit(B2 @ round 4)` (note that no | |||||
precommits were signed for rounds 2 and 3, and that's ok), | |||||
`Precommit(B1 @ round 0)` must be justified by a PoLC at round 0, and | |||||
`Precommit(B2 @ round 4)` must be justified by a PoLC at round 4; but | |||||
the precommit for `<nil>` at round 1 is not a lock-change by definition | |||||
so the JSet for `V1` need not include any prevotes at round 1, 2, or 3 | |||||
(unless `V1` happened to have prevoted for those rounds). | |||||
Further, define the JSet at height `H` of a set of validators `VSet` to | |||||
be the union of the JSets for each validator in `VSet`. For a given | |||||
commit by honest validators at round `R` for block `B` we can construct | |||||
a JSet to justify the commit for `B` at `R`. We say that a JSet | |||||
*justifies* a commit at `(H,R)` if all the committers (validators in the | |||||
commit-set) are each justified in the JSet with no duplicitous vote | |||||
signatures (by the committers). | |||||
- **Lemma**: When a fork is detected by the existence of two | |||||
conflicting [commits](./validators.html#commiting-a-block), the | |||||
union of the JSets for both commits (if they can be compiled) must | |||||
include double-signing by at least 1/3+ of the validator set. | |||||
**Proof**: The commit cannot be at the same round, because that | |||||
would immediately imply double-signing by 1/3+. Take the union of | |||||
the JSets of both commits. If there is no double-signing by at least | |||||
1/3+ of the validator set in the union, then no honest validator | |||||
could have precommitted any different block after the first commit. | |||||
Yet, +2/3 did. Reductio ad absurdum. | |||||
As a corollary, when there is a fork, an external process can determine | |||||
the blame by requiring each validator to justify all of its round votes. | |||||
Either we will find 1/3+ who cannot justify at least one of their votes, | |||||
and/or, we will find 1/3+ who had double-signed. | |||||
### Alternative algorithm | |||||
Alternatively, we can take the JSet of a commit to be the "full commit". | |||||
That is, if light clients and validators do not consider a block to be | |||||
committed unless the JSet of the commit is also known, then we get the | |||||
desirable property that if there ever is a fork (e.g. there are two | |||||
conflicting "full commits"), then 1/3+ of the validators are immediately | |||||
punishable for double-signing. | |||||
There are many ways to ensure that the gossip network efficiently share | |||||
the JSet of a commit. One solution is to add a new message type that | |||||
tells peers that this node has (or does not have) a +2/3 majority for B | |||||
(or) at (H,R), and a bitarray of which votes contributed towards that | |||||
majority. Peers can react by responding with appropriate votes. | |||||
We will implement such an algorithm for the next iteration of the | |||||
Tendermint consensus protocol. | |||||
Other potential improvements include adding more data in votes such as | |||||
the last known PoLC round that caused a lock change, and the last voted | |||||
round/step (or, we may require that validators not skip any votes). This | |||||
may make JSet verification/gossip logic easier to implement. | |||||
### Censorship Attacks | |||||
Due to the definition of a block | |||||
[commit](../../tendermint-core/validator.md#commiting-a-block), any 1/3+ coalition of | |||||
validators can halt the blockchain by not broadcasting their votes. Such | |||||
a coalition can also censor particular transactions by rejecting blocks | |||||
that include these transactions, though this would result in a | |||||
significant proportion of block proposals to be rejected, which would | |||||
slow down the rate of block commits of the blockchain, reducing its | |||||
utility and value. The malicious coalition might also broadcast votes in | |||||
a trickle so as to grind blockchain block commits to a near halt, or | |||||
engage in any combination of these attacks. | |||||
If a global active adversary were also involved, it can partition the | |||||
network in such a way that it may appear that the wrong subset of | |||||
validators were responsible for the slowdown. This is not just a | |||||
limitation of Tendermint, but rather a limitation of all consensus | |||||
protocols whose network is potentially controlled by an active | |||||
adversary. | |||||
### Overcoming Forks and Censorship Attacks | |||||
For these types of attacks, a subset of the validators through external | |||||
means should coordinate to sign a reorg-proposal that chooses a fork | |||||
(and any evidence thereof) and the initial subset of validators with | |||||
their signatures. Validators who sign such a reorg-proposal forego its | |||||
collateral on all other forks. Clients should verify the signatures on | |||||
the reorg-proposal, verify any evidence, and make a judgement or prompt | |||||
the end-user for a decision. For example, a phone wallet app may prompt | |||||
the user with a security warning, while a refrigerator may accept any | |||||
reorg-proposal signed by +1/2 of the original validators. | |||||
No non-synchronous Byzantine fault-tolerant algorithm can come to | |||||
consensus when 1/3+ of validators are dishonest, yet a fork assumes that | |||||
1/3+ of validators have already been dishonest by double-signing or | |||||
lock-changing without justification. So, signing the reorg-proposal is a | |||||
coordination problem that cannot be solved by any non-synchronous | |||||
protocol (i.e. automatically, and without making assumptions about the | |||||
reliability of the underlying network). It must be provided by means | |||||
external to the weakly-synchronous Tendermint consensus algorithm. For | |||||
now, we leave the problem of reorg-proposal coordination to human | |||||
coordination via internet media. Validators must take care to ensure | |||||
that there are no significant network partitions, to avoid situations | |||||
where two conflicting reorg-proposals are signed. | |||||
Assuming that the external coordination medium and protocol is robust, | |||||
it follows that forks are less of a concern than [censorship | |||||
attacks](#censorship-attacks). |
@ -1,218 +0,0 @@ | |||||
Block Structure | |||||
=============== | |||||
The tendermint consensus engine records all agreements by a | |||||
supermajority of nodes into a blockchain, which is replicated among all | |||||
nodes. This blockchain is accessible via various rpc endpoints, mainly | |||||
``/block?height=`` to get the full block, as well as | |||||
``/blockchain?minHeight=_&maxHeight=_`` to get a list of headers. But | |||||
what exactly is stored in these blocks? | |||||
Block | |||||
~~~~~ | |||||
A | |||||
`Block <https://godoc.org/github.com/tendermint/tendermint/types#Block>`__ | |||||
contains: | |||||
- a `Header <#header>`__ contains merkle hashes for various chain | |||||
states | |||||
- the | |||||
`Data <https://godoc.org/github.com/tendermint/tendermint/types#Data>`__ | |||||
is all transactions which are to be processed | |||||
- the `LastCommit <#commit>`__ > 2/3 signatures for the last block | |||||
The signatures returned along with block ``H`` are those validating | |||||
block ``H-1``. This can be a little confusing, but we must also consider | |||||
that the ``Header`` also contains the ``LastCommitHash``. It would be | |||||
impossible for a Header to include the commits that sign it, as it would | |||||
cause an infinite loop here. But when we get block ``H``, we find | |||||
``Header.LastCommitHash``, which must match the hash of ``LastCommit``. | |||||
Header | |||||
~~~~~~ | |||||
The | |||||
`Header <https://godoc.org/github.com/tendermint/tendermint/types#Header>`__ | |||||
contains lots of information (follow link for up-to-date info). Notably, | |||||
it maintains the ``Height``, the ``LastBlockID`` (to make it a chain), | |||||
and hashes of the data, the app state, and the validator set. This is | |||||
important as the only item that is signed by the validators is the | |||||
``Header``, and all other data must be validated against one of the | |||||
merkle hashes in the ``Header``. | |||||
The ``DataHash`` can provide a nice check on the | |||||
`Data <https://godoc.org/github.com/tendermint/tendermint/types#Data>`__ | |||||
returned in this same block. If you are subscribed to new blocks, via | |||||
tendermint RPC, in order to display or process the new transactions you | |||||
should at least validate that the ``DataHash`` is valid. If it is | |||||
important to verify autheniticity, you must wait for the ``LastCommit`` | |||||
from the next block to make sure the block header (including | |||||
``DataHash``) was properly signed. | |||||
The ``ValidatorHash`` contains a hash of the current | |||||
`Validators <https://godoc.org/github.com/tendermint/tendermint/types#Validator>`__. | |||||
Tracking all changes in the validator set is complex, but a client can | |||||
quickly compare this hash with the `hash of the currently known | |||||
validators <https://godoc.org/github.com/tendermint/tendermint/types#ValidatorSet.Hash>`__ | |||||
to see if there have been changes. | |||||
The ``AppHash`` serves as the basis for validating any merkle proofs | |||||
that come from the ABCI application. It represents the | |||||
state of the actual application, rather that the state of the blockchain | |||||
itself. This means it's necessary in order to perform any business | |||||
logic, such as verifying an account balance. | |||||
**Note** After the transactions are committed to a block, they still | |||||
need to be processed in a separate step, which happens between the | |||||
blocks. If you find a given transaction in the block at height ``H``, | |||||
the effects of running that transaction will be first visible in the | |||||
``AppHash`` from the block header at height ``H+1``. | |||||
Like the ``LastCommit`` issue, this is a requirement of the immutability | |||||
of the block chain, as the application only applies transactions *after* | |||||
they are commited to the chain. | |||||
Commit | |||||
~~~~~~ | |||||
The | |||||
`Commit <https://godoc.org/github.com/tendermint/tendermint/types#Commit>`__ | |||||
contains a set of | |||||
`Votes <https://godoc.org/github.com/tendermint/tendermint/types#Vote>`__ | |||||
that were made by the validator set to reach consensus on this block. | |||||
This is the key to the security in any PoS system, and actually no data | |||||
that cannot be traced back to a block header with a valid set of Votes | |||||
can be trusted. Thus, getting the Commit data and verifying the votes is | |||||
extremely important. | |||||
As mentioned above, in order to find the ``precommit votes`` for block | |||||
header ``H``, we need to query block ``H+1``. Then we need to check the | |||||
votes, make sure they really are for that block, and properly formatted. | |||||
Much of this code is implemented in Go in the | |||||
`light-client <https://github.com/tendermint/light-client>`__ package. | |||||
If you look at the code, you will notice that we need to provide the | |||||
``chainID`` of the blockchain in order to properly calculate the votes. | |||||
This is to protect anyone from swapping votes between chains to fake (or | |||||
frame) a validator. Also note that this ``chainID`` is in the | |||||
``genesis.json`` from *Tendermint*, not the ``genesis.json`` from the | |||||
basecoin app (`that is a different | |||||
chainID... <https://github.com/cosmos/cosmos-sdk/issues/32>`__). | |||||
Once we have those votes, and we calculated the proper `sign | |||||
bytes <https://godoc.org/github.com/tendermint/tendermint/types#Vote.WriteSignBytes>`__ | |||||
using the chainID and a `nice helper | |||||
function <https://godoc.org/github.com/tendermint/tendermint/types#SignBytes>`__, | |||||
we can verify them. The light client is responsible for maintaining a | |||||
set of validators that we trust. Each vote only stores the validators | |||||
``Address``, as well as the ``Signature``. Assuming we have a local copy | |||||
of the trusted validator set, we can look up the ``Public Key`` of the | |||||
validator given its ``Address``, then verify that the ``Signature`` | |||||
matches the ``SignBytes`` and ``Public Key``. Then we sum up the total | |||||
voting power of all validators, whose votes fulfilled all these | |||||
stringent requirements. If the total number of voting power for a single | |||||
block is greater than 2/3 of all voting power, then we can finally trust | |||||
the block header, the AppHash, and the proof we got from the ABCI | |||||
application. | |||||
Vote Sign Bytes | |||||
^^^^^^^^^^^^^^^ | |||||
The ``sign-bytes`` of a vote is produced by taking a | |||||
`stable-json <https://github.com/substack/json-stable-stringify>`__-like | |||||
deterministic JSON `wire <./wire-protocol.html>`__ encoding of | |||||
the vote (excluding the ``Signature`` field), and wrapping it with | |||||
``{"chain_id":"my_chain","vote":...}``. | |||||
For example, a precommit vote might have the following ``sign-bytes``: | |||||
.. code:: json | |||||
{"chain_id":"my_chain","vote":{"block_hash":"611801F57B4CE378DF1A3FFF1216656E89209A99","block_parts_header":{"hash":"B46697379DBE0774CC2C3B656083F07CA7E0F9CE","total":123},"height":1234,"round":1,"type":2}} | |||||
Block Hash | |||||
~~~~~~~~~~ | |||||
The `block | |||||
hash <https://godoc.org/github.com/tendermint/tendermint/types#Block.Hash>`__ | |||||
is the `Simple Tree hash <./merkle.html#simple-tree-with-dictionaries>`__ | |||||
of the fields of the block ``Header`` encoded as a list of | |||||
``KVPair``\ s. | |||||
Transaction | |||||
~~~~~~~~~~~ | |||||
A transaction is any sequence of bytes. It is up to your | |||||
ABCI application to accept or reject transactions. | |||||
BlockID | |||||
~~~~~~~ | |||||
Many of these data structures refer to the | |||||
`BlockID <https://godoc.org/github.com/tendermint/tendermint/types#BlockID>`__, | |||||
which is the ``BlockHash`` (hash of the block header, also referred to | |||||
by the next block) along with the ``PartSetHeader``. The | |||||
``PartSetHeader`` is explained below and is used internally to | |||||
orchestrate the p2p propogation. For clients, it is basically opaque | |||||
bytes, but they must match for all votes. | |||||
PartSetHeader | |||||
~~~~~~~~~~~~~ | |||||
The | |||||
`PartSetHeader <https://godoc.org/github.com/tendermint/tendermint/types#PartSetHeader>`__ | |||||
contains the total number of pieces in a | |||||
`PartSet <https://godoc.org/github.com/tendermint/tendermint/types#PartSet>`__, | |||||
and the Merkle root hash of those pieces. | |||||
PartSet | |||||
~~~~~~~ | |||||
PartSet is used to split a byteslice of data into parts (pieces) for | |||||
transmission. By splitting data into smaller parts and computing a | |||||
Merkle root hash on the list, you can verify that a part is legitimately | |||||
part of the complete data, and the part can be forwarded to other peers | |||||
before all the parts are known. In short, it's a fast way to securely | |||||
propagate a large chunk of data (like a block) over a gossip network. | |||||
PartSet was inspired by the LibSwift project. | |||||
Usage: | |||||
.. code:: go | |||||
data := RandBytes(2 << 20) // Something large | |||||
partSet := NewPartSetFromData(data) | |||||
partSet.Total() // Total number of 4KB parts | |||||
partSet.Count() // Equal to the Total, since we already have all the parts | |||||
partSet.Hash() // The Merkle root hash | |||||
partSet.BitArray() // A BitArray of partSet.Total() 1's | |||||
header := partSet.Header() // Send this to the peer | |||||
header.Total // Total number of parts | |||||
header.Hash // The merkle root hash | |||||
// Now we'll reconstruct the data from the parts | |||||
partSet2 := NewPartSetFromHeader(header) | |||||
partSet2.Total() // Same total as partSet.Total() | |||||
partSet2.Count() // Zero, since this PartSet doesn't have any parts yet. | |||||
partSet2.Hash() // Same hash as in partSet.Hash() | |||||
partSet2.BitArray() // A BitArray of partSet.Total() 0's | |||||
// In a gossip network the parts would arrive in arbitrary order, perhaps | |||||
// in response to explicit requests for parts, or optimistically in response | |||||
// to the receiving peer's partSet.BitArray(). | |||||
for !partSet2.IsComplete() { | |||||
part := receivePartFromGossipNetwork() | |||||
added, err := partSet2.AddPart(part) | |||||
if err != nil { | |||||
// A wrong part, | |||||
// the merkle trail does not hash to partSet2.Hash() | |||||
} else if !added { | |||||
// A duplicate part already received | |||||
} | |||||
} | |||||
data2, _ := ioutil.ReadAll(partSet2.GetReader()) | |||||
bytes.Equal(data, data2) // true |
@ -1,349 +0,0 @@ | |||||
Byzantine Consensus Algorithm | |||||
============================= | |||||
Terms | |||||
----- | |||||
- The network is composed of optionally connected *nodes*. Nodes | |||||
directly connected to a particular node are called *peers*. | |||||
- The consensus process in deciding the next block (at some *height* | |||||
``H``) is composed of one or many *rounds*. | |||||
- ``NewHeight``, ``Propose``, ``Prevote``, ``Precommit``, and | |||||
``Commit`` represent state machine states of a round. (aka | |||||
``RoundStep`` or just "step"). | |||||
- A node is said to be *at* a given height, round, and step, or at | |||||
``(H,R,S)``, or at ``(H,R)`` in short to omit the step. | |||||
- To *prevote* or *precommit* something means to broadcast a `prevote | |||||
vote <https://godoc.org/github.com/tendermint/tendermint/types#Vote>`__ | |||||
or `first precommit | |||||
vote <https://godoc.org/github.com/tendermint/tendermint/types#FirstPrecommit>`__ | |||||
for something. | |||||
- A vote *at* ``(H,R)`` is a vote signed with the bytes for ``H`` and | |||||
``R`` included in its | |||||
`sign-bytes <block-structure.html#vote-sign-bytes>`__. | |||||
- *+2/3* is short for "more than 2/3" | |||||
- *1/3+* is short for "1/3 or more" | |||||
- A set of +2/3 of prevotes for a particular block or ``<nil>`` at | |||||
``(H,R)`` is called a *proof-of-lock-change* or *PoLC* for short. | |||||
State Machine Overview | |||||
---------------------- | |||||
At each height of the blockchain a round-based protocol is run to | |||||
determine the next block. Each round is composed of three *steps* | |||||
(``Propose``, ``Prevote``, and ``Precommit``), along with two special | |||||
steps ``Commit`` and ``NewHeight``. | |||||
In the optimal scenario, the order of steps is: | |||||
:: | |||||
NewHeight -> (Propose -> Prevote -> Precommit)+ -> Commit -> NewHeight ->... | |||||
The sequence ``(Propose -> Prevote -> Precommit)`` is called a *round*. | |||||
There may be more than one round required to commit a block at a given | |||||
height. Examples for why more rounds may be required include: | |||||
- The designated proposer was not online. | |||||
- The block proposed by the designated proposer was not valid. | |||||
- The block proposed by the designated proposer did not propagate in | |||||
time. | |||||
- The block proposed was valid, but +2/3 of prevotes for the proposed | |||||
block were not received in time for enough validator nodes by the | |||||
time they reached the ``Precommit`` step. Even though +2/3 of | |||||
prevotes are necessary to progress to the next step, at least one | |||||
validator may have voted ``<nil>`` or maliciously voted for something | |||||
else. | |||||
- The block proposed was valid, and +2/3 of prevotes were received for | |||||
enough nodes, but +2/3 of precommits for the proposed block were not | |||||
received for enough validator nodes. | |||||
Some of these problems are resolved by moving onto the next round & | |||||
proposer. Others are resolved by increasing certain round timeout | |||||
parameters over each successive round. | |||||
State Machine Diagram | |||||
--------------------- | |||||
:: | |||||
+-------------------------------------+ | |||||
v |(Wait til `CommmitTime+timeoutCommit`) | |||||
+-----------+ +-----+-----+ | |||||
+----------> | Propose +--------------+ | NewHeight | | |||||
| +-----------+ | +-----------+ | |||||
| | ^ | |||||
|(Else, after timeoutPrecommit) v | | |||||
+-----+-----+ +-----------+ | | |||||
| Precommit | <------------------------+ Prevote | | | |||||
+-----+-----+ +-----------+ | | |||||
|(When +2/3 Precommits for block found) | | |||||
v | | |||||
+--------------------------------------------------------------------+ | |||||
| Commit | | |||||
| | | |||||
| * Set CommitTime = now; | | |||||
| * Wait for block, then stage/save/commit block; | | |||||
+--------------------------------------------------------------------+ | |||||
Background Gossip | |||||
----------------- | |||||
A node may not have a corresponding validator private key, but it | |||||
nevertheless plays an active role in the consensus process by relaying | |||||
relevant meta-data, proposals, blocks, and votes to its peers. A node | |||||
that has the private keys of an active validator and is engaged in | |||||
signing votes is called a *validator-node*. All nodes (not just | |||||
validator-nodes) have an associated state (the current height, round, | |||||
and step) and work to make progress. | |||||
Between two nodes there exists a ``Connection``, and multiplexed on top | |||||
of this connection are fairly throttled ``Channel``\ s of information. | |||||
An epidemic gossip protocol is implemented among some of these channels | |||||
to bring peers up to speed on the most recent state of consensus. For | |||||
example, | |||||
- Nodes gossip ``PartSet`` parts of the current round's proposer's | |||||
proposed block. A LibSwift inspired algorithm is used to quickly | |||||
broadcast blocks across the gossip network. | |||||
- Nodes gossip prevote/precommit votes. A node NODE\_A that is ahead of | |||||
NODE\_B can send NODE\_B prevotes or precommits for NODE\_B's current | |||||
(or future) round to enable it to progress forward. | |||||
- Nodes gossip prevotes for the proposed PoLC (proof-of-lock-change) | |||||
round if one is proposed. | |||||
- Nodes gossip to nodes lagging in blockchain height with block | |||||
`commits <https://godoc.org/github.com/tendermint/tendermint/types#Commit>`__ | |||||
for older blocks. | |||||
- Nodes opportunistically gossip ``HasVote`` messages to hint peers | |||||
what votes it already has. | |||||
- Nodes broadcast their current state to all neighboring peers. (but is | |||||
not gossiped further) | |||||
There's more, but let's not get ahead of ourselves here. | |||||
Proposals | |||||
--------- | |||||
A proposal is signed and published by the designated proposer at each | |||||
round. The proposer is chosen by a deterministic and non-choking round | |||||
robin selection algorithm that selects proposers in proportion to their | |||||
voting power. (see | |||||
`implementation <https://github.com/tendermint/tendermint/blob/develop/types/validator_set.go>`__) | |||||
A proposal at ``(H,R)`` is composed of a block and an optional latest | |||||
``PoLC-Round < R`` which is included iff the proposer knows of one. This | |||||
hints the network to allow nodes to unlock (when safe) to ensure the | |||||
liveness property. | |||||
State Machine Spec | |||||
------------------ | |||||
Propose Step (height:H,round:R) | |||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||||
Upon entering ``Propose``: - The designated proposer proposes a block at | |||||
``(H,R)``. | |||||
The ``Propose`` step ends: - After ``timeoutProposeR`` after entering | |||||
``Propose``. --> goto ``Prevote(H,R)`` - After receiving proposal block | |||||
and all prevotes at ``PoLC-Round``. --> goto ``Prevote(H,R)`` - After | |||||
`common exit conditions <#common-exit-conditions>`__ | |||||
Prevote Step (height:H,round:R) | |||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||||
Upon entering ``Prevote``, each validator broadcasts its prevote vote. | |||||
- First, if the validator is locked on a block since ``LastLockRound`` | |||||
but now has a PoLC for something else at round ``PoLC-Round`` where | |||||
``LastLockRound < PoLC-Round < R``, then it unlocks. | |||||
- If the validator is still locked on a block, it prevotes that. | |||||
- Else, if the proposed block from ``Propose(H,R)`` is good, it | |||||
prevotes that. | |||||
- Else, if the proposal is invalid or wasn't received on time, it | |||||
prevotes ``<nil>``. | |||||
The ``Prevote`` step ends: - After +2/3 prevotes for a particular block | |||||
or ``<nil>``. --> goto ``Precommit(H,R)`` - After ``timeoutPrevote`` | |||||
after receiving any +2/3 prevotes. --> goto ``Precommit(H,R)`` - After | |||||
`common exit conditions <#common-exit-conditions>`__ | |||||
Precommit Step (height:H,round:R) | |||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||||
Upon entering ``Precommit``, each validator broadcasts its precommit | |||||
vote. - If the validator has a PoLC at ``(H,R)`` for a particular block | |||||
``B``, it (re)locks (or changes lock to) and precommits ``B`` and sets | |||||
``LastLockRound = R``. - Else, if the validator has a PoLC at ``(H,R)`` | |||||
for ``<nil>``, it unlocks and precommits ``<nil>``. - Else, it keeps the | |||||
lock unchanged and precommits ``<nil>``. | |||||
A precommit for ``<nil>`` means "I didn’t see a PoLC for this round, but | |||||
I did get +2/3 prevotes and waited a bit". | |||||
The Precommit step ends: - After +2/3 precommits for ``<nil>``. --> goto | |||||
``Propose(H,R+1)`` - After ``timeoutPrecommit`` after receiving any +2/3 | |||||
precommits. --> goto ``Propose(H,R+1)`` - After `common exit | |||||
conditions <#common-exit-conditions>`__ | |||||
common exit conditions | |||||
^^^^^^^^^^^^^^^^^^^^^^ | |||||
- After +2/3 precommits for a particular block. --> goto ``Commit(H)`` | |||||
- After any +2/3 prevotes received at ``(H,R+x)``. --> goto | |||||
``Prevote(H,R+x)`` | |||||
- After any +2/3 precommits received at ``(H,R+x)``. --> goto | |||||
``Precommit(H,R+x)`` | |||||
Commit Step (height:H) | |||||
~~~~~~~~~~~~~~~~~~~~~~ | |||||
- Set ``CommitTime = now()`` | |||||
- Wait until block is received. --> goto ``NewHeight(H+1)`` | |||||
NewHeight Step (height:H) | |||||
~~~~~~~~~~~~~~~~~~~~~~~~~ | |||||
- Move ``Precommits`` to ``LastCommit`` and increment height. | |||||
- Set ``StartTime = CommitTime+timeoutCommit`` | |||||
- Wait until ``StartTime`` to receive straggler commits. --> goto | |||||
``Propose(H,0)`` | |||||
Proofs | |||||
------ | |||||
Proof of Safety | |||||
~~~~~~~~~~~~~~~ | |||||
Assume that at most -1/3 of the voting power of validators is byzantine. | |||||
If a validator commits block ``B`` at round ``R``, it's because it saw | |||||
+2/3 of precommits at round ``R``. This implies that 1/3+ of honest | |||||
nodes are still locked at round ``R' > R``. These locked validators will | |||||
remain locked until they see a PoLC at ``R' > R``, but this won't happen | |||||
because 1/3+ are locked and honest, so at most -2/3 are available to | |||||
vote for anything other than ``B``. | |||||
Proof of Liveness | |||||
~~~~~~~~~~~~~~~~~ | |||||
If 1/3+ honest validators are locked on two different blocks from | |||||
different rounds, a proposers' ``PoLC-Round`` will eventually cause | |||||
nodes locked from the earlier round to unlock. Eventually, the | |||||
designated proposer will be one that is aware of a PoLC at the later | |||||
round. Also, ``timeoutProposalR`` increments with round ``R``, while the | |||||
size of a proposal are capped, so eventually the network is able to | |||||
"fully gossip" the whole proposal (e.g. the block & PoLC). | |||||
Proof of Fork Accountability | |||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||||
Define the JSet (justification-vote-set) at height ``H`` of a validator | |||||
``V1`` to be all the votes signed by the validator at ``H`` along with | |||||
justification PoLC prevotes for each lock change. For example, if ``V1`` | |||||
signed the following precommits: ``Precommit(B1 @ round 0)``, | |||||
``Precommit(<nil> @ round 1)``, ``Precommit(B2 @ round 4)`` (note that | |||||
no precommits were signed for rounds 2 and 3, and that's ok), | |||||
``Precommit(B1 @ round 0)`` must be justified by a PoLC at round 0, and | |||||
``Precommit(B2 @ round 4)`` must be justified by a PoLC at round 4; but | |||||
the precommit for ``<nil>`` at round 1 is not a lock-change by | |||||
definition so the JSet for ``V1`` need not include any prevotes at round | |||||
1, 2, or 3 (unless ``V1`` happened to have prevoted for those rounds). | |||||
Further, define the JSet at height ``H`` of a set of validators ``VSet`` | |||||
to be the union of the JSets for each validator in ``VSet``. For a given | |||||
commit by honest validators at round ``R`` for block ``B`` we can | |||||
construct a JSet to justify the commit for ``B`` at ``R``. We say that a | |||||
JSet *justifies* a commit at ``(H,R)`` if all the committers (validators | |||||
in the commit-set) are each justified in the JSet with no duplicitous | |||||
vote signatures (by the committers). | |||||
- **Lemma**: When a fork is detected by the existence of two | |||||
conflicting `commits <./validators.html#commiting-a-block>`__, | |||||
the union of the JSets for both commits (if they can be compiled) | |||||
must include double-signing by at least 1/3+ of the validator set. | |||||
**Proof**: The commit cannot be at the same round, because that would | |||||
immediately imply double-signing by 1/3+. Take the union of the JSets | |||||
of both commits. If there is no double-signing by at least 1/3+ of | |||||
the validator set in the union, then no honest validator could have | |||||
precommitted any different block after the first commit. Yet, +2/3 | |||||
did. Reductio ad absurdum. | |||||
As a corollary, when there is a fork, an external process can determine | |||||
the blame by requiring each validator to justify all of its round votes. | |||||
Either we will find 1/3+ who cannot justify at least one of their votes, | |||||
and/or, we will find 1/3+ who had double-signed. | |||||
Alternative algorithm | |||||
~~~~~~~~~~~~~~~~~~~~~ | |||||
Alternatively, we can take the JSet of a commit to be the "full commit". | |||||
That is, if light clients and validators do not consider a block to be | |||||
committed unless the JSet of the commit is also known, then we get the | |||||
desirable property that if there ever is a fork (e.g. there are two | |||||
conflicting "full commits"), then 1/3+ of the validators are immediately | |||||
punishable for double-signing. | |||||
There are many ways to ensure that the gossip network efficiently share | |||||
the JSet of a commit. One solution is to add a new message type that | |||||
tells peers that this node has (or does not have) a +2/3 majority for B | |||||
(or ) at (H,R), and a bitarray of which votes contributed towards that | |||||
majority. Peers can react by responding with appropriate votes. | |||||
We will implement such an algorithm for the next iteration of the | |||||
Tendermint consensus protocol. | |||||
Other potential improvements include adding more data in votes such as | |||||
the last known PoLC round that caused a lock change, and the last voted | |||||
round/step (or, we may require that validators not skip any votes). This | |||||
may make JSet verification/gossip logic easier to implement. | |||||
Censorship Attacks | |||||
~~~~~~~~~~~~~~~~~~ | |||||
Due to the definition of a block | |||||
`commit <validators.html#commiting-a-block>`__, any 1/3+ | |||||
coalition of validators can halt the blockchain by not broadcasting | |||||
their votes. Such a coalition can also censor particular transactions by | |||||
rejecting blocks that include these transactions, though this would | |||||
result in a significant proportion of block proposals to be rejected, | |||||
which would slow down the rate of block commits of the blockchain, | |||||
reducing its utility and value. The malicious coalition might also | |||||
broadcast votes in a trickle so as to grind blockchain block commits to | |||||
a near halt, or engage in any combination of these attacks. | |||||
If a global active adversary were also involved, it can partition the | |||||
network in such a way that it may appear that the wrong subset of | |||||
validators were responsible for the slowdown. This is not just a | |||||
limitation of Tendermint, but rather a limitation of all consensus | |||||
protocols whose network is potentially controlled by an active | |||||
adversary. | |||||
Overcoming Forks and Censorship Attacks | |||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||||
For these types of attacks, a subset of the validators through external | |||||
means should coordinate to sign a reorg-proposal that chooses a fork | |||||
(and any evidence thereof) and the initial subset of validators with | |||||
their signatures. Validators who sign such a reorg-proposal forego its | |||||
collateral on all other forks. Clients should verify the signatures on | |||||
the reorg-proposal, verify any evidence, and make a judgement or prompt | |||||
the end-user for a decision. For example, a phone wallet app may prompt | |||||
the user with a security warning, while a refrigerator may accept any | |||||
reorg-proposal signed by +1/2 of the original validators. | |||||
No non-synchronous Byzantine fault-tolerant algorithm can come to | |||||
consensus when 1/3+ of validators are dishonest, yet a fork assumes that | |||||
1/3+ of validators have already been dishonest by double-signing or | |||||
lock-changing without justification. So, signing the reorg-proposal is a | |||||
coordination problem that cannot be solved by any non-synchronous | |||||
protocol (i.e. automatically, and without making assumptions about the | |||||
reliability of the underlying network). It must be provided by means | |||||
external to the weakly-synchronous Tendermint consensus algorithm. For | |||||
now, we leave the problem of reorg-proposal coordination to human | |||||
coordination via internet media. Validators must take care to ensure | |||||
that there are no significant network partitions, to avoid situations | |||||
where two conflicting reorg-proposals are signed. | |||||
Assuming that the external coordination medium and protocol is robust, | |||||
it follows that forks are less of a concern than `censorship | |||||
attacks <#censorship-attacks>`__. |
@ -1,70 +0,0 @@ | |||||
Corruption | |||||
========== | |||||
Important step | |||||
-------------- | |||||
Make sure you have a backup of the Tendermint data directory. | |||||
Possible causes | |||||
--------------- | |||||
Remember that most corruption is caused by hardware issues: | |||||
- RAID controllers with faulty / worn out battery backup, and an unexpected power loss | |||||
- Hard disk drives with write-back cache enabled, and an unexpected power loss | |||||
- Cheap SSDs with insufficient power-loss protection, and an unexpected power-loss | |||||
- Defective RAM | |||||
- Defective or overheating CPU(s) | |||||
Other causes can be: | |||||
- Database systems configured with fsync=off and an OS crash or power loss | |||||
- Filesystems configured to use write barriers plus a storage layer that ignores write barriers. LVM is a particular culprit. | |||||
- Tendermint bugs | |||||
- Operating system bugs | |||||
- Admin error | |||||
- directly modifying Tendermint data-directory contents | |||||
(Source: https://wiki.postgresql.org/wiki/Corruption) | |||||
WAL Corruption | |||||
-------------- | |||||
If consensus WAL is corrupted at the lastest height and you are trying to start | |||||
Tendermint, replay will fail with panic. | |||||
Recovering from data corruption can be hard and time-consuming. Here are two approaches you can take: | |||||
1) Delete the WAL file and restart Tendermint. It will attempt to sync with other peers. | |||||
2) Try to repair the WAL file manually: | |||||
1. Create a backup of the corrupted WAL file: | |||||
.. code:: bash | |||||
cp "$TMHOME/data/cs.wal/wal" > /tmp/corrupted_wal_backup | |||||
2. Use ./scripts/wal2json to create a human-readable version | |||||
.. code:: bash | |||||
./scripts/wal2json/wal2json "$TMHOME/data/cs.wal/wal" > /tmp/corrupted_wal | |||||
3. Search for a "CORRUPTED MESSAGE" line. | |||||
4. By looking at the previous message and the message after the corrupted one | |||||
and looking at the logs, try to rebuild the message. If the consequent | |||||
messages are marked as corrupted too (this may happen if length header | |||||
got corrupted or some writes did not make it to the WAL ~ truncation), | |||||
then remove all the lines starting from the corrupted one and restart | |||||
Tendermint. | |||||
.. code:: bash | |||||
$EDITOR /tmp/corrupted_wal | |||||
5. After editing, convert this file back into binary form by running: | |||||
.. code:: bash | |||||
./scripts/json2wal/json2wal /tmp/corrupted_wal > "$TMHOME/data/cs.wal/wal" |
@ -1,71 +0,0 @@ | |||||
Genesis | |||||
======= | |||||
The genesis.json file in ``$TMHOME/config`` defines the initial TendermintCore | |||||
state upon genesis of the blockchain (`see | |||||
definition <https://github.com/tendermint/tendermint/blob/master/types/genesis.go>`__). | |||||
Fields | |||||
~~~~~~ | |||||
- ``genesis_time``: Official time of blockchain start. | |||||
- ``chain_id``: ID of the blockchain. This must be unique for every | |||||
blockchain. If your testnet blockchains do not have unique chain IDs, | |||||
you will have a bad time. | |||||
- ``validators``: | |||||
- ``pub_key``: The first element specifies the pub\_key type. 1 == | |||||
Ed25519. The second element are the pubkey bytes. | |||||
- ``power``: The validator's voting power. | |||||
- ``name``: Name of the validator (optional). | |||||
- ``app_hash``: The expected application hash (as returned by the | |||||
``ResponseInfo`` ABCI message) upon genesis. If the app's hash does not | |||||
match, Tendermint will panic. | |||||
- ``app_state``: The application state (e.g. initial distribution of tokens). | |||||
Sample genesis.json | |||||
~~~~~~~~~~~~~~~~~~~ | |||||
.. code:: json | |||||
{ | |||||
"genesis_time": "2016-02-05T06:02:31.526Z", | |||||
"chain_id": "chain-tTH4mi", | |||||
"validators": [ | |||||
{ | |||||
"pub_key": [ | |||||
1, | |||||
"9BC5112CB9614D91CE423FA8744885126CD9D08D9FC9D1F42E552D662BAA411E" | |||||
], | |||||
"power": 1, | |||||
"name": "mach1" | |||||
}, | |||||
{ | |||||
"pub_key": [ | |||||
1, | |||||
"F46A5543D51F31660D9F59653B4F96061A740FF7433E0DC1ECBC30BE8494DE06" | |||||
], | |||||
"power": 1, | |||||
"name": "mach2" | |||||
}, | |||||
{ | |||||
"pub_key": [ | |||||
1, | |||||
"0E7B423C1635FD07C0FC3603B736D5D27953C1C6CA865BB9392CD79DE1A682BB" | |||||
], | |||||
"power": 1, | |||||
"name": "mach3" | |||||
}, | |||||
{ | |||||
"pub_key": [ | |||||
1, | |||||
"4F49237B9A32EB50682EDD83C48CE9CDB1D02A7CFDADCFF6EC8C1FAADB358879" | |||||
], | |||||
"power": 1, | |||||
"name": "mach4" | |||||
} | |||||
], | |||||
"app_hash": "15005165891224E721CB664D15CB972240F5703F", | |||||
"app_state": { | |||||
{"account": "Bob", "coins": 5000} | |||||
} | |||||
} |
@ -1,33 +0,0 @@ | |||||
Light Client Protocol | |||||
===================== | |||||
Light clients are an important part of the complete blockchain system | |||||
for most applications. Tendermint provides unique speed and security | |||||
properties for light client applications. | |||||
See our `lite package | |||||
<https://godoc.org/github.com/tendermint/tendermint/lite>`__. | |||||
Overview | |||||
-------- | |||||
The objective of the light client protocol is to get a | |||||
`commit <./validators.html#committing-a-block>`__ for a recent | |||||
`block hash <./block-structure.html#block-hash>`__ where the commit | |||||
includes a majority of signatures from the last known validator set. | |||||
From there, all the application state is verifiable with `merkle | |||||
proofs <./merkle.html#iavl-tree>`__. | |||||
Properties | |||||
---------- | |||||
- You get the full collateralized security benefits of Tendermint; No | |||||
need to wait for confirmations. | |||||
- You get the full speed benefits of Tendermint; transactions commit | |||||
instantly. | |||||
- You can get the most recent version of the application state | |||||
non-interactively (without committing anything to the blockchain). | |||||
For example, this means that you can get the most recent value of a | |||||
name from the name-registry without worrying about fork censorship | |||||
attacks, without posting a commit and waiting for confirmations. It's | |||||
fast, secure, and free! |
@ -1,88 +0,0 @@ | |||||
Merkle | |||||
====== | |||||
For an overview of Merkle trees, see | |||||
`wikipedia <https://en.wikipedia.org/wiki/Merkle_tree>`__. | |||||
There are two types of Merkle trees used in Tendermint. | |||||
- **IAVL+ Tree**: An immutable self-balancing binary | |||||
tree for persistent application state | |||||
- **Simple Tree**: A simple compact binary tree for | |||||
a static list of items | |||||
IAVL+ Tree | |||||
---------- | |||||
The purpose of this data structure is to provide persistent storage for | |||||
key-value pairs (e.g. account state, name-registrar data, and | |||||
per-contract data) such that a deterministic merkle root hash can be | |||||
computed. The tree is balanced using a variant of the `AVL | |||||
algorithm <http://en.wikipedia.org/wiki/AVL_tree>`__ so all operations | |||||
are O(log(n)). | |||||
Nodes of this tree are immutable and indexed by its hash. Thus any node | |||||
serves as an immutable snapshot which lets us stage uncommitted | |||||
transactions from the mempool cheaply, and we can instantly roll back to | |||||
the last committed state to process transactions of a newly committed | |||||
block (which may not be the same set of transactions as those from the | |||||
mempool). | |||||
In an AVL tree, the heights of the two child subtrees of any node differ | |||||
by at most one. Whenever this condition is violated upon an update, the | |||||
tree is rebalanced by creating O(log(n)) new nodes that point to | |||||
unmodified nodes of the old tree. In the original AVL algorithm, inner | |||||
nodes can also hold key-value pairs. The AVL+ algorithm (note the plus) | |||||
modifies the AVL algorithm to keep all values on leaf nodes, while only | |||||
using branch-nodes to store keys. This simplifies the algorithm while | |||||
minimizing the size of merkle proofs | |||||
In Ethereum, the analog is the `Patricia | |||||
trie <http://en.wikipedia.org/wiki/Radix_tree>`__. There are tradeoffs. | |||||
Keys do not need to be hashed prior to insertion in IAVL+ trees, so this | |||||
provides faster iteration in the key space which may benefit some | |||||
applications. The logic is simpler to implement, requiring only two | |||||
types of nodes -- inner nodes and leaf nodes. The IAVL+ tree is a binary | |||||
tree, so merkle proofs are much shorter than the base 16 Patricia trie. | |||||
On the other hand, while IAVL+ trees provide a deterministic merkle root | |||||
hash, it depends on the order of updates. In practice this shouldn't be | |||||
a problem, since you can efficiently encode the tree structure when | |||||
serializing the tree contents. | |||||
Simple Tree | |||||
----------- | |||||
For merkelizing smaller static lists, use the Simple Tree. The | |||||
transactions and validation signatures of a block are hashed using this | |||||
simple merkle tree logic. | |||||
If the number of items is not a power of two, the tree will not be full | |||||
and some leaf nodes will be at different levels. Simple Tree tries to | |||||
keep both sides of the tree the same size, but the left side may be one | |||||
greater. | |||||
:: | |||||
Simple Tree with 6 items Simple Tree with 7 items | |||||
* * | |||||
/ \ / \ | |||||
/ \ / \ | |||||
/ \ / \ | |||||
/ \ / \ | |||||
* * * * | |||||
/ \ / \ / \ / \ | |||||
/ \ / \ / \ / \ | |||||
/ \ / \ / \ / \ | |||||
* h2 * h5 * * * h6 | |||||
/ \ / \ / \ / \ / \ | |||||
h0 h1 h3 h4 h0 h1 h2 h3 h4 h5 | |||||
Simple Tree with Dictionaries | |||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||||
The Simple Tree is used to merkelize a list of items, so to merkelize a | |||||
(short) dictionary of key-value pairs, encode the dictionary as an | |||||
ordered list of ``KVPair`` structs. The block hash is such a hash | |||||
derived from all the fields of the block ``Header``. The state hash is | |||||
similarly derived. |
@ -1 +0,0 @@ | |||||
Spec moved to [docs/spec](https://github.com/tendermint/tendermint/tree/master/docs/spec). |
@ -1,172 +0,0 @@ | |||||
Wire Protocol | |||||
============= | |||||
The `Tendermint wire protocol <https://github.com/tendermint/go-wire>`__ | |||||
encodes data in `c-style binary <#binary>`__ and `JSON <#json>`__ form. | |||||
Supported types | |||||
--------------- | |||||
- Primitive types | |||||
- ``uint8`` (aka ``byte``), ``uint16``, ``uint32``, ``uint64`` | |||||
- ``int8``, ``int16``, ``int32``, ``int64`` | |||||
- ``uint``, ``int``: variable length (un)signed integers | |||||
- ``string``, ``[]byte`` | |||||
- ``time`` | |||||
- Derived types | |||||
- structs | |||||
- var-length arrays of a particular type | |||||
- fixed-length arrays of a particular type | |||||
- interfaces: registered union types preceded by a ``type byte`` | |||||
- pointers | |||||
Binary | |||||
------ | |||||
**Fixed-length primitive types** are encoded with 1,2,3, or 4 big-endian | |||||
bytes. - ``uint8`` (aka ``byte``), ``uint16``, ``uint32``, ``uint64``: | |||||
takes 1,2,3, and 4 bytes respectively - ``int8``, ``int16``, ``int32``, | |||||
``int64``: takes 1,2,3, and 4 bytes respectively - ``time``: ``int64`` | |||||
representation of nanoseconds since epoch | |||||
**Variable-length integers** are encoded with a single leading byte | |||||
representing the length of the following big-endian bytes. For signed | |||||
negative integers, the most significant bit of the leading byte is a 1. | |||||
- ``uint``: 1-byte length prefixed variable-size (0 ~ 255 bytes) | |||||
unsigned integers | |||||
- ``int``: 1-byte length prefixed variable-size (0 ~ 127 bytes) signed | |||||
integers | |||||
NOTE: While the number 0 (zero) is encoded with a single byte ``x00``, | |||||
the number 1 (one) takes two bytes to represent: ``x0101``. This isn't | |||||
the most efficient representation, but the rules are easier to remember. | |||||
+---------------+----------------+----------------+ | |||||
| number | binary | binary ``int`` | | |||||
| | ``uint`` | | | |||||
+===============+================+================+ | |||||
| 0 | ``x00`` | ``x00`` | | |||||
+---------------+----------------+----------------+ | |||||
| 1 | ``x0101`` | ``x0101`` | | |||||
+---------------+----------------+----------------+ | |||||
| 2 | ``x0102`` | ``x0102`` | | |||||
+---------------+----------------+----------------+ | |||||
| 256 | ``x020100`` | ``x020100`` | | |||||
+---------------+----------------+----------------+ | |||||
| 2^(127\ *8)-1 | ``x800100...`` | overflow | | |||||
| \| | | | | |||||
| ``x7FFFFF...` | | | | |||||
| ` | | | | |||||
| \| | | | | |||||
| ``x7FFFFF...` | | | | |||||
| ` | | | | |||||
| \| \| | | | | |||||
| 2^(127*\ 8) | | | | |||||
+---------------+----------------+----------------+ | |||||
| 2^(255\*8)-1 | | |||||
| \| | | |||||
| ``xFFFFFF...` | | |||||
| ` | | |||||
| \| overflow | | |||||
| \| \| -1 \| | | |||||
| n/a \| | | |||||
| ``x8101`` \| | | |||||
| \| -2 \| n/a | | |||||
| \| ``x8102`` | | |||||
| \| \| -256 \| | | |||||
| n/a \| | | |||||
| ``x820100`` | | |||||
| \| | | |||||
+---------------+----------------+----------------+ | |||||
**Structures** are encoded by encoding the field values in order of | |||||
declaration. | |||||
.. code:: go | |||||
type Foo struct { | |||||
MyString string | |||||
MyUint32 uint32 | |||||
} | |||||
var foo = Foo{"626172", math.MaxUint32} | |||||
/* The binary representation of foo: | |||||
0103626172FFFFFFFF | |||||
0103: `int` encoded length of string, here 3 | |||||
626172: 3 bytes of string "bar" | |||||
FFFFFFFF: 4 bytes of uint32 MaxUint32 | |||||
*/ | |||||
**Variable-length arrays** are encoded with a leading ``int`` denoting | |||||
the length of the array followed by the binary representation of the | |||||
items. **Fixed-length arrays** are similar but aren't preceded by the | |||||
leading ``int``. | |||||
.. code:: go | |||||
foos := []Foo{foo, foo} | |||||
/* The binary representation of foos: | |||||
01020103626172FFFFFFFF0103626172FFFFFFFF | |||||
0102: `int` encoded length of array, here 2 | |||||
0103626172FFFFFFFF: the first `foo` | |||||
0103626172FFFFFFFF: the second `foo` | |||||
*/ | |||||
foos := [2]Foo{foo, foo} // fixed-length array | |||||
/* The binary representation of foos: | |||||
0103626172FFFFFFFF0103626172FFFFFFFF | |||||
0103626172FFFFFFFF: the first `foo` | |||||
0103626172FFFFFFFF: the second `foo` | |||||
*/ | |||||
**Interfaces** can represent one of any number of concrete types. The | |||||
concrete types of an interface must first be declared with their | |||||
corresponding ``type byte``. An interface is then encoded with the | |||||
leading ``type byte``, then the binary encoding of the underlying | |||||
concrete type. | |||||
NOTE: The byte ``x00`` is reserved for the ``nil`` interface value and | |||||
``nil`` pointer values. | |||||
.. code:: go | |||||
type Animal interface{} | |||||
type Dog uint32 | |||||
type Cat string | |||||
RegisterInterface( | |||||
struct{ Animal }{}, // Convenience for referencing the 'Animal' interface | |||||
ConcreteType{Dog(0), 0x01}, // Register the byte 0x01 to denote a Dog | |||||
ConcreteType{Cat(""), 0x02}, // Register the byte 0x02 to denote a Cat | |||||
) | |||||
var animal Animal = Dog(02) | |||||
/* The binary representation of animal: | |||||
010102 | |||||
01: the type byte for a `Dog` | |||||
0102: the bytes of Dog(02) | |||||
*/ | |||||
**Pointers** are encoded with a single leading byte ``x00`` for ``nil`` | |||||
pointers, otherwise encoded with a leading byte ``x01`` followed by the | |||||
binary encoding of the value pointed to. | |||||
NOTE: It's easy to convert pointer types into interface types, since the | |||||
``type byte`` ``x00`` is always ``nil``. | |||||
JSON | |||||
---- | |||||
The JSON codec is compatible with the ```binary`` <#binary>`__ codec, | |||||
and is fairly intuitive if you're already familiar with golang's JSON | |||||
encoding. Some quirks are noted below: | |||||
- variable-length and fixed-length bytes are encoded as uppercase | |||||
hexadecimal strings | |||||
- interface values are encoded as an array of two items: | |||||
``[type_byte, concrete_value]`` | |||||
- times are encoded as rfc2822 strings |
@ -0,0 +1,206 @@ | |||||
# Block Structure | |||||
The tendermint consensus engine records all agreements by a | |||||
supermajority of nodes into a blockchain, which is replicated among all | |||||
nodes. This blockchain is accessible via various rpc endpoints, mainly | |||||
`/block?height=` to get the full block, as well as | |||||
`/blockchain?minHeight=_&maxHeight=_` to get a list of headers. But what | |||||
exactly is stored in these blocks? | |||||
## Block | |||||
A | |||||
[Block](https://godoc.org/github.com/tendermint/tendermint/types#Block) | |||||
contains: | |||||
- a [Header](#header) contains merkle hashes for various chain states | |||||
- the | |||||
[Data](https://godoc.org/github.com/tendermint/tendermint/types#Data) | |||||
is all transactions which are to be processed | |||||
- the [LastCommit](#commit) > 2/3 signatures for the last block | |||||
The signatures returned along with block `H` are those validating block | |||||
`H-1`. This can be a little confusing, but we must also consider that | |||||
the `Header` also contains the `LastCommitHash`. It would be impossible | |||||
for a Header to include the commits that sign it, as it would cause an | |||||
infinite loop here. But when we get block `H`, we find | |||||
`Header.LastCommitHash`, which must match the hash of `LastCommit`. | |||||
## Header | |||||
The | |||||
[Header](https://godoc.org/github.com/tendermint/tendermint/types#Header) | |||||
contains lots of information (follow link for up-to-date info). Notably, | |||||
it maintains the `Height`, the `LastBlockID` (to make it a chain), and | |||||
hashes of the data, the app state, and the validator set. This is | |||||
important as the only item that is signed by the validators is the | |||||
`Header`, and all other data must be validated against one of the merkle | |||||
hashes in the `Header`. | |||||
The `DataHash` can provide a nice check on the | |||||
[Data](https://godoc.org/github.com/tendermint/tendermint/types#Data) | |||||
returned in this same block. If you are subscribed to new blocks, via | |||||
tendermint RPC, in order to display or process the new transactions you | |||||
should at least validate that the `DataHash` is valid. If it is | |||||
important to verify autheniticity, you must wait for the `LastCommit` | |||||
from the next block to make sure the block header (including `DataHash`) | |||||
was properly signed. | |||||
The `ValidatorHash` contains a hash of the current | |||||
[Validators](https://godoc.org/github.com/tendermint/tendermint/types#Validator). | |||||
Tracking all changes in the validator set is complex, but a client can | |||||
quickly compare this hash with the [hash of the currently known | |||||
validators](https://godoc.org/github.com/tendermint/tendermint/types#ValidatorSet.Hash) | |||||
to see if there have been changes. | |||||
The `AppHash` serves as the basis for validating any merkle proofs that | |||||
come from the ABCI application. It represents the state of the actual | |||||
application, rather that the state of the blockchain itself. This means | |||||
it's necessary in order to perform any business logic, such as verifying | |||||
an account balance. | |||||
**Note** After the transactions are committed to a block, they still | |||||
need to be processed in a separate step, which happens between the | |||||
blocks. If you find a given transaction in the block at height `H`, the | |||||
effects of running that transaction will be first visible in the | |||||
`AppHash` from the block header at height `H+1`. | |||||
Like the `LastCommit` issue, this is a requirement of the immutability | |||||
of the block chain, as the application only applies transactions *after* | |||||
they are commited to the chain. | |||||
## Commit | |||||
The | |||||
[Commit](https://godoc.org/github.com/tendermint/tendermint/types#Commit) | |||||
contains a set of | |||||
[Votes](https://godoc.org/github.com/tendermint/tendermint/types#Vote) | |||||
that were made by the validator set to reach consensus on this block. | |||||
This is the key to the security in any PoS system, and actually no data | |||||
that cannot be traced back to a block header with a valid set of Votes | |||||
can be trusted. Thus, getting the Commit data and verifying the votes is | |||||
extremely important. | |||||
As mentioned above, in order to find the `precommit votes` for block | |||||
header `H`, we need to query block `H+1`. Then we need to check the | |||||
votes, make sure they really are for that block, and properly formatted. | |||||
Much of this code is implemented in Go in the | |||||
[light-client](https://github.com/tendermint/light-client) package. If | |||||
you look at the code, you will notice that we need to provide the | |||||
`chainID` of the blockchain in order to properly calculate the votes. | |||||
This is to protect anyone from swapping votes between chains to fake (or | |||||
frame) a validator. Also note that this `chainID` is in the | |||||
`genesis.json` from *Tendermint*, not the `genesis.json` from the | |||||
basecoin app ([that is a different | |||||
chainID...](https://github.com/cosmos/cosmos-sdk/issues/32)). | |||||
Once we have those votes, and we calculated the proper [sign | |||||
bytes](https://godoc.org/github.com/tendermint/tendermint/types#Vote.WriteSignBytes) | |||||
using the chainID and a [nice helper | |||||
function](https://godoc.org/github.com/tendermint/tendermint/types#SignBytes), | |||||
we can verify them. The light client is responsible for maintaining a | |||||
set of validators that we trust. Each vote only stores the validators | |||||
`Address`, as well as the `Signature`. Assuming we have a local copy of | |||||
the trusted validator set, we can look up the `Public Key` of the | |||||
validator given its `Address`, then verify that the `Signature` matches | |||||
the `SignBytes` and `Public Key`. Then we sum up the total voting power | |||||
of all validators, whose votes fulfilled all these stringent | |||||
requirements. If the total number of voting power for a single block is | |||||
greater than 2/3 of all voting power, then we can finally trust the | |||||
block header, the AppHash, and the proof we got from the ABCI | |||||
application. | |||||
### Vote Sign Bytes | |||||
The `sign-bytes` of a vote is produced by taking a | |||||
[stable-json](https://github.com/substack/json-stable-stringify)-like | |||||
deterministic JSON [wire](./wire-protocol.html) encoding of the vote | |||||
(excluding the `Signature` field), and wrapping it with | |||||
`{"chain_id":"my_chain","vote":...}`. | |||||
For example, a precommit vote might have the following `sign-bytes`: | |||||
``` | |||||
{"chain_id":"my_chain","vote":{"block_hash":"611801F57B4CE378DF1A3FFF1216656E89209A99","block_parts_header":{"hash":"B46697379DBE0774CC2C3B656083F07CA7E0F9CE","total":123},"height":1234,"round":1,"type":2}} | |||||
``` | |||||
## Block Hash | |||||
The [block | |||||
hash](https://godoc.org/github.com/tendermint/tendermint/types#Block.Hash) | |||||
is the [Simple Tree hash](./merkle.html#simple-tree-with-dictionaries) | |||||
of the fields of the block `Header` encoded as a list of `KVPair`s. | |||||
## Transaction | |||||
A transaction is any sequence of bytes. It is up to your ABCI | |||||
application to accept or reject transactions. | |||||
## BlockID | |||||
Many of these data structures refer to the | |||||
[BlockID](https://godoc.org/github.com/tendermint/tendermint/types#BlockID), | |||||
which is the `BlockHash` (hash of the block header, also referred to by | |||||
the next block) along with the `PartSetHeader`. The `PartSetHeader` is | |||||
explained below and is used internally to orchestrate the p2p | |||||
propogation. For clients, it is basically opaque bytes, but they must | |||||
match for all votes. | |||||
## PartSetHeader | |||||
The | |||||
[PartSetHeader](https://godoc.org/github.com/tendermint/tendermint/types#PartSetHeader) | |||||
contains the total number of pieces in a | |||||
[PartSet](https://godoc.org/github.com/tendermint/tendermint/types#PartSet), | |||||
and the Merkle root hash of those pieces. | |||||
## PartSet | |||||
PartSet is used to split a byteslice of data into parts (pieces) for | |||||
transmission. By splitting data into smaller parts and computing a | |||||
Merkle root hash on the list, you can verify that a part is legitimately | |||||
part of the complete data, and the part can be forwarded to other peers | |||||
before all the parts are known. In short, it's a fast way to securely | |||||
propagate a large chunk of data (like a block) over a gossip network. | |||||
PartSet was inspired by the LibSwift project. | |||||
Usage: | |||||
``` | |||||
data := RandBytes(2 << 20) // Something large | |||||
partSet := NewPartSetFromData(data) | |||||
partSet.Total() // Total number of 4KB parts | |||||
partSet.Count() // Equal to the Total, since we already have all the parts | |||||
partSet.Hash() // The Merkle root hash | |||||
partSet.BitArray() // A BitArray of partSet.Total() 1's | |||||
header := partSet.Header() // Send this to the peer | |||||
header.Total // Total number of parts | |||||
header.Hash // The merkle root hash | |||||
// Now we'll reconstruct the data from the parts | |||||
partSet2 := NewPartSetFromHeader(header) | |||||
partSet2.Total() // Same total as partSet.Total() | |||||
partSet2.Count() // Zero, since this PartSet doesn't have any parts yet. | |||||
partSet2.Hash() // Same hash as in partSet.Hash() | |||||
partSet2.BitArray() // A BitArray of partSet.Total() 0's | |||||
// In a gossip network the parts would arrive in arbitrary order, perhaps | |||||
// in response to explicit requests for parts, or optimistically in response | |||||
// to the receiving peer's partSet.BitArray(). | |||||
for !partSet2.IsComplete() { | |||||
part := receivePartFromGossipNetwork() | |||||
added, err := partSet2.AddPart(part) | |||||
if err != nil { | |||||
// A wrong part, | |||||
// the merkle trail does not hash to partSet2.Hash() | |||||
} else if !added { | |||||
// A duplicate part already received | |||||
} | |||||
} | |||||
data2, _ := ioutil.ReadAll(partSet2.GetReader()) | |||||
bytes.Equal(data, data2) // true | |||||
``` |
@ -0,0 +1,30 @@ | |||||
# Light Client Protocol | |||||
Light clients are an important part of the complete blockchain system | |||||
for most applications. Tendermint provides unique speed and security | |||||
properties for light client applications. | |||||
See our [lite | |||||
package](https://godoc.org/github.com/tendermint/tendermint/lite). | |||||
## Overview | |||||
The objective of the light client protocol is to get a | |||||
[commit](./validators.md#committing-a-block) for a recent [block | |||||
hash](../spec/consensus/consensus.md.md#block-hash) where the commit includes a | |||||
majority of signatures from the last known validator set. From there, | |||||
all the application state is verifiable with [merkle | |||||
proofs](./merkle.md#iavl-tree). | |||||
## Properties | |||||
- You get the full collateralized security benefits of Tendermint; No | |||||
need to wait for confirmations. | |||||
- You get the full speed benefits of Tendermint; transactions | |||||
commit instantly. | |||||
- You can get the most recent version of the application state | |||||
non-interactively (without committing anything to the blockchain). | |||||
For example, this means that you can get the most recent value of a | |||||
name from the name-registry without worrying about fork censorship | |||||
attacks, without posting a commit and waiting for confirmations. | |||||
It's fast, secure, and free! |