@ -1,9 +1,329 @@ | |||
We are working to finalize an updated Tendermint specification with formal | |||
proofs of safety and liveness. | |||
# Byzantine Consensus Algorithm | |||
In the meantime, see the [description in the | |||
docs](http://tendermint.readthedocs.io/en/master/specification/byzantine-consensus-algorithm.html). | |||
## Terms | |||
There are also relevant but somewhat outdated descriptions in Jae Kwon's [original | |||
whitepaper](https://tendermint.com/static/docs/tendermint.pdf) and Ethan Buchman's [master's | |||
thesis](https://atrium.lib.uoguelph.ca/xmlui/handle/10214/9769). | |||
- The network is composed of optionally connected *nodes*. Nodes | |||
directly connected to a particular node are called *peers*. | |||
- The consensus process in deciding the next block (at some *height* | |||
`H`) is composed of one or many *rounds*. | |||
- `NewHeight`, `Propose`, `Prevote`, `Precommit`, and `Commit` | |||
represent state machine states of a round. (aka `RoundStep` or | |||
just "step"). | |||
- A node is said to be *at* a given height, round, and step, or at | |||
`(H,R,S)`, or at `(H,R)` in short to omit the step. | |||
- To *prevote* or *precommit* something means to broadcast a [prevote | |||
vote](https://godoc.org/github.com/tendermint/tendermint/types#Vote) | |||
or [first precommit | |||
vote](https://godoc.org/github.com/tendermint/tendermint/types#FirstPrecommit) | |||
for something. | |||
- A vote *at* `(H,R)` is a vote signed with the bytes for `H` and `R` | |||
included in its [sign-bytes](block-structure.html#vote-sign-bytes). | |||
- *+2/3* is short for "more than 2/3" | |||
- *1/3+* is short for "1/3 or more" | |||
- A set of +2/3 of prevotes for a particular block or `<nil>` at | |||
`(H,R)` is called a *proof-of-lock-change* or *PoLC* for short. | |||
## State Machine Overview | |||
At each height of the blockchain a round-based protocol is run to | |||
determine the next block. Each round is composed of three *steps* | |||
(`Propose`, `Prevote`, and `Precommit`), along with two special steps | |||
`Commit` and `NewHeight`. | |||
In the optimal scenario, the order of steps is: | |||
``` | |||
NewHeight -> (Propose -> Prevote -> Precommit)+ -> Commit -> NewHeight ->... | |||
``` | |||
The sequence `(Propose -> Prevote -> Precommit)` is called a *round*. | |||
There may be more than one round required to commit a block at a given | |||
height. Examples for why more rounds may be required include: | |||
- The designated proposer was not online. | |||
- The block proposed by the designated proposer was not valid. | |||
- The block proposed by the designated proposer did not propagate | |||
in time. | |||
- The block proposed was valid, but +2/3 of prevotes for the proposed | |||
block were not received in time for enough validator nodes by the | |||
time they reached the `Precommit` step. Even though +2/3 of prevotes | |||
are necessary to progress to the next step, at least one validator | |||
may have voted `<nil>` or maliciously voted for something else. | |||
- The block proposed was valid, and +2/3 of prevotes were received for | |||
enough nodes, but +2/3 of precommits for the proposed block were not | |||
received for enough validator nodes. | |||
Some of these problems are resolved by moving onto the next round & | |||
proposer. Others are resolved by increasing certain round timeout | |||
parameters over each successive round. | |||
## State Machine Diagram | |||
``` | |||
+-------------------------------------+ | |||
v |(Wait til `CommmitTime+timeoutCommit`) | |||
+-----------+ +-----+-----+ | |||
+----------> | Propose +--------------+ | NewHeight | | |||
| +-----------+ | +-----------+ | |||
| | ^ | |||
|(Else, after timeoutPrecommit) v | | |||
+-----+-----+ +-----------+ | | |||
| Precommit | <------------------------+ Prevote | | | |||
+-----+-----+ +-----------+ | | |||
|(When +2/3 Precommits for block found) | | |||
v | | |||
+--------------------------------------------------------------------+ | |||
| Commit | | |||
| | | |||
| * Set CommitTime = now; | | |||
| * Wait for block, then stage/save/commit block; | | |||
+--------------------------------------------------------------------+ | |||
``` | |||
Background Gossip | |||
================= | |||
A node may not have a corresponding validator private key, but it | |||
nevertheless plays an active role in the consensus process by relaying | |||
relevant meta-data, proposals, blocks, and votes to its peers. A node | |||
that has the private keys of an active validator and is engaged in | |||
signing votes is called a *validator-node*. All nodes (not just | |||
validator-nodes) have an associated state (the current height, round, | |||
and step) and work to make progress. | |||
Between two nodes there exists a `Connection`, and multiplexed on top of | |||
this connection are fairly throttled `Channel`s of information. An | |||
epidemic gossip protocol is implemented among some of these channels to | |||
bring peers up to speed on the most recent state of consensus. For | |||
example, | |||
- Nodes gossip `PartSet` parts of the current round's proposer's | |||
proposed block. A LibSwift inspired algorithm is used to quickly | |||
broadcast blocks across the gossip network. | |||
- Nodes gossip prevote/precommit votes. A node `NODE_A` that is ahead | |||
of `NODE_B` can send `NODE_B` prevotes or precommits for `NODE_B`'s | |||
current (or future) round to enable it to progress forward. | |||
- Nodes gossip prevotes for the proposed PoLC (proof-of-lock-change) | |||
round if one is proposed. | |||
- Nodes gossip to nodes lagging in blockchain height with block | |||
[commits](https://godoc.org/github.com/tendermint/tendermint/types#Commit) | |||
for older blocks. | |||
- Nodes opportunistically gossip `HasVote` messages to hint peers what | |||
votes it already has. | |||
- Nodes broadcast their current state to all neighboring peers. (but | |||
is not gossiped further) | |||
There's more, but let's not get ahead of ourselves here. | |||
## Proposals | |||
A proposal is signed and published by the designated proposer at each | |||
round. The proposer is chosen by a deterministic and non-choking round | |||
robin selection algorithm that selects proposers in proportion to their | |||
voting power (see | |||
[implementation](https://github.com/tendermint/tendermint/blob/develop/types/validator_set.go)). | |||
A proposal at `(H,R)` is composed of a block and an optional latest | |||
`PoLC-Round < R` which is included iff the proposer knows of one. This | |||
hints the network to allow nodes to unlock (when safe) to ensure the | |||
liveness property. | |||
## State Machine Spec | |||
### Propose Step (height:H,round:R) | |||
Upon entering `Propose`: - The designated proposer proposes a block at | |||
`(H,R)`. | |||
The `Propose` step ends: - After `timeoutProposeR` after entering | |||
`Propose`. --> goto `Prevote(H,R)` - After receiving proposal block | |||
and all prevotes at `PoLC-Round`. --> goto `Prevote(H,R)` - After | |||
[common exit conditions](#common-exit-conditions) | |||
### Prevote Step (height:H,round:R) | |||
Upon entering `Prevote`, each validator broadcasts its prevote vote. | |||
- First, if the validator is locked on a block since `LastLockRound` | |||
but now has a PoLC for something else at round `PoLC-Round` where | |||
`LastLockRound < PoLC-Round < R`, then it unlocks. | |||
- If the validator is still locked on a block, it prevotes that. | |||
- Else, if the proposed block from `Propose(H,R)` is good, it | |||
prevotes that. | |||
- Else, if the proposal is invalid or wasn't received on time, it | |||
prevotes `<nil>`. | |||
The `Prevote` step ends: - After +2/3 prevotes for a particular block or | |||
`<nil>`. -->; goto `Precommit(H,R)` - After `timeoutPrevote` after | |||
receiving any +2/3 prevotes. --> goto `Precommit(H,R)` - After | |||
[common exit conditions](#common-exit-conditions) | |||
### Precommit Step (height:H,round:R) | |||
Upon entering `Precommit`, each validator broadcasts its precommit vote. | |||
- If the validator has a PoLC at `(H,R)` for a particular block `B`, it | |||
(re)locks (or changes lock to) and precommits `B` and sets | |||
`LastLockRound = R`. - Else, if the validator has a PoLC at `(H,R)` for | |||
`<nil>`, it unlocks and precommits `<nil>`. - Else, it keeps the lock | |||
unchanged and precommits `<nil>`. | |||
A precommit for `<nil>` means "I didn’t see a PoLC for this round, but I | |||
did get +2/3 prevotes and waited a bit". | |||
The Precommit step ends: - After +2/3 precommits for `<nil>`. --> | |||
goto `Propose(H,R+1)` - After `timeoutPrecommit` after receiving any | |||
+2/3 precommits. --> goto `Propose(H,R+1)` - After [common exit | |||
conditions](#common-exit-conditions) | |||
### Common exit conditions | |||
- After +2/3 precommits for a particular block. --> goto | |||
`Commit(H)` | |||
- After any +2/3 prevotes received at `(H,R+x)`. --> goto | |||
`Prevote(H,R+x)` | |||
- After any +2/3 precommits received at `(H,R+x)`. --> goto | |||
`Precommit(H,R+x)` | |||
### Commit Step (height:H) | |||
- Set `CommitTime = now()` | |||
- Wait until block is received. --> goto `NewHeight(H+1)` | |||
### NewHeight Step (height:H) | |||
- Move `Precommits` to `LastCommit` and increment height. | |||
- Set `StartTime = CommitTime+timeoutCommit` | |||
- Wait until `StartTime` to receive straggler commits. --> goto | |||
`Propose(H,0)` | |||
## Proofs | |||
### Proof of Safety | |||
Assume that at most -1/3 of the voting power of validators is byzantine. | |||
If a validator commits block `B` at round `R`, it's because it saw +2/3 | |||
of precommits at round `R`. This implies that 1/3+ of honest nodes are | |||
still locked at round `R' > R`. These locked validators will remain | |||
locked until they see a PoLC at `R' > R`, but this won't happen because | |||
1/3+ are locked and honest, so at most -2/3 are available to vote for | |||
anything other than `B`. | |||
### Proof of Liveness | |||
If 1/3+ honest validators are locked on two different blocks from | |||
different rounds, a proposers' `PoLC-Round` will eventually cause nodes | |||
locked from the earlier round to unlock. Eventually, the designated | |||
proposer will be one that is aware of a PoLC at the later round. Also, | |||
`timeoutProposalR` increments with round `R`, while the size of a | |||
proposal are capped, so eventually the network is able to "fully gossip" | |||
the whole proposal (e.g. the block & PoLC). | |||
### Proof of Fork Accountability | |||
Define the JSet (justification-vote-set) at height `H` of a validator | |||
`V1` to be all the votes signed by the validator at `H` along with | |||
justification PoLC prevotes for each lock change. For example, if `V1` | |||
signed the following precommits: `Precommit(B1 @ round 0)`, | |||
`Precommit(<nil> @ round 1)`, `Precommit(B2 @ round 4)` (note that no | |||
precommits were signed for rounds 2 and 3, and that's ok), | |||
`Precommit(B1 @ round 0)` must be justified by a PoLC at round 0, and | |||
`Precommit(B2 @ round 4)` must be justified by a PoLC at round 4; but | |||
the precommit for `<nil>` at round 1 is not a lock-change by definition | |||
so the JSet for `V1` need not include any prevotes at round 1, 2, or 3 | |||
(unless `V1` happened to have prevoted for those rounds). | |||
Further, define the JSet at height `H` of a set of validators `VSet` to | |||
be the union of the JSets for each validator in `VSet`. For a given | |||
commit by honest validators at round `R` for block `B` we can construct | |||
a JSet to justify the commit for `B` at `R`. We say that a JSet | |||
*justifies* a commit at `(H,R)` if all the committers (validators in the | |||
commit-set) are each justified in the JSet with no duplicitous vote | |||
signatures (by the committers). | |||
- **Lemma**: When a fork is detected by the existence of two | |||
conflicting [commits](./validators.html#commiting-a-block), the | |||
union of the JSets for both commits (if they can be compiled) must | |||
include double-signing by at least 1/3+ of the validator set. | |||
**Proof**: The commit cannot be at the same round, because that | |||
would immediately imply double-signing by 1/3+. Take the union of | |||
the JSets of both commits. If there is no double-signing by at least | |||
1/3+ of the validator set in the union, then no honest validator | |||
could have precommitted any different block after the first commit. | |||
Yet, +2/3 did. Reductio ad absurdum. | |||
As a corollary, when there is a fork, an external process can determine | |||
the blame by requiring each validator to justify all of its round votes. | |||
Either we will find 1/3+ who cannot justify at least one of their votes, | |||
and/or, we will find 1/3+ who had double-signed. | |||
### Alternative algorithm | |||
Alternatively, we can take the JSet of a commit to be the "full commit". | |||
That is, if light clients and validators do not consider a block to be | |||
committed unless the JSet of the commit is also known, then we get the | |||
desirable property that if there ever is a fork (e.g. there are two | |||
conflicting "full commits"), then 1/3+ of the validators are immediately | |||
punishable for double-signing. | |||
There are many ways to ensure that the gossip network efficiently share | |||
the JSet of a commit. One solution is to add a new message type that | |||
tells peers that this node has (or does not have) a +2/3 majority for B | |||
(or) at (H,R), and a bitarray of which votes contributed towards that | |||
majority. Peers can react by responding with appropriate votes. | |||
We will implement such an algorithm for the next iteration of the | |||
Tendermint consensus protocol. | |||
Other potential improvements include adding more data in votes such as | |||
the last known PoLC round that caused a lock change, and the last voted | |||
round/step (or, we may require that validators not skip any votes). This | |||
may make JSet verification/gossip logic easier to implement. | |||
### Censorship Attacks | |||
Due to the definition of a block | |||
[commit](../../tendermint-core/validator.md#commiting-a-block), any 1/3+ coalition of | |||
validators can halt the blockchain by not broadcasting their votes. Such | |||
a coalition can also censor particular transactions by rejecting blocks | |||
that include these transactions, though this would result in a | |||
significant proportion of block proposals to be rejected, which would | |||
slow down the rate of block commits of the blockchain, reducing its | |||
utility and value. The malicious coalition might also broadcast votes in | |||
a trickle so as to grind blockchain block commits to a near halt, or | |||
engage in any combination of these attacks. | |||
If a global active adversary were also involved, it can partition the | |||
network in such a way that it may appear that the wrong subset of | |||
validators were responsible for the slowdown. This is not just a | |||
limitation of Tendermint, but rather a limitation of all consensus | |||
protocols whose network is potentially controlled by an active | |||
adversary. | |||
### Overcoming Forks and Censorship Attacks | |||
For these types of attacks, a subset of the validators through external | |||
means should coordinate to sign a reorg-proposal that chooses a fork | |||
(and any evidence thereof) and the initial subset of validators with | |||
their signatures. Validators who sign such a reorg-proposal forego its | |||
collateral on all other forks. Clients should verify the signatures on | |||
the reorg-proposal, verify any evidence, and make a judgement or prompt | |||
the end-user for a decision. For example, a phone wallet app may prompt | |||
the user with a security warning, while a refrigerator may accept any | |||
reorg-proposal signed by +1/2 of the original validators. | |||
No non-synchronous Byzantine fault-tolerant algorithm can come to | |||
consensus when 1/3+ of validators are dishonest, yet a fork assumes that | |||
1/3+ of validators have already been dishonest by double-signing or | |||
lock-changing without justification. So, signing the reorg-proposal is a | |||
coordination problem that cannot be solved by any non-synchronous | |||
protocol (i.e. automatically, and without making assumptions about the | |||
reliability of the underlying network). It must be provided by means | |||
external to the weakly-synchronous Tendermint consensus algorithm. For | |||
now, we leave the problem of reorg-proposal coordination to human | |||
coordination via internet media. Validators must take care to ensure | |||
that there are no significant network partitions, to avoid situations | |||
where two conflicting reorg-proposals are signed. | |||
Assuming that the external coordination medium and protocol is robust, | |||
it follows that forks are less of a concern than [censorship | |||
attacks](#censorship-attacks). |
@ -1,218 +0,0 @@ | |||
Block Structure | |||
=============== | |||
The tendermint consensus engine records all agreements by a | |||
supermajority of nodes into a blockchain, which is replicated among all | |||
nodes. This blockchain is accessible via various rpc endpoints, mainly | |||
``/block?height=`` to get the full block, as well as | |||
``/blockchain?minHeight=_&maxHeight=_`` to get a list of headers. But | |||
what exactly is stored in these blocks? | |||
Block | |||
~~~~~ | |||
A | |||
`Block <https://godoc.org/github.com/tendermint/tendermint/types#Block>`__ | |||
contains: | |||
- a `Header <#header>`__ contains merkle hashes for various chain | |||
states | |||
- the | |||
`Data <https://godoc.org/github.com/tendermint/tendermint/types#Data>`__ | |||
is all transactions which are to be processed | |||
- the `LastCommit <#commit>`__ > 2/3 signatures for the last block | |||
The signatures returned along with block ``H`` are those validating | |||
block ``H-1``. This can be a little confusing, but we must also consider | |||
that the ``Header`` also contains the ``LastCommitHash``. It would be | |||
impossible for a Header to include the commits that sign it, as it would | |||
cause an infinite loop here. But when we get block ``H``, we find | |||
``Header.LastCommitHash``, which must match the hash of ``LastCommit``. | |||
Header | |||
~~~~~~ | |||
The | |||
`Header <https://godoc.org/github.com/tendermint/tendermint/types#Header>`__ | |||
contains lots of information (follow link for up-to-date info). Notably, | |||
it maintains the ``Height``, the ``LastBlockID`` (to make it a chain), | |||
and hashes of the data, the app state, and the validator set. This is | |||
important as the only item that is signed by the validators is the | |||
``Header``, and all other data must be validated against one of the | |||
merkle hashes in the ``Header``. | |||
The ``DataHash`` can provide a nice check on the | |||
`Data <https://godoc.org/github.com/tendermint/tendermint/types#Data>`__ | |||
returned in this same block. If you are subscribed to new blocks, via | |||
tendermint RPC, in order to display or process the new transactions you | |||
should at least validate that the ``DataHash`` is valid. If it is | |||
important to verify autheniticity, you must wait for the ``LastCommit`` | |||
from the next block to make sure the block header (including | |||
``DataHash``) was properly signed. | |||
The ``ValidatorHash`` contains a hash of the current | |||
`Validators <https://godoc.org/github.com/tendermint/tendermint/types#Validator>`__. | |||
Tracking all changes in the validator set is complex, but a client can | |||
quickly compare this hash with the `hash of the currently known | |||
validators <https://godoc.org/github.com/tendermint/tendermint/types#ValidatorSet.Hash>`__ | |||
to see if there have been changes. | |||
The ``AppHash`` serves as the basis for validating any merkle proofs | |||
that come from the ABCI application. It represents the | |||
state of the actual application, rather that the state of the blockchain | |||
itself. This means it's necessary in order to perform any business | |||
logic, such as verifying an account balance. | |||
**Note** After the transactions are committed to a block, they still | |||
need to be processed in a separate step, which happens between the | |||
blocks. If you find a given transaction in the block at height ``H``, | |||
the effects of running that transaction will be first visible in the | |||
``AppHash`` from the block header at height ``H+1``. | |||
Like the ``LastCommit`` issue, this is a requirement of the immutability | |||
of the block chain, as the application only applies transactions *after* | |||
they are commited to the chain. | |||
Commit | |||
~~~~~~ | |||
The | |||
`Commit <https://godoc.org/github.com/tendermint/tendermint/types#Commit>`__ | |||
contains a set of | |||
`Votes <https://godoc.org/github.com/tendermint/tendermint/types#Vote>`__ | |||
that were made by the validator set to reach consensus on this block. | |||
This is the key to the security in any PoS system, and actually no data | |||
that cannot be traced back to a block header with a valid set of Votes | |||
can be trusted. Thus, getting the Commit data and verifying the votes is | |||
extremely important. | |||
As mentioned above, in order to find the ``precommit votes`` for block | |||
header ``H``, we need to query block ``H+1``. Then we need to check the | |||
votes, make sure they really are for that block, and properly formatted. | |||
Much of this code is implemented in Go in the | |||
`light-client <https://github.com/tendermint/light-client>`__ package. | |||
If you look at the code, you will notice that we need to provide the | |||
``chainID`` of the blockchain in order to properly calculate the votes. | |||
This is to protect anyone from swapping votes between chains to fake (or | |||
frame) a validator. Also note that this ``chainID`` is in the | |||
``genesis.json`` from *Tendermint*, not the ``genesis.json`` from the | |||
basecoin app (`that is a different | |||
chainID... <https://github.com/cosmos/cosmos-sdk/issues/32>`__). | |||
Once we have those votes, and we calculated the proper `sign | |||
bytes <https://godoc.org/github.com/tendermint/tendermint/types#Vote.WriteSignBytes>`__ | |||
using the chainID and a `nice helper | |||
function <https://godoc.org/github.com/tendermint/tendermint/types#SignBytes>`__, | |||
we can verify them. The light client is responsible for maintaining a | |||
set of validators that we trust. Each vote only stores the validators | |||
``Address``, as well as the ``Signature``. Assuming we have a local copy | |||
of the trusted validator set, we can look up the ``Public Key`` of the | |||
validator given its ``Address``, then verify that the ``Signature`` | |||
matches the ``SignBytes`` and ``Public Key``. Then we sum up the total | |||
voting power of all validators, whose votes fulfilled all these | |||
stringent requirements. If the total number of voting power for a single | |||
block is greater than 2/3 of all voting power, then we can finally trust | |||
the block header, the AppHash, and the proof we got from the ABCI | |||
application. | |||
Vote Sign Bytes | |||
^^^^^^^^^^^^^^^ | |||
The ``sign-bytes`` of a vote is produced by taking a | |||
`stable-json <https://github.com/substack/json-stable-stringify>`__-like | |||
deterministic JSON `wire <./wire-protocol.html>`__ encoding of | |||
the vote (excluding the ``Signature`` field), and wrapping it with | |||
``{"chain_id":"my_chain","vote":...}``. | |||
For example, a precommit vote might have the following ``sign-bytes``: | |||
.. code:: json | |||
{"chain_id":"my_chain","vote":{"block_hash":"611801F57B4CE378DF1A3FFF1216656E89209A99","block_parts_header":{"hash":"B46697379DBE0774CC2C3B656083F07CA7E0F9CE","total":123},"height":1234,"round":1,"type":2}} | |||
Block Hash | |||
~~~~~~~~~~ | |||
The `block | |||
hash <https://godoc.org/github.com/tendermint/tendermint/types#Block.Hash>`__ | |||
is the `Simple Tree hash <./merkle.html#simple-tree-with-dictionaries>`__ | |||
of the fields of the block ``Header`` encoded as a list of | |||
``KVPair``\ s. | |||
Transaction | |||
~~~~~~~~~~~ | |||
A transaction is any sequence of bytes. It is up to your | |||
ABCI application to accept or reject transactions. | |||
BlockID | |||
~~~~~~~ | |||
Many of these data structures refer to the | |||
`BlockID <https://godoc.org/github.com/tendermint/tendermint/types#BlockID>`__, | |||
which is the ``BlockHash`` (hash of the block header, also referred to | |||
by the next block) along with the ``PartSetHeader``. The | |||
``PartSetHeader`` is explained below and is used internally to | |||
orchestrate the p2p propogation. For clients, it is basically opaque | |||
bytes, but they must match for all votes. | |||
PartSetHeader | |||
~~~~~~~~~~~~~ | |||
The | |||
`PartSetHeader <https://godoc.org/github.com/tendermint/tendermint/types#PartSetHeader>`__ | |||
contains the total number of pieces in a | |||
`PartSet <https://godoc.org/github.com/tendermint/tendermint/types#PartSet>`__, | |||
and the Merkle root hash of those pieces. | |||
PartSet | |||
~~~~~~~ | |||
PartSet is used to split a byteslice of data into parts (pieces) for | |||
transmission. By splitting data into smaller parts and computing a | |||
Merkle root hash on the list, you can verify that a part is legitimately | |||
part of the complete data, and the part can be forwarded to other peers | |||
before all the parts are known. In short, it's a fast way to securely | |||
propagate a large chunk of data (like a block) over a gossip network. | |||
PartSet was inspired by the LibSwift project. | |||
Usage: | |||
.. code:: go | |||
data := RandBytes(2 << 20) // Something large | |||
partSet := NewPartSetFromData(data) | |||
partSet.Total() // Total number of 4KB parts | |||
partSet.Count() // Equal to the Total, since we already have all the parts | |||
partSet.Hash() // The Merkle root hash | |||
partSet.BitArray() // A BitArray of partSet.Total() 1's | |||
header := partSet.Header() // Send this to the peer | |||
header.Total // Total number of parts | |||
header.Hash // The merkle root hash | |||
// Now we'll reconstruct the data from the parts | |||
partSet2 := NewPartSetFromHeader(header) | |||
partSet2.Total() // Same total as partSet.Total() | |||
partSet2.Count() // Zero, since this PartSet doesn't have any parts yet. | |||
partSet2.Hash() // Same hash as in partSet.Hash() | |||
partSet2.BitArray() // A BitArray of partSet.Total() 0's | |||
// In a gossip network the parts would arrive in arbitrary order, perhaps | |||
// in response to explicit requests for parts, or optimistically in response | |||
// to the receiving peer's partSet.BitArray(). | |||
for !partSet2.IsComplete() { | |||
part := receivePartFromGossipNetwork() | |||
added, err := partSet2.AddPart(part) | |||
if err != nil { | |||
// A wrong part, | |||
// the merkle trail does not hash to partSet2.Hash() | |||
} else if !added { | |||
// A duplicate part already received | |||
} | |||
} | |||
data2, _ := ioutil.ReadAll(partSet2.GetReader()) | |||
bytes.Equal(data, data2) // true |
@ -1,349 +0,0 @@ | |||
Byzantine Consensus Algorithm | |||
============================= | |||
Terms | |||
----- | |||
- The network is composed of optionally connected *nodes*. Nodes | |||
directly connected to a particular node are called *peers*. | |||
- The consensus process in deciding the next block (at some *height* | |||
``H``) is composed of one or many *rounds*. | |||
- ``NewHeight``, ``Propose``, ``Prevote``, ``Precommit``, and | |||
``Commit`` represent state machine states of a round. (aka | |||
``RoundStep`` or just "step"). | |||
- A node is said to be *at* a given height, round, and step, or at | |||
``(H,R,S)``, or at ``(H,R)`` in short to omit the step. | |||
- To *prevote* or *precommit* something means to broadcast a `prevote | |||
vote <https://godoc.org/github.com/tendermint/tendermint/types#Vote>`__ | |||
or `first precommit | |||
vote <https://godoc.org/github.com/tendermint/tendermint/types#FirstPrecommit>`__ | |||
for something. | |||
- A vote *at* ``(H,R)`` is a vote signed with the bytes for ``H`` and | |||
``R`` included in its | |||
`sign-bytes <block-structure.html#vote-sign-bytes>`__. | |||
- *+2/3* is short for "more than 2/3" | |||
- *1/3+* is short for "1/3 or more" | |||
- A set of +2/3 of prevotes for a particular block or ``<nil>`` at | |||
``(H,R)`` is called a *proof-of-lock-change* or *PoLC* for short. | |||
State Machine Overview | |||
---------------------- | |||
At each height of the blockchain a round-based protocol is run to | |||
determine the next block. Each round is composed of three *steps* | |||
(``Propose``, ``Prevote``, and ``Precommit``), along with two special | |||
steps ``Commit`` and ``NewHeight``. | |||
In the optimal scenario, the order of steps is: | |||
:: | |||
NewHeight -> (Propose -> Prevote -> Precommit)+ -> Commit -> NewHeight ->... | |||
The sequence ``(Propose -> Prevote -> Precommit)`` is called a *round*. | |||
There may be more than one round required to commit a block at a given | |||
height. Examples for why more rounds may be required include: | |||
- The designated proposer was not online. | |||
- The block proposed by the designated proposer was not valid. | |||
- The block proposed by the designated proposer did not propagate in | |||
time. | |||
- The block proposed was valid, but +2/3 of prevotes for the proposed | |||
block were not received in time for enough validator nodes by the | |||
time they reached the ``Precommit`` step. Even though +2/3 of | |||
prevotes are necessary to progress to the next step, at least one | |||
validator may have voted ``<nil>`` or maliciously voted for something | |||
else. | |||
- The block proposed was valid, and +2/3 of prevotes were received for | |||
enough nodes, but +2/3 of precommits for the proposed block were not | |||
received for enough validator nodes. | |||
Some of these problems are resolved by moving onto the next round & | |||
proposer. Others are resolved by increasing certain round timeout | |||
parameters over each successive round. | |||
State Machine Diagram | |||
--------------------- | |||
:: | |||
+-------------------------------------+ | |||
v |(Wait til `CommmitTime+timeoutCommit`) | |||
+-----------+ +-----+-----+ | |||
+----------> | Propose +--------------+ | NewHeight | | |||
| +-----------+ | +-----------+ | |||
| | ^ | |||
|(Else, after timeoutPrecommit) v | | |||
+-----+-----+ +-----------+ | | |||
| Precommit | <------------------------+ Prevote | | | |||
+-----+-----+ +-----------+ | | |||
|(When +2/3 Precommits for block found) | | |||
v | | |||
+--------------------------------------------------------------------+ | |||
| Commit | | |||
| | | |||
| * Set CommitTime = now; | | |||
| * Wait for block, then stage/save/commit block; | | |||
+--------------------------------------------------------------------+ | |||
Background Gossip | |||
----------------- | |||
A node may not have a corresponding validator private key, but it | |||
nevertheless plays an active role in the consensus process by relaying | |||
relevant meta-data, proposals, blocks, and votes to its peers. A node | |||
that has the private keys of an active validator and is engaged in | |||
signing votes is called a *validator-node*. All nodes (not just | |||
validator-nodes) have an associated state (the current height, round, | |||
and step) and work to make progress. | |||
Between two nodes there exists a ``Connection``, and multiplexed on top | |||
of this connection are fairly throttled ``Channel``\ s of information. | |||
An epidemic gossip protocol is implemented among some of these channels | |||
to bring peers up to speed on the most recent state of consensus. For | |||
example, | |||
- Nodes gossip ``PartSet`` parts of the current round's proposer's | |||
proposed block. A LibSwift inspired algorithm is used to quickly | |||
broadcast blocks across the gossip network. | |||
- Nodes gossip prevote/precommit votes. A node NODE\_A that is ahead of | |||
NODE\_B can send NODE\_B prevotes or precommits for NODE\_B's current | |||
(or future) round to enable it to progress forward. | |||
- Nodes gossip prevotes for the proposed PoLC (proof-of-lock-change) | |||
round if one is proposed. | |||
- Nodes gossip to nodes lagging in blockchain height with block | |||
`commits <https://godoc.org/github.com/tendermint/tendermint/types#Commit>`__ | |||
for older blocks. | |||
- Nodes opportunistically gossip ``HasVote`` messages to hint peers | |||
what votes it already has. | |||
- Nodes broadcast their current state to all neighboring peers. (but is | |||
not gossiped further) | |||
There's more, but let's not get ahead of ourselves here. | |||
Proposals | |||
--------- | |||
A proposal is signed and published by the designated proposer at each | |||
round. The proposer is chosen by a deterministic and non-choking round | |||
robin selection algorithm that selects proposers in proportion to their | |||
voting power. (see | |||
`implementation <https://github.com/tendermint/tendermint/blob/develop/types/validator_set.go>`__) | |||
A proposal at ``(H,R)`` is composed of a block and an optional latest | |||
``PoLC-Round < R`` which is included iff the proposer knows of one. This | |||
hints the network to allow nodes to unlock (when safe) to ensure the | |||
liveness property. | |||
State Machine Spec | |||
------------------ | |||
Propose Step (height:H,round:R) | |||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||
Upon entering ``Propose``: - The designated proposer proposes a block at | |||
``(H,R)``. | |||
The ``Propose`` step ends: - After ``timeoutProposeR`` after entering | |||
``Propose``. --> goto ``Prevote(H,R)`` - After receiving proposal block | |||
and all prevotes at ``PoLC-Round``. --> goto ``Prevote(H,R)`` - After | |||
`common exit conditions <#common-exit-conditions>`__ | |||
Prevote Step (height:H,round:R) | |||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||
Upon entering ``Prevote``, each validator broadcasts its prevote vote. | |||
- First, if the validator is locked on a block since ``LastLockRound`` | |||
but now has a PoLC for something else at round ``PoLC-Round`` where | |||
``LastLockRound < PoLC-Round < R``, then it unlocks. | |||
- If the validator is still locked on a block, it prevotes that. | |||
- Else, if the proposed block from ``Propose(H,R)`` is good, it | |||
prevotes that. | |||
- Else, if the proposal is invalid or wasn't received on time, it | |||
prevotes ``<nil>``. | |||
The ``Prevote`` step ends: - After +2/3 prevotes for a particular block | |||
or ``<nil>``. --> goto ``Precommit(H,R)`` - After ``timeoutPrevote`` | |||
after receiving any +2/3 prevotes. --> goto ``Precommit(H,R)`` - After | |||
`common exit conditions <#common-exit-conditions>`__ | |||
Precommit Step (height:H,round:R) | |||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||
Upon entering ``Precommit``, each validator broadcasts its precommit | |||
vote. - If the validator has a PoLC at ``(H,R)`` for a particular block | |||
``B``, it (re)locks (or changes lock to) and precommits ``B`` and sets | |||
``LastLockRound = R``. - Else, if the validator has a PoLC at ``(H,R)`` | |||
for ``<nil>``, it unlocks and precommits ``<nil>``. - Else, it keeps the | |||
lock unchanged and precommits ``<nil>``. | |||
A precommit for ``<nil>`` means "I didn’t see a PoLC for this round, but | |||
I did get +2/3 prevotes and waited a bit". | |||
The Precommit step ends: - After +2/3 precommits for ``<nil>``. --> goto | |||
``Propose(H,R+1)`` - After ``timeoutPrecommit`` after receiving any +2/3 | |||
precommits. --> goto ``Propose(H,R+1)`` - After `common exit | |||
conditions <#common-exit-conditions>`__ | |||
common exit conditions | |||
^^^^^^^^^^^^^^^^^^^^^^ | |||
- After +2/3 precommits for a particular block. --> goto ``Commit(H)`` | |||
- After any +2/3 prevotes received at ``(H,R+x)``. --> goto | |||
``Prevote(H,R+x)`` | |||
- After any +2/3 precommits received at ``(H,R+x)``. --> goto | |||
``Precommit(H,R+x)`` | |||
Commit Step (height:H) | |||
~~~~~~~~~~~~~~~~~~~~~~ | |||
- Set ``CommitTime = now()`` | |||
- Wait until block is received. --> goto ``NewHeight(H+1)`` | |||
NewHeight Step (height:H) | |||
~~~~~~~~~~~~~~~~~~~~~~~~~ | |||
- Move ``Precommits`` to ``LastCommit`` and increment height. | |||
- Set ``StartTime = CommitTime+timeoutCommit`` | |||
- Wait until ``StartTime`` to receive straggler commits. --> goto | |||
``Propose(H,0)`` | |||
Proofs | |||
------ | |||
Proof of Safety | |||
~~~~~~~~~~~~~~~ | |||
Assume that at most -1/3 of the voting power of validators is byzantine. | |||
If a validator commits block ``B`` at round ``R``, it's because it saw | |||
+2/3 of precommits at round ``R``. This implies that 1/3+ of honest | |||
nodes are still locked at round ``R' > R``. These locked validators will | |||
remain locked until they see a PoLC at ``R' > R``, but this won't happen | |||
because 1/3+ are locked and honest, so at most -2/3 are available to | |||
vote for anything other than ``B``. | |||
Proof of Liveness | |||
~~~~~~~~~~~~~~~~~ | |||
If 1/3+ honest validators are locked on two different blocks from | |||
different rounds, a proposers' ``PoLC-Round`` will eventually cause | |||
nodes locked from the earlier round to unlock. Eventually, the | |||
designated proposer will be one that is aware of a PoLC at the later | |||
round. Also, ``timeoutProposalR`` increments with round ``R``, while the | |||
size of a proposal are capped, so eventually the network is able to | |||
"fully gossip" the whole proposal (e.g. the block & PoLC). | |||
Proof of Fork Accountability | |||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||
Define the JSet (justification-vote-set) at height ``H`` of a validator | |||
``V1`` to be all the votes signed by the validator at ``H`` along with | |||
justification PoLC prevotes for each lock change. For example, if ``V1`` | |||
signed the following precommits: ``Precommit(B1 @ round 0)``, | |||
``Precommit(<nil> @ round 1)``, ``Precommit(B2 @ round 4)`` (note that | |||
no precommits were signed for rounds 2 and 3, and that's ok), | |||
``Precommit(B1 @ round 0)`` must be justified by a PoLC at round 0, and | |||
``Precommit(B2 @ round 4)`` must be justified by a PoLC at round 4; but | |||
the precommit for ``<nil>`` at round 1 is not a lock-change by | |||
definition so the JSet for ``V1`` need not include any prevotes at round | |||
1, 2, or 3 (unless ``V1`` happened to have prevoted for those rounds). | |||
Further, define the JSet at height ``H`` of a set of validators ``VSet`` | |||
to be the union of the JSets for each validator in ``VSet``. For a given | |||
commit by honest validators at round ``R`` for block ``B`` we can | |||
construct a JSet to justify the commit for ``B`` at ``R``. We say that a | |||
JSet *justifies* a commit at ``(H,R)`` if all the committers (validators | |||
in the commit-set) are each justified in the JSet with no duplicitous | |||
vote signatures (by the committers). | |||
- **Lemma**: When a fork is detected by the existence of two | |||
conflicting `commits <./validators.html#commiting-a-block>`__, | |||
the union of the JSets for both commits (if they can be compiled) | |||
must include double-signing by at least 1/3+ of the validator set. | |||
**Proof**: The commit cannot be at the same round, because that would | |||
immediately imply double-signing by 1/3+. Take the union of the JSets | |||
of both commits. If there is no double-signing by at least 1/3+ of | |||
the validator set in the union, then no honest validator could have | |||
precommitted any different block after the first commit. Yet, +2/3 | |||
did. Reductio ad absurdum. | |||
As a corollary, when there is a fork, an external process can determine | |||
the blame by requiring each validator to justify all of its round votes. | |||
Either we will find 1/3+ who cannot justify at least one of their votes, | |||
and/or, we will find 1/3+ who had double-signed. | |||
Alternative algorithm | |||
~~~~~~~~~~~~~~~~~~~~~ | |||
Alternatively, we can take the JSet of a commit to be the "full commit". | |||
That is, if light clients and validators do not consider a block to be | |||
committed unless the JSet of the commit is also known, then we get the | |||
desirable property that if there ever is a fork (e.g. there are two | |||
conflicting "full commits"), then 1/3+ of the validators are immediately | |||
punishable for double-signing. | |||
There are many ways to ensure that the gossip network efficiently share | |||
the JSet of a commit. One solution is to add a new message type that | |||
tells peers that this node has (or does not have) a +2/3 majority for B | |||
(or ) at (H,R), and a bitarray of which votes contributed towards that | |||
majority. Peers can react by responding with appropriate votes. | |||
We will implement such an algorithm for the next iteration of the | |||
Tendermint consensus protocol. | |||
Other potential improvements include adding more data in votes such as | |||
the last known PoLC round that caused a lock change, and the last voted | |||
round/step (or, we may require that validators not skip any votes). This | |||
may make JSet verification/gossip logic easier to implement. | |||
Censorship Attacks | |||
~~~~~~~~~~~~~~~~~~ | |||
Due to the definition of a block | |||
`commit <validators.html#commiting-a-block>`__, any 1/3+ | |||
coalition of validators can halt the blockchain by not broadcasting | |||
their votes. Such a coalition can also censor particular transactions by | |||
rejecting blocks that include these transactions, though this would | |||
result in a significant proportion of block proposals to be rejected, | |||
which would slow down the rate of block commits of the blockchain, | |||
reducing its utility and value. The malicious coalition might also | |||
broadcast votes in a trickle so as to grind blockchain block commits to | |||
a near halt, or engage in any combination of these attacks. | |||
If a global active adversary were also involved, it can partition the | |||
network in such a way that it may appear that the wrong subset of | |||
validators were responsible for the slowdown. This is not just a | |||
limitation of Tendermint, but rather a limitation of all consensus | |||
protocols whose network is potentially controlled by an active | |||
adversary. | |||
Overcoming Forks and Censorship Attacks | |||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||
For these types of attacks, a subset of the validators through external | |||
means should coordinate to sign a reorg-proposal that chooses a fork | |||
(and any evidence thereof) and the initial subset of validators with | |||
their signatures. Validators who sign such a reorg-proposal forego its | |||
collateral on all other forks. Clients should verify the signatures on | |||
the reorg-proposal, verify any evidence, and make a judgement or prompt | |||
the end-user for a decision. For example, a phone wallet app may prompt | |||
the user with a security warning, while a refrigerator may accept any | |||
reorg-proposal signed by +1/2 of the original validators. | |||
No non-synchronous Byzantine fault-tolerant algorithm can come to | |||
consensus when 1/3+ of validators are dishonest, yet a fork assumes that | |||
1/3+ of validators have already been dishonest by double-signing or | |||
lock-changing without justification. So, signing the reorg-proposal is a | |||
coordination problem that cannot be solved by any non-synchronous | |||
protocol (i.e. automatically, and without making assumptions about the | |||
reliability of the underlying network). It must be provided by means | |||
external to the weakly-synchronous Tendermint consensus algorithm. For | |||
now, we leave the problem of reorg-proposal coordination to human | |||
coordination via internet media. Validators must take care to ensure | |||
that there are no significant network partitions, to avoid situations | |||
where two conflicting reorg-proposals are signed. | |||
Assuming that the external coordination medium and protocol is robust, | |||
it follows that forks are less of a concern than `censorship | |||
attacks <#censorship-attacks>`__. |
@ -1,70 +0,0 @@ | |||
Corruption | |||
========== | |||
Important step | |||
-------------- | |||
Make sure you have a backup of the Tendermint data directory. | |||
Possible causes | |||
--------------- | |||
Remember that most corruption is caused by hardware issues: | |||
- RAID controllers with faulty / worn out battery backup, and an unexpected power loss | |||
- Hard disk drives with write-back cache enabled, and an unexpected power loss | |||
- Cheap SSDs with insufficient power-loss protection, and an unexpected power-loss | |||
- Defective RAM | |||
- Defective or overheating CPU(s) | |||
Other causes can be: | |||
- Database systems configured with fsync=off and an OS crash or power loss | |||
- Filesystems configured to use write barriers plus a storage layer that ignores write barriers. LVM is a particular culprit. | |||
- Tendermint bugs | |||
- Operating system bugs | |||
- Admin error | |||
- directly modifying Tendermint data-directory contents | |||
(Source: https://wiki.postgresql.org/wiki/Corruption) | |||
WAL Corruption | |||
-------------- | |||
If consensus WAL is corrupted at the lastest height and you are trying to start | |||
Tendermint, replay will fail with panic. | |||
Recovering from data corruption can be hard and time-consuming. Here are two approaches you can take: | |||
1) Delete the WAL file and restart Tendermint. It will attempt to sync with other peers. | |||
2) Try to repair the WAL file manually: | |||
1. Create a backup of the corrupted WAL file: | |||
.. code:: bash | |||
cp "$TMHOME/data/cs.wal/wal" > /tmp/corrupted_wal_backup | |||
2. Use ./scripts/wal2json to create a human-readable version | |||
.. code:: bash | |||
./scripts/wal2json/wal2json "$TMHOME/data/cs.wal/wal" > /tmp/corrupted_wal | |||
3. Search for a "CORRUPTED MESSAGE" line. | |||
4. By looking at the previous message and the message after the corrupted one | |||
and looking at the logs, try to rebuild the message. If the consequent | |||
messages are marked as corrupted too (this may happen if length header | |||
got corrupted or some writes did not make it to the WAL ~ truncation), | |||
then remove all the lines starting from the corrupted one and restart | |||
Tendermint. | |||
.. code:: bash | |||
$EDITOR /tmp/corrupted_wal | |||
5. After editing, convert this file back into binary form by running: | |||
.. code:: bash | |||
./scripts/json2wal/json2wal /tmp/corrupted_wal > "$TMHOME/data/cs.wal/wal" |
@ -1,71 +0,0 @@ | |||
Genesis | |||
======= | |||
The genesis.json file in ``$TMHOME/config`` defines the initial TendermintCore | |||
state upon genesis of the blockchain (`see | |||
definition <https://github.com/tendermint/tendermint/blob/master/types/genesis.go>`__). | |||
Fields | |||
~~~~~~ | |||
- ``genesis_time``: Official time of blockchain start. | |||
- ``chain_id``: ID of the blockchain. This must be unique for every | |||
blockchain. If your testnet blockchains do not have unique chain IDs, | |||
you will have a bad time. | |||
- ``validators``: | |||
- ``pub_key``: The first element specifies the pub\_key type. 1 == | |||
Ed25519. The second element are the pubkey bytes. | |||
- ``power``: The validator's voting power. | |||
- ``name``: Name of the validator (optional). | |||
- ``app_hash``: The expected application hash (as returned by the | |||
``ResponseInfo`` ABCI message) upon genesis. If the app's hash does not | |||
match, Tendermint will panic. | |||
- ``app_state``: The application state (e.g. initial distribution of tokens). | |||
Sample genesis.json | |||
~~~~~~~~~~~~~~~~~~~ | |||
.. code:: json | |||
{ | |||
"genesis_time": "2016-02-05T06:02:31.526Z", | |||
"chain_id": "chain-tTH4mi", | |||
"validators": [ | |||
{ | |||
"pub_key": [ | |||
1, | |||
"9BC5112CB9614D91CE423FA8744885126CD9D08D9FC9D1F42E552D662BAA411E" | |||
], | |||
"power": 1, | |||
"name": "mach1" | |||
}, | |||
{ | |||
"pub_key": [ | |||
1, | |||
"F46A5543D51F31660D9F59653B4F96061A740FF7433E0DC1ECBC30BE8494DE06" | |||
], | |||
"power": 1, | |||
"name": "mach2" | |||
}, | |||
{ | |||
"pub_key": [ | |||
1, | |||
"0E7B423C1635FD07C0FC3603B736D5D27953C1C6CA865BB9392CD79DE1A682BB" | |||
], | |||
"power": 1, | |||
"name": "mach3" | |||
}, | |||
{ | |||
"pub_key": [ | |||
1, | |||
"4F49237B9A32EB50682EDD83C48CE9CDB1D02A7CFDADCFF6EC8C1FAADB358879" | |||
], | |||
"power": 1, | |||
"name": "mach4" | |||
} | |||
], | |||
"app_hash": "15005165891224E721CB664D15CB972240F5703F", | |||
"app_state": { | |||
{"account": "Bob", "coins": 5000} | |||
} | |||
} |
@ -1,33 +0,0 @@ | |||
Light Client Protocol | |||
===================== | |||
Light clients are an important part of the complete blockchain system | |||
for most applications. Tendermint provides unique speed and security | |||
properties for light client applications. | |||
See our `lite package | |||
<https://godoc.org/github.com/tendermint/tendermint/lite>`__. | |||
Overview | |||
-------- | |||
The objective of the light client protocol is to get a | |||
`commit <./validators.html#committing-a-block>`__ for a recent | |||
`block hash <./block-structure.html#block-hash>`__ where the commit | |||
includes a majority of signatures from the last known validator set. | |||
From there, all the application state is verifiable with `merkle | |||
proofs <./merkle.html#iavl-tree>`__. | |||
Properties | |||
---------- | |||
- You get the full collateralized security benefits of Tendermint; No | |||
need to wait for confirmations. | |||
- You get the full speed benefits of Tendermint; transactions commit | |||
instantly. | |||
- You can get the most recent version of the application state | |||
non-interactively (without committing anything to the blockchain). | |||
For example, this means that you can get the most recent value of a | |||
name from the name-registry without worrying about fork censorship | |||
attacks, without posting a commit and waiting for confirmations. It's | |||
fast, secure, and free! |
@ -1,88 +0,0 @@ | |||
Merkle | |||
====== | |||
For an overview of Merkle trees, see | |||
`wikipedia <https://en.wikipedia.org/wiki/Merkle_tree>`__. | |||
There are two types of Merkle trees used in Tendermint. | |||
- **IAVL+ Tree**: An immutable self-balancing binary | |||
tree for persistent application state | |||
- **Simple Tree**: A simple compact binary tree for | |||
a static list of items | |||
IAVL+ Tree | |||
---------- | |||
The purpose of this data structure is to provide persistent storage for | |||
key-value pairs (e.g. account state, name-registrar data, and | |||
per-contract data) such that a deterministic merkle root hash can be | |||
computed. The tree is balanced using a variant of the `AVL | |||
algorithm <http://en.wikipedia.org/wiki/AVL_tree>`__ so all operations | |||
are O(log(n)). | |||
Nodes of this tree are immutable and indexed by its hash. Thus any node | |||
serves as an immutable snapshot which lets us stage uncommitted | |||
transactions from the mempool cheaply, and we can instantly roll back to | |||
the last committed state to process transactions of a newly committed | |||
block (which may not be the same set of transactions as those from the | |||
mempool). | |||
In an AVL tree, the heights of the two child subtrees of any node differ | |||
by at most one. Whenever this condition is violated upon an update, the | |||
tree is rebalanced by creating O(log(n)) new nodes that point to | |||
unmodified nodes of the old tree. In the original AVL algorithm, inner | |||
nodes can also hold key-value pairs. The AVL+ algorithm (note the plus) | |||
modifies the AVL algorithm to keep all values on leaf nodes, while only | |||
using branch-nodes to store keys. This simplifies the algorithm while | |||
minimizing the size of merkle proofs | |||
In Ethereum, the analog is the `Patricia | |||
trie <http://en.wikipedia.org/wiki/Radix_tree>`__. There are tradeoffs. | |||
Keys do not need to be hashed prior to insertion in IAVL+ trees, so this | |||
provides faster iteration in the key space which may benefit some | |||
applications. The logic is simpler to implement, requiring only two | |||
types of nodes -- inner nodes and leaf nodes. The IAVL+ tree is a binary | |||
tree, so merkle proofs are much shorter than the base 16 Patricia trie. | |||
On the other hand, while IAVL+ trees provide a deterministic merkle root | |||
hash, it depends on the order of updates. In practice this shouldn't be | |||
a problem, since you can efficiently encode the tree structure when | |||
serializing the tree contents. | |||
Simple Tree | |||
----------- | |||
For merkelizing smaller static lists, use the Simple Tree. The | |||
transactions and validation signatures of a block are hashed using this | |||
simple merkle tree logic. | |||
If the number of items is not a power of two, the tree will not be full | |||
and some leaf nodes will be at different levels. Simple Tree tries to | |||
keep both sides of the tree the same size, but the left side may be one | |||
greater. | |||
:: | |||
Simple Tree with 6 items Simple Tree with 7 items | |||
* * | |||
/ \ / \ | |||
/ \ / \ | |||
/ \ / \ | |||
/ \ / \ | |||
* * * * | |||
/ \ / \ / \ / \ | |||
/ \ / \ / \ / \ | |||
/ \ / \ / \ / \ | |||
* h2 * h5 * * * h6 | |||
/ \ / \ / \ / \ / \ | |||
h0 h1 h3 h4 h0 h1 h2 h3 h4 h5 | |||
Simple Tree with Dictionaries | |||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |||
The Simple Tree is used to merkelize a list of items, so to merkelize a | |||
(short) dictionary of key-value pairs, encode the dictionary as an | |||
ordered list of ``KVPair`` structs. The block hash is such a hash | |||
derived from all the fields of the block ``Header``. The state hash is | |||
similarly derived. |
@ -1 +0,0 @@ | |||
Spec moved to [docs/spec](https://github.com/tendermint/tendermint/tree/master/docs/spec). |
@ -1,172 +0,0 @@ | |||
Wire Protocol | |||
============= | |||
The `Tendermint wire protocol <https://github.com/tendermint/go-wire>`__ | |||
encodes data in `c-style binary <#binary>`__ and `JSON <#json>`__ form. | |||
Supported types | |||
--------------- | |||
- Primitive types | |||
- ``uint8`` (aka ``byte``), ``uint16``, ``uint32``, ``uint64`` | |||
- ``int8``, ``int16``, ``int32``, ``int64`` | |||
- ``uint``, ``int``: variable length (un)signed integers | |||
- ``string``, ``[]byte`` | |||
- ``time`` | |||
- Derived types | |||
- structs | |||
- var-length arrays of a particular type | |||
- fixed-length arrays of a particular type | |||
- interfaces: registered union types preceded by a ``type byte`` | |||
- pointers | |||
Binary | |||
------ | |||
**Fixed-length primitive types** are encoded with 1,2,3, or 4 big-endian | |||
bytes. - ``uint8`` (aka ``byte``), ``uint16``, ``uint32``, ``uint64``: | |||
takes 1,2,3, and 4 bytes respectively - ``int8``, ``int16``, ``int32``, | |||
``int64``: takes 1,2,3, and 4 bytes respectively - ``time``: ``int64`` | |||
representation of nanoseconds since epoch | |||
**Variable-length integers** are encoded with a single leading byte | |||
representing the length of the following big-endian bytes. For signed | |||
negative integers, the most significant bit of the leading byte is a 1. | |||
- ``uint``: 1-byte length prefixed variable-size (0 ~ 255 bytes) | |||
unsigned integers | |||
- ``int``: 1-byte length prefixed variable-size (0 ~ 127 bytes) signed | |||
integers | |||
NOTE: While the number 0 (zero) is encoded with a single byte ``x00``, | |||
the number 1 (one) takes two bytes to represent: ``x0101``. This isn't | |||
the most efficient representation, but the rules are easier to remember. | |||
+---------------+----------------+----------------+ | |||
| number | binary | binary ``int`` | | |||
| | ``uint`` | | | |||
+===============+================+================+ | |||
| 0 | ``x00`` | ``x00`` | | |||
+---------------+----------------+----------------+ | |||
| 1 | ``x0101`` | ``x0101`` | | |||
+---------------+----------------+----------------+ | |||
| 2 | ``x0102`` | ``x0102`` | | |||
+---------------+----------------+----------------+ | |||
| 256 | ``x020100`` | ``x020100`` | | |||
+---------------+----------------+----------------+ | |||
| 2^(127\ *8)-1 | ``x800100...`` | overflow | | |||
| \| | | | | |||
| ``x7FFFFF...` | | | | |||
| ` | | | | |||
| \| | | | | |||
| ``x7FFFFF...` | | | | |||
| ` | | | | |||
| \| \| | | | | |||
| 2^(127*\ 8) | | | | |||
+---------------+----------------+----------------+ | |||
| 2^(255\*8)-1 | | |||
| \| | | |||
| ``xFFFFFF...` | | |||
| ` | | |||
| \| overflow | | |||
| \| \| -1 \| | | |||
| n/a \| | | |||
| ``x8101`` \| | | |||
| \| -2 \| n/a | | |||
| \| ``x8102`` | | |||
| \| \| -256 \| | | |||
| n/a \| | | |||
| ``x820100`` | | |||
| \| | | |||
+---------------+----------------+----------------+ | |||
**Structures** are encoded by encoding the field values in order of | |||
declaration. | |||
.. code:: go | |||
type Foo struct { | |||
MyString string | |||
MyUint32 uint32 | |||
} | |||
var foo = Foo{"626172", math.MaxUint32} | |||
/* The binary representation of foo: | |||
0103626172FFFFFFFF | |||
0103: `int` encoded length of string, here 3 | |||
626172: 3 bytes of string "bar" | |||
FFFFFFFF: 4 bytes of uint32 MaxUint32 | |||
*/ | |||
**Variable-length arrays** are encoded with a leading ``int`` denoting | |||
the length of the array followed by the binary representation of the | |||
items. **Fixed-length arrays** are similar but aren't preceded by the | |||
leading ``int``. | |||
.. code:: go | |||
foos := []Foo{foo, foo} | |||
/* The binary representation of foos: | |||
01020103626172FFFFFFFF0103626172FFFFFFFF | |||
0102: `int` encoded length of array, here 2 | |||
0103626172FFFFFFFF: the first `foo` | |||
0103626172FFFFFFFF: the second `foo` | |||
*/ | |||
foos := [2]Foo{foo, foo} // fixed-length array | |||
/* The binary representation of foos: | |||
0103626172FFFFFFFF0103626172FFFFFFFF | |||
0103626172FFFFFFFF: the first `foo` | |||
0103626172FFFFFFFF: the second `foo` | |||
*/ | |||
**Interfaces** can represent one of any number of concrete types. The | |||
concrete types of an interface must first be declared with their | |||
corresponding ``type byte``. An interface is then encoded with the | |||
leading ``type byte``, then the binary encoding of the underlying | |||
concrete type. | |||
NOTE: The byte ``x00`` is reserved for the ``nil`` interface value and | |||
``nil`` pointer values. | |||
.. code:: go | |||
type Animal interface{} | |||
type Dog uint32 | |||
type Cat string | |||
RegisterInterface( | |||
struct{ Animal }{}, // Convenience for referencing the 'Animal' interface | |||
ConcreteType{Dog(0), 0x01}, // Register the byte 0x01 to denote a Dog | |||
ConcreteType{Cat(""), 0x02}, // Register the byte 0x02 to denote a Cat | |||
) | |||
var animal Animal = Dog(02) | |||
/* The binary representation of animal: | |||
010102 | |||
01: the type byte for a `Dog` | |||
0102: the bytes of Dog(02) | |||
*/ | |||
**Pointers** are encoded with a single leading byte ``x00`` for ``nil`` | |||
pointers, otherwise encoded with a leading byte ``x01`` followed by the | |||
binary encoding of the value pointed to. | |||
NOTE: It's easy to convert pointer types into interface types, since the | |||
``type byte`` ``x00`` is always ``nil``. | |||
JSON | |||
---- | |||
The JSON codec is compatible with the ```binary`` <#binary>`__ codec, | |||
and is fairly intuitive if you're already familiar with golang's JSON | |||
encoding. Some quirks are noted below: | |||
- variable-length and fixed-length bytes are encoded as uppercase | |||
hexadecimal strings | |||
- interface values are encoded as an array of two items: | |||
``[type_byte, concrete_value]`` | |||
- times are encoded as rfc2822 strings |
@ -0,0 +1,206 @@ | |||
# Block Structure | |||
The tendermint consensus engine records all agreements by a | |||
supermajority of nodes into a blockchain, which is replicated among all | |||
nodes. This blockchain is accessible via various rpc endpoints, mainly | |||
`/block?height=` to get the full block, as well as | |||
`/blockchain?minHeight=_&maxHeight=_` to get a list of headers. But what | |||
exactly is stored in these blocks? | |||
## Block | |||
A | |||
[Block](https://godoc.org/github.com/tendermint/tendermint/types#Block) | |||
contains: | |||
- a [Header](#header) contains merkle hashes for various chain states | |||
- the | |||
[Data](https://godoc.org/github.com/tendermint/tendermint/types#Data) | |||
is all transactions which are to be processed | |||
- the [LastCommit](#commit) > 2/3 signatures for the last block | |||
The signatures returned along with block `H` are those validating block | |||
`H-1`. This can be a little confusing, but we must also consider that | |||
the `Header` also contains the `LastCommitHash`. It would be impossible | |||
for a Header to include the commits that sign it, as it would cause an | |||
infinite loop here. But when we get block `H`, we find | |||
`Header.LastCommitHash`, which must match the hash of `LastCommit`. | |||
## Header | |||
The | |||
[Header](https://godoc.org/github.com/tendermint/tendermint/types#Header) | |||
contains lots of information (follow link for up-to-date info). Notably, | |||
it maintains the `Height`, the `LastBlockID` (to make it a chain), and | |||
hashes of the data, the app state, and the validator set. This is | |||
important as the only item that is signed by the validators is the | |||
`Header`, and all other data must be validated against one of the merkle | |||
hashes in the `Header`. | |||
The `DataHash` can provide a nice check on the | |||
[Data](https://godoc.org/github.com/tendermint/tendermint/types#Data) | |||
returned in this same block. If you are subscribed to new blocks, via | |||
tendermint RPC, in order to display or process the new transactions you | |||
should at least validate that the `DataHash` is valid. If it is | |||
important to verify autheniticity, you must wait for the `LastCommit` | |||
from the next block to make sure the block header (including `DataHash`) | |||
was properly signed. | |||
The `ValidatorHash` contains a hash of the current | |||
[Validators](https://godoc.org/github.com/tendermint/tendermint/types#Validator). | |||
Tracking all changes in the validator set is complex, but a client can | |||
quickly compare this hash with the [hash of the currently known | |||
validators](https://godoc.org/github.com/tendermint/tendermint/types#ValidatorSet.Hash) | |||
to see if there have been changes. | |||
The `AppHash` serves as the basis for validating any merkle proofs that | |||
come from the ABCI application. It represents the state of the actual | |||
application, rather that the state of the blockchain itself. This means | |||
it's necessary in order to perform any business logic, such as verifying | |||
an account balance. | |||
**Note** After the transactions are committed to a block, they still | |||
need to be processed in a separate step, which happens between the | |||
blocks. If you find a given transaction in the block at height `H`, the | |||
effects of running that transaction will be first visible in the | |||
`AppHash` from the block header at height `H+1`. | |||
Like the `LastCommit` issue, this is a requirement of the immutability | |||
of the block chain, as the application only applies transactions *after* | |||
they are commited to the chain. | |||
## Commit | |||
The | |||
[Commit](https://godoc.org/github.com/tendermint/tendermint/types#Commit) | |||
contains a set of | |||
[Votes](https://godoc.org/github.com/tendermint/tendermint/types#Vote) | |||
that were made by the validator set to reach consensus on this block. | |||
This is the key to the security in any PoS system, and actually no data | |||
that cannot be traced back to a block header with a valid set of Votes | |||
can be trusted. Thus, getting the Commit data and verifying the votes is | |||
extremely important. | |||
As mentioned above, in order to find the `precommit votes` for block | |||
header `H`, we need to query block `H+1`. Then we need to check the | |||
votes, make sure they really are for that block, and properly formatted. | |||
Much of this code is implemented in Go in the | |||
[light-client](https://github.com/tendermint/light-client) package. If | |||
you look at the code, you will notice that we need to provide the | |||
`chainID` of the blockchain in order to properly calculate the votes. | |||
This is to protect anyone from swapping votes between chains to fake (or | |||
frame) a validator. Also note that this `chainID` is in the | |||
`genesis.json` from *Tendermint*, not the `genesis.json` from the | |||
basecoin app ([that is a different | |||
chainID...](https://github.com/cosmos/cosmos-sdk/issues/32)). | |||
Once we have those votes, and we calculated the proper [sign | |||
bytes](https://godoc.org/github.com/tendermint/tendermint/types#Vote.WriteSignBytes) | |||
using the chainID and a [nice helper | |||
function](https://godoc.org/github.com/tendermint/tendermint/types#SignBytes), | |||
we can verify them. The light client is responsible for maintaining a | |||
set of validators that we trust. Each vote only stores the validators | |||
`Address`, as well as the `Signature`. Assuming we have a local copy of | |||
the trusted validator set, we can look up the `Public Key` of the | |||
validator given its `Address`, then verify that the `Signature` matches | |||
the `SignBytes` and `Public Key`. Then we sum up the total voting power | |||
of all validators, whose votes fulfilled all these stringent | |||
requirements. If the total number of voting power for a single block is | |||
greater than 2/3 of all voting power, then we can finally trust the | |||
block header, the AppHash, and the proof we got from the ABCI | |||
application. | |||
### Vote Sign Bytes | |||
The `sign-bytes` of a vote is produced by taking a | |||
[stable-json](https://github.com/substack/json-stable-stringify)-like | |||
deterministic JSON [wire](./wire-protocol.html) encoding of the vote | |||
(excluding the `Signature` field), and wrapping it with | |||
`{"chain_id":"my_chain","vote":...}`. | |||
For example, a precommit vote might have the following `sign-bytes`: | |||
``` | |||
{"chain_id":"my_chain","vote":{"block_hash":"611801F57B4CE378DF1A3FFF1216656E89209A99","block_parts_header":{"hash":"B46697379DBE0774CC2C3B656083F07CA7E0F9CE","total":123},"height":1234,"round":1,"type":2}} | |||
``` | |||
## Block Hash | |||
The [block | |||
hash](https://godoc.org/github.com/tendermint/tendermint/types#Block.Hash) | |||
is the [Simple Tree hash](./merkle.html#simple-tree-with-dictionaries) | |||
of the fields of the block `Header` encoded as a list of `KVPair`s. | |||
## Transaction | |||
A transaction is any sequence of bytes. It is up to your ABCI | |||
application to accept or reject transactions. | |||
## BlockID | |||
Many of these data structures refer to the | |||
[BlockID](https://godoc.org/github.com/tendermint/tendermint/types#BlockID), | |||
which is the `BlockHash` (hash of the block header, also referred to by | |||
the next block) along with the `PartSetHeader`. The `PartSetHeader` is | |||
explained below and is used internally to orchestrate the p2p | |||
propogation. For clients, it is basically opaque bytes, but they must | |||
match for all votes. | |||
## PartSetHeader | |||
The | |||
[PartSetHeader](https://godoc.org/github.com/tendermint/tendermint/types#PartSetHeader) | |||
contains the total number of pieces in a | |||
[PartSet](https://godoc.org/github.com/tendermint/tendermint/types#PartSet), | |||
and the Merkle root hash of those pieces. | |||
## PartSet | |||
PartSet is used to split a byteslice of data into parts (pieces) for | |||
transmission. By splitting data into smaller parts and computing a | |||
Merkle root hash on the list, you can verify that a part is legitimately | |||
part of the complete data, and the part can be forwarded to other peers | |||
before all the parts are known. In short, it's a fast way to securely | |||
propagate a large chunk of data (like a block) over a gossip network. | |||
PartSet was inspired by the LibSwift project. | |||
Usage: | |||
``` | |||
data := RandBytes(2 << 20) // Something large | |||
partSet := NewPartSetFromData(data) | |||
partSet.Total() // Total number of 4KB parts | |||
partSet.Count() // Equal to the Total, since we already have all the parts | |||
partSet.Hash() // The Merkle root hash | |||
partSet.BitArray() // A BitArray of partSet.Total() 1's | |||
header := partSet.Header() // Send this to the peer | |||
header.Total // Total number of parts | |||
header.Hash // The merkle root hash | |||
// Now we'll reconstruct the data from the parts | |||
partSet2 := NewPartSetFromHeader(header) | |||
partSet2.Total() // Same total as partSet.Total() | |||
partSet2.Count() // Zero, since this PartSet doesn't have any parts yet. | |||
partSet2.Hash() // Same hash as in partSet.Hash() | |||
partSet2.BitArray() // A BitArray of partSet.Total() 0's | |||
// In a gossip network the parts would arrive in arbitrary order, perhaps | |||
// in response to explicit requests for parts, or optimistically in response | |||
// to the receiving peer's partSet.BitArray(). | |||
for !partSet2.IsComplete() { | |||
part := receivePartFromGossipNetwork() | |||
added, err := partSet2.AddPart(part) | |||
if err != nil { | |||
// A wrong part, | |||
// the merkle trail does not hash to partSet2.Hash() | |||
} else if !added { | |||
// A duplicate part already received | |||
} | |||
} | |||
data2, _ := ioutil.ReadAll(partSet2.GetReader()) | |||
bytes.Equal(data, data2) // true | |||
``` |
@ -0,0 +1,30 @@ | |||
# Light Client Protocol | |||
Light clients are an important part of the complete blockchain system | |||
for most applications. Tendermint provides unique speed and security | |||
properties for light client applications. | |||
See our [lite | |||
package](https://godoc.org/github.com/tendermint/tendermint/lite). | |||
## Overview | |||
The objective of the light client protocol is to get a | |||
[commit](./validators.md#committing-a-block) for a recent [block | |||
hash](../spec/consensus/consensus.md.md#block-hash) where the commit includes a | |||
majority of signatures from the last known validator set. From there, | |||
all the application state is verifiable with [merkle | |||
proofs](./merkle.md#iavl-tree). | |||
## Properties | |||
- You get the full collateralized security benefits of Tendermint; No | |||
need to wait for confirmations. | |||
- You get the full speed benefits of Tendermint; transactions | |||
commit instantly. | |||
- You can get the most recent version of the application state | |||
non-interactively (without committing anything to the blockchain). | |||
For example, this means that you can get the most recent value of a | |||
name from the name-registry without worrying about fork censorship | |||
attacks, without posting a commit and waiting for confirmations. | |||
It's fast, secure, and free! |
@ -0,0 +1,41 @@ | |||
#!/usr/bin/env bash | |||
# XXX: this script is intended to be run from | |||
# an MacOS machine | |||
# as written, this script will install | |||
# tendermint core from master branch | |||
REPO=github.com/tendermint/tendermint | |||
# change this to a specific release or branch | |||
BRANCH=master | |||
if ! [ -x "$(command -v brew)" ]; then | |||
echo 'Error: brew is not installed, to install brew' >&2 | |||
echo 'follow the instructions here: https://docs.brew.sh/Installation' >&2 | |||
exit 1 | |||
fi | |||
if ! [ -x "$(command -v go)" ]; then | |||
echo 'Error: go is not installed, to install go follow' >&2 | |||
echo 'the instructions here: https://golang.org/doc/install#tarball' >&2 | |||
echo 'ALSO MAKE SURE TO SETUP YOUR $GOPATH and $GOBIN in your ~/.profile: https://github.com/golang/go/wiki/SettingGOPATH' >&2 | |||
exit 1 | |||
fi | |||
if ! [ -x "$(command -v make)" ]; then | |||
echo 'Make not installed, installing using brew...' | |||
brew install make | |||
fi | |||
# get the code and move into repo | |||
go get $REPO | |||
cd $GOPATH/src/$REPO | |||
# build & install | |||
git checkout $BRANCH | |||
# XXX: uncomment if branch isn't master | |||
# git fetch origin $BRANCH | |||
make get_tools | |||
make get_vendor_deps | |||
make install |