Browse Source

Prepare to Nuke Develop (#47)

* state -> step

* vote -> v

* New version of the algorithm and the proof

* New version of the algorithm and the proofs

* Added algorithm description

* Add algorithm description

* Add introduction

* Add conclusion

* Add conclusion file

* fix warnings (caption was defined twice)

- only the latter is used anyways (centers captions)
- this makes it possible to autom. building the paper

* Update grammar

* s/state_p/step_p

* Address Ismail's comments

* intro: language fixes

* definitions: language fixes

* consensus: various fixes

* proof: some fixes

* try to improve reviewability

* \eq -> =

* textwrap to 79

* various minor fixes

* proof: fix itemization

* proof: more minor fixes

* proof: timeouts are functions

* proof: fixes to lemma6

* Intro changes and improve title page

* Add Marko and Ming to acks

* add readme

* Format algorithm correctly

Clarify condition semantic and timeouts

Improve descriptions

* patform -> platform

* Ensure that rules are mutually exclusive

- various clarifications and small improvements

* Release v0.6

* small nits for smoother readability
pull/7804/head
Marko 5 years ago
committed by GitHub
parent
commit
87abbf78e6
No known key found for this signature in database GPG Key ID: 4AEE18F83AFDEB23
21 changed files with 1097 additions and 353 deletions
  1. +5
    -23
      README.md
  2. +0
    -249
      consensus.tex
  3. +0
    -0
      consensus/IEEEtran.bst
  4. +0
    -0
      consensus/IEEEtran.cls
  5. +25
    -0
      consensus/README.md
  6. +0
    -0
      consensus/algorithmicplus.sty
  7. +16
    -0
      consensus/conclusion.tex
  8. +397
    -0
      consensus/consensus.tex
  9. +117
    -0
      consensus/definitions.tex
  10. +15
    -2
      consensus/homodel.sty
  11. +138
    -0
      consensus/intro.tex
  12. +0
    -0
      consensus/latex8.bst
  13. +0
    -0
      consensus/latex8.sty
  14. +80
    -1
      consensus/lit.bib
  15. +15
    -14
      consensus/paper.tex
  16. +280
    -0
      consensus/proof.tex
  17. +9
    -8
      consensus/rounddiag.sty
  18. +0
    -0
      consensus/technote.sty
  19. +0
    -54
      definitions.tex
  20. BIN
      paper.dvi
  21. +0
    -2
      proof.tex

+ 5
- 23
README.md View File

@ -1,25 +1,7 @@
# Tendermint-spec
# Tendermint Spec
The repository contains the specification (and the proofs) of the Tendermint
consensus protocol.
## How to install Latex on Mac OS
MacTex is Latex distribution for Mac OS. You can download it [here](http://www.tug.org/mactex/mactex-download.html).
Popular IDE for Latex-based projects is TexStudio. It can be downloaded
[here](https://www.texstudio.org/).
## How to build project
In order to compile the latex files (and write bibliography), execute
`$ pdflatex paper` <br/>
`$ bibtex paper` <br/>
`$ pdflatex paper` <br/>
`$ pdflatex paper` <br/>
The generated file is paper.pdf. You can open it with
`$ open paper.pdf`
This repository contains the paper describing the Tendermint consensus
algorithm, including formal proofs of its safety and liveness.
For the pdf, see the [latest
release](https://github.com/tendermint/spec/releases).

+ 0
- 249
consensus.tex View File

@ -1,249 +0,0 @@
\section{Tendermint}
\label{sec:tendermint}
\newcommand\Disseminate{\textbf{Disseminate}}
\newcommand\Proposal{\mathsf{PROPOSAL}}
\newcommand\ProposalPart{\mathsf{PROPOSAL\mbox{-}PART}}
\newcommand\PrePrepare{\mathsf{INIT}}
\newcommand\Prevote{\mathsf{PREVOTE}}
\newcommand\Precommit{\mathsf{PRECOMMIT}}
\newcommand\Decision{\mathsf{DECISION}}
\newcommand\ViewChange{\mathsf{VC}}
\newcommand\ViewChangeAck{\mathsf{VC\mbox{-}ACK}}
\newcommand\NewPrePrepare{\mathsf{VC\mbox{-}INIT}}
\newcommand\coord{\mathsf{proposer}}
\newcommand\newHeight{newHeight}
\newcommand\newRound{newRound}
\newcommand\nil{nil}
\newcommand\prevote{prevote}
\newcommand\precommit{precommit}
\newcommand\commit{commit}
\newcommand\timeoutPropose{timeoutPropose}
\newcommand\timeoutPrevote{timeoutPrevote}
\newcommand\timeoutPrecommit{timeoutPrecommit}
\newcommand\proofOfLocking{proof\mbox{-}of\mbox{-}locking}
\begin{algorithm}[htb!]
\def\baselinestretch{1}
\scriptsize\raggedright
\begin{algorithmic}[1]
\SHORTSPACE
\INIT{}
\STATE $height_p := 0$ \COMMENT{current height, or consensus instance we are currently executing}
\STATE $round_p := 0$ \COMMENT{current round number}
\STATE $step_p \in \set{\propose=0, \prevote=1, \precommit=2, \commit=3}$, initially $\propose$ \COMMENT{current round step}
\STATE $lockedValue := nil$
\STATE $lockedRound := -1$
\STATE $validValue := nil$
\STATE $validRound := -1$
\ENDINIT
\SPACE
\UPON{start}
\STATE $StartRound(0)$
\ENDUPON
\SPACE
\FUNCTION{$StartRound(round)$} \label{line:tab:startRound}
\STATE $round_p \assign round$
\STATE $state_p \assign \propose$
\IF{$\coord(height_p, round_p) = p$}
\IF{$lockedValue_p \neq \nil$}
\STATE $proposal \assign lockedValue$
\ELSIF{$validValue_p \neq \nil$}
\STATE $proposal \assign validValue$
\ELSE
\STATE $proposal \assign getValue()$
\ENDIF
\STATE \PBroadcast\ $\li{\Proposal,height_p, round_p, proposal}$ to all \label{line:tab:send-proposal}
\ENDIF
\STATE \textbf{after} $\timeoutPropose$ execute $OnTimeoutPropose(height_p, round_p)$
\ENDFUNCTION
\SPACE
\UPON{receiving $\li{\Proposal,height_p,round_p, proposal}$ \From\ $\coord(height_p,round_p)$ \With\ $state_p = \propose$} \label{line:tab:recvProposal}
\IF{$lockedValue_p \neq \nil$} \label{line:tab:hasLockedValue}
\STATE \PBroadcast \ $\li{\Prevote,height_p,round_p,lockedValue_p}$ to all \label{line:tab:send-locked-value}
\ELSIF{$valid(proposal))$} \label{line:tab:locked-value-nil}
\STATE \PBroadcast \ $\li{\Prevote,height_p,round_p,proposal}$ to all \label{line:tab:send-prevote}
\ELSE
\STATE \PBroadcast \ $\li{\Prevote,height_p,round_p,\nil}$ to all \label{line:tab:send-prevote-nil}
\ENDIF
\STATE $state_p \assign \prevote$ \label{line:tab:setStateToPrevote}
\ENDUPON
\SPACE
\FUNCTION{$OnTimeoutPropose(height,round)$} \label{line:tab:onTimeoutPropose}
\IF{$height = height_p \wedge round = round_p \wedge state_p = \propose$}
\STATE \PBroadcast \ $\li{\Prevote,height_p,round_p,\nil}$ to all
\STATE $state_p \assign \prevote$
\ENDIF
\ENDFUNCTION
\SPACE
\UPON{receiving $\li{\Prevote,height_p, round_p,*}$ \From\ at least $2f+1$ processes \With\
$state_p \le \prevote$} \label{line:tab:recvAny2/3Prevote}
\STATE \textbf{after} $\timeoutPrevote$ execute $OnTimeoutPrevote(height_p, round_p)$ \label{line:tab:timeoutPrevote}
\ENDUPON
\SPACE
\UPON{receiving $\li{\Prevote,height_p, round_p,v}$ \From\ at least $2f+1$ processes \With\
$state_p \le \prevote$} \label{line:tab:recvPrevote}
\IF{$v \neq \nil$}
\STATE $lockedValue_p \assign v$ \label{line:tab:setLockedValue}
\STATE $lockedRound_p \assign round_p$ \label{line:tab:setLockedRound}
\STATE \PBroadcast \ $\li{\Precommit,height_p,round_p,lockedValue_p}$ to all \label{line:tab:send-precommit}
\ELSE
\STATE \PBroadcast \ $\li{\Precommit,height_p,round_p,\nil}$ to all \label{line:tab:send-precommit-nil}
\ENDIF
\STATE $state_p \assign \precommit$ \label{line:tab:setStateToCommit}
\ENDUPON
\SPACE
\UPON{receiving $\li{\Prevote,height_p,round,v}$ \From\ at least $2f+1$ processes \With\
$round > lockedRound_p$} \label{line:tab:unlockRule}
\IF{$v \neq lockedValue_p$}
\STATE $lockedValue_p \assign \nil$
\STATE $lockedRound_p \assign -1$
\ENDIF
\ENDUPON
\SPACE
\UPON{receiving $\li{\Prevote,height_p,round,v}$ \From\ at least $2f+1$ processes \With\
$v \neq \nil \wedge round > validRound_p$} \label{line:tab:validValueRule}
\STATE $validValue_p \assign v$ \label{line:tab:setValidValue}
\STATE $validRound_p \assign round$ \label{line:tab:setValidRound}
\ENDUPON
\SPACE
\FUNCTION{$OnTimeoutPrevote(height,round)$} \label{line:tab:onTimeoutPrevote}
\IF{$height = height_p \wedge round = round_p \wedge state_p = \prevote$}
\STATE \PBroadcast \ $\li{\Precommit,height_p,round_p,\nil}$ to all \label{line:tab:precommit-nil-onTimeout}
\STATE $state_p \assign \precommit$
\ENDIF
\ENDFUNCTION
\SPACE
\UPON{receiving $\li{\Precommit,height_p,round,v}$ \From\ at least $2f+1$ processes \With\ $state_p < \commit$} \label{line:tab:onPrecommitRule}
\IF{$vote \neq \nil$}
\STATE $decide(height_p, v)$ \label{line:tab:decide}
\STATE$height_p \assign height_p + 1$
\STATE reset $round_p$, $step_p$, $lockedRound_p$, $lockedValue_p$, $validValue_p$ and $validRound$ to init values
\STATE $StartRound(0)$
\ENDIF
\IF{$round_p = round$}
\STATE $StartRound(round_p+1)$
\ENDIF
\ENDUPON
\SPACE
\UPON{receiving $\li{\Precommit,height_p,round_p,*}$ \From\ at least $2f+1$ processes \With\ $state_p = \precommit$} \label{line:tab:startTimeoutPrecommit}
\STATE \textbf{after} $\timeoutPrecommit$ execute $OnTimeoutPrecommit(height_p, round_p)$
\ENDUPON
\SPACE
\FUNCTION{$OnTimeoutPrecommit(height,round)$} \label{line:tab:onTimeoutPrecommit}
\IF{$height = height_p \wedge round = round_p \wedge state_p = \precommit$}
\STATE $StartRound(round_p+1)$
\ENDIF
\ENDFUNCTION
\SPACE
\UPON{receiving message \textbf{with} $height = height_p$ and $round > round_p$ \From\ at least $f+1$ processes} \label{line:tab:skipRounds}
\STATE $StartRound(round)$
\ENDUPON
\end{algorithmic}
\caption{Tendermint consensus algorithm}
\label{alg:tendermint}
\end{algorithm}
In this section we present a new Byzantine consensus algorithm called Tendermint\footnote{https://github.com/tendermint/tendermint}
(see Algorithm~\ref{alg:tendermint}).
The algorithm requires $N > 3f$, i.e., faulty processes have together the voting power that is smaller than one third of the total voting power. For simplicity we present the algorithm for the case $N = 3f + 1$. We use the notation "upon receiving message $\li{TAG, h, r, v}$ from at least $X$ processes", to denote that a message with the given field values (* denotes any value) has been received from number of processes with the aggregate voting power equals at least $X$.
The upon rules of Algorithm~\ref{alg:tendermint} are executed atomically.
Algorithm shares the basic mechanisms called the locking and the unlocking of a value with the DLS algorithm for authenticated faults (the Algorithm 2 from \cite{DLS88:jacm}), but the algorithm structure and the implementation of these mechanisms is different in Tendermint.
Tendermint consensus algorithm is optimized for the gossip based communication so the consensus layer creates a minimum number of information that needs to be gossiped. Processes exchange only three message types: $\Proposal$, $\Prevote$ and $\Precommit$, and there is only a single mode of execution, i.e., there is no separation between the normal and the recovery mode, which is the case in PBFT-like protocols (e.g., \cite{CL02:tcs}, \cite{Ver09:spinning} or \cite{Cle09:aardvark}). We believe that this makes the protocol simpler to understand and implement correctly, as one of the Tendermint goals is proposing \emph{understandable} Byzantine consensus protocol following Raft path~\cite{Ongaro14:raft}.
The algorithm proceeds in a sequence of rounds, where each round has a dedicated coordinator (called proposer). The assignment scheme of rounds to coordinators is known to all processes and is given as a function $\coord(height, round)$ returning the coordinator for round $r$ in the consensus instance $height$. In the algorithm~\ref{alg:tendermint} processes agree (decides) on a sequence of values (solves multiple instances problem) by sequentially executing one consensus instances after the other.
Tendermint decomposes the consensus problem into four relatively independent mechanisms (or subproblems) which are discussed in the following subsections:
\begin{itemize}
\item locking mechanism
\item unlocking mechanism
\item finding the right value to propose
\end{itemize}
\subsection{Locking mechanism}
\label{sec:locking}
A process in Tendermint initially does not have any preference about what the decided value should be ($lockedValue := nil$). The process learns about possible decision value from the coordinator of the current round. As a proposer might be a faulty process, different processes
might receive different suggestions. Using the locking mechanism, correct processes try to converge to a single value within the single round boundary\footnote{We will explain in the part related to safety how we will build on this mechanism to ensure safety across round boundaries.}.
If during the locking phase, a sufficient number of correct processes succeed in agreeing on the common value, then they will later decide on that value.
More formally, the locking mechanism ensures that two correct processes can lock only a single value in some round $r$ (they can however potentially lock different values in distinct rounds). In Algorithm~\ref{alg:tendermint}, a correct process $p$ locks a value $v$ in round $r$ by setting a variable $lockedValue_p$ to $v$ and a variable $lockedRound_p$ to $r$ (see line~\ref{line:tab:setLockedValue} and \ref{line:tab:setLockedRound}). A correct process can have only a single value locked at a particular point of the algorithm execution.
However, before a value is decided, a process might change his mind and unlock its current value using the unlocking mechanism.
Locking phase starts by a process suggesting what value should be locked in the current round. This is achieved by sending a $\Prevote$ message to all processes. If a correct process has locked some value $v$ ($lockedValue = v \neq \nil$), he votes for $v$ and ignores the suggested value by the proposer (see lines~\ref{line:tab:hasLockedValue}-\ref{line:tab:send-locked-value}). Otherwise, a process might accept the value suggested by the proposer if the proposal is valid\footnote{Validity of a value is an application specific and defined by the function \emph{valid()}.}, and vote for that value (see lines~\ref{line:tab:locked-value-nil}-\ref{line:tab:setStateToPrevote}). In case the suggested value is invalid or a process has not received any proposal from the current coordinator within $\timeoutPropose$ from the start of the current round, it votes with $\nil$ value (see line~\ref{line:tab:send-prevote-nil} and line \ref{line:tab:onTimeoutPropose}). Note that $\nil$ is special value that can not be locked, and it is used to inform others that the algorithm step has not completed successfully; in case of the locking phase, a process has not received a valid value from the proposer on time.
A process locks value $v$ if it receives at least $2f+1$ $\Prevote$ messages for the value $v$ in the current round (see lines~\ref{line:tab:recvPrevote}-\ref{line:tab:setLockedRound}). A correct process sends only a single $\Prevote$ message in every round. Algorithm relies on the $state$ variable to prevent correct processes from sending more than one $\Prevote$ message in a single round. As a correct process sends a single $\Prevote$ message in a round, and $f$ is the total voting power of faulty processes, two correct processes cannot receive $2f+1$ $\Prevote$ messages for different values in a single round. So if two correct processes lock some value in a given round $r$, then it must be the same value. We will prove this formally in the section~\ref{sec:proof}.
The locking phase ends by processes informing others if they locked some value in the current round (or not) by sending $\Precommit$ message. If a process has locked some value $v$ in the current round, then this value is sent in the $\Precommit$ message (see line~\ref{line:tab:send-precommit}); otherwise $\nil$ is sent (line~\ref{line:tab:send-precommit-nil} and \ref{line:tab:precommit-nil-onTimeout}). A correct process sends only single $\Precommit$ message in every round. Similarly as with the $\Prevote$ message, the algorithm relies on the $state$ variable to prevent correct processes from sending more than one $\Precommit$ message in a single round.
\subsection{Unlocking mechanism}
\label{sec:unlocking}
Initially, a correct process does not lock any value. During the algorithm execution, processes might end up locking different values in different rounds. As we discussed in the locking section (section~\ref{sec:locking}), a correct process that has locked some value will vote
only for that value at the beginning of the locking phase. With correct processes voting for different values during the locking phase, it is not possible to reach an agreement what value should be locked, and later decided. Without unlocking mechanism in place, the algorithm could loop forever in that state, and no decision will be reached. The unlocking mechanism is therefore essential for the algorithm termination as it allows processes to release locks and try to converge to a common value in the following rounds. In Algorithm~\ref{alg:tendermint}, a correct process $p$ unlocks a value by setting $lockedValue_p$ to $\nil$ and $lockedRound_p$ to $-1$.
A correct process $p$ releases its locked value if he observes that some correct process might\footnote{We say that other process \emph{might} have locked different value as we do not get a direct message from that process claiming that he locked a value with the corresponding proof of locking. We infer this information by observing $\Prevote$ messages.} have locked a different value in the round $r$ higher than the current $lockedRound_p$ ($r > lockedRound_p$). This is the case if a process $p$ receives at least $2f+1$ $\Prevote$ messages with value $v' \neq lockedValue_p$ and $round > lockedRound_p$ (see the rule starting at line~\ref{line:tab:unlockRule}). The process also implicitly releases a lock on its current value by locking a different value at lines~\ref{line:tab:setLockedValue} and \ref{line:tab:setLockedRound}.
The unlocking mechanism, together with the underlying gossip layer, ensures the following properties:
\begin{itemize}
\item (i) if a correct process $p$ locks a value $v$ in a round $r$ during a good period (period of synchrony), then at the end of the round $r$ all correct processes will lock either $v$ or they will not lock any value
\item (ii) if a correct process $p$ locks a value $v$ in a round $r$, then every correct process $q$ that has locked a different value in some of the previous rounds, i.e., $lockedValue_q \neq v$ and $lockedRound_q < r$, will eventually unlock it's value.
\end{itemize}
From the property (ii) follows that after a sufficiently big good period in which no correct process manages to lock a value, all correct processes will lock either some common value $v$ or they will not lock any value (where $v$ is the value locked in the highest round $r$ by some correct process). This follows from the fact that processes keep resending messages at the gossip layer, so during good periods, $\Prevote$ messages that led to the correct process locking the value $v$ in the round $r$ are received by all correct processes. Therefore, they either stay with $v$ as its locked value, or unlock its current value.
Both properties ensure that during a sufficiently long good period, there will eventually be a round, such that at the beginning of this round, all correct processes will lock the same value or they will not lock any value. As we will explain in the following section (\ref{sec:findRighValue}), this is one of the essential mechanisms for ensuring termination of the Tendermint consensus algorithm.
\subsection{Finding the right value to propose}
\label{sec:findRighValue}
As we discussed in section~\ref{sec:locking}, a correct process in Tendermint votes always for its locked value (if it exists) or for the \emph{valid} value suggested by the coordinator of the current round. In order to reach a decision in some round $r$ it is necessary that all correct processes vote for the same value in that round (we do not depend on faulty processes for termination). Then correct processes can lock that value, and later decide. Tendermint relies on the coordinator to ensure that processes that have not locked any value, vote for the same value. As a correct process always votes for its locked value, then several conditions need to be fulfilled just before the round $r$ starts so we can achieve such preferable scenario:
\begin{enumerate}
\item if a correct process $p$ has locked some value $v$ at the beginning of the round $r$, then all other correct processes have either locked $v$ or they have not locked any value. \label{enum:termination:v-is-locked}
\item the proposer is a correct process, and a value suggested by the proposer is $v$. \label{enum:termination:v-is-proposed}
\end{enumerate}
In case those conditions hold, all correct processes will lock $v$ in the round $r$ and later decide.
The existence of a round where the condition \ref{enum:termination:v-is-locked}) holds, follows from the unlocking mechanism. The first part of the condition \ref{enum:termination:v-is-proposed}) follows from the fact that $\coord$ function eventually provides a correct proposer. The second part of the condition \ref{enum:termination:v-is-proposed}) is more tricky as the proposer needs to suggest exactly the value that is locked by correct processes. If the proposer has locked $v$ at the beginning of the round $r$, then it trivially follows from the condition \ref{enum:termination:v-is-locked}). Otherwise, we need an additional mechanism that ensures that the correct proposer suggests $v$.
In order to ensure this condition, Tendermint has an additional mechanism for determining what is the good value to suggest such that this condition holds. It is implemented using variables $validValue$ and $validRound$. Using these two variables every process tracks the highest round ($validRound$) in which correct process might have locked some value ($validValue$). So whenever process observes at least $2f+1$ $\Prevote$ messages with $round > validRound_p$ (see the rule at line~\ref{line:tab:validValueRule}) for some value $v \neq \nil$,
it updates $validValue$ and $validRound$ variables.
This mechanism, together with the underlying gossip layer, ensures the following properties:
\begin{itemize}
\item (i) if a correct process $p$ locks a value $v$ in a round $r$ during a good period (period of synchrony), then at the end of the round $r$ all correct processes will have $validValue$ equal to $v$ and $validRound$ equal to $r$.
\item (ii) if no correct process $p$ locks some value during a sufficiently long good period, then all correct processes will eventually have $validValue$ equal to the same value $v$.
\end{itemize}
If during a good period, no correct process locks a value during a sufficiently long period (so the property i) does not hold), then
all correct processes will have $validValue$ equal to $v$ due to the following reasoning: During this time period, all correct processes will receive delayed $\li{Prevote}$ messages from previous rounds (because of resending messages at the gossip layer). Now let denote with $r$ and $v$, a round and a value, such that there are at least $2f+1$ $\li{Prevote}$ messages with $r$ and $v$, and $r$ is the highest among such $(*, r)$ pairs. Then all correct processes will receive those messages and set $validValue$ to $v$.
In both cases, during sufficiently long good period, there will eventually be a round, such that at the beginning of this round, all correct processes will have $validValue$ equal to the same value; and in case some correct process has locked some value $v$, then the $validValue$ will be equal to $v$.
The unlocking mechanism and the mechanism for determining the right proposal value ensures that during a period of synchrony, there will be a round $r$ such that (i) all correct processes either lock the same value $v$ or they do not lock any value, and (ii) the $validValue$ is equal to $v$ at all correct processes. If a proposer of such round $r$ is correct, then he proposes $v$, all correct processes vote for $v$ during the locking phase; therefore they lock $v$, send a $\Precommit$ message with $v$ and later decide. The formal proof of termination is provided in section~\ref{sec:proof}.

IEEEtran.bst → consensus/IEEEtran.bst View File


IEEEtran.cls → consensus/IEEEtran.cls View File


+ 25
- 0
consensus/README.md View File

@ -0,0 +1,25 @@
# Tendermint-spec
The repository contains the specification (and the proofs) of the Tendermint
consensus protocol.
## How to install Latex on Mac OS
MacTex is Latex distribution for Mac OS. You can download it [here](http://www.tug.org/mactex/mactex-download.html).
Popular IDE for Latex-based projects is TexStudio. It can be downloaded
[here](https://www.texstudio.org/).
## How to build project
In order to compile the latex files (and write bibliography), execute
`$ pdflatex paper` <br/>
`$ bibtex paper` <br/>
`$ pdflatex paper` <br/>
`$ pdflatex paper` <br/>
The generated file is paper.pdf. You can open it with
`$ open paper.pdf`

algorithmicplus.sty → consensus/algorithmicplus.sty View File


+ 16
- 0
consensus/conclusion.tex View File

@ -0,0 +1,16 @@
\section{Conclusion} \label{sec:conclusion}
We have proposed a new Byzantine-fault tolerant consensus algorithm that is the
core of the Tendermint BFT SMR platform. The algorithm is designed for the wide
area network with high number of mutually distrusted nodes that communicate
over gossip based peer-to-peer network. It has only a single mode of execution
and the communication pattern is very similar to the "normal" case of the
state-of-the art PBFT algorithm. The algorithm ensures termination with a novel
mechanism that takes advantage of the gossip based communication between nodes.
The proposed algorithm and the proofs are simple and elegant, and we believe
that this makes it easier to understand and implement correctly.
\section*{Acknowledgment}
We would like to thank Anton Kaliaev, Ismail Khoffi and Dahlia Malkhi for comments on an earlier version of the paper. We also want to thank Marko Vukolic, Ming Chuan Lin, Maria Potop-Butucaru, Sara Tucci, Antonella Del Pozzo and Yackolley Amoussou-Guenou for pointing out the liveness issues
in the previous version of the algorithm. Finally, we want to thank the Tendermint team members and all project contributors for making Tendermint such a great platform.

+ 397
- 0
consensus/consensus.tex View File

@ -0,0 +1,397 @@
\section{Tendermint consensus algorithm} \label{sec:tendermint}
\newcommand\Disseminate{\textbf{Disseminate}}
\newcommand\Proposal{\mathsf{PROPOSAL}}
\newcommand\ProposalPart{\mathsf{PROPOSAL\mbox{-}PART}}
\newcommand\PrePrepare{\mathsf{INIT}} \newcommand\Prevote{\mathsf{PREVOTE}}
\newcommand\Precommit{\mathsf{PRECOMMIT}}
\newcommand\Decision{\mathsf{DECISION}}
\newcommand\ViewChange{\mathsf{VC}}
\newcommand\ViewChangeAck{\mathsf{VC\mbox{-}ACK}}
\newcommand\NewPrePrepare{\mathsf{VC\mbox{-}INIT}}
\newcommand\coord{\mathsf{proposer}}
\newcommand\newHeight{newHeight} \newcommand\newRound{newRound}
\newcommand\nil{nil} \newcommand\id{id} \newcommand{\propose}{propose}
\newcommand\prevote{prevote} \newcommand\prevoteWait{prevoteWait}
\newcommand\precommit{precommit} \newcommand\precommitWait{precommitWait}
\newcommand\commit{commit}
\newcommand\timeoutPropose{timeoutPropose}
\newcommand\timeoutPrevote{timeoutPrevote}
\newcommand\timeoutPrecommit{timeoutPrecommit}
\newcommand\proofOfLocking{proof\mbox{-}of\mbox{-}locking}
\begin{algorithm}[htb!] \def\baselinestretch{1} \scriptsize\raggedright
\begin{algorithmic}[1]
\SHORTSPACE
\INIT{}
\STATE $h_p := 0$
\COMMENT{current height, or consensus instance we are currently executing}
\STATE $round_p := 0$ \COMMENT{current round number}
\STATE $step_p \in \set{\propose, \prevote, \precommit}$
\STATE $decision_p[] := nil$
\STATE $lockedValue_p := nil$
\STATE $lockedRound_p := -1$
\STATE $validValue_p := nil$
\STATE $validRound_p := -1$
\ENDINIT
\SHORTSPACE
\STATE \textbf{upon} start \textbf{do} $StartRound(0)$
\SHORTSPACE
\FUNCTION{$StartRound(round)$} \label{line:tab:startRound}
\STATE $round_p \assign round$
\STATE $step_p \assign \propose$
\IF{$\coord(h_p, round_p) = p$}
\IF{$validValue_p \neq \nil$} \label{line:tab:isThereLockedValue}
\STATE $proposal \assign validValue_p$ \ELSE \STATE $proposal \assign
getValue()$
\label{line:tab:getValidValue}
\ENDIF
\STATE \Broadcast\ $\li{\Proposal,h_p, round_p, proposal, validRound_p}$
\label{line:tab:send-proposal}
\ELSE
\STATE \textbf{schedule} $OnTimeoutPropose(h_p,
round_p)$ to be executed \textbf{after} $\timeoutPropose(round_p)$
\ENDIF
\ENDFUNCTION
\SPACE
\UPON{$\li{\Proposal,h_p,round_p, v, -1}$ \From\ $\coord(h_p,round_p)$
\With\ $step_p = \propose$} \label{line:tab:recvProposal}
\IF{$valid(v) \wedge (lockedRound_p = -1 \vee lockedValue_p = v$)}
\label{line:tab:accept-proposal-2}
\STATE \Broadcast \ $\li{\Prevote,h_p,round_p,id(v)}$
\label{line:tab:prevote-proposal}
\ELSE
\label{line:tab:acceptProposal1}
\STATE \Broadcast \ $\li{\Prevote,h_p,round_p,\nil}$
\label{line:tab:prevote-nil}
\ENDIF
\STATE $step_p \assign \prevote$ \label{line:tab:setStateToPrevote1}
\ENDUPON
\SPACE
\UPON{$\li{\Proposal,h_p,round_p, v, vr}$ \From\ $\coord(h_p,round_p)$
\textbf{AND} $2f+1$ $\li{\Prevote,h_p, vr,id(v)}$ \With\ $step_p = \propose \wedge (vr \ge 0 \wedge vr < round_p)$}
\label{line:tab:acceptProposal}
\IF{$valid(v) \wedge (lockedRound_p \le vr
\vee lockedValue_p = v)$} \label{line:tab:cond-prevote-higher-proposal}
\STATE \Broadcast \ $\li{\Prevote,h_p,round_p,id(v)}$
\label{line:tab:prevote-higher-proposal}
\ELSE
\label{line:tab:acceptProposal2}
\STATE \Broadcast \ $\li{\Prevote,h_p,round_p,\nil}$
\label{line:tab:prevote-nil2}
\ENDIF
\STATE $step_p \assign \prevote$ \label{line:tab:setStateToPrevote3}
\ENDUPON
\SPACE
\UPON{$2f+1$ $\li{\Prevote,h_p, round_p,*}$ \With\ $step_p = \prevote$ for the first time}
\label{line:tab:recvAny2/3Prevote}
\STATE \textbf{schedule} $OnTimeoutPrevote(h_p, round_p)$ to be executed \textbf{after} $\timeoutPrevote(round_p)$ \label{line:tab:timeoutPrevote}
\ENDUPON
\SPACE
\UPON{$\li{\Proposal,h_p,round_p, v, *}$ \From\ $\coord(h_p,round_p)$
\textbf{AND} $2f+1$ $\li{\Prevote,h_p, round_p,id(v)}$ \With\ $valid(v) \wedge step_p \ge \prevote$ for the first time}
\label{line:tab:recvPrevote}
\IF{$step_p = \prevote$}
\STATE $lockedValue_p \assign v$ \label{line:tab:setLockedValue}
\STATE $lockedRound_p \assign round_p$ \label{line:tab:setLockedRound}
\STATE \Broadcast \ $\li{\Precommit,h_p,round_p,id(v))}$
\label{line:tab:precommit-v}
\STATE $step_p \assign \precommit$ \label{line:tab:setStateToCommit}
\ENDIF
\STATE $validValue_p \assign v$ \label{line:tab:setValidRound}
\STATE $validRound_p \assign round_p$ \label{line:tab:setValidValue}
\ENDUPON
\SHORTSPACE
\UPON{$2f+1$ $\li{\Prevote,h_p,round_p, \nil}$
\With\ $step_p = \prevote$}
\STATE \Broadcast \ $\li{\Precommit,h_p,round_p, \nil}$
\label{line:tab:precommit-v-1}
\STATE $step_p \assign \precommit$
\ENDUPON
\SPACE
\UPON{$2f+1$ $\li{\Precommit,h_p,round_p,*}$ for the first time}
\label{line:tab:startTimeoutPrecommit}
\STATE \textbf{schedule} $OnTimeoutPrecommit(h_p, round_p)$ to be executed \textbf{after} $\timeoutPrecommit(round_p)$
\ENDUPON
\SPACE
\UPON{$\li{\Proposal,h_p,r, v, *}$ \From\ $\coord(h_p,r)$ \textbf{AND}
$2f+1$ $\li{\Precommit,h_p,r,id(v)}$ \With\ $decision_p[h_p] = \nil$}
\label{line:tab:onDecideRule}
\IF{$valid(v)$} \label{line:tab:validDecisionValue}
\STATE $decision_p[h_p] = v$ \label{line:tab:decide}
\STATE$h_p \assign h_p + 1$ \label{line:tab:increaseHeight}
\STATE reset $lockedRound_p$, $lockedValue_p$, $validRound_p$ and $validValue_p$ to initial values
and empty message log
\STATE $StartRound(0)$
\ENDIF
\ENDUPON
\SHORTSPACE
\UPON{$f+1$ $\li{*,h_p,round, *, *}$ \textbf{with} $round > round_p$}
\label{line:tab:skipRounds}
\STATE $StartRound(round)$ \label{line:tab:nextRound2}
\ENDUPON
\SHORTSPACE
\FUNCTION{$OnTimeoutPropose(height,round)$} \label{line:tab:onTimeoutPropose}
\IF{$height = h_p \wedge round = round_p \wedge step_p = \propose$}
\STATE \Broadcast \ $\li{\Prevote,h_p,round_p, \nil}$
\label{line:tab:prevote-nil-on-timeout}
\STATE $step_p \assign \prevote$
\ENDIF
\ENDFUNCTION
\SHORTSPACE
\FUNCTION{$OnTimeoutPrevote(height,round)$} \label{line:tab:onTimeoutPrevote}
\IF{$height = h_p \wedge round = round_p \wedge step_p = \prevote$}
\STATE \Broadcast \ $\li{\Precommit,h_p,round_p,\nil}$
\label{line:tab:precommit-nil-onTimeout}
\STATE $step_p \assign \precommit$
\ENDIF
\ENDFUNCTION
\SHORTSPACE
\FUNCTION{$OnTimeoutPrecommit(height,round)$} \label{line:tab:onTimeoutPrecommit}
\IF{$height = h_p \wedge round = round_p$}
\STATE $StartRound(round_p + 1)$ \label{line:tab:nextRound}
\ENDIF
\ENDFUNCTION
\end{algorithmic} \caption{Tendermint consensus algorithm}
\label{alg:tendermint}
\end{algorithm}
In this section we present the Tendermint Byzantine fault-tolerant consensus
algorithm. The algorithm is specified by the pseudo-code shown in
Algorithm~\ref{alg:tendermint}. We present the algorithm as a set of \emph{upon
rules} that are executed atomically\footnote{In case several rules are active
at the same time, the first rule to be executed is picked randomly. The
correctness of the algorithm does not depend on the order in which rules are
executed.}. We assume that processes exchange protocol messages using a gossip
protocol and that both sent and received messages are stored in a local message
log for every process. An upon rule is triggered once the message log contains
messages such that the corresponding condition evaluates to $\tt{true}$. The
condition that assumes reception of $X$ messages of a particular type and
content denotes reception of messages whose senders have aggregate voting power at
least equal to $X$. For example, the condition $2f+1$ $\li{\Precommit,h_p,r,id(v)}$,
evaluates to true upon reception of $\Precommit$ messages for height $h_p$,
a round $r$ and with value equal to $id(v)$ whose senders have aggregate voting
power at least equal to $2f+1$. Some of the rules ends with "for the first time" constraint
to denote that it is triggered only the first time a corresponding condition evaluates
to $\tt{true}$. This is because those rules do not always change the state of algorithm
variables so without this constraint, the algorithm could keep
executing those rules forever. The variables with index $p$ are process local state
variables, while variables without index $p$ are value placeholders. The sign
$*$ denotes any value.
We denote with $n$ the total voting power of processes in the system, and we
assume that the total voting power of faulty processes in the system is bounded
with a system parameter $f$. The algorithm assumes that $n > 3f$, i.e., it
requires that the total voting power of faulty processes is smaller than one
third of the total voting power. For simplicity we present the algorithm for
the case $n = 3f + 1$.
The algorithm proceeds in rounds, where each round has a dedicated
\emph{proposer}. The mapping of rounds to proposers is known to all processes
and is given as a function $\coord(h, round)$, returning the proposer for
the round $round$ in the consensus instance $h$. We
assume that the proposer selection function is weighted round-robin, where
processes are rotated proportional to their voting power\footnote{A validator
with more voting power is selected more frequently, proportional to its power.
More precisely, during a sequence of rounds of size $n$, every process is
proposer in a number of rounds equal to its voting power.}.
The internal protocol state transitions are triggered by message reception and
by expiration of timeouts. There are three timeouts in Algorithm \ref{alg:tendermint}:
$\timeoutPropose$, $\timeoutPrevote$ and $\timeoutPrecommit$.
The timeouts prevent the algorithm from blocking and
waiting forever for some condition to be true, ensure that processes continuously
transition between rounds, and guarantee that eventually (after GST) communication
between correct processes is timely and reliable so they can decide.
The last role is achieved by increasing the timeouts with every new round $r$,
i.e, $timeoutX(r) = initTimeoutX + r*timeoutDelta$;
they are reset for every new height (consensus
instance).
Processes exchange the following messages in Tendermint: $\Proposal$,
$\Prevote$ and $\Precommit$. The $\Proposal$ message is used by the proposer of
the current round to suggest a potential decision value, while $\Prevote$ and
$\Precommit$ are votes for a proposed value. According to the classification of
consensus algorithms from \cite{RMS10:dsn}, Tendermint, like PBFT
\cite{CL02:tcs} and DLS \cite{DLS88:jacm}, belongs to class 3, so it requires
two voting steps (three communication exchanges in total) to decide a value.
The Tendermint consensus algorithm is designed for the blockchain context where
the value to decide is a block of transactions (ie. it is potentially quite
large, consisting of many transactions). Therefore, in the Algorithm
\ref{alg:tendermint} (similar as in \cite{CL02:tcs}) we are explicit about
sending a value (block of transactions) and a small, constant size value id (a
unique value identifier, normally a hash of the value, i.e., if $\id(v) =
\id(v')$, then $v=v'$). The $\Proposal$ message is the only one carrying the
value; $\Prevote$ and $\Precommit$ messages carry the value id. A correct
process decides on a value $v$ in Tendermint upon receiving the $\Proposal$ for
$v$ and $2f+1$ voting-power equivalent $\Precommit$ messages for $\id(v)$ in
some round $r$. In order to send $\Precommit$ message for $v$ in a round $r$, a
correct process waits to receive the $\Proposal$ and $2f+1$ of the
corresponding $\Prevote$ messages in the round $r$. Otherwise,
it sends $\Precommit$ message with a special $\nil$ value.
This ensures that correct processes can $\Precommit$ only a
single value (or $\nil$) in a round. As
proposers may be faulty, the proposed value is treated by correct processes as
a suggestion (it is not blindly accepted), and a correct process tells others
if it accepted the $\Proposal$ for value $v$ by sending $\Prevote$ message for
$\id(v)$; otherwise it sends $\Prevote$ message with the special $\nil$ value.
Every process maintains the following variables in the Algorithm
\ref{alg:tendermint}: $step$, $lockedValue$, $lockedRound$, $validValue$ and
$validRound$. The $step$ denotes the current state of the internal Tendermint
state machine, i.e., it reflects the stage of the algorithm execution in the
current round. The $lockedValue$ stores the most recent value (with respect to
a round number) for which a $\Precommit$ message has been sent. The
$lockedRound$ is the last round in which the process sent a $\Precommit$
message that is not $\nil$. We also say that a correct process locks a value
$v$ in a round $r$ by setting $lockedValue = v$ and $lockedRound = r$ before
sending $\Precommit$ message for $\id(v)$. As a correct process can decide a
value $v$ only if $2f+1$ $\Precommit$ messages for $\id(v)$ are received, this
implies that a possible decision value is a value that is locked by at least
$f+1$ voting power equivalent of correct processes. Therefore, any value $v$
for which $\Proposal$ and $2f+1$ of the corresponding $\Prevote$ messages are
received in some round $r$ is a \emph{possible decision} value. The role of the
$validValue$ variable is to store the most recent possible decision value; the
$validRound$ is the last round in which $validValue$ is updated. Apart from
those variables, a process also stores the current consensus instance ($h_p$,
called \emph{height} in Tendermint), and the current round number ($round_p$)
and attaches them to every message. Finally, a process also stores an array of
decisions, $decision_p$ (Tendermint assumes a sequence of consensus instances,
one for each height).
Every round starts by a proposer suggesting a value with the $\Proposal$
message (see line \ref{line:tab:send-proposal}). In the initial round of each
height, the proposer is free to chose the value to suggest. In the
Algorithm~\ref{alg:tendermint}, a correct process obtains a value to propose
using an external function $getValue()$ that returns a valid value to
propose. In the following rounds, a correct proposer will suggest a new value
only if $validValue = \nil$; otherwise $validValue$ is proposed (see
lines~\ref{line:tab:isThereLockedValue}-\ref{line:tab:getValidValue}).
In addition to the value proposed, the $\Proposal$ message also
contains the $validRound$ so other processes are informed about the last round
in which the proposer observed $validValue$ as a possible decision value.
Note that if a correct proposer $p$ sends $validValue$ with the $validRound$ in the
$\Proposal$, this implies that the process $p$ received $\Proposal$ and the
corresponding $2f+1$ $\Prevote$ messages for $validValue$ in the round
$validRound$.
If a correct process sends $\Proposal$ message with $validValue$ ($validRound > -1$)
at time $t > GST$, by the \emph{Gossip communication} property, the
corresponding $\Proposal$ and the $\Prevote$ messages will be received by all
correct processes before time $t+\Delta$. Therefore, all correct processes will
be able to verify the correctness of the suggested value as it is supported by
the $\Proposal$ and the corresponding $2f+1$ voting power equivalent $\Prevote$
messages.
A correct process $p$ accepts the proposal for a value $v$ (send $\Prevote$
for $id(v)$) if an external \emph{valid} function returns $true$ for the value
$v$, and if $p$ hasn't locked any value ($lockedRound = -1$) or $p$ has locked
the value $v$ ($lockedValue = v$); see the line
\ref{line:tab:accept-proposal-2}. In case the proposed pair is $(v,vr \ge 0)$ and a
correct process $p$ has locked some value, it will accept
$v$ if it is a more recent possible decision value\footnote{As
explained above, the possible decision value in a round $r$ is the one for
which $\Proposal$ and the corresponding $2f+1$ $\Prevote$ messages are received
for the round $r$.}, $vr > lockedRound_p$, or if $lockedValue = v$
(see line~\ref{line:tab:cond-prevote-higher-proposal}). Otherwise, a correct
process will reject the proposal by sending $\Prevote$ message with $\nil$
value. A correct process will send $\Prevote$ message with $\nil$ value also in
case $\timeoutPropose$ expired (it is triggered when a correct process starts a
new round) and a process has not sent $\Prevote$ message in the current round
yet (see the line \ref{line:tab:onTimeoutPropose}).
If a correct process receives $\Proposal$ message for some value $v$ and $2f+1$
$\Prevote$ messages for $\id(v)$, then it sends $\Precommit$ message with
$\id(v)$. Otherwise, it sends $\Precommit$ $\nil$. A correct process will send
$\Precommit$ message with $\nil$ value also in case $\timeoutPrevote$ expired
(it is started when a correct process sent $\Prevote$ message and received any
$2f+1$ $\Prevote$ messages) and a process has not sent $\Precommit$ message in
the current round yet (see the line \ref{line:tab:onTimeoutPrecommit}). A
correct process decides on some value $v$ if it receives in some round $r$
$\Proposal$ message for $v$ and $2f+1$ $\Precommit$ messages with $\id(v)$ (see
the line \ref{line:tab:decide}). To prevent the algorithm from blocking and
waiting forever for this condition to be true, the Algorithm
\ref{alg:tendermint} relies on $\timeoutPrecommit$. It is triggered after a
process receives any set of $2f+1$ $\Precommit$ messages for the current round.
If the $\timeoutPrecommit$ expires and a process has not decided yet, the
process starts the next round (see the line \ref{line:tab:onTimeoutPrecommit}).
When a correct process $p$ decides, it starts the next consensus instance
(for the next height). The \emph{Gossip communication} property ensures
that $\Proposal$ and $2f+1$ $\Prevote$ messages that led $p$ to decide
are eventually received by all correct processes, so they will also decide.
\subsection{Termination mechanism}
Tendermint ensures termination by a novel mechanism that benefits from the
gossip based nature of communication (see \emph{Gossip communication}
property). It requires managing two additional variables, $validValue$ and
$validRound$ that are then used by the proposer during the propose step as
explained above. The $validValue$ and $validRound$ are updated to $v$ and $r$
by a correct process in a round $r$ when the process receives valid $\Proposal$
message for the value $v$ and the corresponding $2f+1$ $\Prevote$ messages for
$id(v)$ in the round $r$ (see the rule at line~\ref{line:tab:recvPrevote}).
We now give briefly the intuition how managing and proposing $validValue$
and $validRound$ ensures termination. Formal treatment is left for
Section~\ref{sec:proof}.
The first thing to note is that during good period, because of the
\emph{Gossip communication} property, if a correct process $p$ locks a value
$v$ in some round $r$, all correct processes will update $validValue$ to $v$
and $validRound$ to $r$ before the end of the round $r$ (we prove this formally
in the Section~\ref{sec:proof}). The intuition is that messages that led to $p$
locking a value $v$ in the round $r$ will be gossiped to all correct processes
before the end of the round $r$, so it will update $validValue$ and
$validRound$ (the line~\ref{line:tab:recvPrevote}). Therefore, if a correct
process locks some value during good period, $validValue$ and $validRound$ are
updated by all correct processes so that the value proposed in the following
rounds will be acceptable by all correct processes. Note
that it could happen that during good period, no correct process locks a value,
but some correct process $q$ updates $validValue$ and $validRound$ during some
round. As no correct process locks a value in this case, $validValue_q$ and
$validRound_q$ will also be acceptable by all correct processes as
$validRound_q > lockedRound_c$ for every correct process $c$ and as the
\emph{Gossip communication} property ensures that the corresponding $\Prevote$
messages that $q$ received in the round $validRound_q$ are received by all
correct processes $\Delta$ time later.
Finally, it could happen that after GST, there is a long sequence of rounds in which
no correct process neither locks a value nor update $validValue$ and $validRound$.
In this case, during this sequence of rounds, the proposed value suggested by correct
processes was not accepted by all correct processes. Note that this sequence of rounds
is always finite as at the beginning of every
round there is at least a single correct process $c$ such that $validValue_c$
and $validRound_c$ are acceptable by every correct process. This is true as
there exists a correct process $c$ such that for every other correct process
$p$, $validRound_c > lockedRound_p$ or $validValue_c = lockedValue_p$. This is
true as $c$ is the process that has locked a value in the most recent round
among all correct processes (or no correct process locked any value). Therefore,
eventually $c$ will be the proper in some round and the proposed value will be accepted
by all correct processes, terminating therefore this sequence of
rounds.
Therefore, updating $validValue$ and $validRound$ variables, and the
\emph{Gossip communication} property, together ensures that eventually, during
the good period, there exists a round with a correct proposer whose proposed
value will be accepted by all correct processes, and all correct processes will
terminate in that round. Note that this mechanism, contrary to the common
termination mechanism illustrated in the
Figure~\ref{ch3:fig:coordinator-change}, does not require exchanging any
additional information in addition to messages already sent as part of what is
normally being called "normal" case.

+ 117
- 0
consensus/definitions.tex View File

@ -0,0 +1,117 @@
\section{Definitions} \label{sec:definitions}
\subsection{Model}
We consider a system of processes that communicate by exchanging messages.
Processes can be correct or faulty, where a faulty process can behave in an
arbitrary way, i.e., Byzantine faults. We assume that each process
has some amount of voting power (voting power of a process can be $0$).
Processes in our model are not part of a single administrative domain;
therefore we cannot enforce a direct network connectivity between all
processes. Instead, we assume that each process is connected to a subset of
processes called peers, such that there is an indirect communication channel
between all correct processes. Communication between processes is established
using a gossip protocol \cite{Dem1987:gossip}.
Formally, we model the network communication using the \emph{partially
synchronous system model}~\cite{DLS88:jacm}: in all executions of the system
there is a bound $\Delta$ and an instant GST (Global Stabilization Time) such
that all communication among correct processes after GST is reliable and
$\Delta$-timely, i.e., if a correct process $p$ sends message $m$ at time $t
\ge GST$ to correct process $q$, then $q$ will receive $m$ before $t +
\Delta$\footnote{Note that as we do not assume direct communication channels
among all correct processes, this implies that before the message $m$
reaches $q$, it might pass through a number of correct processes that will
forward the message $m$ using gossip protocol towards $q$.}. Messages among
correct processes can be delayed, dropped or duplicated before GST.
Spoofing/impersonation attacks are assumed to be impossible at all times due to
the use of public-key cryptography. The bound $\Delta$ and GST are system
parameters whose values are not required to be known for the safety of our
algorithm. Termination of the algorithm is guaranteed within a bounded duration
after GST. In practice, the algorithm will work correctly in the slightly
weaker variant of the model where the system alternates between (long enough)
good periods (corresponds to the \emph{after} GST period where system is
reliable and $\Delta$-timely) and bad periods (corresponds to the period
\emph{before} GST during which the system is asynchronous and messages can be
lost), but consideration of the GST model simplifies the discussion.
We assume that process steps (which might include sending and receiving
messages) take zero time. Processes are equipped with clocks so they can
measure local timeouts. All protocol messages are signed, i.e., when a correct
process $q$ receives a signed message $m$ from its peer, the process $q$ can
verify who was the original sender of the message $m$.
The details of the Tendermint gossip protocol will be discussed in a separate
technical report. For the sake of this paper it is sufficient to assume that
messages are being gossiped between processes and the following property holds
(in addition to the partial synchrony network assumptions):
\begin{itemize} \item \emph{Gossip communication:} If a correct process $p$
receives some message $m$ at time $t$, all correct processes will receive
$m$ before $max\{t, GST\} + \Delta$. \end{itemize}
%Messages that are being gossiped are created by the consensus layer. We can
%think about consensus protocol as a content creator, which %defines what
%messages should be disseminated using the gossip protocol. A correct
%process creates the message for dissemination either i) %explicitly, by
%invoking \emph{send} function as part of the consensus protocol or ii)
%implicitly, by receiving a message from some other %process. Note that in
%the case ii) gossiping of messages is implicit, i.e., it happens without
%explicit send clause in the consensus algorithm %whenever a correct
%process receives some messages in the consensus algorithm\footnote{If a
%message is received by a correct process at %the consensus level then it
%is considered valid from the protocol point of view, i.e., it has a
%correct signature, a proper message structure %and a valid height and
%round number.}.
%\item Processes keep resending messages (in case of failures or message loss)
%until all its peers get them. This ensures that every message %sent or
%received by a correct process is eventually received by all correct
%processes.
\subsection{State Machine Replication}
State machine replication (SMR) is a general approach for replicating services
modeled as a deterministic state machine~\cite{Lam78:cacm,Sch90:survey}. The
key idea of this approach is to guarantee that all replicas start in the same
state and then apply requests from clients in the same order, thereby
guaranteeing that the replicas' states will not diverge. Following
Schneider~\cite{Sch90:survey}, we note that the following is key for
implementing a replicated state machine tolerant to (Byzantine) faults:
\begin{itemize} \item \emph{Replica Coordination.} All [non-faulty] replicas
receive and process the same sequence of requests. \end{itemize}
Moreover, as Schneider also notes, this property can be decomposed into two
parts, \emph{Agreement} and \emph{Order}: Agreement requires all (non-faulty)
replicas to receive all requests, and Order requires that the order of received
requests is the same at all replicas.
There is an additional requirement that needs to be ensured by Byzantine
tolerant state machine replication: only requests (called transactions in the
Tendermint terminology) proposed by clients are executed. In Tendermint,
transaction verification is the responsibility of the service that is being
replicated; upon receiving a transaction from the client, the Tendermint
process will ask the service if the request is valid, and only valid requests
will be processed.
\subsection{Consensus} \label{sec:consensus}
Tendermint solves state machine replication by sequentially executing consensus
instances to agree on each block of transactions that are
then executed by the service being replicated. We consider a variant of the
Byzantine consensus problem called Validity Predicate-based Byzantine consensus
that is motivated by blockchain systems~\cite{GLR17:red-belly-bc}. The problem
is defined by an agreement, a termination, and a validity property.
\begin{itemize} \item \emph{Agreement:} No two correct processes decide on
different values. \item \emph{Termination:} All correct processes
eventually decide on a value. \item \emph{Validity:} A decided value
is valid, i.e., it satisfies the predefined predicate denoted
\emph{valid()}. \end{itemize}
This variant of the Byzantine consensus problem has an application-specific
\emph{valid()} predicate to indicate whether a value is valid. In the context
of blockchain systems, for example, a value is not valid if it does not
contain an appropriate hash of the last value (block) added to the blockchain.

homodel.sty → consensus/homodel.sty View File


+ 138
- 0
consensus/intro.tex View File

@ -0,0 +1,138 @@
\section{Introduction} \label{sec:tendermint}
Consensus is a fundamental problem in distributed computing. It
is important because of it's role in State Machine Replication (SMR), a generic
approach for replicating services that can be modeled as a deterministic state
machine~\cite{Lam78:cacm, Sch90:survey}. The key idea of this approach is that
service replicas start in the same initial state, and then execute requests
(also called transactions) in the same order; thereby guaranteeing that
replicas stay in sync with each other. The role of consensus in the SMR
approach is ensuring that all replicas receive transactions in the same order.
Traditionally, deployments of SMR based systems are in data-center settings
(local area network), have a small number of replicas (three to seven) and are
typically part of a single administration domain (e.g., Chubby
\cite{Bur:osdi06}); therefore they handle benign (crash) failures only, as more
general forms of failure (in particular, malicious or Byzantine faults) are
considered to occur with only negligible probability.
The success of cryptocurrencies and blockchain systems in recent years (e.g.,
\cite{Nak2012:bitcoin, But2014:ethereum}) pose a whole new set of challenges on
the design and deployment of SMR based systems: reaching agreement over wide
area network, among large number of nodes (hundreds or thousands) that are not
part of the same administrative domain, and where a subset of nodes can behave
maliciously (Byzantine faults). Furthermore, contrary to the previous
data-center deployments where nodes are fully connected to each other, in
blockchain systems, a node is only connected to a subset of other nodes, so
communication is achieved by gossip-based peer-to-peer protocols.
The new requirements demand designs and algorithms that are not necessarily
present in the classical academic literature on Byzantine fault tolerant
consensus (or SMR) systems (e.g., \cite{DLS88:jacm, CL02:tcs}) as the primary
focus was different setup.
In this paper we describe a novel Byzantine-fault tolerant consensus algorithm
that is the core of the BFT SMR platform called Tendermint\footnote{The
Tendermint platform is available open source at
https://github.com/tendermint/tendermint.}. The Tendermint platform consists of
a high-performance BFT SMR implementation written in Go, a flexible interface
for
building arbitrary deterministic applications above the consensus, and a suite
of tools for deployment and management.
The Tendermint consensus algorithm is inspired by the PBFT SMR
algorithm~\cite{CL99:osdi} and the DLS algorithm for authenticated faults (the
Algorithm 2 from \cite{DLS88:jacm}). Similar to DLS algorithm, Tendermint
proceeds in
rounds\footnote{Tendermint is not presented in the basic round model of
\cite{DLS88:jacm}. Furthermore, we use the term round differently than in
\cite{DLS88:jacm}; in Tendermint a round denotes a sequence of communication
steps instead of a single communication step in \cite{DLS88:jacm}.}, where each
round has a dedicated proposer (also called coordinator or
leader) and a process proceeds to a new round as part of normal
processing (not only in case the proposer is faulty or suspected as being faulty
by enough processes as in PBFT).
The communication pattern of each round is very similar to the "normal" case
of PBFT. Therefore, in preferable conditions (correct proposer, timely and
reliable communication between correct processes), Tendermint decides in three
communication steps (the same as PBFT).
The major novelty and contribution of the Tendermint consensus algorithm is a
new termination mechanism. As explained in \cite{MHS09:opodis, RMS10:dsn}, the
existing BFT consensus (and SMR) algorithms for the partially synchronous
system model (for example PBFT~\cite{CL99:osdi}, \cite{DLS88:jacm},
\cite{MA06:tdsc}) typically relies on the communication pattern illustrated in
Figure~\ref{ch3:fig:coordinator-change} for termination. The
Figure~\ref{ch3:fig:coordinator-change} illustrates messages exchanged during
the proposer change when processes start a new round\footnote{There is no
consistent terminology in the distributed computing terminology on naming
sequence of communication steps that corresponds to a logical unit. It is
sometimes called a round, phase or a view.}. It guarantees that eventually (ie.
after some Global Stabilization Time, GST), there exists a round with a correct
proposer that will bring the system into a univalent configuration.
Intuitively, in a round in which the proposed value is accepted
by all correct processes, and communication between correct processes is
timely and reliable, all correct processes decide.
\begin{figure}[tbh!] \def\rdstretch{5} \def\ystretch{3} \centering
\begin{rounddiag}{4}{2} \round{1}{~} \rdmessage{1}{1}{$v_1$}
\rdmessage{2}{1}{$v_2$} \rdmessage{3}{1}{$v_3$} \rdmessage{4}{1}{$v_4$}
\round{2}{~} \rdmessage{1}{1}{$x, [v_{1..4}]$}
\rdmessage{1}{2}{$~~~~~~x, [v_{1..4}]$} \rdmessage{1}{3}{$~~~~~~~~x,
[v_{1..4}]$} \rdmessage{1}{4}{$~~~~~~~x, [v_{1..4}]$} \end{rounddiag}
\vspace{-5mm} \caption{\boldmath Proposer (coordinator) change: $p_1$ is the
new proposer.} \label{ch3:fig:coordinator-change} \end{figure}
To ensure that a proposed value is accepted by all correct
processes\footnote{The proposed value is not blindly accepted by correct
processes in BFT algorithms. A correct process always verifies if the proposed
value is safe to be accepted so that safety properties of consensus are not
violated.}
a proposer will 1) build the global state by receiving messages from other
processes, 2) select the safe value to propose and 3) send the selected value
together with the signed messages
received in the first step to support it. The
value $v_i$ that a correct process sends to the next proposer normally
corresponds to a value the process considers as acceptable for a decision:
\begin{itemize} \item in PBFT~\cite{CL99:osdi} and DLS~\cite{DLS88:jacm} it is
not the value itself but a set of $2f+1$ signed messages with the same
value id, \item in Fast Byzantine Paxos~\cite{MA06:tdsc} the value
itself is being sent. \end{itemize}
In both cases, using this mechanism in our system model (ie. high
number of nodes over gossip based network) would have high communication
complexity that increases with the number of processes: in the first case as
the message sent depends on the total number of processes, and in the second
case as the value (block of transactions) is sent by each process. The set of
messages received in the first step are normally piggybacked on the proposal
message (in the Figure~\ref{ch3:fig:coordinator-change} denoted with
$[v_{1..4}]$) to justify the choice of the selected value $x$. Note that
sending this message also does not scale with the number of processes in the
system.
We designed a novel termination mechanism for Tendermint that better suits the
system model we consider. It does not require additional communication (neither
sending new messages nor piggybacking information on the existing messages) and
it is fully based on the communication pattern that is very similar to the
normal case in PBFT \cite{CL99:osdi}. Therefore, there is only a single mode of
execution in Tendermint, i.e., there is no separation between the normal and
the recovery mode, which is the case in other PBFT-like protocols (e.g.,
\cite{CL99:osdi}, \cite{Ver09:spinning} or \cite{Cle09:aardvark}). We believe
this makes Tendermint simpler to understand and implement correctly.
Note that the orthogonal approach for reducing message complexity in order to
improve
scalability and decentralization (number of processes) of BFT consensus
algorithms is using advanced cryptography (for example Boneh-Lynn-Shacham (BLS)
signatures \cite{BLS2001:crypto}) as done for example in SBFT
\cite{Gue2018:sbft}.
The remainder of the paper is as follows: Section~\ref{sec:definitions} defines
the system model and gives the problem definitions. Tendermint
consensus algorithm is presented in Section~\ref{sec:tendermint} and the
proofs are given in Section~\ref{sec:proof}. We conclude in
Section~\ref{sec:conclusion}.

latex8.bst → consensus/latex8.bst View File


latex8.sty → consensus/latex8.sty View File


lit.bib → consensus/lit.bib View File


paper.tex → consensus/paper.tex View File


+ 280
- 0
consensus/proof.tex View File

@ -0,0 +1,280 @@
\section{Proof of Tendermint consensus algorithm} \label{sec:proof}
\begin{lemma} \label{lemma:majority-intersection} For all $f\geq 0$, any two
sets of processes with voting power at least equal to $2f+1$ have at least one
correct process in common. \end{lemma}
\begin{proof} As the total voting power is equal to $n=3f+1$, we have $2(2f+1)
= n+f+1$. This means that the intersection of two sets with the voting
power equal to $2f+1$ contains at least $f+1$ voting power in common, \ie,
at least one correct process (as the total voting power of faulty processes
is $f$). The result follows directly from this. \end{proof}
\begin{lemma} \label{lemma:locked-decision_value-prevote-v} If $f+1$ correct
processes lock value $v$ in round $r_0$ ($lockedValue = v$ and $lockedRound =
r_0$), then in all rounds $r > r_0$, they send $\Prevote$ for $id(v)$ or
$\nil$. \end{lemma}
\begin{proof} We prove the result by induction on $r$.
\emph{Base step $r = r_0 + 1:$} Let's denote with $C$ the set of correct
processes with voting power equal to $f+1$. By the rules at
line~\ref{line:tab:recvProposal} and line~\ref{line:tab:acceptProposal}, the
processes from the set $C$ can't accept $\Proposal$ for any value different
from $v$ in round $r$, and therefore can't send a $\li{\Prevote,height_p,
r,id(v')}$ message, if $v' \neq v$. Therefore, the Lemma holds for the base
step.
\emph{Induction step from $r_1$ to $r_1+1$:} We assume that no process from the
set $C$ has sent $\Prevote$ for values different than $id(v)$ or $\nil$ until
round $r_1 + 1$. We now prove that the Lemma also holds for round $r_1 + 1$. As
processes from the set $C$ send $\Prevote$ for $id(v)$ or $\nil$ in rounds $r_0
\le r \le r_1$, by Lemma~\ref{lemma:majority-intersection} there is no value
$v' \neq v$ for which it is possible to receive $2f+1$ $\Prevote$ messages in
those rounds (i). Therefore, we have for all processes from the set $C$,
$lockedValue = v$ and $lockedRound \ge r_0$. Let's assume by a contradiction
that a process $q$ from the set $C$ sends $\Prevote$ in round $r_1 + 1$ for
value $id(v')$, where $v' \neq v$. This is possible only by
line~\ref{line:tab:prevote-higher-proposal}. Note that this implies that $q$
received $2f+1$ $\li{\Prevote,h_q, r,id(v')}$ messages, where $r > r_0$ and $r
< r_1 +1$ (see line~\ref{line:tab:cond-prevote-higher-proposal}). A
contradiction with (i) and Lemma~\ref{lemma:majority-intersection}.
\end{proof}
\begin{lemma} \label{lemma:agreement} Algorithm~\ref{alg:tendermint} satisfies
Agreement. \end{lemma}
\begin{proof} Let round $r_0$ be the first round of height $h$ such that some
correct process $p$ decides $v$. We now prove that if some correct process
$q$ decides $v'$ in some round $r \ge r_0$, then $v = v'$.
In case $r = r_0$, $q$ has received at least $2f+1$
$\li{\Precommit,h_p,r_0,id(v')}$ messages at line~\ref{line:tab:onDecideRule},
while $p$ has received at least $2f+1$ $\li{\Precommit,h_p,r_0,id(v)}$
messages. By Lemma~\ref{lemma:majority-intersection} two sets of messages of
voting power $2f+1$ intersect in at least one correct process. As a correct
process sends a single $\Precommit$ message in a round, then $v=v'$.
We prove the case $r > r_0$ by contradiction. By the
rule~\ref{line:tab:onDecideRule}, $p$ has received at least $2f+1$ voting-power
equivalent of $\li{\Precommit,h_p,r_0,id(v)}$ messages, i.e., at least $f+1$
voting-power equivalent correct processes have locked value $v$ in round $r_0$ and have
sent those messages (i). Let denote this set of messages with $C$. On the
other side, $q$ has received at least $2f+1$ voting power equivalent of
$\li{\Precommit,h_q, r,id(v')}$ messages. As the voting power of all faulty
processes is at most $f$, some correct process $c$ has sent one of those
messages. By the rule at line~\ref{line:tab:recvPrevote}, $c$ has locked value
$v'$ in round $r$ before sending $\li{\Precommit,h_q, r,id(v')}$. Therefore $c$
has received $2f+1$ $\Prevote$ messages for $id(v')$ in round $r > r_0$ (see
line~\ref{line:tab:recvPrevote}). By Lemma~\ref{lemma:majority-intersection}, a
process from the set $C$ has sent $\Prevote$ message for $id(v')$ in round $r$.
A contradiction with (i) and Lemma~\ref{lemma:locked-decision_value-prevote-v}.
\end{proof}
\begin{lemma} \label{lemma:agreement} Algorithm~\ref{alg:tendermint} satisfies
Validity. \end{lemma}
\begin{proof} Trivially follows from the rule at line
\ref{line:tab:validDecisionValue} which ensures that only valid values can be
decided. \end{proof}
\begin{lemma} \label{lemma:round-synchronisation} If we assume that:
\begin{enumerate}
\item a correct process $p$ is the first correct process to
enter a round $r>0$ at time $t > GST$ (for every correct process
$c$, $round_c \le r$ at time $t$)
\item the proposer of round $r$ is
a correct process $q$
\item for every correct process $c$,
$lockedRound_c \le validRound_q$ at time $t$
\item $\timeoutPropose(r)
> 2\Delta + \timeoutPrecommit(r-1)$, $\timeoutPrevote(r) > 2\Delta$ and
$\timeoutPrecommit(r) > 2\Delta$,
\end{enumerate}
then all correct processes decide in round $r$ before $t + 4\Delta +
\timeoutPrecommit(r-1)$.
\end{lemma}
\begin{proof} As $p$ is the first correct process to enter round $r$, it
executed the line~\ref{line:tab:nextRound} after $\timeoutPrecommit(r-1)$
expired. Therefore, $p$ received $2f+1$ $\Precommit$ messages in the round
$r-1$ before time $t$. By the \emph{Gossip communication} property, all
correct processes will receive those messages the latest at time $t +
\Delta$. Correct processes that are in rounds $< r-1$ at time $t$ will
enter round $r-1$ (see the rule at line~\ref{line:tab:nextRound2}) and
trigger $\timeoutPrecommit(r-1)$ (see rule~\ref{line:tab:startTimeoutPrecommit})
by time $t+\Delta$. Therefore, all correct processes will start round $r$
by time $t+\Delta+\timeoutPrecommit(r-1)$ (i).
In the worst case, the process $q$ is the last correct process to enter round
$r$, so $q$ starts round $r$ and sends $\Proposal$ message for some value $v$
at time $t + \Delta + \timeoutPrecommit(r-1)$. Therefore, all correct processes
receive the $\Proposal$ message from $q$ the latest by time $t + 2\Delta +
\timeoutPrecommit(r-1)$. Therefore, if $\timeoutPropose(r) > 2\Delta +
\timeoutPrecommit(r-1)$, all correct processes will receive $\Proposal$ message
before $\timeoutPropose(r)$ expires.
By (3) and the rules at line~\ref{line:tab:recvProposal} and
\ref{line:tab:acceptProposal}, all correct processes will accept the
$\Proposal$ message for value $v$ and will send a $\Prevote$ message for
$id(v)$ by time $t + 2\Delta + \timeoutPrecommit(r-1)$. Note that by the
\emph{Gossip communication} property, the $\Prevote$ messages needed to trigger
the rule at line~\ref{line:tab:acceptProposal} are received before time $t +
\Delta$.
By time $t + 3\Delta + \timeoutPrecommit(r-1)$, all correct processes will receive
$\Proposal$ for $v$ and $2f+1$ corresponding $\Prevote$ messages for $id(v)$.
By the rule at line~\ref{line:tab:recvPrevote}, all correct processes will send
a $\Precommit$ message (see line~\ref{line:tab:precommit-v}) for $id(v)$ by
time $t + 3\Delta + \timeoutPrecommit(r-1)$. Therefore, by time $t + 4\Delta +
\timeoutPrecommit(r-1)$, all correct processes will have received the $\Proposal$
for $v$ and $2f+1$ $\Precommit$ messages for $id(v)$, so they decide at
line~\ref{line:tab:decide} on $v$.
This scenario holds if every correct process $q$ sends a $\Precommit$ message
before $\timeoutPrevote(r)$ expires, and if $\timeoutPrecommit(r)$ does not expire
before $t + 4\Delta + \timeoutPrecommit(r-1)$. Let's assume that a correct process
$c_1$ is the first correct process to trigger $\timeoutPrevote(r)$ (see the rule
at line~\ref{line:tab:recvAny2/3Prevote}) at time $t_1 > t$. This implies that
before time $t_1$, $c_1$ received a $\Proposal$ ($step_{c_1}$ must be
$\prevote$ by the rule at line~\ref{line:tab:recvAny2/3Prevote}) and a set of
$2f+1$ $\Prevote$ messages. By time $t_1 + \Delta$, all correct processes will
receive those messages. Note that even if some correct process was in the
smaller round before time $t_1$, at time $t_1 + \Delta$ it will start round $r$
after receiving those messages (see the rule at
line~\ref{line:tab:skipRounds}). Therefore, all correct processes will send
their $\Prevote$ message for $id(v)$ by time $t_1 + \Delta$, and all correct
processes will receive those messages the by time $t_1 + 2\Delta$. Therefore,
as $\timeoutPrevote(r) > 2\Delta$, this ensures that all correct processes receive
$\Prevote$ messages from all correct processes before their respective local
$\timeoutPrevote(r)$ expire.
On the other hand, $\timeoutPrecommit(r)$ is triggered in a correct process $c_2$
after it receives any set of $2f+1$ $\Precommit$ messages for the first time.
Let's denote with $t_2 > t$ the earliest point in time $\timeoutPrecommit(r)$ is
triggered in some correct process $c_2$. This implies that $c_2$ has received
at least $f+1$ $\Precommit$ messages for $id(v)$ from correct processes, i.e.,
those processes have received $\Proposal$ for $v$ and $2f+1$ $\Prevote$
messages for $id(v)$ before time $t_2$. By the \emph{Gossip communication}
property, all correct processes will receive those messages by time $t_2 +
\Delta$, and will send $\Precommit$ messages for $id(v)$. Note that even if
some correct processes were at time $t_2$ in a round smaller than $r$, by the
rule at line~\ref{line:tab:skipRounds} they will enter round $r$ by time $t_2 +
\Delta$. Therefore, by time $t_2 + 2\Delta$, all correct processes will
receive $\Proposal$ for $v$ and $2f+1$ $\Precommit$ messages for $id(v)$. So if
$\timeoutPrecommit(r) > 2\Delta$, all correct processes will decide before the
timeout expires. \end{proof}
\begin{lemma} \label{lemma:validValue} If a correct process $p$ locks a value
$v$ at time $t_0 > GST$ in some round $r$ ($lockedValue = v$ and
$lockedRound = r$) and $\timeoutPrecommit(r) > 2\Delta$, then all correct
processes set $validValue$ to $v$ and $validRound$ to $r$ before starting
round $r+1$. \end{lemma}
\begin{proof} In order to prove this Lemma, we need to prove that if the
process $p$ locks a value $v$ at time $t_0$, then no correct process will
leave round $r$ before time $t_0 + \Delta$ (unless it has already set
$validValue$ to $v$ and $validRound$ to $r$). It is sufficient to prove
this, since by the \emph{Gossip communication} property the messages that
$p$ received at time $t_0$ and that triggered rule at
line~\ref{line:tab:recvPrevote} will be received by time $t_0 + \Delta$ by
all correct processes, so all correct processes that are still in round $r$
will set $validValue$ to $v$ and $validRound$ to $r$ (by the rule at
line~\ref{line:tab:recvPrevote}). To prove this, we need to compute the
earliest point in time a correct process could leave round $r$ without
updating $validValue$ to $v$ and $validRound$ to $r$ (we denote this time
with $t_1$). The Lemma is correct if $t_0 + \Delta < t_1$.
If the process $p$ locks a value $v$ at time $t_0$, this implies that $p$
received the valid $\Proposal$ message for $v$ and $2f+1$
$\li{\Prevote,h,r,id(v)}$ at time $t_0$. At least $f+1$ of those messages are
sent by correct processes. Let's denote this set of correct processes as $C$. By
Lemma~\ref{lemma:majority-intersection} any set of $2f+1$ $\Prevote$ messages
in round $r$ contains at least a single message from the set $C$.
Let's denote as time $t$ the earliest point in time a correct process, $c_1$, triggered
$\timeoutPrevote(r)$. This implies that $c_1$ received $2f+1$ $\Prevote$ messages
(see the rule at line \ref{line:tab:recvAny2/3Prevote}), where at least one of
those messages was sent by a process $c_2$ from the set $C$. Therefore, process
$c_2$ had received $\Proposal$ message before time $t$. By the \emph{Gossip
communication} property, all correct processes will receive $\Proposal$ and
$2f+1$ $\Prevote$ messages for round $r$ by time $t+\Delta$. The latest point
in time $p$ will trigger $\timeoutPrevote(r)$ is $t+\Delta$\footnote{Note that
even if $p$ was in smaller round at time $t$ it will start round $r$ by time
$t+\Delta$.}. So the latest point in time $p$ can lock the value $v$ in
round $r$ is $t_0 = t+\Delta+\timeoutPrevote(r)$ (as at this point
$\timeoutPrevote(r)$ expires, so a process sends $\Precommit$ $\nil$ and updates
$step$ to $\precommit$, see line \ref{line:tab:onTimeoutPrevote}).
Note that according to the Algorithm \ref{alg:tendermint}, a correct process
can not send a $\Precommit$ message before receiving $2f+1$ $\Prevote$
messages. Therefore, no correct process can send a $\Precommit$ message in
round $r$ before time $t$. If a correct process sends a $\Precommit$ message
for $\nil$, it implies that it has waited for the full duration of
$\timeoutPrevote(r)$ (see line
\ref{line:tab:precommit-nil-onTimeout})\footnote{The other case in which a
correct process $\Precommit$ for $\nil$ is after receiving $2f+1$ $Prevote$ for
$\nil$ messages, see the line \ref{line:tab:precommit-v-1}. By
Lemma~\ref{lemma:majority-intersection}, this is not possible in round $r$.}.
Therefore, no correct process can send $\Precommit$ for $\nil$ before time $t +
\timeoutPrevote(r)$ (*).
A correct process $q$ that enters round $r+1$ must wait (i) $\timeoutPrecommit(r)$
(see line \ref{line:tab:nextRound}) or (ii) receiving $f+1$ messages from the
round $r+1$ (see the line \ref{line:tab:skipRounds}). In the former case, $q$
receives $2f+1$ $\Precommit$ messages before starting $\timeoutPrecommit(r)$. If
at least a single $\Precommit$ message from a correct process (at least $f+1$
voting power equivalent of those messages is sent by correct processes) is for
$\nil$, then $q$ cannot start round $r+1$ before time $t_1 = t +
\timeoutPrevote(r) + \timeoutPrecommit(r)$ (see (*)). Therefore in this case we have:
$t_0 + \Delta < t_1$, i.e., $t+2\Delta+\timeoutPrevote(r) < t + \timeoutPrevote(r) +
\timeoutPrecommit(r)$, and this is true whenever $\timeoutPrecommit(r) > 2\Delta$, so
Lemma holds in this case.
If in the set of $2f+1$ $\Precommit$ messages $q$ receives, there is at least a
single $\Precommit$ for $id(v)$ message from a correct process $c$, then $q$
can start the round $r+1$ the earliest at time $t_1 = t+\timeoutPrecommit(r)$. In
this case, by the \emph{Gossip communication} property, all correct processes
will receive $\Proposal$ and $2f+1$ $\Prevote$ messages (that $c$ received
before time $t$) the latest at time $t+\Delta$. Therefore, $q$ will set
$validValue$ to $v$ and $validRound$ to $r$ the latest at time $t+\Delta$. As
$t+\Delta < t+\timeoutPrecommit(r)$, whenever $\timeoutPrecommit(r) > \Delta$, the
Lemma holds also in this case.
In case (ii), $q$ received at least a single message from a correct process $c$
from the round $r+1$. The earliest point in time $c$ could have started round
$r+1$ is $t+\timeoutPrecommit(r)$ in case it received a $\Precommit$ message for
$v$ from some correct process in the set of $2f+1$ $\Precommit$ messages it
received. The same reasoning as above holds also in this case, so $q$ set
$validValue$ to $v$ and $validRound$ to $r$ the latest by time $t+\Delta$. As
$t+\Delta < t+\timeoutPrecommit(r)$, whenever $\timeoutPrecommit(r) > \Delta$, the
Lemma holds also in this case. \end{proof}
\begin{lemma} \label{lemma:agreement} Algorithm~\ref{alg:tendermint} satisfies
Termination. \end{lemma}
\begin{proof} Lemma~\ref{lemma:round-synchronisation} defines a scenario in
which all correct processes decide. We now prove that within a bounded
duration after GST such a scenario will unfold. Let's assume that at time
$GST$ the highest round started by a correct process is $r_0$, and that
there exists a correct process $p$ such that the following holds: for every
correct process $c$, $lockedRound_c \le validRound_p$. Furthermore, we
assume that $p$ will be the proposer in some round $r_1 > r$ (this is
ensured by the $\coord$ function).
We have two cases to consider. In the first case, for all rounds $r \ge r_0$
and $r < r_1$, no correct process locks a value (set $lockedRound$ to $r$). So
in round $r_1$ we have the scenario from the
Lemma~\ref{lemma:round-synchronisation}, so all correct processes decides in
round $r_1$.
In the second case, a correct process locks a value $v$ in round $r_2$, where
$r_2 \ge r_0$ and $r_2 < r_1$. Let's assume that $r_2$ is the highest round
before $r_1$ in which some correct process $q$ locks a value. By Lemma
\ref{lemma:validValue} at the end of round $r_2$ the following holds for all
correct processes $c$: $validValue_c = lockedValue_q$ and $validRound_c = r_2$.
Then in round $r_1$, the conditions for the
Lemma~\ref{lemma:round-synchronisation} holds, so all correct processes decide.
\end{proof}

rounddiag.sty → consensus/rounddiag.sty View File


technote.sty → consensus/technote.sty View File


+ 0
- 54
definitions.tex View File

@ -1,54 +0,0 @@
\section{Definitions}
\label{sec:definitions}
\subsection{Model}
We consider a system composed of $n$ server processes $\Pi = \{ 1, \dots, n\}$.
Server processes can be correct or faulty, where a faulty process can behave in an arbitrary way,
i.e., we consider Byzantine faults. We assume that each process has some amount of voting power, and we denote with $N$ the total voting power of all processes in the system.
Furthermore, we assume that the total voting power of faulty processes is bounded with a system parameter $f$.
Processes communicate by exchanging messages. We assume a model where processes are not part of the single administration domain; therefore we cannot enforce a direct network connectivity between all server processes.
Instead, we assume that each server process is connected to a small number of processes called peers,
such that there is an indirect communication channel between all correct server processes. Communication between processes is established using gossip protocol.
We assume that processes communicate over wide-area network. Formally, we model the network communication using the partially synchronous system model~\cite{DLS88:jacm}, or rather a slightly weaker variant of this model: we assume that the system alternates between good periods (during which the system is synchronous) and bad periods (during which the system is asynchronous and messages can be lost). During good periods, there is a bound $\Delta$ such that all
communication among correct processes is reliable and $\Delta$-timely, i.e., if a correct process
$p$ sends message $m$ at time $t$ to a correct process $q$, then $q$ will receive $m$
before $t+\Delta$. Note that as we do not assume direct communication channels among all correct processes, this implies that before the message $m$ reaches $q$, it will pass through a number of
correct servers that will forward the message $m$ using gossip protocol towards $q$.
Messages among correct processes can be
delayed, dropped or duplicated during bad periods. Spoofing/impersonation attacks are assumed to be impossible also during bad periods.
The bound $\Delta$ is a system parameter whose value is not required to be known for the safety of the algorithm presented. However, the timing bounds derived for our algorithm, and thus the guaranteed latency requires knowledge of $\Delta$.
We assume that process steps (which might include sending and receiving messages) take zero time.
Processes are equipped with clocks so they can measure local timeouts.
We assume integrity of direct communication channels between peers, that is if a process $p$ received a message $m$ from process $q$, then $q$ sent message $m$ to $p$ before. All protocol messages are signed, i.e., when a correct process $q$ receives a signed message $m$ from its peer, the process $q$ can verify who was the original message sender.
The details of the Tendermint gossip protocol will be discussed in a separate technical report. For the sake of this report it is sufficient to assume the following properties provided by the gossip protocol (in addition to the properties about partial synchrony of the network stated above):
\begin{itemize}
\item Messages that are being gossiped come from the consensus layer. We can think about consensus protocol as a content creator, which defines what messages should be disseminated using the gossip protocol. A correct process creates the message for dissemination either i) explicitly, by invoking \emph{send} function as part of the consensus protocol or ii) implicitly, by receiving a message from some other process. Note that in the case ii) gossiping of messages is implicit, i.e., it happens without explicit send clause in the consensus algorithm whenever a correct process receives some messages in the consensus algorithm\footnote{If a message is received by a correct process at the consensus level then it is considered valid from the protocol point of view, i.e., it has a correct signature, a proper message structure and a valid height and round number.}.
\item Processes keep resending messages (in case of failures or message loss) until all its peers get them. This ensures that every message sent or received by a correct process is eventually received by all correct processes.
\end{itemize}
\subsection{Consensus}
\label{sec:consensus}
\newcommand{\propose}{\mathsf{propose}}
\newcommand{\decide}{\mathsf{decide}}
We consider a variant of the Byzantine consensus problem called Validity Predicate-based Byzantine consensus that is motivated by blockchain systems~\cite{GLR17:red-belly-bc}. The problem is defined by an agreement, a termination, and a validity
property.
\begin{itemize}
\item \emph{Agreement:} No two correct processes decide on different values.
\item \emph{Termination:} All correct processes eventually decide on a value.
\item \emph{Validity:} A decided value is valid, i.e., it satisfies the predefined predicate denoted \emph{valid()}.
\end{itemize}
This variant of the Byzantine consensus problem has an application-specific valid() predicate to indicate whether a value is valid. In the context of blockchain systems, for example, a value is not valid if it does not contain an appropriate hash of the last value (block) added to the blockchain.

BIN
paper.dvi View File


+ 0
- 2
proof.tex View File

@ -1,2 +0,0 @@
\section{Formal proof of Tendermint consensus algorithm}
\label{sec:proof}

Loading…
Cancel
Save