zolfa
/
tendermint

\section{Introduction} \label{sec:tendermint}
Consensus is a fundamental problem in distributed computing. Itis important because of it's role in State Machine Replication (SMR), a genericapproach for replicating services that can be modeled as a deterministic statemachine~\cite{Lam78:cacm, Sch90:survey}. The key idea of this approach is thatservice replicas start in the same initial state, and then execute requests(also called transactions) in the same order; thereby guaranteeing thatreplicas stay in sync with each other. The role of consensus in the SMRapproach is ensuring that all replicas receive transactions in the same order.Traditionally, deployments of SMR based systems are in data-center settings(local area network), have a small number of replicas (three to seven) and aretypically part of a single administration domain (e.g., Chubby\cite{Bur:osdi06}); therefore they handle benign (crash) failures only, as moregeneral forms of failure (in particular, malicious or Byzantine faults) areconsidered to occur with only negligible probability.  
The success of cryptocurrencies and blockchain systems in recent years (e.g.,\cite{Nak2012:bitcoin, But2014:ethereum}) pose a whole new set of challenges onthe design and deployment of SMR based systems: reaching agreement over widearea network, among large number of nodes (hundreds or thousands) that are notpart of the same administrative domain, and where a subset of nodes can behavemaliciously (Byzantine faults). Furthermore, contrary to the previousdata-center deployments where nodes are fully connected to each other, inblockchain systems, a node is only connected to a subset of other nodes, socommunication is achieved by gossip-based peer-to-peer protocols. The new requirements demand designs and algorithms that are not necessarilypresent in the classical academic literature on Byzantine fault tolerantconsensus (or SMR) systems (e.g., \cite{DLS88:jacm, CL02:tcs}) as the primary focus was different setup. 
In this paper we describe a novel Byzantine-fault tolerant consensus algorithmthat is the core of the BFT SMR platform called Tendermint\footnote{The	Tendermint platform is available open source at	https://github.com/tendermint/tendermint.}. The Tendermint platform consists ofa high-performance BFT SMR implementation written in Go, a flexible interfaceforbuilding arbitrary deterministic applications above the consensus, and a suiteof tools for deployment and management.  
The Tendermint consensus algorithm is inspired by the PBFT SMRalgorithm~\cite{CL99:osdi} and the DLS algorithm for authenticated faults (theAlgorithm 2 from \cite{DLS88:jacm}). Similar to DLS algorithm, Tendermintproceeds inrounds\footnote{Tendermint is not presented in the basic round model of	\cite{DLS88:jacm}. Furthermore, we use the term round differently than in	\cite{DLS88:jacm}; in Tendermint a round denotes a sequence of communication	steps instead of a single communication step in \cite{DLS88:jacm}.}, where eachround has a dedicated proposer (also called coordinator orleader) and a process proceeds to a new round as part of normalprocessing (not only in case the proposer is faulty or suspected as being faultyby enough processes as in PBFT).  The communication pattern of each round is very similar to the "normal" caseof PBFT. Therefore, in preferable conditions (correct proposer, timely andreliable communication between correct processes), Tendermint decides in threecommunication steps (the same as PBFT). 
The major novelty and contribution of the Tendermint consensus algorithm is anew termination mechanism. As explained in \cite{MHS09:opodis, RMS10:dsn}, theexisting BFT consensus (and SMR) algorithms for the partially synchronoussystem model (for example PBFT~\cite{CL99:osdi}, \cite{DLS88:jacm},\cite{MA06:tdsc}) typically relies on the communication pattern illustrated inFigure~\ref{ch3:fig:coordinator-change} for termination. TheFigure~\ref{ch3:fig:coordinator-change} illustrates messages exchanged duringthe proposer change when processes start a new round\footnote{There is no	consistent terminology in the distributed computing terminology on naming	sequence of communication steps that corresponds to a logical unit. It is	sometimes called a round, phase or a view.}. It guarantees that eventually (ie.after some Global Stabilization Time, GST), there exists a round with a correctproposer that will bring the system into a univalent configuration.Intuitively, in a round in which the proposed value is acceptedby all correct processes, and communication between correct processes istimely and reliable, all correct processes decide.   

\begin{figure}[tbh!] \def\rdstretch{5} \def\ystretch{3} \centering	\begin{rounddiag}{4}{2} \round{1}{~} \rdmessage{1}{1}{$v_1$}		\rdmessage{2}{1}{$v_2$} \rdmessage{3}{1}{$v_3$} \rdmessage{4}{1}{$v_4$}		\round{2}{~} \rdmessage{1}{1}{$x, [v_{1..4}]$}		\rdmessage{1}{2}{$~~~~~~x, [v_{1..4}]$} \rdmessage{1}{3}{$~~~~~~~~x,
			[v_{1..4}]$} \rdmessage{1}{4}{$~~~~~~~x, [v_{1..4}]$} \end{rounddiag}
	\vspace{-5mm} \caption{\boldmath Proposer (coordinator) change: $p_1$ is the		new proposer.} \label{ch3:fig:coordinator-change} \end{figure}  
To ensure that a proposed value is accepted by all correctprocesses\footnote{The proposed value is not blindly accepted by correct	processes in BFT algorithms. A correct process always verifies if the proposed	value is safe to be accepted so that safety properties of consensus are not	violated.}a proposer will 1) build the global state by receiving messages from otherprocesses, 2) select the safe value to propose and 3) send the selected valuetogether with the signed messagesreceived in the first step to support it. Thevalue $v_i$ that a correct process sends to the next proposer normallycorresponds to a value the process considers as acceptable for a decision: 
\begin{itemize} \item in PBFT~\cite{CL99:osdi} and DLS~\cite{DLS88:jacm} it is	not the value itself but a set of $2f+1$ signed messages with the same	value id, \item in Fast Byzantine Paxos~\cite{MA06:tdsc} the value	itself is being sent.  \end{itemize}
In both cases, using this mechanism in our system model (ie. highnumber of nodes over gossip based network) would have high communicationcomplexity that increases with the number of processes: in the first case asthe message sent depends on the total number of processes, and in the secondcase as the value (block of transactions) is sent by each process. The set ofmessages received in the first step are normally piggybacked on the proposalmessage (in the Figure~\ref{ch3:fig:coordinator-change} denoted with$[v_{1..4}]$) to justify the choice of the selected value $x$. Note thatsending this message also does not scale with the number of processes in thesystem.   
We designed a novel termination mechanism for Tendermint that better suits thesystem model we consider. It does not require additional communication (neithersending new messages nor piggybacking information on the existing messages) andit is fully based on the communication pattern that is very similar to thenormal case in PBFT \cite{CL99:osdi}. Therefore, there is only a single mode ofexecution in Tendermint, i.e., there is no separation between the normal andthe recovery mode, which is the case in other PBFT-like protocols (e.g.,\cite{CL99:osdi}, \cite{Ver09:spinning} or \cite{Cle09:aardvark}). We believethis makes Tendermint simpler to understand and implement correctly. 
Note that the orthogonal approach for reducing message complexity in order toimprovescalability and decentralization (number of processes) of BFT consensusalgorithms is using advanced cryptography (for example Boneh-Lynn-Shacham (BLS)signatures \cite{BLS2001:crypto}) as done for example in SBFT\cite{Gue2018:sbft}.  
The remainder of the paper is as follows: Section~\ref{sec:definitions} definesthe system model and gives the problem definitions. Tendermintconsensus algorithm is presented in Section~\ref{sec:tendermint} and theproofs are given in Section~\ref{sec:proof}. We conclude inSection~\ref{sec:conclusion}.