|
|
- \section{Introduction} \label{sec:tendermint}
-
- Consensus is a fundamental problem in distributed computing. It
- is important because of it's role in State Machine Replication (SMR), a generic
- approach for replicating services that can be modeled as a deterministic state
- machine~\cite{Lam78:cacm, Sch90:survey}. The key idea of this approach is that
- service replicas start in the same initial state, and then execute requests
- (also called transactions) in the same order; thereby guaranteeing that
- replicas stay in sync with each other. The role of consensus in the SMR
- approach is ensuring that all replicas receive transactions in the same order.
- Traditionally, deployments of SMR based systems are in data-center settings
- (local area network), have a small number of replicas (three to seven) and are
- typically part of a single administration domain (e.g., Chubby
- \cite{Bur:osdi06}); therefore they handle benign (crash) failures only, as more
- general forms of failure (in particular, malicious or Byzantine faults) are
- considered to occur with only negligible probability.
-
- The success of cryptocurrencies and blockchain systems in recent years (e.g.,
- \cite{Nak2012:bitcoin, But2014:ethereum}) pose a whole new set of challenges on
- the design and deployment of SMR based systems: reaching agreement over wide
- area network, among large number of nodes (hundreds or thousands) that are not
- part of the same administrative domain, and where a subset of nodes can behave
- maliciously (Byzantine faults). Furthermore, contrary to the previous
- data-center deployments where nodes are fully connected to each other, in
- blockchain systems, a node is only connected to a subset of other nodes, so
- communication is achieved by gossip-based peer-to-peer protocols.
- The new requirements demand designs and algorithms that are not necessarily
- present in the classical academic literature on Byzantine fault tolerant
- consensus (or SMR) systems (e.g., \cite{DLS88:jacm, CL02:tcs}) as the primary
- focus was different setup.
-
- In this paper we describe a novel Byzantine-fault tolerant consensus algorithm
- that is the core of the BFT SMR platform called Tendermint\footnote{The
- Tendermint platform is available open source at
- https://github.com/tendermint/tendermint.}. The Tendermint platform consists of
- a high-performance BFT SMR implementation written in Go, a flexible interface
- for
- building arbitrary deterministic applications above the consensus, and a suite
- of tools for deployment and management.
-
- The Tendermint consensus algorithm is inspired by the PBFT SMR
- algorithm~\cite{CL99:osdi} and the DLS algorithm for authenticated faults (the
- Algorithm 2 from \cite{DLS88:jacm}). Similar to DLS algorithm, Tendermint
- proceeds in
- rounds\footnote{Tendermint is not presented in the basic round model of
- \cite{DLS88:jacm}. Furthermore, we use the term round differently than in
- \cite{DLS88:jacm}; in Tendermint a round denotes a sequence of communication
- steps instead of a single communication step in \cite{DLS88:jacm}.}, where each
- round has a dedicated proposer (also called coordinator or
- leader) and a process proceeds to a new round as part of normal
- processing (not only in case the proposer is faulty or suspected as being faulty
- by enough processes as in PBFT).
- The communication pattern of each round is very similar to the "normal" case
- of PBFT. Therefore, in preferable conditions (correct proposer, timely and
- reliable communication between correct processes), Tendermint decides in three
- communication steps (the same as PBFT).
-
- The major novelty and contribution of the Tendermint consensus algorithm is a
- new termination mechanism. As explained in \cite{MHS09:opodis, RMS10:dsn}, the
- existing BFT consensus (and SMR) algorithms for the partially synchronous
- system model (for example PBFT~\cite{CL99:osdi}, \cite{DLS88:jacm},
- \cite{MA06:tdsc}) typically relies on the communication pattern illustrated in
- Figure~\ref{ch3:fig:coordinator-change} for termination. The
- Figure~\ref{ch3:fig:coordinator-change} illustrates messages exchanged during
- the proposer change when processes start a new round\footnote{There is no
- consistent terminology in the distributed computing terminology on naming
- sequence of communication steps that corresponds to a logical unit. It is
- sometimes called a round, phase or a view.}. It guarantees that eventually (ie.
- after some Global Stabilization Time, GST), there exists a round with a correct
- proposer that will bring the system into a univalent configuration.
- Intuitively, in a round in which the proposed value is accepted
- by all correct processes, and communication between correct processes is
- timely and reliable, all correct processes decide.
-
-
- \begin{figure}[tbh!] \def\rdstretch{5} \def\ystretch{3} \centering
- \begin{rounddiag}{4}{2} \round{1}{~} \rdmessage{1}{1}{$v_1$}
- \rdmessage{2}{1}{$v_2$} \rdmessage{3}{1}{$v_3$} \rdmessage{4}{1}{$v_4$}
- \round{2}{~} \rdmessage{1}{1}{$x, [v_{1..4}]$}
- \rdmessage{1}{2}{$~~~~~~x, [v_{1..4}]$} \rdmessage{1}{3}{$~~~~~~~~x,
- [v_{1..4}]$} \rdmessage{1}{4}{$~~~~~~~x, [v_{1..4}]$} \end{rounddiag}
- \vspace{-5mm} \caption{\boldmath Proposer (coordinator) change: $p_1$ is the
- new proposer.} \label{ch3:fig:coordinator-change} \end{figure}
-
- To ensure that a proposed value is accepted by all correct
- processes\footnote{The proposed value is not blindly accepted by correct
- processes in BFT algorithms. A correct process always verifies if the proposed
- value is safe to be accepted so that safety properties of consensus are not
- violated.}
- a proposer will 1) build the global state by receiving messages from other
- processes, 2) select the safe value to propose and 3) send the selected value
- together with the signed messages
- received in the first step to support it. The
- value $v_i$ that a correct process sends to the next proposer normally
- corresponds to a value the process considers as acceptable for a decision:
-
- \begin{itemize} \item in PBFT~\cite{CL99:osdi} and DLS~\cite{DLS88:jacm} it is
- not the value itself but a set of $2f+1$ signed messages with the same
- value id, \item in Fast Byzantine Paxos~\cite{MA06:tdsc} the value
- itself is being sent. \end{itemize}
-
- In both cases, using this mechanism in our system model (ie. high
- number of nodes over gossip based network) would have high communication
- complexity that increases with the number of processes: in the first case as
- the message sent depends on the total number of processes, and in the second
- case as the value (block of transactions) is sent by each process. The set of
- messages received in the first step are normally piggybacked on the proposal
- message (in the Figure~\ref{ch3:fig:coordinator-change} denoted with
- $[v_{1..4}]$) to justify the choice of the selected value $x$. Note that
- sending this message also does not scale with the number of processes in the
- system.
-
- We designed a novel termination mechanism for Tendermint that better suits the
- system model we consider. It does not require additional communication (neither
- sending new messages nor piggybacking information on the existing messages) and
- it is fully based on the communication pattern that is very similar to the
- normal case in PBFT \cite{CL99:osdi}. Therefore, there is only a single mode of
- execution in Tendermint, i.e., there is no separation between the normal and
- the recovery mode, which is the case in other PBFT-like protocols (e.g.,
- \cite{CL99:osdi}, \cite{Ver09:spinning} or \cite{Cle09:aardvark}). We believe
- this makes Tendermint simpler to understand and implement correctly.
-
- Note that the orthogonal approach for reducing message complexity in order to
- improve
- scalability and decentralization (number of processes) of BFT consensus
- algorithms is using advanced cryptography (for example Boneh-Lynn-Shacham (BLS)
- signatures \cite{BLS2001:crypto}) as done for example in SBFT
- \cite{Gue2018:sbft}.
-
- The remainder of the paper is as follows: Section~\ref{sec:definitions} defines
- the system model and gives the problem definitions. Tendermint
- consensus algorithm is presented in Section~\ref{sec:tendermint} and the
- proofs are given in Section~\ref{sec:proof}. We conclude in
- Section~\ref{sec:conclusion}.
-
-
-
-
|