\section{Introduction} \label{sec:tendermint} Consensus is a fundamental problem in distributed computing. It is important because of it's role in State Machine Replication (SMR), a generic approach for replicating services that can be modeled as a deterministic state machine~\cite{Lam78:cacm, Sch90:survey}. The key idea of this approach is that service replicas start in the same initial state, and then execute requests (also called transactions) in the same order; thereby guaranteeing that replicas stay in sync with each other. The role of consensus in the SMR approach is ensuring that all replicas receive transactions in the same order. Traditionally, deployments of SMR based systems are in data-center settings (local area network), have a small number of replicas (three to seven) and are typically part of a single administration domain (e.g., Chubby \cite{Bur:osdi06}); therefore they handle benign (crash) failures only, as more general forms of failure (in particular, malicious or Byzantine faults) are considered to occur with only negligible probability. The success of cryptocurrencies and blockchain systems in recent years (e.g., \cite{Nak2012:bitcoin, But2014:ethereum}) pose a whole new set of challenges on the design and deployment of SMR based systems: reaching agreement over wide area network, among large number of nodes (hundreds or thousands) that are not part of the same administrative domain, and where a subset of nodes can behave maliciously (Byzantine faults). Furthermore, contrary to the previous data-center deployments where nodes are fully connected to each other, in blockchain systems, a node is only connected to a subset of other nodes, so communication is achieved by gossip-based peer-to-peer protocols. The new requirements demand designs and algorithms that are not necessarily present in the classical academic literature on Byzantine fault tolerant consensus (or SMR) systems (e.g., \cite{DLS88:jacm, CL02:tcs}) as the primary focus was different setup. In this paper we describe a novel Byzantine-fault tolerant consensus algorithm that is the core of the BFT SMR platform called Tendermint\footnote{The Tendermint platform is available open source at https://github.com/tendermint/tendermint.}. The Tendermint platform consists of a high-performance BFT SMR implementation written in Go, a flexible interface for building arbitrary deterministic applications above the consensus, and a suite of tools for deployment and management. The Tendermint consensus algorithm is inspired by the PBFT SMR algorithm~\cite{CL99:osdi} and the DLS algorithm for authenticated faults (the Algorithm 2 from \cite{DLS88:jacm}). Similar to DLS algorithm, Tendermint proceeds in rounds\footnote{Tendermint is not presented in the basic round model of \cite{DLS88:jacm}. Furthermore, we use the term round differently than in \cite{DLS88:jacm}; in Tendermint a round denotes a sequence of communication steps instead of a single communication step in \cite{DLS88:jacm}.}, where each round has a dedicated proposer (also called coordinator or leader) and a process proceeds to a new round as part of normal processing (not only in case the proposer is faulty or suspected as being faulty by enough processes as in PBFT). The communication pattern of each round is very similar to the "normal" case of PBFT. Therefore, in preferable conditions (correct proposer, timely and reliable communication between correct processes), Tendermint decides in three communication steps (the same as PBFT). The major novelty and contribution of the Tendermint consensus algorithm is a new termination mechanism. As explained in \cite{MHS09:opodis, RMS10:dsn}, the existing BFT consensus (and SMR) algorithms for the partially synchronous system model (for example PBFT~\cite{CL99:osdi}, \cite{DLS88:jacm}, \cite{MA06:tdsc}) typically relies on the communication pattern illustrated in Figure~\ref{ch3:fig:coordinator-change} for termination. The Figure~\ref{ch3:fig:coordinator-change} illustrates messages exchanged during the proposer change when processes start a new round\footnote{There is no consistent terminology in the distributed computing terminology on naming sequence of communication steps that corresponds to a logical unit. It is sometimes called a round, phase or a view.}. It guarantees that eventually (ie. after some Global Stabilization Time, GST), there exists a round with a correct proposer that will bring the system into a univalent configuration. Intuitively, in a round in which the proposed value is accepted by all correct processes, and communication between correct processes is timely and reliable, all correct processes decide. \begin{figure}[tbh!] \def\rdstretch{5} \def\ystretch{3} \centering \begin{rounddiag}{4}{2} \round{1}{~} \rdmessage{1}{1}{$v_1$} \rdmessage{2}{1}{$v_2$} \rdmessage{3}{1}{$v_3$} \rdmessage{4}{1}{$v_4$} \round{2}{~} \rdmessage{1}{1}{$x, [v_{1..4}]$} \rdmessage{1}{2}{$~~~~~~x, [v_{1..4}]$} \rdmessage{1}{3}{$~~~~~~~~x, [v_{1..4}]$} \rdmessage{1}{4}{$~~~~~~~x, [v_{1..4}]$} \end{rounddiag} \vspace{-5mm} \caption{\boldmath Proposer (coordinator) change: $p_1$ is the new proposer.} \label{ch3:fig:coordinator-change} \end{figure} To ensure that a proposed value is accepted by all correct processes\footnote{The proposed value is not blindly accepted by correct processes in BFT algorithms. A correct process always verifies if the proposed value is safe to be accepted so that safety properties of consensus are not violated.} a proposer will 1) build the global state by receiving messages from other processes, 2) select the safe value to propose and 3) send the selected value together with the signed messages received in the first step to support it. The value $v_i$ that a correct process sends to the next proposer normally corresponds to a value the process considers as acceptable for a decision: \begin{itemize} \item in PBFT~\cite{CL99:osdi} and DLS~\cite{DLS88:jacm} it is not the value itself but a set of $2f+1$ signed messages with the same value id, \item in Fast Byzantine Paxos~\cite{MA06:tdsc} the value itself is being sent. \end{itemize} In both cases, using this mechanism in our system model (ie. high number of nodes over gossip based network) would have high communication complexity that increases with the number of processes: in the first case as the message sent depends on the total number of processes, and in the second case as the value (block of transactions) is sent by each process. The set of messages received in the first step are normally piggybacked on the proposal message (in the Figure~\ref{ch3:fig:coordinator-change} denoted with $[v_{1..4}]$) to justify the choice of the selected value $x$. Note that sending this message also does not scale with the number of processes in the system. We designed a novel termination mechanism for Tendermint that better suits the system model we consider. It does not require additional communication (neither sending new messages nor piggybacking information on the existing messages) and it is fully based on the communication pattern that is very similar to the normal case in PBFT \cite{CL99:osdi}. Therefore, there is only a single mode of execution in Tendermint, i.e., there is no separation between the normal and the recovery mode, which is the case in other PBFT-like protocols (e.g., \cite{CL99:osdi}, \cite{Ver09:spinning} or \cite{Cle09:aardvark}). We believe this makes Tendermint simpler to understand and implement correctly. Note that the orthogonal approach for reducing message complexity in order to improve scalability and decentralization (number of processes) of BFT consensus algorithms is using advanced cryptography (for example Boneh-Lynn-Shacham (BLS) signatures \cite{BLS2001:crypto}) as done for example in SBFT \cite{Gue2018:sbft}. The remainder of the paper is as follows: Section~\ref{sec:definitions} defines the system model and gives the problem definitions. Tendermint consensus algorithm is presented in Section~\ref{sec:tendermint} and the proofs are given in Section~\ref{sec:proof}. We conclude in Section~\ref{sec:conclusion}.