You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

138 lines
8.0 KiB

  1. \section{Introduction} \label{sec:tendermint}
  2. Consensus is a fundamental problem in distributed computing. It
  3. is important because of it's role in State Machine Replication (SMR), a generic
  4. approach for replicating services that can be modeled as a deterministic state
  5. machine~\cite{Lam78:cacm, Sch90:survey}. The key idea of this approach is that
  6. service replicas start in the same initial state, and then execute requests
  7. (also called transactions) in the same order; thereby guaranteeing that
  8. replicas stay in sync with each other. The role of consensus in the SMR
  9. approach is ensuring that all replicas receive transactions in the same order.
  10. Traditionally, deployments of SMR based systems are in data-center settings
  11. (local area network), have a small number of replicas (three to seven) and are
  12. typically part of a single administration domain (e.g., Chubby
  13. \cite{Bur:osdi06}); therefore they handle benign (crash) failures only, as more
  14. general forms of failure (in particular, malicious or Byzantine faults) are
  15. considered to occur with only negligible probability.
  16. The success of cryptocurrencies and blockchain systems in recent years (e.g.,
  17. \cite{Nak2012:bitcoin, But2014:ethereum}) pose a whole new set of challenges on
  18. the design and deployment of SMR based systems: reaching agreement over wide
  19. area network, among large number of nodes (hundreds or thousands) that are not
  20. part of the same administrative domain, and where a subset of nodes can behave
  21. maliciously (Byzantine faults). Furthermore, contrary to the previous
  22. data-center deployments where nodes are fully connected to each other, in
  23. blockchain systems, a node is only connected to a subset of other nodes, so
  24. communication is achieved by gossip-based peer-to-peer protocols.
  25. The new requirements demand designs and algorithms that are not necessarily
  26. present in the classical academic literature on Byzantine fault tolerant
  27. consensus (or SMR) systems (e.g., \cite{DLS88:jacm, CL02:tcs}) as the primary
  28. focus was different setup.
  29. In this paper we describe a novel Byzantine-fault tolerant consensus algorithm
  30. that is the core of the BFT SMR platform called Tendermint\footnote{The
  31. Tendermint platform is available open source at
  32. https://github.com/tendermint/tendermint.}. The Tendermint platform consists of
  33. a high-performance BFT SMR implementation written in Go, a flexible interface
  34. for
  35. building arbitrary deterministic applications above the consensus, and a suite
  36. of tools for deployment and management.
  37. The Tendermint consensus algorithm is inspired by the PBFT SMR
  38. algorithm~\cite{CL99:osdi} and the DLS algorithm for authenticated faults (the
  39. Algorithm 2 from \cite{DLS88:jacm}). Similar to DLS algorithm, Tendermint
  40. proceeds in
  41. rounds\footnote{Tendermint is not presented in the basic round model of
  42. \cite{DLS88:jacm}. Furthermore, we use the term round differently than in
  43. \cite{DLS88:jacm}; in Tendermint a round denotes a sequence of communication
  44. steps instead of a single communication step in \cite{DLS88:jacm}.}, where each
  45. round has a dedicated proposer (also called coordinator or
  46. leader) and a process proceeds to a new round as part of normal
  47. processing (not only in case the proposer is faulty or suspected as being faulty
  48. by enough processes as in PBFT).
  49. The communication pattern of each round is very similar to the "normal" case
  50. of PBFT. Therefore, in preferable conditions (correct proposer, timely and
  51. reliable communication between correct processes), Tendermint decides in three
  52. communication steps (the same as PBFT).
  53. The major novelty and contribution of the Tendermint consensus algorithm is a
  54. new termination mechanism. As explained in \cite{MHS09:opodis, RMS10:dsn}, the
  55. existing BFT consensus (and SMR) algorithms for the partially synchronous
  56. system model (for example PBFT~\cite{CL99:osdi}, \cite{DLS88:jacm},
  57. \cite{MA06:tdsc}) typically relies on the communication pattern illustrated in
  58. Figure~\ref{ch3:fig:coordinator-change} for termination. The
  59. Figure~\ref{ch3:fig:coordinator-change} illustrates messages exchanged during
  60. the proposer change when processes start a new round\footnote{There is no
  61. consistent terminology in the distributed computing terminology on naming
  62. sequence of communication steps that corresponds to a logical unit. It is
  63. sometimes called a round, phase or a view.}. It guarantees that eventually (ie.
  64. after some Global Stabilization Time, GST), there exists a round with a correct
  65. proposer that will bring the system into a univalent configuration.
  66. Intuitively, in a round in which the proposed value is accepted
  67. by all correct processes, and communication between correct processes is
  68. timely and reliable, all correct processes decide.
  69. \begin{figure}[tbh!] \def\rdstretch{5} \def\ystretch{3} \centering
  70. \begin{rounddiag}{4}{2} \round{1}{~} \rdmessage{1}{1}{$v_1$}
  71. \rdmessage{2}{1}{$v_2$} \rdmessage{3}{1}{$v_3$} \rdmessage{4}{1}{$v_4$}
  72. \round{2}{~} \rdmessage{1}{1}{$x, [v_{1..4}]$}
  73. \rdmessage{1}{2}{$~~~~~~x, [v_{1..4}]$} \rdmessage{1}{3}{$~~~~~~~~x,
  74. [v_{1..4}]$} \rdmessage{1}{4}{$~~~~~~~x, [v_{1..4}]$} \end{rounddiag}
  75. \vspace{-5mm} \caption{\boldmath Proposer (coordinator) change: $p_1$ is the
  76. new proposer.} \label{ch3:fig:coordinator-change} \end{figure}
  77. To ensure that a proposed value is accepted by all correct
  78. processes\footnote{The proposed value is not blindly accepted by correct
  79. processes in BFT algorithms. A correct process always verifies if the proposed
  80. value is safe to be accepted so that safety properties of consensus are not
  81. violated.}
  82. a proposer will 1) build the global state by receiving messages from other
  83. processes, 2) select the safe value to propose and 3) send the selected value
  84. together with the signed messages
  85. received in the first step to support it. The
  86. value $v_i$ that a correct process sends to the next proposer normally
  87. corresponds to a value the process considers as acceptable for a decision:
  88. \begin{itemize} \item in PBFT~\cite{CL99:osdi} and DLS~\cite{DLS88:jacm} it is
  89. not the value itself but a set of $2f+1$ signed messages with the same
  90. value id, \item in Fast Byzantine Paxos~\cite{MA06:tdsc} the value
  91. itself is being sent. \end{itemize}
  92. In both cases, using this mechanism in our system model (ie. high
  93. number of nodes over gossip based network) would have high communication
  94. complexity that increases with the number of processes: in the first case as
  95. the message sent depends on the total number of processes, and in the second
  96. case as the value (block of transactions) is sent by each process. The set of
  97. messages received in the first step are normally piggybacked on the proposal
  98. message (in the Figure~\ref{ch3:fig:coordinator-change} denoted with
  99. $[v_{1..4}]$) to justify the choice of the selected value $x$. Note that
  100. sending this message also does not scale with the number of processes in the
  101. system.
  102. We designed a novel termination mechanism for Tendermint that better suits the
  103. system model we consider. It does not require additional communication (neither
  104. sending new messages nor piggybacking information on the existing messages) and
  105. it is fully based on the communication pattern that is very similar to the
  106. normal case in PBFT \cite{CL99:osdi}. Therefore, there is only a single mode of
  107. execution in Tendermint, i.e., there is no separation between the normal and
  108. the recovery mode, which is the case in other PBFT-like protocols (e.g.,
  109. \cite{CL99:osdi}, \cite{Ver09:spinning} or \cite{Cle09:aardvark}). We believe
  110. this makes Tendermint simpler to understand and implement correctly.
  111. Note that the orthogonal approach for reducing message complexity in order to
  112. improve
  113. scalability and decentralization (number of processes) of BFT consensus
  114. algorithms is using advanced cryptography (for example Boneh-Lynn-Shacham (BLS)
  115. signatures \cite{BLS2001:crypto}) as done for example in SBFT
  116. \cite{Gue2018:sbft}.
  117. The remainder of the paper is as follows: Section~\ref{sec:definitions} defines
  118. the system model and gives the problem definitions. Tendermint
  119. consensus algorithm is presented in Section~\ref{sec:tendermint} and the
  120. proofs are given in Section~\ref{sec:proof}. We conclude in
  121. Section~\ref{sec:conclusion}.