You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

303 lines
14 KiB

  1. # ADR 067: Mempool Refactor
  2. - [ADR 067: Mempool Refactor](#adr-067-mempool-refactor)
  3. - [Changelog](#changelog)
  4. - [Status](#status)
  5. - [Context](#context)
  6. - [Current Design](#current-design)
  7. - [Alternative Approaches](#alternative-approaches)
  8. - [Prior Art](#prior-art)
  9. - [Ethereum](#ethereum)
  10. - [Diem](#diem)
  11. - [Decision](#decision)
  12. - [Detailed Design](#detailed-design)
  13. - [CheckTx](#checktx)
  14. - [Mempool](#mempool)
  15. - [Eviction](#eviction)
  16. - [Gossiping](#gossiping)
  17. - [Performance](#performance)
  18. - [Future Improvements](#future-improvements)
  19. - [Consequences](#consequences)
  20. - [Positive](#positive)
  21. - [Negative](#negative)
  22. - [Neutral](#neutral)
  23. - [References](#references)
  24. ## Changelog
  25. - April 19, 2021: Initial Draft (@alexanderbez)
  26. ## Status
  27. Accepted
  28. ## Context
  29. Tendermint Core has a reactor and data structure, mempool, that facilitates the
  30. ephemeral storage of uncommitted transactions. Honest nodes participating in a
  31. Tendermint network gossip these uncommitted transactions to each other if they
  32. pass the application's `CheckTx`. In addition, block proposers select from the
  33. mempool a subset of uncommitted transactions to include in the next block.
  34. Currently, the mempool in Tendermint Core is designed as a FIFO queue. In other
  35. words, transactions are included in blocks as they are received by a node. There
  36. currently is no explicit and prioritized ordering of these uncommitted transactions.
  37. This presents a few technical and UX challenges for operators and applications.
  38. Namely, validators are not able to prioritize transactions by their fees or any
  39. incentive aligned mechanism. In addition, the lack of prioritization also leads
  40. to cascading effects in terms of DoS and various attack vectors on networks,
  41. e.g. [cosmos/cosmos-sdk#8224](https://github.com/cosmos/cosmos-sdk/discussions/8224).
  42. Thus, Tendermint Core needs the ability for an application and its users to
  43. prioritize transactions in a flexible and performant manner. Specifically, we're
  44. aiming to either improve, maintain or add the following properties in the
  45. Tendermint mempool:
  46. - Allow application-determined transaction priority.
  47. - Allow efficient concurrent reads and writes.
  48. - Allow block proposers to reap transactions efficiently by priority.
  49. - Maintain a fixed mempool capacity by transaction size and evict lower priority
  50. transactions to make room for higher priority transactions.
  51. - Allow transactions to be gossiped by priority efficiently.
  52. - Allow operators to specify a maximum TTL for transactions in the mempool before
  53. they're automatically evicted if not selected for a block proposal in time.
  54. - Ensure the design allows for future extensions, such as replace-by-priority and
  55. allowing multiple pending transactions per sender, to be incorporated easily.
  56. Note, not all of these properties will be addressed by the proposed changes in
  57. this ADR. However, this proposal will ensure that any unaddressed properties
  58. can be addressed in an easy and extensible manner in the future.
  59. ### Current Design
  60. ![mempool](./img/mempool-v0.jpeg)
  61. At the core of the `v0` mempool reactor is a concurrent linked-list. This is the
  62. primary data structure that contains `Tx` objects that have passed `CheckTx`.
  63. When a node receives a transaction from another peer, it executes `CheckTx`, which
  64. obtains a read-lock on the `*CListMempool`. If the transaction passes `CheckTx`
  65. locally on the node, it is added to the `*CList` by obtaining a write-lock. It
  66. is also added to the `cache` and `txsMap`, both of which obtain their own respective
  67. write-locks and map a reference from the transaction hash to the `Tx` itself.
  68. Transactions are continuously gossiped to peers whenever a new transaction is added
  69. to a local node's `*CList`, where the node at the front of the `*CList` is selected.
  70. Another transaction will not be gossiped until the `*CList` notifies the reader
  71. that there are more transactions to gossip.
  72. When a proposer attempts to propose a block, they will execute `ReapMaxBytesMaxGas`
  73. on the reactor's `*CListMempool`. This call obtains a read-lock on the `*CListMempool`
  74. and selects as many transactions as possible starting from the front of the `*CList`
  75. moving to the back of the list.
  76. When a block is finally committed, a caller invokes `Update` on the reactor's
  77. `*CListMempool` with all the selected transactions. Note, the caller must also
  78. explicitly obtain a write-lock on the reactor's `*CListMempool`. This call
  79. will remove all the supplied transactions from the `txsMap` and the `*CList`, both
  80. of which obtain their own respective write-locks. In addition, the transaction
  81. may also be removed from the `cache` which obtains it's own write-lock.
  82. ## Alternative Approaches
  83. When considering which approach to take for a priority-based flexible and
  84. performant mempool, there are two core candidates. The first candidate is less
  85. invasive in the required set of protocol and implementation changes, which
  86. simply extends the existing `CheckTx` ABCI method. The second candidate essentially
  87. involves the introduction of new ABCI method(s) and would require a higher degree
  88. of complexity in protocol and implementation changes, some of which may either
  89. overlap or conflict with the upcoming introduction of [ABCI++](https://github.com/tendermint/spec/blob/master/rfc/004-abci%2B%2B.md).
  90. For more information on the various approaches and proposals, please see the
  91. [mempool discussion](https://github.com/tendermint/tendermint/discussions/6295).
  92. ## Prior Art
  93. ### Ethereum
  94. The Ethereum mempool, specifically [Geth](https://github.com/ethereum/go-ethereum),
  95. contains a mempool, `*TxPool`, that contains various mappings indexed by account,
  96. such as a `pending` which contains all processable transactions for accounts
  97. prioritized by nonce. It also contains a `queue` which is the exact same mapping
  98. except it contains not currently processable transactions. The mempool also
  99. contains a `priced` index of type `*txPricedList` that is a priority queue based
  100. on transaction price.
  101. ### Diem
  102. The [Diem mempool](https://github.com/diem/diem/blob/master/mempool/README.md#implementation-details)
  103. contains a similar approach to the one we propose. Specifically, the Diem mempool
  104. contains a mapping from `Account:[]Tx`. On top of this primary mapping from account
  105. to a list of transactions, are various indexes used to perform certain actions.
  106. The main index, `PriorityIndex`. is an ordered queue of transactions that are
  107. “consensus-ready” (i.e., they have a sequence number which is sequential to the
  108. current sequence number for the account). This queue is ordered by gas price so
  109. that if a client is willing to pay more (than other clients) per unit of
  110. execution, then they can enter consensus earlier.
  111. ## Decision
  112. To incorporate a priority-based flexible and performant mempool in Tendermint Core,
  113. we will introduce new fields, `priority` and `sender`, into the `ResponseCheckTx`
  114. type.
  115. We will introduce a new versioned mempool reactor, `v1` and assume an implicit
  116. version of the current mempool reactor as `v0`. In the new `v1` mempool reactor,
  117. we largely keep the functionality the same as `v0` except we augment the underlying
  118. data structures. Specifically, we keep a mapping of senders to transaction objects.
  119. On top of this mapping, we index transactions to provide the ability to efficiently
  120. gossip and reap transactions by priority.
  121. ## Detailed Design
  122. ### CheckTx
  123. We introduce the following new fields into the `ResponseCheckTx` type:
  124. ```diff
  125. message ResponseCheckTx {
  126. uint32 code = 1;
  127. bytes data = 2;
  128. string log = 3; // nondeterministic
  129. string info = 4; // nondeterministic
  130. int64 gas_wanted = 5 [json_name = "gas_wanted"];
  131. int64 gas_used = 6 [json_name = "gas_used"];
  132. repeated Event events = 7 [(gogoproto.nullable) = false, (gogoproto.jsontag) = "events,omitempty"];
  133. string codespace = 8;
  134. + int64 priority = 9;
  135. + string sender = 10;
  136. }
  137. ```
  138. It is entirely up the application in determining how these fields are populated
  139. and with what values, e.g. the `sender` could be the signer and fee payer
  140. of the transaction, the `priority` could be the cumulative sum of the fee(s).
  141. Only `sender` is required, while `priority` can be omitted which would result in
  142. using the default value of zero.
  143. ### Mempool
  144. The existing concurrent-safe linked-list will be replaced by a thread-safe map
  145. of `<sender:*Tx>`, i.e a mapping from `sender` to a single `*Tx` object, where
  146. each `*Tx` is the next valid and processable transaction from the given `sender`.
  147. On top of this mapping, we index all transactions by priority using a thread-safe
  148. priority queue, i.e. a [max heap](https://en.wikipedia.org/wiki/Min-max_heap).
  149. When a proposer is ready to select transactions for the next block proposal,
  150. transactions are selected from this priority index by highest priority order.
  151. When a transaction is selected and reaped, it is removed from this index and
  152. from the `<sender:*Tx>` mapping.
  153. We define `Tx` as the following data structure:
  154. ```go
  155. type Tx struct {
  156. // Tx represents the raw binary transaction data.
  157. Tx []byte
  158. // Priority defines the transaction's priority as specified by the application
  159. // in the ResponseCheckTx response.
  160. Priority int64
  161. // Sender defines the transaction's sender as specified by the application in
  162. // the ResponseCheckTx response.
  163. Sender string
  164. // Index defines the current index in the priority queue index. Note, if
  165. // multiple Tx indexes are needed, this field will be removed and each Tx
  166. // index will have its own wrapped Tx type.
  167. Index int
  168. }
  169. ```
  170. ### Eviction
  171. Upon successfully executing `CheckTx` for a new `Tx` and the mempool is currently
  172. full, we must check if there exists a `Tx` of lower priority that can be evicted
  173. to make room for the new `Tx` with higher priority and with sufficient size
  174. capacity left.
  175. If such a `Tx` exists, we find it by obtaining a read lock and sorting the
  176. priority queue index. Once sorted, we find the first `Tx` with lower priority and
  177. size such that the new `Tx` would fit within the mempool's size limit. We then
  178. remove this `Tx` from the priority queue index as well as the `<sender:*Tx>`
  179. mapping.
  180. This will require additional `O(n)` space and `O(n*log(n))` runtime complexity. Note that the space complexity does not depend on the size of the tx.
  181. ### Gossiping
  182. We keep the existing thread-safe linked list as an additional index. Using this
  183. index, we can efficiently gossip transactions in the same manner as they are
  184. gossiped now (FIFO).
  185. Gossiping transactions will not require locking any other indexes.
  186. ### Performance
  187. Performance should largely remain unaffected apart from the space overhead of
  188. keeping an additional priority queue index and the case where we need to evict
  189. transactions from the priority queue index. There should be no reads which
  190. block writes on any index
  191. ## Future Improvements
  192. There are a few considerable ways in which the proposed design can be improved or
  193. expanded upon. Namely, transaction gossiping and for the ability to support
  194. multiple transactions from the same `sender`.
  195. With regards to transaction gossiping, we need empirically validate whether we
  196. need to gossip by priority. In addition, the current method of gossiping may not
  197. be the most efficient. Specifically, broadcasting all the transactions a node
  198. has in it's mempool to it's peers. Rather, we should explore for the ability to
  199. gossip transactions on a request/response basis similar to Ethereum and other
  200. protocols. Not only does this reduce bandwidth and complexity, but also allows
  201. for us to explore gossiping by priority or other dimensions more efficiently.
  202. Allowing for multiple transactions from the same `sender` is important and will
  203. most likely be a needed feature in the future development of the mempool, but for
  204. now it suffices to have the preliminary design agreed upon. Having the ability
  205. to support multiple transactions per `sender` will require careful thought with
  206. regards to the interplay of the corresponding ABCI application. Regardless, the
  207. proposed design should allow for adaptations to support this feature in a
  208. non-contentious and backwards compatible manner.
  209. ## Consequences
  210. ### Positive
  211. - Transactions are allowed to be prioritized by the application.
  212. ### Negative
  213. - Increased size of the `ResponseCheckTx` Protocol Buffer type.
  214. - Causal ordering is NOT maintained.
  215. - It is possible that certain transactions broadcasted in a particular order may
  216. pass `CheckTx` but not end up being committed in a block because they fail
  217. `CheckTx` later. e.g. Consider Tx<sub>1</sub> that sends funds from existing
  218. account Alice to a _new_ account Bob with priority P<sub>1</sub> and then later
  219. Bob's _new_ account sends funds back to Alice in Tx<sub>2</sub> with P<sub>2</sub>,
  220. such that P<sub>2</sub> > P<sub>1</sub>. If executed in this order, both
  221. transactions will pass `CheckTx`. However, when a proposer is ready to select
  222. transactions for the next block proposal, they will select Tx<sub>2</sub> before
  223. Tx<sub>1</sub> and thus Tx<sub>2</sub> will _fail_ because Tx<sub>1</sub> must
  224. be executed first. This is because there is a _causal ordering_,
  225. Tx<sub>1</sub> ➝ Tx<sub>2</sub>. These types of situations should be rare as
  226. most transactions are not causally ordered and can be circumvented by simply
  227. trying again at a later point in time or by ensuring the "child" priority is
  228. lower than the "parent" priority. In other words, if parents always have
  229. priories that are higher than their children, then the new mempool design will
  230. maintain causal ordering.
  231. ### Neutral
  232. - A transaction that passed `CheckTx` and entered the mempool can later be evicted
  233. at a future point in time if a higher priority transaction entered while the
  234. mempool was full.
  235. ## References
  236. - [ABCI++](https://github.com/tendermint/spec/blob/master/rfc/004-abci%2B%2B.md)
  237. - [Mempool Discussion](https://github.com/tendermint/tendermint/discussions/6295)