|
|
- # ADR 067: Mempool Refactor
-
- - [ADR 067: Mempool Refactor](#adr-067-mempool-refactor)
- - [Changelog](#changelog)
- - [Status](#status)
- - [Context](#context)
- - [Current Design](#current-design)
- - [Alternative Approaches](#alternative-approaches)
- - [Prior Art](#prior-art)
- - [Ethereum](#ethereum)
- - [Diem](#diem)
- - [Decision](#decision)
- - [Detailed Design](#detailed-design)
- - [CheckTx](#checktx)
- - [Mempool](#mempool)
- - [Eviction](#eviction)
- - [Gossiping](#gossiping)
- - [Performance](#performance)
- - [Future Improvements](#future-improvements)
- - [Consequences](#consequences)
- - [Positive](#positive)
- - [Negative](#negative)
- - [Neutral](#neutral)
- - [References](#references)
-
- ## Changelog
-
- - April 19, 2021: Initial Draft (@alexanderbez)
-
- ## Status
-
- Accepted
-
- ## Context
-
- Tendermint Core has a reactor and data structure, mempool, that facilitates the
- ephemeral storage of uncommitted transactions. Honest nodes participating in a
- Tendermint network gossip these uncommitted transactions to each other if they
- pass the application's `CheckTx`. In addition, block proposers select from the
- mempool a subset of uncommitted transactions to include in the next block.
-
- Currently, the mempool in Tendermint Core is designed as a FIFO queue. In other
- words, transactions are included in blocks as they are received by a node. There
- currently is no explicit and prioritized ordering of these uncommitted transactions.
- This presents a few technical and UX challenges for operators and applications.
-
- Namely, validators are not able to prioritize transactions by their fees or any
- incentive aligned mechanism. In addition, the lack of prioritization also leads
- to cascading effects in terms of DoS and various attack vectors on networks,
- e.g. [cosmos/cosmos-sdk#8224](https://github.com/cosmos/cosmos-sdk/discussions/8224).
-
- Thus, Tendermint Core needs the ability for an application and its users to
- prioritize transactions in a flexible and performant manner. Specifically, we're
- aiming to either improve, maintain or add the following properties in the
- Tendermint mempool:
-
- - Allow application-determined transaction priority.
- - Allow efficient concurrent reads and writes.
- - Allow block proposers to reap transactions efficiently by priority.
- - Maintain a fixed mempool capacity by transaction size and evict lower priority
- transactions to make room for higher priority transactions.
- - Allow transactions to be gossiped by priority efficiently.
- - Allow operators to specify a maximum TTL for transactions in the mempool before
- they're automatically evicted if not selected for a block proposal in time.
- - Ensure the design allows for future extensions, such as replace-by-priority and
- allowing multiple pending transactions per sender, to be incorporated easily.
-
- Note, not all of these properties will be addressed by the proposed changes in
- this ADR. However, this proposal will ensure that any unaddressed properties
- can be addressed in an easy and extensible manner in the future.
-
- ### Current Design
-
- ![mempool](./img/mempool-v0.jpeg)
-
- At the core of the `v0` mempool reactor is a concurrent linked-list. This is the
- primary data structure that contains `Tx` objects that have passed `CheckTx`.
- When a node receives a transaction from another peer, it executes `CheckTx`, which
- obtains a read-lock on the `*CListMempool`. If the transaction passes `CheckTx`
- locally on the node, it is added to the `*CList` by obtaining a write-lock. It
- is also added to the `cache` and `txsMap`, both of which obtain their own respective
- write-locks and map a reference from the transaction hash to the `Tx` itself.
-
- Transactions are continuously gossiped to peers whenever a new transaction is added
- to a local node's `*CList`, where the node at the front of the `*CList` is selected.
- Another transaction will not be gossiped until the `*CList` notifies the reader
- that there are more transactions to gossip.
-
- When a proposer attempts to propose a block, they will execute `ReapMaxBytesMaxGas`
- on the reactor's `*CListMempool`. This call obtains a read-lock on the `*CListMempool`
- and selects as many transactions as possible starting from the front of the `*CList`
- moving to the back of the list.
-
- When a block is finally committed, a caller invokes `Update` on the reactor's
- `*CListMempool` with all the selected transactions. Note, the caller must also
- explicitly obtain a write-lock on the reactor's `*CListMempool`. This call
- will remove all the supplied transactions from the `txsMap` and the `*CList`, both
- of which obtain their own respective write-locks. In addition, the transaction
- may also be removed from the `cache` which obtains it's own write-lock.
-
- ## Alternative Approaches
-
- When considering which approach to take for a priority-based flexible and
- performant mempool, there are two core candidates. The first candidate is less
- invasive in the required set of protocol and implementation changes, which
- simply extends the existing `CheckTx` ABCI method. The second candidate essentially
- involves the introduction of new ABCI method(s) and would require a higher degree
- of complexity in protocol and implementation changes, some of which may either
- overlap or conflict with the upcoming introduction of [ABCI++](https://github.com/tendermint/spec/blob/master/rfc/004-abci%2B%2B.md).
-
- For more information on the various approaches and proposals, please see the
- [mempool discussion](https://github.com/tendermint/tendermint/discussions/6295).
-
- ## Prior Art
-
- ### Ethereum
-
- The Ethereum mempool, specifically [Geth](https://github.com/ethereum/go-ethereum),
- contains a mempool, `*TxPool`, that contains various mappings indexed by account,
- such as a `pending` which contains all processable transactions for accounts
- prioritized by nonce. It also contains a `queue` which is the exact same mapping
- except it contains not currently processable transactions. The mempool also
- contains a `priced` index of type `*txPricedList` that is a priority queue based
- on transaction price.
-
- ### Diem
-
- The [Diem mempool](https://github.com/diem/diem/blob/master/mempool/README.md#implementation-details)
- contains a similar approach to the one we propose. Specifically, the Diem mempool
- contains a mapping from `Account:[]Tx`. On top of this primary mapping from account
- to a list of transactions, are various indexes used to perform certain actions.
-
- The main index, `PriorityIndex`. is an ordered queue of transactions that are
- “consensus-ready” (i.e., they have a sequence number which is sequential to the
- current sequence number for the account). This queue is ordered by gas price so
- that if a client is willing to pay more (than other clients) per unit of
- execution, then they can enter consensus earlier.
-
- ## Decision
-
- To incorporate a priority-based flexible and performant mempool in Tendermint Core,
- we will introduce new fields, `priority` and `sender`, into the `ResponseCheckTx`
- type.
-
- We will introduce a new versioned mempool reactor, `v1` and assume an implicit
- version of the current mempool reactor as `v0`. In the new `v1` mempool reactor,
- we largely keep the functionality the same as `v0` except we augment the underlying
- data structures. Specifically, we keep a mapping of senders to transaction objects.
- On top of this mapping, we index transactions to provide the ability to efficiently
- gossip and reap transactions by priority.
-
- ## Detailed Design
-
- ### CheckTx
-
- We introduce the following new fields into the `ResponseCheckTx` type:
-
- ```diff
- message ResponseCheckTx {
- uint32 code = 1;
- bytes data = 2;
- string log = 3; // nondeterministic
- string info = 4; // nondeterministic
- int64 gas_wanted = 5 [json_name = "gas_wanted"];
- int64 gas_used = 6 [json_name = "gas_used"];
- repeated Event events = 7 [(gogoproto.nullable) = false, (gogoproto.jsontag) = "events,omitempty"];
- string codespace = 8;
- + int64 priority = 9;
- + string sender = 10;
- }
- ```
-
- It is entirely up the application in determining how these fields are populated
- and with what values, e.g. the `sender` could be the signer and fee payer
- of the transaction, the `priority` could be the cumulative sum of the fee(s).
-
- Only `sender` is required, while `priority` can be omitted which would result in
- using the default value of zero.
-
- ### Mempool
-
- The existing concurrent-safe linked-list will be replaced by a thread-safe map
- of `<sender:*Tx>`, i.e a mapping from `sender` to a single `*Tx` object, where
- each `*Tx` is the next valid and processable transaction from the given `sender`.
-
- On top of this mapping, we index all transactions by priority using a thread-safe
- priority queue, i.e. a [max heap](https://en.wikipedia.org/wiki/Min-max_heap).
- When a proposer is ready to select transactions for the next block proposal,
- transactions are selected from this priority index by highest priority order.
- When a transaction is selected and reaped, it is removed from this index and
- from the `<sender:*Tx>` mapping.
-
- We define `Tx` as the following data structure:
-
- ```go
- type Tx struct {
- // Tx represents the raw binary transaction data.
- Tx []byte
-
- // Priority defines the transaction's priority as specified by the application
- // in the ResponseCheckTx response.
- Priority int64
-
- // Sender defines the transaction's sender as specified by the application in
- // the ResponseCheckTx response.
- Sender string
-
- // Index defines the current index in the priority queue index. Note, if
- // multiple Tx indexes are needed, this field will be removed and each Tx
- // index will have its own wrapped Tx type.
- Index int
- }
- ```
-
- ### Eviction
-
- Upon successfully executing `CheckTx` for a new `Tx` and the mempool is currently
- full, we must check if there exists a `Tx` of lower priority that can be evicted
- to make room for the new `Tx` with higher priority and with sufficient size
- capacity left.
-
- If such a `Tx` exists, we find it by obtaining a read lock and sorting the
- priority queue index. Once sorted, we find the first `Tx` with lower priority and
- size such that the new `Tx` would fit within the mempool's size limit. We then
- remove this `Tx` from the priority queue index as well as the `<sender:*Tx>`
- mapping.
-
- This will require additional `O(n)` space and `O(n*log(n))` runtime complexity. Note that the space complexity does not depend on the size of the tx.
-
- ### Gossiping
-
- We keep the existing thread-safe linked list as an additional index. Using this
- index, we can efficiently gossip transactions in the same manner as they are
- gossiped now (FIFO).
-
- Gossiping transactions will not require locking any other indexes.
-
- ### Performance
-
- Performance should largely remain unaffected apart from the space overhead of
- keeping an additional priority queue index and the case where we need to evict
- transactions from the priority queue index. There should be no reads which
- block writes on any index
-
- ## Future Improvements
-
- There are a few considerable ways in which the proposed design can be improved or
- expanded upon. Namely, transaction gossiping and for the ability to support
- multiple transactions from the same `sender`.
-
- With regards to transaction gossiping, we need empirically validate whether we
- need to gossip by priority. In addition, the current method of gossiping may not
- be the most efficient. Specifically, broadcasting all the transactions a node
- has in it's mempool to it's peers. Rather, we should explore for the ability to
- gossip transactions on a request/response basis similar to Ethereum and other
- protocols. Not only does this reduce bandwidth and complexity, but also allows
- for us to explore gossiping by priority or other dimensions more efficiently.
-
- Allowing for multiple transactions from the same `sender` is important and will
- most likely be a needed feature in the future development of the mempool, but for
- now it suffices to have the preliminary design agreed upon. Having the ability
- to support multiple transactions per `sender` will require careful thought with
- regards to the interplay of the corresponding ABCI application. Regardless, the
- proposed design should allow for adaptations to support this feature in a
- non-contentious and backwards compatible manner.
-
- ## Consequences
-
- ### Positive
-
- - Transactions are allowed to be prioritized by the application.
-
- ### Negative
-
- - Increased size of the `ResponseCheckTx` Protocol Buffer type.
- - Causal ordering is NOT maintained.
- - It is possible that certain transactions broadcasted in a particular order may
- pass `CheckTx` but not end up being committed in a block because they fail
- `CheckTx` later. e.g. Consider Tx<sub>1</sub> that sends funds from existing
- account Alice to a _new_ account Bob with priority P<sub>1</sub> and then later
- Bob's _new_ account sends funds back to Alice in Tx<sub>2</sub> with P<sub>2</sub>,
- such that P<sub>2</sub> > P<sub>1</sub>. If executed in this order, both
- transactions will pass `CheckTx`. However, when a proposer is ready to select
- transactions for the next block proposal, they will select Tx<sub>2</sub> before
- Tx<sub>1</sub> and thus Tx<sub>2</sub> will _fail_ because Tx<sub>1</sub> must
- be executed first. This is because there is a _causal ordering_,
- Tx<sub>1</sub> ➝ Tx<sub>2</sub>. These types of situations should be rare as
- most transactions are not causally ordered and can be circumvented by simply
- trying again at a later point in time or by ensuring the "child" priority is
- lower than the "parent" priority. In other words, if parents always have
- priories that are higher than their children, then the new mempool design will
- maintain causal ordering.
-
- ### Neutral
-
- - A transaction that passed `CheckTx` and entered the mempool can later be evicted
- at a future point in time if a higher priority transaction entered while the
- mempool was full.
-
- ## References
-
- - [ABCI++](https://github.com/tendermint/spec/blob/master/rfc/004-abci%2B%2B.md)
- - [Mempool Discussion](https://github.com/tendermint/tendermint/discussions/6295)
|