diff --git a/docs/rfc/README.md b/docs/rfc/README.md index fa8040032..8905c0160 100644 --- a/docs/rfc/README.md +++ b/docs/rfc/README.md @@ -49,6 +49,7 @@ sections. - [RFC-009: Consensus Parameter Upgrades](./rfc-009-consensus-parameter-upgrades.md) - [RFC-010: P2P Light Client](./rfc-010-p2p-light-client.rst) - [RFC-011: Delete Gas](./rfc-011-delete-gas.md) +- [RFC-012: Event Indexing Revisited](./rfc-012-custom-indexing.md) - [RFC-013: ABCI++](./rfc-013-abci++.md) - [RFC-014: Semantic Versioning](./rfc-014-semantic-versioning.md) diff --git a/docs/rfc/rfc-012-custom-indexing.md b/docs/rfc/rfc-012-custom-indexing.md new file mode 100644 index 000000000..9dc9bdbd6 --- /dev/null +++ b/docs/rfc/rfc-012-custom-indexing.md @@ -0,0 +1,352 @@ +# RFC 012: Event Indexing Revisited + +## Changelog + +- 11-Feb-2022: Add terminological notes. +- 10-Feb-2022: Updated from review feedback. +- 07-Feb-2022: Initial draft (@creachadair) + +## Abstract + +A Tendermint node allows ABCI events associated with block and transaction +processing to be "indexed" into persistent storage. The original Tendermint +implementation provided a fixed, built-in [proprietary indexer][kv-index] for +such events. + +In response to user requests to customize indexing, [ADR 065][adr065] +introduced an "event sink" interface that allows developers (at least in +theory) to plug in alternative index storage. + +Although ADR-065 was a good first step toward customization, its implementation +model does not satisfy all the user requirements. Moreover, this approach +leaves some existing technical issues with indexing unsolved. + +This RFC documents these concerns, and discusses some potential approaches to +solving them. This RFC does _not_ propose a specific technical decision. It is +meant to unify and focus some of the disparate discussions of the topic. + + +## Background + +We begin with some important terminological context. The term "event" in +Tendermint can be confusing, as the same word is used for multiple related but +distinct concepts: + +1. **ABCI Events** refer to the key-value metadata attached to blocks and + transactions by the application. These values are represented by the ABCI + `Event` protobuf message type. + +2. **Consensus Events** refer to the data published by the Tendermint node to + its pubsub bus in response to various consensus state transitions and other + important activities, such as round updates, votes, transaction delivery, + and block completion. + +This confusion is compounded because some "consensus event" values also have +"ABCI event" metadata attached to them. Notably, block and transaction items +typically have ABCI metadata assigned by the application. + +Indexers and RPC clients subscribed to the pubsub bus receive **consensus +events**, but they identify which ones to care about using query expressions +that match against the **ABCI events** associated with them. + +In the discussion that follows, we will use the term **event item** to refer to +a datum published to or received from the pubsub bus, and **ABCI event** or +**event metadata** to refer to the key/value annotations. + +**Indexing** in this context means recording the association between certain +ABCI metadata and the blocks or transactions they're attached to. The ABCI +metadata typically carry application-specific details like sender and recipient +addresses, catgory tags, and so forth, that are not part of consensus but are +used by UI tools to find and display transactions of interest. + +The consensus node records the blocks and transactions as part of its block +store, but does not persist the application metadata. Metadata persistence is +the task of the indexer, which can be (optionally) enabled by the node +operator. + +### History + +The [original indexer][kv-index] built in to Tendermint stored index data in an +embedded [`tm-db` database][tmdb] with a proprietary key layout. +In [ADR 065][adr065], we noted that this implementation has both performance +and scaling problems under load. Moreover, the only practical way to query the +index data is via the [query filter language][query] used for event +subscription. [Issue #1161][i1161] appears to be a motivational context for that ADR. + +To mitigate both of these concerns, we introduced the [`EventSink`][esink] +interface, combining the original transaction and block indexer interfaces +along with some service plumbing. Using this interface, a developer can plug +in an indexer that uses a more efficient storage engine, and provides a more +expressive query language. As a proof-of-concept, we built a [PostgreSQL event +sink][psql] that exports data to a [PostgreSQL database][postgres]. + +Although this approach addressed some of the immediate concerns, there are +several issues for custom indexing that have not been fully addressed. Here we +will discuss them in more detail. + +For further context, including links to user reports and related work, see also +the [Pluggable custom event indexing tracking issue][i7135] issue. + +### Issue 1: Tight Coupling + +The `EventSink` interface supports multiple implementations, but plugging in +implementations still requires tight integration with the node. In particular: + +- Any custom indexer must either be written in Go and compiled in to the + Tendermint binary, or the developer must write a Go shim to communicate with + the implementation and build that into the Tendermint binary. + +- This means to support a custom indexer, it either has to be integrated into + the Tendermint core repository, or every installation that uses that indexer + must fetch or build a patched version of Tendermint. + +The problem with integrating indexers into Tendermint Core is that every user +of Tendermint Core takes a dependency on all supported indexers, including +those they never use. Even if the unused code is disabled with build tags, +users have to remember to do this or potentially be exposed to security issues +that may arise in any of the custom indexers. This is a risk for Tendermint, +which is a trust-critical component of all applications built on it. + +The problem with _not_ integrating indexers into Tendermint Core is that any +developer who wants to use a particular indexer must now fetch or build a +patched version of the core code that includes the custom indexer. Besides +being inconvenient, this makes it harder for users to upgrade their node, since +they need to either re-apply their patches directly or wait for an intermediary +to do it for them. + +Even for developers who have written their applications in Go and link with the +consensus node directly (e.g., using the [Cosmos SDK][sdk]), these issues add a +potentially significant complication to the build process. + +### Issue 2: Legacy Compatibility + +The `EventSink` interface retains several limitations of the original +proprietary indexer. These include: + +- The indexer has no control over which event items are reported. Only the + exact block and transaction events that were reported to the original indexer + are reported to a custom indexer. + +- The interface requires the implementation to define methods for the legacy + search and query API. This requirement comes from the integation with the + [event subscription RPC API][event-rpc], but actually supporting these + methods is not trivial. + +At present, only the original KV indexer implements the query methods. Even the +proof-of-concept PostgreSQL implementation simply reports errors for all calls +to these methods. + +Even for a plugin written in Go, implementing these methods "correctly" would +require parsing and translating the custom query language over whatever storage +platform the indexer uses. + +For a plugin _not_ written in Go, even beyond the cost of integration the +developer would have to re-implement the entire query language. + +### Issue 3: Indexing Delays Consensus + +Within the node, indexing hooks in to the same internal pubsub dispatcher that +is used to export event items to the [event subscription RPC API][event-rpc]. +In contrast with RPC subscribers, however, indexing is a "privileged" +subscriber: If an RPC subscriber is "too slow", the node may terminate the +subscription and disconnect the client. That means that RPC subscribers may +lose (miss) event items. The indexer, however, is "unbuffered", and the +publisher will never drop or disconnect from it. If the indexer is slow, the +publisher will block until it returns, to ensure that no event items are lost. + +In practice, this means that the performance of the indexer has a direct effect +on the performance of the consensus node: If the indexer is slow or stalls, it +will slow or halt the progress of consensus. Users have already reported this +problem even with the built-in indexer (see, for example, [#7247][i7247]). +Extending this concern to arbitrary user-defined custom indexers gives that +risk a much larger surface area. + + +## Discussion + +It is not possible to simultaneously guarantee that publishing event items will +not delay consensus, and also that all event items of interest are always +completely indexed. + +Therefore, our choice is between eliminating delay (and minimizing loss) or +eliminating loss (and minimizing delay). Currently, we take the second +approach, which has led to user complaints about consensus delays due to +indexing and subscription overhead. + +- If we agree that consensus performance supersedes index completeness, our + design choices are to constrain the likelihood and frequency of missing event + items. + +- If we decide that consensus performance is more important than index + completeness, our option is to minimize overhead on the event delivery path + and document that indexer plugins constrain the rate of consensus. + +Since we have user reports requesting both properties, we have to choose one or +the other. Since the primary job of the consensus engine is to correctly, +robustly, reliablly, and efficiently replicate application state across the +network, I believe the correct choice is to favor consensus performance. + +An important consideration for this decision is that a node does not index +application metadata separately: If indexing is disabled, there is no built-in +mechanism to go back and replay or reconstruct the data that an indexer would +have stored. The node _does_ store the blockchain itself (i.e., the blocks and +their transactions), so potentially some use cases currently handled by the +indexer could be handled by the node. For example, allowing clients to ask +whether a given transaction ID has been committed to a block could in principle +be done without an indexer, since it does not depend on application metadata. + +Inevitably, a question will arise whether we could implement both strategies +and toggle between them with a flag. That would be a worst-case scenario, +requiring us to maintain the complexity of two very-different operational +concerns. If our goal is that Tendermint should be as simple, efficient, and +trustworthy as posible, there is not a strong case for making these options +configurable: We should pick a side and commit to it. + +### Design Principles + +Although there is no unique "best" solution to the issues described above, +there are some specific principles that a solution should include: + +1. **A custom indexer should not require integration into Tendermint core.** A + developer or node operator can create, build, deploy, and use a custom + indexer with a stock build of the Tendermint consensus node. + +2. **Custom indexers cannot stall consensus.** An indexer that is slow or + stalls cannot slow down or prevent core consensus from making progress. + + The plugin interface must give node operators control over the tolerances + for acceptable indexer performance, and the means to detect when indexers + are falling outside those tolerances, but indexer failures should "fail + safe" with respect to consensus (even if that means the indexer may miss + some data, in sufficiently-extreme circumstances). + +3. **Custom indexers control which event items they index.** A custom indexer + is not limited to only the current transaction and block events, but can + observe any event item published by the node. + +4. **Custom indexing is forward-compatible.** Adding new event item types or + metadata to the consensus node should not require existing custom indexers + to be rebuilt or modified, unless they want to take advantage of the new + data. + +5. **Indexers are responsible for answering queries.** An indexer plugin is not + required to support the legacy query filter language, nor to be compatible + with the legacy RPC endpoints for accessing them. Any APIs for clients to + query a custom index are the responsibility of the indexer, not the node. + +### Open Questions + +Given the constraints outlined above, there are important design questions we +must answer to guide any specific changes: + +1. **What is an acceptable probability that, given sufficiently extreme + operational issues, an indexer might miss some number of events?** + + There are two parts to this question: One is what constitutes an extreme + operational problem, the other is how likely we are to miss some number of + events items. + + - If the consensus is that no event item must ever be missed, no matter how + bad the operational circumstances, then we _must_ accept that indexing can + slow or halt consensus arbitrarily. It is impossible to guarantee complete + index coverage without potentially unbounded delays. + + - Otherwise, how much data can we afford to lose and how often? For example, + if we can ensure no event item will be lost unless the indexer halts for + at least five minutes, is that acceptable? What probabilities and time + ranges are reasonable for real production environments? + +2. **What level of operational overhead is acceptable to impose on node + operators to support indexing?** + + Are node operators willing to configure and run custom indexers as sidecar + type processes alongside a node? How much indexer setup above and beyond the + work of setting up the underlying node in isolation is tractable in + production networks? + + The answer to this question also informs the question of whether we should + keep an "in-process" indexing option, and to what extent that option needs + to satisfy the suggested design principles. + + Relatedly, to what extent do we need to be concerned about the cost of + encoding and sending event items to an external process (e.g., as JSON blobs + or protobuf wire messages)? Given that the node already encodes event items + as JSON for subscription purposes, the overhead would be negligible for the + node itself, but the indexer would have to decode to process the results. + +3. **What (if any) query APIs does the consensus node need to export, + independent of the indexer implementation?** + + One typical example is whether the node should be able to answer queries + like "is this transaction ID in a block?" Currently, a node cannot answer + this query _unless_ it runs the built-in KV indexer. Does the node need to + continue to support that query even for nodes that disable the KV indexer, + or which use a custom indexer? + +### Informal Design Intent + +The design principles described above implicate several components of the +Tendermint node, beyond just the indexer. In the context of [ADR 075][adr075], +we are re-working the RPC event subscription API to improve some of the UX +issues discussed above for RPC clients. It is our expectation that a solution +for pluggable custom indexing will take advantage of some of the same work. + +On that basis, the design approach I am considering for custom indexing looks +something like this (subject to refinement): + +1. A custom indexer runs as a separate process from the node. + +2. The indexer subscribes to event items via the ADR 075 events API. + + This means indexers would receive event payloads as JSON rather than + protobuf, but since we already have to support JSON encoding for the RPC + interface anyway, that should not increase complexity for the node. + +3. The existing PostgreSQL indexer gets reworked to have this form, and no + longer built as part of the Tendermint core binary. + + We can retain the code in the core repository as a proof-of-concept, or + perhaps create a separate repository with contributed indexers and move it + there. + +4. (Possibly) Deprecate and remove the legacy KV indexer, or disable it by + default. If we decide to remove it, we can also remove the legacy RPC + endpoints for querying the KV indexer. + + If we plan to do this, we should also investigate providing a way for + clients to query whether a given transaction ID has landed in a block. That + serves a common need, and currently _only_ works if the KV indexer is + enabled, but could be addressed more simply using the other data a node + already has stored, without having to answer more general queries. + + +## References + +- [ADR 065: Custom Event Indexing][adr065] +- [ADR 075: RPC Event Subscription Interface][adr075] +- [Cosmos SDK][sdk] +- [Event subscription RPC][event-rpc] +- [KV transaction indexer][kv-index] +- [Pluggable custom event indexing][i7135] (#7135) +- [PostgreSQL event sink][psql] + - [PostgreSQL database][postgres] +- [Query filter language][query] +- [Stream events to postgres for indexing][i1161] (#1161) +- [Unbuffered event subscription slow down the consensus][i7247] (#7247) +- [`EventSink` interface][esink] +- [`tm-db` library][tmdb] + +[adr065]: https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-065-custom-event-indexing.md +[adr075]: https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-075-rpc-subscription.md +[esink]: https://pkg.go.dev/github.com/tendermint/tendermint/internal/state/indexer#EventSink +[event-rpc]: https://docs.tendermint.com/master/rpc/#/Websocket/subscribe +[i1161]: https://github.com/tendermint/tendermint/issues/1161 +[i7135]: https://github.com/tendermint/tendermint/issues/7135 +[i7247]: https://github.com/tendermint/tendermint/issues/7247 +[kv-index]: https://github.com/tendermint/tendermint/blob/master/internal/state/indexer/tx/kv +[postgres]: https://postgresql.org/ +[psql]: https://github.com/tendermint/tendermint/blob/master/internal/state/indexer/sink/psql +[psql]: https://github.com/tendermint/tendermint/blob/master/internal/state/indexer/sink/psql +[query]: https://pkg.go.dev/github.com/tendermint/tendermint/internal/pubsub/query/syntax +[sdk]: https://github.com/cosmos/cosmos-sdk +[tmdb]: https://pkg.go.dev/github.com/tendermint/tm-db#DB