|
|
- # RFC 012: Event Indexing Revisited
-
- ## Changelog
-
- - 11-Feb-2022: Add terminological notes.
- - 10-Feb-2022: Updated from review feedback.
- - 07-Feb-2022: Initial draft (@creachadair)
-
- ## Abstract
-
- A Tendermint node allows ABCI events associated with block and transaction
- processing to be "indexed" into persistent storage. The original Tendermint
- implementation provided a fixed, built-in [proprietary indexer][kv-index] for
- such events.
-
- In response to user requests to customize indexing, [ADR 065][adr065]
- introduced an "event sink" interface that allows developers (at least in
- theory) to plug in alternative index storage.
-
- Although ADR-065 was a good first step toward customization, its implementation
- model does not satisfy all the user requirements. Moreover, this approach
- leaves some existing technical issues with indexing unsolved.
-
- This RFC documents these concerns, and discusses some potential approaches to
- solving them. This RFC does _not_ propose a specific technical decision. It is
- meant to unify and focus some of the disparate discussions of the topic.
-
-
- ## Background
-
- We begin with some important terminological context. The term "event" in
- Tendermint can be confusing, as the same word is used for multiple related but
- distinct concepts:
-
- 1. **ABCI Events** refer to the key-value metadata attached to blocks and
- transactions by the application. These values are represented by the ABCI
- `Event` protobuf message type.
-
- 2. **Consensus Events** refer to the data published by the Tendermint node to
- its pubsub bus in response to various consensus state transitions and other
- important activities, such as round updates, votes, transaction delivery,
- and block completion.
-
- This confusion is compounded because some "consensus event" values also have
- "ABCI event" metadata attached to them. Notably, block and transaction items
- typically have ABCI metadata assigned by the application.
-
- Indexers and RPC clients subscribed to the pubsub bus receive **consensus
- events**, but they identify which ones to care about using query expressions
- that match against the **ABCI events** associated with them.
-
- In the discussion that follows, we will use the term **event item** to refer to
- a datum published to or received from the pubsub bus, and **ABCI event** or
- **event metadata** to refer to the key/value annotations.
-
- **Indexing** in this context means recording the association between certain
- ABCI metadata and the blocks or transactions they're attached to. The ABCI
- metadata typically carry application-specific details like sender and recipient
- addresses, catgory tags, and so forth, that are not part of consensus but are
- used by UI tools to find and display transactions of interest.
-
- The consensus node records the blocks and transactions as part of its block
- store, but does not persist the application metadata. Metadata persistence is
- the task of the indexer, which can be (optionally) enabled by the node
- operator.
-
- ### History
-
- The [original indexer][kv-index] built in to Tendermint stored index data in an
- embedded [`tm-db` database][tmdb] with a proprietary key layout.
- In [ADR 065][adr065], we noted that this implementation has both performance
- and scaling problems under load. Moreover, the only practical way to query the
- index data is via the [query filter language][query] used for event
- subscription. [Issue #1161][i1161] appears to be a motivational context for that ADR.
-
- To mitigate both of these concerns, we introduced the [`EventSink`][esink]
- interface, combining the original transaction and block indexer interfaces
- along with some service plumbing. Using this interface, a developer can plug
- in an indexer that uses a more efficient storage engine, and provides a more
- expressive query language. As a proof-of-concept, we built a [PostgreSQL event
- sink][psql] that exports data to a [PostgreSQL database][postgres].
-
- Although this approach addressed some of the immediate concerns, there are
- several issues for custom indexing that have not been fully addressed. Here we
- will discuss them in more detail.
-
- For further context, including links to user reports and related work, see also
- the [Pluggable custom event indexing tracking issue][i7135] issue.
-
- ### Issue 1: Tight Coupling
-
- The `EventSink` interface supports multiple implementations, but plugging in
- implementations still requires tight integration with the node. In particular:
-
- - Any custom indexer must either be written in Go and compiled in to the
- Tendermint binary, or the developer must write a Go shim to communicate with
- the implementation and build that into the Tendermint binary.
-
- - This means to support a custom indexer, it either has to be integrated into
- the Tendermint core repository, or every installation that uses that indexer
- must fetch or build a patched version of Tendermint.
-
- The problem with integrating indexers into Tendermint Core is that every user
- of Tendermint Core takes a dependency on all supported indexers, including
- those they never use. Even if the unused code is disabled with build tags,
- users have to remember to do this or potentially be exposed to security issues
- that may arise in any of the custom indexers. This is a risk for Tendermint,
- which is a trust-critical component of all applications built on it.
-
- The problem with _not_ integrating indexers into Tendermint Core is that any
- developer who wants to use a particular indexer must now fetch or build a
- patched version of the core code that includes the custom indexer. Besides
- being inconvenient, this makes it harder for users to upgrade their node, since
- they need to either re-apply their patches directly or wait for an intermediary
- to do it for them.
-
- Even for developers who have written their applications in Go and link with the
- consensus node directly (e.g., using the [Cosmos SDK][sdk]), these issues add a
- potentially significant complication to the build process.
-
- ### Issue 2: Legacy Compatibility
-
- The `EventSink` interface retains several limitations of the original
- proprietary indexer. These include:
-
- - The indexer has no control over which event items are reported. Only the
- exact block and transaction events that were reported to the original indexer
- are reported to a custom indexer.
-
- - The interface requires the implementation to define methods for the legacy
- search and query API. This requirement comes from the integation with the
- [event subscription RPC API][event-rpc], but actually supporting these
- methods is not trivial.
-
- At present, only the original KV indexer implements the query methods. Even the
- proof-of-concept PostgreSQL implementation simply reports errors for all calls
- to these methods.
-
- Even for a plugin written in Go, implementing these methods "correctly" would
- require parsing and translating the custom query language over whatever storage
- platform the indexer uses.
-
- For a plugin _not_ written in Go, even beyond the cost of integration the
- developer would have to re-implement the entire query language.
-
- ### Issue 3: Indexing Delays Consensus
-
- Within the node, indexing hooks in to the same internal pubsub dispatcher that
- is used to export event items to the [event subscription RPC API][event-rpc].
- In contrast with RPC subscribers, however, indexing is a "privileged"
- subscriber: If an RPC subscriber is "too slow", the node may terminate the
- subscription and disconnect the client. That means that RPC subscribers may
- lose (miss) event items. The indexer, however, is "unbuffered", and the
- publisher will never drop or disconnect from it. If the indexer is slow, the
- publisher will block until it returns, to ensure that no event items are lost.
-
- In practice, this means that the performance of the indexer has a direct effect
- on the performance of the consensus node: If the indexer is slow or stalls, it
- will slow or halt the progress of consensus. Users have already reported this
- problem even with the built-in indexer (see, for example, [#7247][i7247]).
- Extending this concern to arbitrary user-defined custom indexers gives that
- risk a much larger surface area.
-
-
- ## Discussion
-
- It is not possible to simultaneously guarantee that publishing event items will
- not delay consensus, and also that all event items of interest are always
- completely indexed.
-
- Therefore, our choice is between eliminating delay (and minimizing loss) or
- eliminating loss (and minimizing delay). Currently, we take the second
- approach, which has led to user complaints about consensus delays due to
- indexing and subscription overhead.
-
- - If we agree that consensus performance supersedes index completeness, our
- design choices are to constrain the likelihood and frequency of missing event
- items.
-
- - If we decide that consensus performance is more important than index
- completeness, our option is to minimize overhead on the event delivery path
- and document that indexer plugins constrain the rate of consensus.
-
- Since we have user reports requesting both properties, we have to choose one or
- the other. Since the primary job of the consensus engine is to correctly,
- robustly, reliablly, and efficiently replicate application state across the
- network, I believe the correct choice is to favor consensus performance.
-
- An important consideration for this decision is that a node does not index
- application metadata separately: If indexing is disabled, there is no built-in
- mechanism to go back and replay or reconstruct the data that an indexer would
- have stored. The node _does_ store the blockchain itself (i.e., the blocks and
- their transactions), so potentially some use cases currently handled by the
- indexer could be handled by the node. For example, allowing clients to ask
- whether a given transaction ID has been committed to a block could in principle
- be done without an indexer, since it does not depend on application metadata.
-
- Inevitably, a question will arise whether we could implement both strategies
- and toggle between them with a flag. That would be a worst-case scenario,
- requiring us to maintain the complexity of two very-different operational
- concerns. If our goal is that Tendermint should be as simple, efficient, and
- trustworthy as posible, there is not a strong case for making these options
- configurable: We should pick a side and commit to it.
-
- ### Design Principles
-
- Although there is no unique "best" solution to the issues described above,
- there are some specific principles that a solution should include:
-
- 1. **A custom indexer should not require integration into Tendermint core.** A
- developer or node operator can create, build, deploy, and use a custom
- indexer with a stock build of the Tendermint consensus node.
-
- 2. **Custom indexers cannot stall consensus.** An indexer that is slow or
- stalls cannot slow down or prevent core consensus from making progress.
-
- The plugin interface must give node operators control over the tolerances
- for acceptable indexer performance, and the means to detect when indexers
- are falling outside those tolerances, but indexer failures should "fail
- safe" with respect to consensus (even if that means the indexer may miss
- some data, in sufficiently-extreme circumstances).
-
- 3. **Custom indexers control which event items they index.** A custom indexer
- is not limited to only the current transaction and block events, but can
- observe any event item published by the node.
-
- 4. **Custom indexing is forward-compatible.** Adding new event item types or
- metadata to the consensus node should not require existing custom indexers
- to be rebuilt or modified, unless they want to take advantage of the new
- data.
-
- 5. **Indexers are responsible for answering queries.** An indexer plugin is not
- required to support the legacy query filter language, nor to be compatible
- with the legacy RPC endpoints for accessing them. Any APIs for clients to
- query a custom index are the responsibility of the indexer, not the node.
-
- ### Open Questions
-
- Given the constraints outlined above, there are important design questions we
- must answer to guide any specific changes:
-
- 1. **What is an acceptable probability that, given sufficiently extreme
- operational issues, an indexer might miss some number of events?**
-
- There are two parts to this question: One is what constitutes an extreme
- operational problem, the other is how likely we are to miss some number of
- events items.
-
- - If the consensus is that no event item must ever be missed, no matter how
- bad the operational circumstances, then we _must_ accept that indexing can
- slow or halt consensus arbitrarily. It is impossible to guarantee complete
- index coverage without potentially unbounded delays.
-
- - Otherwise, how much data can we afford to lose and how often? For example,
- if we can ensure no event item will be lost unless the indexer halts for
- at least five minutes, is that acceptable? What probabilities and time
- ranges are reasonable for real production environments?
-
- 2. **What level of operational overhead is acceptable to impose on node
- operators to support indexing?**
-
- Are node operators willing to configure and run custom indexers as sidecar
- type processes alongside a node? How much indexer setup above and beyond the
- work of setting up the underlying node in isolation is tractable in
- production networks?
-
- The answer to this question also informs the question of whether we should
- keep an "in-process" indexing option, and to what extent that option needs
- to satisfy the suggested design principles.
-
- Relatedly, to what extent do we need to be concerned about the cost of
- encoding and sending event items to an external process (e.g., as JSON blobs
- or protobuf wire messages)? Given that the node already encodes event items
- as JSON for subscription purposes, the overhead would be negligible for the
- node itself, but the indexer would have to decode to process the results.
-
- 3. **What (if any) query APIs does the consensus node need to export,
- independent of the indexer implementation?**
-
- One typical example is whether the node should be able to answer queries
- like "is this transaction ID in a block?" Currently, a node cannot answer
- this query _unless_ it runs the built-in KV indexer. Does the node need to
- continue to support that query even for nodes that disable the KV indexer,
- or which use a custom indexer?
-
- ### Informal Design Intent
-
- The design principles described above implicate several components of the
- Tendermint node, beyond just the indexer. In the context of [ADR 075][adr075],
- we are re-working the RPC event subscription API to improve some of the UX
- issues discussed above for RPC clients. It is our expectation that a solution
- for pluggable custom indexing will take advantage of some of the same work.
-
- On that basis, the design approach I am considering for custom indexing looks
- something like this (subject to refinement):
-
- 1. A custom indexer runs as a separate process from the node.
-
- 2. The indexer subscribes to event items via the ADR 075 events API.
-
- This means indexers would receive event payloads as JSON rather than
- protobuf, but since we already have to support JSON encoding for the RPC
- interface anyway, that should not increase complexity for the node.
-
- 3. The existing PostgreSQL indexer gets reworked to have this form, and no
- longer built as part of the Tendermint core binary.
-
- We can retain the code in the core repository as a proof-of-concept, or
- perhaps create a separate repository with contributed indexers and move it
- there.
-
- 4. (Possibly) Deprecate and remove the legacy KV indexer, or disable it by
- default. If we decide to remove it, we can also remove the legacy RPC
- endpoints for querying the KV indexer.
-
- If we plan to do this, we should also investigate providing a way for
- clients to query whether a given transaction ID has landed in a block. That
- serves a common need, and currently _only_ works if the KV indexer is
- enabled, but could be addressed more simply using the other data a node
- already has stored, without having to answer more general queries.
-
-
- ## References
-
- - [ADR 065: Custom Event Indexing][adr065]
- - [ADR 075: RPC Event Subscription Interface][adr075]
- - [Cosmos SDK][sdk]
- - [Event subscription RPC][event-rpc]
- - [KV transaction indexer][kv-index]
- - [Pluggable custom event indexing][i7135] (#7135)
- - [PostgreSQL event sink][psql]
- - [PostgreSQL database][postgres]
- - [Query filter language][query]
- - [Stream events to postgres for indexing][i1161] (#1161)
- - [Unbuffered event subscription slow down the consensus][i7247] (#7247)
- - [`EventSink` interface][esink]
- - [`tm-db` library][tmdb]
-
- [adr065]: https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-065-custom-event-indexing.md
- [adr075]: https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-075-rpc-subscription.md
- [esink]: https://pkg.go.dev/github.com/tendermint/tendermint/internal/state/indexer#EventSink
- [event-rpc]: https://docs.tendermint.com/master/rpc/#/Websocket/subscribe
- [i1161]: https://github.com/tendermint/tendermint/issues/1161
- [i7135]: https://github.com/tendermint/tendermint/issues/7135
- [i7247]: https://github.com/tendermint/tendermint/issues/7247
- [kv-index]: https://github.com/tendermint/tendermint/blob/master/internal/state/indexer/tx/kv
- [postgres]: https://postgresql.org/
- [psql]: https://github.com/tendermint/tendermint/blob/master/internal/state/indexer/sink/psql
- [psql]: https://github.com/tendermint/tendermint/blob/master/internal/state/indexer/sink/psql
- [query]: https://pkg.go.dev/github.com/tendermint/tendermint/internal/pubsub/query/syntax
- [sdk]: https://github.com/cosmos/cosmos-sdk
- [tmdb]: https://pkg.go.dev/github.com/tendermint/tm-db#DB
|