You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

352 lines
17 KiB

  1. # RFC 012: Event Indexing Revisited
  2. ## Changelog
  3. - 11-Feb-2022: Add terminological notes.
  4. - 10-Feb-2022: Updated from review feedback.
  5. - 07-Feb-2022: Initial draft (@creachadair)
  6. ## Abstract
  7. A Tendermint node allows ABCI events associated with block and transaction
  8. processing to be "indexed" into persistent storage. The original Tendermint
  9. implementation provided a fixed, built-in [proprietary indexer][kv-index] for
  10. such events.
  11. In response to user requests to customize indexing, [ADR 065][adr065]
  12. introduced an "event sink" interface that allows developers (at least in
  13. theory) to plug in alternative index storage.
  14. Although ADR-065 was a good first step toward customization, its implementation
  15. model does not satisfy all the user requirements. Moreover, this approach
  16. leaves some existing technical issues with indexing unsolved.
  17. This RFC documents these concerns, and discusses some potential approaches to
  18. solving them. This RFC does _not_ propose a specific technical decision. It is
  19. meant to unify and focus some of the disparate discussions of the topic.
  20. ## Background
  21. We begin with some important terminological context. The term "event" in
  22. Tendermint can be confusing, as the same word is used for multiple related but
  23. distinct concepts:
  24. 1. **ABCI Events** refer to the key-value metadata attached to blocks and
  25. transactions by the application. These values are represented by the ABCI
  26. `Event` protobuf message type.
  27. 2. **Consensus Events** refer to the data published by the Tendermint node to
  28. its pubsub bus in response to various consensus state transitions and other
  29. important activities, such as round updates, votes, transaction delivery,
  30. and block completion.
  31. This confusion is compounded because some "consensus event" values also have
  32. "ABCI event" metadata attached to them. Notably, block and transaction items
  33. typically have ABCI metadata assigned by the application.
  34. Indexers and RPC clients subscribed to the pubsub bus receive **consensus
  35. events**, but they identify which ones to care about using query expressions
  36. that match against the **ABCI events** associated with them.
  37. In the discussion that follows, we will use the term **event item** to refer to
  38. a datum published to or received from the pubsub bus, and **ABCI event** or
  39. **event metadata** to refer to the key/value annotations.
  40. **Indexing** in this context means recording the association between certain
  41. ABCI metadata and the blocks or transactions they're attached to. The ABCI
  42. metadata typically carry application-specific details like sender and recipient
  43. addresses, catgory tags, and so forth, that are not part of consensus but are
  44. used by UI tools to find and display transactions of interest.
  45. The consensus node records the blocks and transactions as part of its block
  46. store, but does not persist the application metadata. Metadata persistence is
  47. the task of the indexer, which can be (optionally) enabled by the node
  48. operator.
  49. ### History
  50. The [original indexer][kv-index] built in to Tendermint stored index data in an
  51. embedded [`tm-db` database][tmdb] with a proprietary key layout.
  52. In [ADR 065][adr065], we noted that this implementation has both performance
  53. and scaling problems under load. Moreover, the only practical way to query the
  54. index data is via the [query filter language][query] used for event
  55. subscription. [Issue #1161][i1161] appears to be a motivational context for that ADR.
  56. To mitigate both of these concerns, we introduced the [`EventSink`][esink]
  57. interface, combining the original transaction and block indexer interfaces
  58. along with some service plumbing. Using this interface, a developer can plug
  59. in an indexer that uses a more efficient storage engine, and provides a more
  60. expressive query language. As a proof-of-concept, we built a [PostgreSQL event
  61. sink][psql] that exports data to a [PostgreSQL database][postgres].
  62. Although this approach addressed some of the immediate concerns, there are
  63. several issues for custom indexing that have not been fully addressed. Here we
  64. will discuss them in more detail.
  65. For further context, including links to user reports and related work, see also
  66. the [Pluggable custom event indexing tracking issue][i7135] issue.
  67. ### Issue 1: Tight Coupling
  68. The `EventSink` interface supports multiple implementations, but plugging in
  69. implementations still requires tight integration with the node. In particular:
  70. - Any custom indexer must either be written in Go and compiled in to the
  71. Tendermint binary, or the developer must write a Go shim to communicate with
  72. the implementation and build that into the Tendermint binary.
  73. - This means to support a custom indexer, it either has to be integrated into
  74. the Tendermint core repository, or every installation that uses that indexer
  75. must fetch or build a patched version of Tendermint.
  76. The problem with integrating indexers into Tendermint Core is that every user
  77. of Tendermint Core takes a dependency on all supported indexers, including
  78. those they never use. Even if the unused code is disabled with build tags,
  79. users have to remember to do this or potentially be exposed to security issues
  80. that may arise in any of the custom indexers. This is a risk for Tendermint,
  81. which is a trust-critical component of all applications built on it.
  82. The problem with _not_ integrating indexers into Tendermint Core is that any
  83. developer who wants to use a particular indexer must now fetch or build a
  84. patched version of the core code that includes the custom indexer. Besides
  85. being inconvenient, this makes it harder for users to upgrade their node, since
  86. they need to either re-apply their patches directly or wait for an intermediary
  87. to do it for them.
  88. Even for developers who have written their applications in Go and link with the
  89. consensus node directly (e.g., using the [Cosmos SDK][sdk]), these issues add a
  90. potentially significant complication to the build process.
  91. ### Issue 2: Legacy Compatibility
  92. The `EventSink` interface retains several limitations of the original
  93. proprietary indexer. These include:
  94. - The indexer has no control over which event items are reported. Only the
  95. exact block and transaction events that were reported to the original indexer
  96. are reported to a custom indexer.
  97. - The interface requires the implementation to define methods for the legacy
  98. search and query API. This requirement comes from the integation with the
  99. [event subscription RPC API][event-rpc], but actually supporting these
  100. methods is not trivial.
  101. At present, only the original KV indexer implements the query methods. Even the
  102. proof-of-concept PostgreSQL implementation simply reports errors for all calls
  103. to these methods.
  104. Even for a plugin written in Go, implementing these methods "correctly" would
  105. require parsing and translating the custom query language over whatever storage
  106. platform the indexer uses.
  107. For a plugin _not_ written in Go, even beyond the cost of integration the
  108. developer would have to re-implement the entire query language.
  109. ### Issue 3: Indexing Delays Consensus
  110. Within the node, indexing hooks in to the same internal pubsub dispatcher that
  111. is used to export event items to the [event subscription RPC API][event-rpc].
  112. In contrast with RPC subscribers, however, indexing is a "privileged"
  113. subscriber: If an RPC subscriber is "too slow", the node may terminate the
  114. subscription and disconnect the client. That means that RPC subscribers may
  115. lose (miss) event items. The indexer, however, is "unbuffered", and the
  116. publisher will never drop or disconnect from it. If the indexer is slow, the
  117. publisher will block until it returns, to ensure that no event items are lost.
  118. In practice, this means that the performance of the indexer has a direct effect
  119. on the performance of the consensus node: If the indexer is slow or stalls, it
  120. will slow or halt the progress of consensus. Users have already reported this
  121. problem even with the built-in indexer (see, for example, [#7247][i7247]).
  122. Extending this concern to arbitrary user-defined custom indexers gives that
  123. risk a much larger surface area.
  124. ## Discussion
  125. It is not possible to simultaneously guarantee that publishing event items will
  126. not delay consensus, and also that all event items of interest are always
  127. completely indexed.
  128. Therefore, our choice is between eliminating delay (and minimizing loss) or
  129. eliminating loss (and minimizing delay). Currently, we take the second
  130. approach, which has led to user complaints about consensus delays due to
  131. indexing and subscription overhead.
  132. - If we agree that consensus performance supersedes index completeness, our
  133. design choices are to constrain the likelihood and frequency of missing event
  134. items.
  135. - If we decide that consensus performance is more important than index
  136. completeness, our option is to minimize overhead on the event delivery path
  137. and document that indexer plugins constrain the rate of consensus.
  138. Since we have user reports requesting both properties, we have to choose one or
  139. the other. Since the primary job of the consensus engine is to correctly,
  140. robustly, reliablly, and efficiently replicate application state across the
  141. network, I believe the correct choice is to favor consensus performance.
  142. An important consideration for this decision is that a node does not index
  143. application metadata separately: If indexing is disabled, there is no built-in
  144. mechanism to go back and replay or reconstruct the data that an indexer would
  145. have stored. The node _does_ store the blockchain itself (i.e., the blocks and
  146. their transactions), so potentially some use cases currently handled by the
  147. indexer could be handled by the node. For example, allowing clients to ask
  148. whether a given transaction ID has been committed to a block could in principle
  149. be done without an indexer, since it does not depend on application metadata.
  150. Inevitably, a question will arise whether we could implement both strategies
  151. and toggle between them with a flag. That would be a worst-case scenario,
  152. requiring us to maintain the complexity of two very-different operational
  153. concerns. If our goal is that Tendermint should be as simple, efficient, and
  154. trustworthy as posible, there is not a strong case for making these options
  155. configurable: We should pick a side and commit to it.
  156. ### Design Principles
  157. Although there is no unique "best" solution to the issues described above,
  158. there are some specific principles that a solution should include:
  159. 1. **A custom indexer should not require integration into Tendermint core.** A
  160. developer or node operator can create, build, deploy, and use a custom
  161. indexer with a stock build of the Tendermint consensus node.
  162. 2. **Custom indexers cannot stall consensus.** An indexer that is slow or
  163. stalls cannot slow down or prevent core consensus from making progress.
  164. The plugin interface must give node operators control over the tolerances
  165. for acceptable indexer performance, and the means to detect when indexers
  166. are falling outside those tolerances, but indexer failures should "fail
  167. safe" with respect to consensus (even if that means the indexer may miss
  168. some data, in sufficiently-extreme circumstances).
  169. 3. **Custom indexers control which event items they index.** A custom indexer
  170. is not limited to only the current transaction and block events, but can
  171. observe any event item published by the node.
  172. 4. **Custom indexing is forward-compatible.** Adding new event item types or
  173. metadata to the consensus node should not require existing custom indexers
  174. to be rebuilt or modified, unless they want to take advantage of the new
  175. data.
  176. 5. **Indexers are responsible for answering queries.** An indexer plugin is not
  177. required to support the legacy query filter language, nor to be compatible
  178. with the legacy RPC endpoints for accessing them. Any APIs for clients to
  179. query a custom index are the responsibility of the indexer, not the node.
  180. ### Open Questions
  181. Given the constraints outlined above, there are important design questions we
  182. must answer to guide any specific changes:
  183. 1. **What is an acceptable probability that, given sufficiently extreme
  184. operational issues, an indexer might miss some number of events?**
  185. There are two parts to this question: One is what constitutes an extreme
  186. operational problem, the other is how likely we are to miss some number of
  187. events items.
  188. - If the consensus is that no event item must ever be missed, no matter how
  189. bad the operational circumstances, then we _must_ accept that indexing can
  190. slow or halt consensus arbitrarily. It is impossible to guarantee complete
  191. index coverage without potentially unbounded delays.
  192. - Otherwise, how much data can we afford to lose and how often? For example,
  193. if we can ensure no event item will be lost unless the indexer halts for
  194. at least five minutes, is that acceptable? What probabilities and time
  195. ranges are reasonable for real production environments?
  196. 2. **What level of operational overhead is acceptable to impose on node
  197. operators to support indexing?**
  198. Are node operators willing to configure and run custom indexers as sidecar
  199. type processes alongside a node? How much indexer setup above and beyond the
  200. work of setting up the underlying node in isolation is tractable in
  201. production networks?
  202. The answer to this question also informs the question of whether we should
  203. keep an "in-process" indexing option, and to what extent that option needs
  204. to satisfy the suggested design principles.
  205. Relatedly, to what extent do we need to be concerned about the cost of
  206. encoding and sending event items to an external process (e.g., as JSON blobs
  207. or protobuf wire messages)? Given that the node already encodes event items
  208. as JSON for subscription purposes, the overhead would be negligible for the
  209. node itself, but the indexer would have to decode to process the results.
  210. 3. **What (if any) query APIs does the consensus node need to export,
  211. independent of the indexer implementation?**
  212. One typical example is whether the node should be able to answer queries
  213. like "is this transaction ID in a block?" Currently, a node cannot answer
  214. this query _unless_ it runs the built-in KV indexer. Does the node need to
  215. continue to support that query even for nodes that disable the KV indexer,
  216. or which use a custom indexer?
  217. ### Informal Design Intent
  218. The design principles described above implicate several components of the
  219. Tendermint node, beyond just the indexer. In the context of [ADR 075][adr075],
  220. we are re-working the RPC event subscription API to improve some of the UX
  221. issues discussed above for RPC clients. It is our expectation that a solution
  222. for pluggable custom indexing will take advantage of some of the same work.
  223. On that basis, the design approach I am considering for custom indexing looks
  224. something like this (subject to refinement):
  225. 1. A custom indexer runs as a separate process from the node.
  226. 2. The indexer subscribes to event items via the ADR 075 events API.
  227. This means indexers would receive event payloads as JSON rather than
  228. protobuf, but since we already have to support JSON encoding for the RPC
  229. interface anyway, that should not increase complexity for the node.
  230. 3. The existing PostgreSQL indexer gets reworked to have this form, and no
  231. longer built as part of the Tendermint core binary.
  232. We can retain the code in the core repository as a proof-of-concept, or
  233. perhaps create a separate repository with contributed indexers and move it
  234. there.
  235. 4. (Possibly) Deprecate and remove the legacy KV indexer, or disable it by
  236. default. If we decide to remove it, we can also remove the legacy RPC
  237. endpoints for querying the KV indexer.
  238. If we plan to do this, we should also investigate providing a way for
  239. clients to query whether a given transaction ID has landed in a block. That
  240. serves a common need, and currently _only_ works if the KV indexer is
  241. enabled, but could be addressed more simply using the other data a node
  242. already has stored, without having to answer more general queries.
  243. ## References
  244. - [ADR 065: Custom Event Indexing][adr065]
  245. - [ADR 075: RPC Event Subscription Interface][adr075]
  246. - [Cosmos SDK][sdk]
  247. - [Event subscription RPC][event-rpc]
  248. - [KV transaction indexer][kv-index]
  249. - [Pluggable custom event indexing][i7135] (#7135)
  250. - [PostgreSQL event sink][psql]
  251. - [PostgreSQL database][postgres]
  252. - [Query filter language][query]
  253. - [Stream events to postgres for indexing][i1161] (#1161)
  254. - [Unbuffered event subscription slow down the consensus][i7247] (#7247)
  255. - [`EventSink` interface][esink]
  256. - [`tm-db` library][tmdb]
  257. [adr065]: https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-065-custom-event-indexing.md
  258. [adr075]: https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-075-rpc-subscription.md
  259. [esink]: https://pkg.go.dev/github.com/tendermint/tendermint/internal/state/indexer#EventSink
  260. [event-rpc]: https://docs.tendermint.com/master/rpc/#/Websocket/subscribe
  261. [i1161]: https://github.com/tendermint/tendermint/issues/1161
  262. [i7135]: https://github.com/tendermint/tendermint/issues/7135
  263. [i7247]: https://github.com/tendermint/tendermint/issues/7247
  264. [kv-index]: https://github.com/tendermint/tendermint/blob/master/internal/state/indexer/tx/kv
  265. [postgres]: https://postgresql.org/
  266. [psql]: https://github.com/tendermint/tendermint/blob/master/internal/state/indexer/sink/psql
  267. [psql]: https://github.com/tendermint/tendermint/blob/master/internal/state/indexer/sink/psql
  268. [query]: https://pkg.go.dev/github.com/tendermint/tendermint/internal/pubsub/query/syntax
  269. [sdk]: https://github.com/cosmos/cosmos-sdk
  270. [tmdb]: https://pkg.go.dev/github.com/tendermint/tm-db#DB