You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

122 lines
5.2 KiB

  1. =====================
  2. RFC 005: Event System
  3. =====================
  4. Changelog
  5. ---------
  6. - 2021-09-17: Initial Draft (@tychoish)
  7. Abstract
  8. --------
  9. The event system within Tendermint, which supports a lot of core
  10. functionality, also represents a major infrastructural liability. As part of
  11. our upcoming review of the RPC interfaces and our ongoing thoughts about
  12. stability and performance, as well as the preparation for Tendermint 1.0, we
  13. should revisit the design and implementation of the event system. This
  14. document discusses both the current state of the system and potential
  15. directions for future improvement.
  16. Background
  17. ----------
  18. Current State of Events
  19. ~~~~~~~~~~~~~~~~~~~~~~~
  20. The event system makes it possible for clients, both internal and external,
  21. to receive notifications of state replication events, such as new blocks,
  22. new transactions, validator set changes, as well as intermediate events during
  23. consensus. Because the event system is very cross cutting, the behavior and
  24. performance of the event publication and subscription system has huge impacts
  25. for all of Tendermint.
  26. The subscription service is exposed over the RPC interface, but also powers
  27. the indexing (e.g. to an external database,) and is the mechanism by which
  28. `BroadcastTxCommit` is able to wait for transactions to land in a block.
  29. The current pubsub mechanism relies on a couple of buffered channels,
  30. primarily between all event creators and subscribers, but also for each
  31. subscription. The result of this design is that, in some situations with the
  32. right collection of slow subscription consumers the event system can put
  33. backpressure on the consensus state machine and message gossiping in the
  34. network, thereby causing nodes to lag.
  35. Improvements
  36. ~~~~~~~~~~~~
  37. The current system relies on implicit, bounded queues built by the buffered channels,
  38. and though threadsafe, can force all activity within Tendermint to serialize,
  39. which does not need to happen. Additionally, timeouts for subscription
  40. consumers related to the implementation of the RPC layer, may complicate the
  41. use of the system.
  42. References
  43. ~~~~~~~~~~
  44. - Legacy Implementation
  45. - `publication of events <https://github.com/tendermint/tendermint/blob/master/libs/pubsub/pubsub.go#L333-L345>`_
  46. - `send operation <https://github.com/tendermint/tendermint/blob/master/libs/pubsub/pubsub.go#L489-L527>`_
  47. - `send loop <https://github.com/tendermint/tendermint/blob/master/libs/pubsub/pubsub.go#L381-L402>`_
  48. - Related RFCs
  49. - `RFC 002: IPC Ecosystem <./rfc-002-ipc-ecosystem.md>`_
  50. - `RFC 003: Performance Questions <./rfc-003-performance-questions.md>`_
  51. Discussion
  52. ----------
  53. Changes to Published Events
  54. ~~~~~~~~~~~~~~~~~~~~~~~~~~~
  55. As part of this process, the Tendermint team should do a study of the existing
  56. event types and ensure that there are viable production use cases for
  57. subscriptions to all event types. Instinctively it seems plausible that some
  58. of the events may not be useable outside of tendermint, (e.g. ``TimeoutWait``
  59. or ``NewRoundStep``) and it might make sense to remove them. Certainly, it
  60. would be good to make sure that we don't maintain infrastructure for unused or
  61. un-useful message indefinitely.
  62. Blocking Subscription
  63. ~~~~~~~~~~~~~~~~~~~~~
  64. The blocking subscription mechanism makes it possible to have *send*
  65. operations into the subscription channel be un-buffered (the event processing
  66. channel is still buffered.) In the blocking case, events from one subscription
  67. can block processing that event for other non-blocking subscriptions. The main
  68. case, it seems for blocking subscriptions is ensuring that a transaction has
  69. been committed to a block for ``BroadcastTxCommit``. Removing blocking
  70. subscriptions entirely, and potentially finding another way to implement
  71. ``BroadcastTxCommit``, could lead to important simplifications and
  72. improvements to throughput without requiring large changes.
  73. Subscription Identification
  74. ~~~~~~~~~~~~~~~~~~~~~~~~~~~
  75. Before `#6386 <https://github.com/tendermint/tendermint/pull/6386>`_, all
  76. subscriptions were identified by the combination of a client ID and a query,
  77. and with that change, it became possible to identify all subscription given
  78. only an ID, but compatibility with the legacy identification means that there's a
  79. good deal of legacy code as well as client side efficiency that could be
  80. improved.
  81. Pubsub Changes
  82. ~~~~~~~~~~~~~~
  83. The pubsub core should be implemented in a way that removes the possibility of
  84. backpressure from the event system to impact the core system *or* for one
  85. subscription to impact the behavior of another area of the
  86. system. Additionally, because the current system is implemented entirely in
  87. terms of a collection of buffered channels, the event system (and large
  88. numbers of subscriptions) can be a source of memory pressure.
  89. These changes could include:
  90. - explicit cancellation and timeouts promulgated from callers (e.g. RPC end
  91. points, etc,) this should be done using contexts.
  92. - subscription system should be able to spill to disk to avoid putting memory
  93. pressure on the core behavior of the node (consensus, gossip).
  94. - subscriptions implemented as cursors rather than channels, with either
  95. condition variables to simulate the existing "push" API or a client side
  96. iterator API with some kind of long polling-type interface.