You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

316 lines
14 KiB

  1. ====================
  2. RFC 000: P2P Roadmap
  3. ====================
  4. Changelog
  5. ---------
  6. - 2021-08-20: Completed initial draft and distributed via a gist
  7. - 2021-08-25: Migrated as an RFC and changed format
  8. Abstract
  9. --------
  10. This document discusses the future of peer network management in Tendermint, with
  11. a particular focus on features, semantics, and a proposed roadmap.
  12. Specifically, we consider libp2p as a tool kit for implementing some fundamentals.
  13. Background
  14. ----------
  15. For the 0.35 release cycle the switching/routing layer of Tendermint was
  16. replaced. This work was done "in place," and produced a version of Tendermint
  17. that was backward-compatible and interoperable with previous versions of the
  18. software. While there are new p2p/peer management constructs in the new
  19. version (e.g. ``PeerManager`` and ``Router``), the main effect of this change
  20. was to simplify the ways that other components within Tendermint interacted with
  21. the peer management layer, and to make it possible for higher-level components
  22. (specifically the reactors), to be used and tested more independently.
  23. This refactoring, which was a major undertaking, was entirely necessary to
  24. enable areas for future development and iteration on this aspect of
  25. Tendermint. There are also a number of potential user-facing features that
  26. depend heavily on the p2p layer: additional transport protocols, transport
  27. compression, improved resilience to network partitions. These improvements to
  28. modularity, stability, and reliability of the p2p system will also make
  29. ongoing maintenance and feature development easier in the rest of Tendermint.
  30. Critique of Current Peer-to-Peer Infrastructure
  31. ---------------------------------------
  32. The current (refactored) P2P stack is an improvement on the previous iteration
  33. (legacy), but as of 0.35, there remains room for improvement in the design and
  34. implementation of the P2P layer.
  35. Some limitations of the current stack include:
  36. - heavy reliance on buffering to avoid backups in the flow of components,
  37. which is fragile to maintain and can lead to unexpected memory usage
  38. patterns and forces the routing layer to make decisions about when messages
  39. should be discarded.
  40. - the current p2p stack relies on convention (rather than the compiler) to
  41. enforce the API boundaries and conventions between reactors and the router,
  42. making it very easy to write "wrong" reactor code or introduce a bad
  43. dependency.
  44. - the current stack is probably more complex and difficult to maintain because
  45. the legacy system must coexist with the new components in 0.35. When the
  46. legacy stack is removed there are some simple changes that will become
  47. possible and could reduce the complexity of the new system. (e.g. `#6598
  48. <https://github.com/tendermint/tendermint/issues/6598>`_.)
  49. - the current stack encapsulates a lot of information about peers, and makes it
  50. difficult to expose that information to monitoring/observability tools. This
  51. general opacity also makes it difficult to interact with the peer system
  52. from other areas of the code base (e.g. tests, reactors).
  53. - the legacy stack provided some control to operators to force the system to
  54. dial new peers or seed nodes or manipulate the topology of the system _in
  55. situ_. The current stack can't easily provide this, and while the new stack
  56. may have better behavior, it does leave operators hands tied.
  57. Some of these issues will be resolved early in the 0.36 cycle, with the
  58. removal of the legacy components.
  59. The 0.36 release also provides the opportunity to make changes to the
  60. protocol, as the release will not be compatible with previous releases.
  61. Areas for Development
  62. ---------------------
  63. These sections describe features that may make sense to include in a Phase 2 of
  64. a P2P project.
  65. Internal Message Passing
  66. ~~~~~~~~~~~~~~~~~~~~~~~~
  67. Currently, there's no provision for intranode communication using the P2P
  68. layer, which means when two reactors need to interact with each other they
  69. have to have dependencies on each other's interfaces, and
  70. initialization. Changing these interactions (e.g. transitions between
  71. blocksync and consensus) from procedure calls to message passing.
  72. This is a relatively simple change and could be implemented with the following
  73. components:
  74. - a constant to represent "local" delivery as the ``To`` field on
  75. ``p2p.Envelope``.
  76. - special path for routing local messages that doesn't require message
  77. serialization (protobuf marshalling/unmarshaling).
  78. Adding these semantics, particularly if in conjunction with synchronous
  79. semantics provides a solution to dependency graph problems currently present
  80. in the Tendermint codebase, which will simplify development, make it possible
  81. to isolate components for testing.
  82. Eventually, this will also make it possible to have a logical Tendermint node
  83. running in multiple processes or in a collection of containers, although the
  84. usecase of this may be debatable.
  85. Synchronous Semantics (Paired Request/Response)
  86. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  87. In the current system, all messages are sent with fire-and-forget semantics,
  88. and there's no coupling between a request sent via the p2p layer, and a
  89. response. These kinds of semantics would simplify the implementation of
  90. state and block sync reactors, and make intra-node message passing more
  91. powerful.
  92. For some interactions, like gossiping transactions between the mempools of
  93. different nodes, fire-and-forget semantics make sense, but for other
  94. operations the missing link between requests/responses leads to either
  95. inefficiency when a node fails to respond or becomes unavailable, or code that
  96. is just difficult to follow.
  97. To support this kind of work, the protocol would need to accommodate some kind
  98. of request/response ID to allow identifying out-of-order responses over a
  99. single connection. Additionally, expanded the programming model of the
  100. ``p2p.Channel`` to accommodate some kind of _future_ or similar paradigm to
  101. make it viable to write reactor code without needing for the reactor developer
  102. to wrestle with lower level concurrency constructs.
  103. Timeout Handling (QoS)
  104. ~~~~~~~~~~~~~~~~~~~~~~
  105. Currently, all timeouts, buffering, and QoS features are handled at the router
  106. layer, and the reactors are implemented in ways that assume/require
  107. asynchronous operation. This both increases the required complexity at the
  108. routing layer, and means that misbehavior at the reactor level is difficult to
  109. detect or attribute. Additionally, the current system provides three main
  110. parameters to control quality of service:
  111. - buffer sizes for channels and queues.
  112. - priorities for channels
  113. - queue implementation details for shedding load.
  114. These end up being quite coarse controls, and changing the settings are
  115. difficult because as the queues and channels are able to buffer large numbers
  116. of messages it can be hard to see the impact of a given change, particularly
  117. in our extant test environment. In general, we should endeavor to:
  118. - set real timeouts, via contexts, on most message send operations, so that
  119. senders rather than queues can be responsible for timeout
  120. logic. Additionally, this will make it possible to avoid sending messages
  121. during shutdown.
  122. - reduce (to the greatest extent possible) the amount of buffering in
  123. channels and the queues, to more readily surface backpressure and reduce the
  124. potential for buildup of stale messages.
  125. Stream Based Connection Handling
  126. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  127. Currently the transport layer is message based, which makes sense from a
  128. mental model of how the protocol works, but makes it more difficult to
  129. implement transports and connection types, as it forces a higher level view of
  130. the connection and interaction which makes it harder to implement for novel
  131. transport types and makes it more likely that message-based caching and rate
  132. limiting will be implemented at the transport layer rather than at a more
  133. appropriate level.
  134. The transport then, would be responsible for negotiating the connection and the
  135. handshake and otherwise behave like a socket/file descriptor with ``Read`` and
  136. ``Write`` methods.
  137. While this was included in the initial design for the new P2P layer, it may be
  138. obviated entirely if the transport and peer layer is replaced with libp2p,
  139. which is primarily stream based.
  140. Service Discovery
  141. ~~~~~~~~~~~~~~~~~
  142. In the current system, Tendermint assumes that all nodes in a network are
  143. largely equivalent, and nodes tend to be "chatty" making many requests of
  144. large numbers of peers and waiting for peers to (hopefully) respond. While
  145. this works and has allowed Tendermint to get to a certain point, this both
  146. produces a theoretical scaling bottle neck and makes it harder to test and
  147. verify components of the system.
  148. In addition to peer's identity and connection information, peers should be
  149. able to advertise a number of services or capabilities, and node operators or
  150. developers should be able to specify peer capability requirements (e.g. target
  151. at least <x>-percent of peers with <y> capability.)
  152. These capabilities may be useful in selecting peers to send messages to, it
  153. may make sense to extend Tendermint's message addressing capability to allow
  154. reactors to send messages to groups of peers based on role rather than only
  155. allowing addressing to one or all peers.
  156. Having a good service discovery mechanism may pair well with the synchronous
  157. semantics (request/response) work, as it allows reactors to "make a request of
  158. a peer with <x> capability and wait for the response," rather force the
  159. reactors to need to track the capabilities or state of specific peers.
  160. Solutions
  161. ---------
  162. Continued Homegrown Implementation
  163. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  164. The current peer system is homegrown and is conceptually compatible with the
  165. needs of the project, and while there are limitations to the system, the p2p
  166. layer is not (currently as of 0.35) a major source of bugs or friction during
  167. development.
  168. However, the current implementation makes a number of allowances for
  169. interoperability, and there are a collection of iterative improvements that
  170. should be considered in the next couple of releases. To maintain the current
  171. implementation, upcoming work would include:
  172. - change the ``Transport`` mechanism to facilitate easier implementations.
  173. - implement different ``Transport`` handlers to be able to manage peer
  174. connections using different protocols (e.g. QUIC, etc.)
  175. - entirely remove the constructs and implementations of the legacy peer
  176. implementation.
  177. - establish and enforce clearer chains of responsibility for connection
  178. establishment (e.g. handshaking, setup,) which is currently shared between
  179. three components.
  180. - report better metrics regarding the into the state of peers and network
  181. connectivity, which are opaque outside of the system. This is constrained at
  182. the moment as a side effect of the split responsibility for connection
  183. establishment.
  184. - extend the PEX system to include service information so that nodes in the
  185. network weren't necessarily homogeneous.
  186. While maintaining a bespoke peer management layer would seem to distract from
  187. development of core functionality, the truth is that (once the legacy code is
  188. removed,) the scope of the peer layer is relatively small from a maintenance
  189. perspective, and having control at this layer might actually afford the
  190. project with the ability to more rapidly iterate on some features.
  191. LibP2P
  192. ~~~~~~
  193. LibP2P provides components that, approximately, account for the
  194. ``PeerManager`` and ``Transport`` components of the current (new) P2P
  195. stack. The Go APIs seem reasonable, and being able to externalize the
  196. implementation details of peer and connection management seems like it could
  197. provide a lot of benefits, particularly in supporting a more active ecosystem.
  198. In general the API provides the kind of stream-based, multi-protocol
  199. supporting, and idiomatic baseline for implementing a peer layer. Additionally
  200. because it handles peer exchange and connection management at a lower
  201. level, by using libp2p it'd be possible to remove a good deal of code in favor
  202. of just using libp2p. Having said that, Tendermint's P2P layer covers a
  203. greater scope (e.g. message routing to different peers) and that layer is
  204. something that Tendermint might want to retain.
  205. The are a number of unknowns that require more research including how much of
  206. a peer database the Tendermint engine itself needs to maintain, in order to
  207. support higher level operations (consensus, statesync), but it might be the
  208. case that our internal systems need to know much less about peers than
  209. otherwise specified. Similarly, the current system has a notion of peer
  210. scoring that cannot be communicated to libp2p, which may be fine as this is
  211. only used to support peer exchange (PEX,) which would become a property libp2p
  212. and not expressed in it's current higher-level form.
  213. In general, the effort to switch to libp2p would involve:
  214. - timing it during an appropriate protocol-breaking window, as it doesn't seem
  215. viable to support both libp2p *and* the current p2p protocol.
  216. - providing some in-memory testing network to support the use case that the
  217. current ``p2p.MemoryNetwork`` provides.
  218. - re-homing the ``p2p.Router`` implementation on top of libp2p components to
  219. be able to maintain the current reactor implementations.
  220. Open question include:
  221. - how much local buffering should we be doing? It sort of seems like we should
  222. figure out what the expected behavior is for libp2p for QoS-type
  223. functionality, and if our requirements mean that we should be implementing
  224. this on top of things ourselves?
  225. - if Tendermint was going to use libp2p, how would libp2p's stability
  226. guarantees (protocol, etc.) impact/constrain Tendermint's stability
  227. guarantees?
  228. - what kind of introspection does libp2p provide, and to what extend would
  229. this change or constrain the kind of observability that Tendermint is able
  230. to provide?
  231. - how do efforts to select "the best" (healthy, close, well-behaving, etc.)
  232. peers work out if Tendermint is not maintaining a local peer database?
  233. - would adding additional higher level semantics (internal message passing,
  234. request/response pairs, service discovery, etc.) facilitate removing some of
  235. the direct linkages between constructs/components in the system and reduce
  236. the need for Tendermint nodes to maintain state about its peers?
  237. References
  238. ----------
  239. - `Tracking Ticket for P2P Refactor Project <https://github.com/tendermint/tendermint/issues/5670>`_
  240. - `ADR 61: P2P Refactor Scope <../architecture/adr-061-p2p-refactor-scope.md>`_
  241. - `ADR 62: P2P Architecture and Abstraction <../architecture/adr-061-p2p-architecture.md>`_