You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

235 lines
9.6 KiB

  1. # ADR 073: Adopt LibP2P
  2. ## Changelog
  3. - 2021-11-02: Initial Draft (@tychoish)
  4. ## Status
  5. Proposed.
  6. ## Context
  7. As part of the 0.35 development cycle, the Tendermint team completed
  8. the first phase of the work described in ADRs 61 and 62, which included a
  9. large scale refactoring of the reactors and the p2p message
  10. routing. This replaced the switch and many of the other legacy
  11. components without breaking protocol or network-level
  12. interoperability and left the legacy connection/socket handling code.
  13. Following the release, the team has reexamined the state of the code
  14. and the design, as well as Tendermint's requirements. The notes
  15. from that process are available in the [P2P Roadmap
  16. RFC][rfc].
  17. This ADR supersedes the decisions made in ADRs 60 and 61, but
  18. builds on the completed portions of this work. Previously, the
  19. boundaries of peer management, message handling, and the higher level
  20. business logic (e.g., "the reactors") were intermingled, and core
  21. elements of the p2p system were responsible for the orchestration of
  22. higher-level business logic. Refactoring the legacy components
  23. made it more obvious that this entanglement of responsibilities
  24. had outsized influence on the entire implementation, making
  25. it difficult to iterate within the current abstractions.
  26. It would not be viable to maintain interoperability with legacy
  27. systems while also achieving many of our broader objectives.
  28. LibP2P is a thoroughly-specified implementation of a peer-to-peer
  29. networking stack, designed specifically for systems such as
  30. ours. Adopting LibP2P as the basis of Tendermint will allow the
  31. Tendermint team to focus more of their time on other differentiating
  32. aspects of the system, and make it possible for the ecosystem as a
  33. whole to take advantage of tooling and efforts of the LibP2P
  34. platform.
  35. ## Alternative Approaches
  36. As discussed in the [P2P Roadmap RFC][rfc], the primary alternative would be to
  37. continue development of Tendermint's home-grown peer-to-peer
  38. layer. While that would give the Tendermint team maximal control
  39. over the peer system, the current design is unexceptional on its
  40. own merits, and the prospective maintenance burden for this system
  41. exceeds our tolerances for the medium term.
  42. Tendermint can and should differentiate itself not on the basis of
  43. its networking implementation or peer management tools, but providing
  44. a consistent operator experience, a battle-tested consensus algorithm,
  45. and an ergonomic user experience.
  46. ## Decision
  47. Tendermint will adopt libp2p during the 0.37 development cycle,
  48. replacing the bespoke Tendermint P2P stack. This will remove the
  49. `Endpoint`, `Transport`, `Connection`, and `PeerManager` abstractions
  50. and leave the reactors, `p2p.Router` and `p2p.Channel`
  51. abstractions.
  52. LibP2P may obviate the need for a dedicated peer exchange (PEX)
  53. reactor, which would also in turn obviate the need for a dedicated
  54. seed mode. If this is the case, then all of this functionality would
  55. be removed.
  56. If it turns out (based on the advice of Protocol Labs) that it makes
  57. sense to maintain separate pubsub or gossipsub topics
  58. per-message-type, then the `Router` abstraction could also
  59. be entirely subsumed.
  60. ## Detailed Design
  61. ### Implementation Changes
  62. The seams in the P2P implementation between the higher level
  63. constructs (reactors), the routing layer (`Router`) and the lower
  64. level connection and peer management code make this operation
  65. relatively straightforward to implement. A key
  66. goal in this design is to minimize the impact on the reactors
  67. (potentially entirely,) and completely remove the lower level
  68. components (e.g., `Transport`, `Connection` and `PeerManager`) using the
  69. separation afforded by the `Router` layer. The current state of the
  70. code makes these changes relatively surgical, and limited to a small
  71. number of methods:
  72. - `p2p.Router.OpenChannel` will still return a `Channel` structure
  73. which will continue to serve as a pipe between the reactors and the
  74. `Router`. The implementation will no longer need the queue
  75. implementation, and will instead start goroutines that
  76. are responsible for routing the messages from the channel to libp2p
  77. fundamentals, replacing the current `p2p.Router.routeChannel`.
  78. - The current `p2p.Router.dialPeers` and `p2p.Router.acceptPeers`,
  79. are responsible for establishing outbound and inbound connections,
  80. respectively. These methods will be removed, along with
  81. `p2p.Router.openConnection`, and the libp2p connection manager will
  82. be responsible for maintaining network connectivity.
  83. - The `p2p.Channel` interface will change to replace Go
  84. channels with a more functional interface for sending messages.
  85. New methods on this object will take contexts to support safe
  86. cancellation, and return errors, and will block rather than
  87. running asynchronously. The `Out` channel through which
  88. reactors send messages to Peers, will be replaced by a `Send`
  89. method, and the Error channel will be replaced by an `Error`
  90. method.
  91. - Reactors will be passed an interface that will allow them to
  92. access Peer information from libp2p. This will supplant the
  93. `p2p.PeerUpdates` subscription.
  94. - Add some kind of heartbeat message at the application level
  95. (e.g. with a reactor,) potentially connected to libp2p's DHT to be
  96. used by reactors for service discovery, message targeting, or other
  97. features.
  98. - Replace the existing/legacy handshake protocol with [Noise](http://www.noiseprotocol.org/noise.html).
  99. This project will initially use the TCP-based transport protocols within
  100. libp2p. QUIC is also available as an option that we may implement later.
  101. We will not support mixed networks in the initial release, but will
  102. revisit that possibility later if there is a demonstrated need.
  103. ### Upgrade and Compatibility
  104. Because the routers and all current P2P libraries are `internal`
  105. packages and not part of the public API, the only changes to the public
  106. API surface area of Tendermint will be different configuration
  107. file options, replacing the current P2P options with options relevant
  108. to libp2p.
  109. However, it will not be possible to run a network with both networking
  110. stacks active at once, so the upgrade to the version of Tendermint
  111. will need to be coordinated between all nodes of the network. This is
  112. consistent with the expectations around upgrades for Tendermint moving
  113. forward, and will help manage both the complexity of the project and
  114. the implementation timeline.
  115. ## Open Questions
  116. - What is the role of Protocol Labs in the implementation of libp2p in
  117. tendermint, both during the initial implementation and on an ongoing
  118. basis thereafter?
  119. - Should all P2P traffic for a given node be pushed to a single topic,
  120. so that a topic maps to a specific ChainID, or should
  121. each reactor (or type of message) have its own topic? How many
  122. topics can a libp2p network support? Is there testing that validates
  123. the capabilities?
  124. - Tendermint presently provides a very coarse QoS-like functionality
  125. using priorities based on message-type.
  126. This intuitively/theoretically ensures that evidence and consensus
  127. messages don't get starved by blocksync/statesync messages. It's
  128. unclear if we can or should attempt to replicate this with libp2p.
  129. - What kind of QoS functionality does libp2p provide and what kind of
  130. metrics does libp2p provide about it's QoS functionality?
  131. - Is it possible to store additional (and potentially arbitrary)
  132. information into the DHT as part of the heartbeats between nodes,
  133. such as the latest height, and then access that in the
  134. reactors. How frequently can the DHT be updated?
  135. - Does it make sense to have reactors continue to consume inbound
  136. messages from a Channel (`In`) or is there another interface or
  137. pattern that we should consider?
  138. - We should avoid exposing Go channels when possible, and likely
  139. some kind of alternate iterator likely makes sense for processing
  140. messages within the reactors.
  141. - What are the security and protocol implications of tracking
  142. information from peer heartbeats and exposing that to reactors?
  143. - How much (or how little) configuration can Tendermint provide for
  144. libp2p, particularly on the first release?
  145. - In general, we should not support byo-functionality for libp2p
  146. components within Tendermint, and reduce the configuration surface
  147. area, as much as possible.
  148. - What are the best ways to provide request/response semantics for
  149. reactors on top of libp2p? Will it be possible to add
  150. request/response semantics in a future release or is there
  151. anticipatory work that needs to be done as part of the initial
  152. release?
  153. ## Consequences
  154. ### Positive
  155. - Reduce the maintenance burden for the Tendermint Core team by
  156. removing a large swath of legacy code that has proven to be
  157. difficult to modify safely.
  158. - Remove the responsibility for maintaining and developing the entire
  159. peer management system (p2p) and stack.
  160. - Provide users with a more stable peer and networking system,
  161. Tendermint can improve operator experience and network stability.
  162. ### Negative
  163. - By deferring to library implementations for peer management and
  164. networking, Tendermint loses some flexibility for innovating at the
  165. peer and networking level. However, Tendermint should be innovating
  166. primarily at the consensus layer, and libp2p does not preclude
  167. optimization or development in the peer layer.
  168. - Libp2p is a large dependency and Tendermint would become dependent
  169. upon Protocol Labs' release cycle and prioritization for bug
  170. fixes. If this proves onerous, it's possible to maintain a vendor
  171. fork of relevant components as needed.
  172. ### Neutral
  173. - N/A
  174. ## References
  175. - [ADR 61: P2P Refactor Scope][adr61]
  176. - [ADR 62: P2P Architecture][adr62]
  177. - [P2P Roadmap RFC][rfc]
  178. [adr61]: ./adr-061-p2p-refactor-scope.md
  179. [adr62]: ./adr-062-p2p-architecture.md
  180. [rfc]: ../rfc/rfc-000-p2p.rst