|
|
@ -0,0 +1,235 @@ |
|
|
|
# ADR 073: Adopt LibP2P |
|
|
|
|
|
|
|
## Changelog |
|
|
|
|
|
|
|
- 2021-11-02: Initial Draft (@tychoish) |
|
|
|
|
|
|
|
## Status |
|
|
|
|
|
|
|
Proposed. |
|
|
|
|
|
|
|
## Context |
|
|
|
|
|
|
|
|
|
|
|
As part of the 0.35 development cycle, the Tendermint team completed |
|
|
|
the first phase of the work described in ADRs 61 and 62, which included a |
|
|
|
large scale refactoring of the reactors and the p2p message |
|
|
|
routing. This replaced the switch and many of the other legacy |
|
|
|
components without breaking protocol or network-level |
|
|
|
interoperability and left the legacy connection/socket handling code. |
|
|
|
|
|
|
|
Following the release, the team has reexamined the state of the code |
|
|
|
and the design, as well as Tendermint's requirements. The notes |
|
|
|
from that process are available in the [P2P Roadmap |
|
|
|
RFC][rfc]. |
|
|
|
|
|
|
|
This ADR supersedes the decisions made in ADRs 60 and 61, but |
|
|
|
builds on the completed portions of this work. Previously, the |
|
|
|
boundaries of peer management, message handling, and the higher level |
|
|
|
business logic (e.g., "the reactors") were intermingled, and core |
|
|
|
elements of the p2p system were responsible for the orchestration of |
|
|
|
higher-level business logic. Refactoring the legacy components |
|
|
|
made it more obvious that this entanglement of responsibilities |
|
|
|
had outsized influence on the entire implementation, making |
|
|
|
it difficult to iterate within the current abstractions. |
|
|
|
It would not be viable to maintain interoperability with legacy |
|
|
|
systems while also achieving many of our broader objectives. |
|
|
|
|
|
|
|
LibP2P is a thoroughly-specified implementation of a peer-to-peer |
|
|
|
networking stack, designed specifically for systems such as |
|
|
|
ours. Adopting LibP2P as the basis of Tendermint will allow the |
|
|
|
Tendermint team to focus more of their time on other differentiating |
|
|
|
aspects of the system, and make it possible for the ecosystem as a |
|
|
|
whole to take advantage of tooling and efforts of the LibP2P |
|
|
|
platform. |
|
|
|
|
|
|
|
## Alternative Approaches |
|
|
|
|
|
|
|
As discussed in the [P2P Roadmap RFC][rfc], the primary alternative would be to |
|
|
|
continue development of Tendermint's home-grown peer-to-peer |
|
|
|
layer. While that would give the Tendermint team maximal control |
|
|
|
over the peer system, the current design is unexceptional on its |
|
|
|
own merits, and the prospective maintenance burden for this system |
|
|
|
exceeds our tolerances for the medium term. |
|
|
|
|
|
|
|
Tendermint can and should differentiate itself not on the basis of |
|
|
|
its networking implementation or peer management tools, but providing |
|
|
|
a consistent operator experience, a battle-tested consensus algorithm, |
|
|
|
and an ergonomic user experience. |
|
|
|
|
|
|
|
## Decision |
|
|
|
|
|
|
|
Tendermint will adopt libp2p during the 0.37 development cycle, |
|
|
|
replacing the bespoke Tendermint P2P stack. This will remove the |
|
|
|
`Endpoint`, `Transport`, `Connection`, and `PeerManager` abstractions |
|
|
|
and leave the reactors, `p2p.Router` and `p2p.Channel` |
|
|
|
abstractions. |
|
|
|
|
|
|
|
LibP2P may obviate the need for a dedicated peer exchange (PEX) |
|
|
|
reactor, which would also in turn obviate the need for a dedicated |
|
|
|
seed mode. If this is the case, then all of this functionality would |
|
|
|
be removed. |
|
|
|
|
|
|
|
If it turns out (based on the advice of Protocol Labs) that it makes |
|
|
|
sense to maintain separate pubsub or gossipsub topics |
|
|
|
per-message-type, then the `Router` abstraction could also |
|
|
|
be entirely subsumed. |
|
|
|
|
|
|
|
## Detailed Design |
|
|
|
|
|
|
|
### Implementation Changes |
|
|
|
|
|
|
|
The seams in the P2P implementation between the higher level |
|
|
|
constructs (reactors), the routing layer (`Router`) and the lower |
|
|
|
level connection and peer management code make this operation |
|
|
|
relatively straightforward to implement. A key |
|
|
|
goal in this design is to minimize the impact on the reactors |
|
|
|
(potentially entirely,) and completely remove the lower level |
|
|
|
components (e.g., `Transport`, `Connection` and `PeerManager`) using the |
|
|
|
separation afforded by the `Router` layer. The current state of the |
|
|
|
code makes these changes relatively surgical, and limited to a small |
|
|
|
number of methods: |
|
|
|
|
|
|
|
- `p2p.Router.OpenChannel` will still return a `Channel` structure |
|
|
|
which will continue to serve as a pipe between the reactors and the |
|
|
|
`Router`. The implementation will no longer need the queue |
|
|
|
implementation, and will instead start goroutines that |
|
|
|
are responsible for routing the messages from the channel to libp2p |
|
|
|
fundamentals, replacing the current `p2p.Router.routeChannel`. |
|
|
|
|
|
|
|
- The current `p2p.Router.dialPeers` and `p2p.Router.acceptPeers`, |
|
|
|
are responsible for establishing outbound and inbound connections, |
|
|
|
respectively. These methods will be removed, along with |
|
|
|
`p2p.Router.openConnection`, and the libp2p connection manager will |
|
|
|
be responsible for maintaining network connectivity. |
|
|
|
|
|
|
|
- The `p2p.Channel` interface will change to replace Go |
|
|
|
channels with a more functional interface for sending messages. |
|
|
|
New methods on this object will take contexts to support safe |
|
|
|
cancellation, and return errors, and will block rather than |
|
|
|
running asynchronously. The `Out` channel through which |
|
|
|
reactors send messages to Peers, will be replaced by a `Send` |
|
|
|
method, and the Error channel will be replaced by an `Error` |
|
|
|
method. |
|
|
|
|
|
|
|
- Reactors will be passed an interface that will allow them to |
|
|
|
access Peer information from libp2p. This will supplant the |
|
|
|
`p2p.PeerUpdates` subscription. |
|
|
|
|
|
|
|
- Add some kind of heartbeat message at the application level |
|
|
|
(e.g. with a reactor,) potentially connected to libp2p's DHT to be |
|
|
|
used by reactors for service discovery, message targeting, or other |
|
|
|
features. |
|
|
|
|
|
|
|
- Replace the existing/legacy handshake protocol with [Noise](http://www.noiseprotocol.org/noise.html). |
|
|
|
|
|
|
|
This project will initially use the TCP-based transport protocols within |
|
|
|
libp2p. QUIC is also available as an option that we may implement later. |
|
|
|
We will not support mixed networks in the initial release, but will |
|
|
|
revisit that possibility later if there is a demonstrated need. |
|
|
|
|
|
|
|
### Upgrade and Compatibility |
|
|
|
|
|
|
|
Because the routers and all current P2P libraries are `internal` |
|
|
|
packages and not part of the public API, the only changes to the public |
|
|
|
API surface area of Tendermint will be different configuration |
|
|
|
file options, replacing the current P2P options with options relevant |
|
|
|
to libp2p. |
|
|
|
|
|
|
|
However, it will not be possible to run a network with both networking |
|
|
|
stacks active at once, so the upgrade to the version of Tendermint |
|
|
|
will need to be coordinated between all nodes of the network. This is |
|
|
|
consistent with the expectations around upgrades for Tendermint moving |
|
|
|
forward, and will help manage both the complexity of the project and |
|
|
|
the implementation timeline. |
|
|
|
|
|
|
|
## Open Questions |
|
|
|
|
|
|
|
- What is the role of Protocol Labs in the implementation of libp2p in |
|
|
|
tendermint, both during the initial implementation and on an ongoing |
|
|
|
basis thereafter? |
|
|
|
|
|
|
|
- Should all P2P traffic for a given node be pushed to a single topic, |
|
|
|
so that a topic maps to a specific ChainID, or should |
|
|
|
each reactor (or type of message) have its own topic? How many |
|
|
|
topics can a libp2p network support? Is there testing that validates |
|
|
|
the capabilities? |
|
|
|
|
|
|
|
- Tendermint presently provides a very coarse QoS-like functionality |
|
|
|
using priorities based on message-type. |
|
|
|
This intuitively/theoretically ensures that evidence and consensus |
|
|
|
messages don't get starved by blocksync/statesync messages. It's |
|
|
|
unclear if we can or should attempt to replicate this with libp2p. |
|
|
|
|
|
|
|
- What kind of QoS functionality does libp2p provide and what kind of |
|
|
|
metrics does libp2p provide about it's QoS functionality? |
|
|
|
|
|
|
|
- Is it possible to store additional (and potentially arbitrary) |
|
|
|
information into the DHT as part of the heartbeats between nodes, |
|
|
|
such as the latest height, and then access that in the |
|
|
|
reactors. How frequently can the DHT be updated? |
|
|
|
|
|
|
|
- Does it make sense to have reactors continue to consume inbound |
|
|
|
messages from a Channel (`In`) or is there another interface or |
|
|
|
pattern that we should consider? |
|
|
|
|
|
|
|
- We should avoid exposing Go channels when possible, and likely |
|
|
|
some kind of alternate iterator likely makes sense for processing |
|
|
|
messages within the reactors. |
|
|
|
|
|
|
|
- What are the security and protocol implications of tracking |
|
|
|
information from peer heartbeats and exposing that to reactors? |
|
|
|
|
|
|
|
- How much (or how little) configuration can Tendermint provide for |
|
|
|
libp2p, particularly on the first release? |
|
|
|
|
|
|
|
- In general, we should not support byo-functionality for libp2p |
|
|
|
components within Tendermint, and reduce the configuration surface |
|
|
|
area, as much as possible. |
|
|
|
|
|
|
|
- What are the best ways to provide request/response semantics for |
|
|
|
reactors on top of libp2p? Will it be possible to add |
|
|
|
request/response semantics in a future release or is there |
|
|
|
anticipatory work that needs to be done as part of the initial |
|
|
|
release? |
|
|
|
|
|
|
|
## Consequences |
|
|
|
|
|
|
|
### Positive |
|
|
|
|
|
|
|
- Reduce the maintenance burden for the Tendermint Core team by |
|
|
|
removing a large swath of legacy code that has proven to be |
|
|
|
difficult to modify safely. |
|
|
|
|
|
|
|
- Remove the responsibility for maintaining and developing the entire |
|
|
|
peer management system (p2p) and stack. |
|
|
|
|
|
|
|
- Provide users with a more stable peer and networking system, |
|
|
|
Tendermint can improve operator experience and network stability. |
|
|
|
|
|
|
|
### Negative |
|
|
|
|
|
|
|
- By deferring to library implementations for peer management and |
|
|
|
networking, Tendermint loses some flexibility for innovating at the |
|
|
|
peer and networking level. However, Tendermint should be innovating |
|
|
|
primarily at the consensus layer, and libp2p does not preclude |
|
|
|
optimization or development in the peer layer. |
|
|
|
|
|
|
|
- Libp2p is a large dependency and Tendermint would become dependent |
|
|
|
upon Protocol Labs' release cycle and prioritization for bug |
|
|
|
fixes. If this proves onerous, it's possible to maintain a vendor |
|
|
|
fork of relevant components as needed. |
|
|
|
|
|
|
|
### Neutral |
|
|
|
|
|
|
|
- N/A |
|
|
|
|
|
|
|
## References |
|
|
|
|
|
|
|
- [ADR 61: P2P Refactor Scope][adr61] |
|
|
|
- [ADR 62: P2P Architecture][adr62] |
|
|
|
- [P2P Roadmap RFC][rfc] |
|
|
|
|
|
|
|
[adr61]: ./adr-061-p2p-refactor-scope.md |
|
|
|
[adr62]: ./adr-062-p2p-architecture.md |
|
|
|
[rfc]: ../rfc/rfc-000-p2p.rst |