|
@ -0,0 +1,316 @@ |
|
|
|
|
|
==================== |
|
|
|
|
|
RFC 000: P2P Roadmap |
|
|
|
|
|
==================== |
|
|
|
|
|
|
|
|
|
|
|
Changelog |
|
|
|
|
|
--------- |
|
|
|
|
|
|
|
|
|
|
|
- 2021-08-20: Completed initial draft and distributed via a gist |
|
|
|
|
|
- 2021-08-25: Migrated as an RFC and changed format |
|
|
|
|
|
|
|
|
|
|
|
Abstract |
|
|
|
|
|
-------- |
|
|
|
|
|
|
|
|
|
|
|
This document discusses the future of peer network management in Tendermint, with |
|
|
|
|
|
a particular focus on features, semantics, and a proposed roadmap. |
|
|
|
|
|
Specifically, we consider libp2p as a tool kit for implementing some fundamentals. |
|
|
|
|
|
|
|
|
|
|
|
Background |
|
|
|
|
|
---------- |
|
|
|
|
|
|
|
|
|
|
|
For the 0.35 release cycle the switching/routing layer of Tendermint was |
|
|
|
|
|
replaced. This work was done "in place," and produced a version of Tendermint |
|
|
|
|
|
that was backward-compatible and interoperable with previous versions of the |
|
|
|
|
|
software. While there are new p2p/peer management constructs in the new |
|
|
|
|
|
version (e.g. ``PeerManager`` and ``Router``), the main effect of this change |
|
|
|
|
|
was to simplify the ways that other components within Tendermint interacted with |
|
|
|
|
|
the peer management layer, and to make it possible for higher-level components |
|
|
|
|
|
(specifically the reactors), to be used and tested more independently. |
|
|
|
|
|
|
|
|
|
|
|
This refactoring, which was a major undertaking, was entirely necessary to |
|
|
|
|
|
enable areas for future development and iteration on this aspect of |
|
|
|
|
|
Tendermint. There are also a number of potential user-facing features that |
|
|
|
|
|
depend heavily on the p2p layer: additional transport protocols, transport |
|
|
|
|
|
compression, improved resilience to network partitions. These improvements to |
|
|
|
|
|
modularity, stability, and reliability of the p2p system will also make |
|
|
|
|
|
ongoing maintenance and feature development easier in the rest of Tendermint. |
|
|
|
|
|
|
|
|
|
|
|
Critique of Current Peer-to-Peer Infrastructure |
|
|
|
|
|
--------------------------------------- |
|
|
|
|
|
|
|
|
|
|
|
The current (refactored) P2P stack is an improvement on the previous iteration |
|
|
|
|
|
(legacy), but as of 0.35, there remains room for improvement in the design and |
|
|
|
|
|
implementation of the P2P layer. |
|
|
|
|
|
|
|
|
|
|
|
Some limitations of the current stack include: |
|
|
|
|
|
|
|
|
|
|
|
- heavy reliance on buffering to avoid backups in the flow of components, |
|
|
|
|
|
which is fragile to maintain and can lead to unexpected memory usage |
|
|
|
|
|
patterns and forces the routing layer to make decisions about when messages |
|
|
|
|
|
should be discarded. |
|
|
|
|
|
|
|
|
|
|
|
- the current p2p stack relies on convention (rather than the compiler) to |
|
|
|
|
|
enforce the API boundaries and conventions between reactors and the router, |
|
|
|
|
|
making it very easy to write "wrong" reactor code or introduce a bad |
|
|
|
|
|
dependency. |
|
|
|
|
|
|
|
|
|
|
|
- the current stack is probably more complex and difficult to maintain because |
|
|
|
|
|
the legacy system must coexist with the new components in 0.35. When the |
|
|
|
|
|
legacy stack is removed there are some simple changes that will become |
|
|
|
|
|
possible and could reduce the complexity of the new system. (e.g. `#6598 |
|
|
|
|
|
<https://github.com/tendermint/tendermint/issues/6598>`_.) |
|
|
|
|
|
|
|
|
|
|
|
- the current stack encapsulates a lot of information about peers, and makes it |
|
|
|
|
|
difficult to expose that information to monitoring/observability tools. This |
|
|
|
|
|
general opacity also makes it difficult to interact with the peer system |
|
|
|
|
|
from other areas of the code base (e.g. tests, reactors). |
|
|
|
|
|
|
|
|
|
|
|
- the legacy stack provided some control to operators to force the system to |
|
|
|
|
|
dial new peers or seed nodes or manipulate the topology of the system _in |
|
|
|
|
|
situ_. The current stack can't easily provide this, and while the new stack |
|
|
|
|
|
may have better behavior, it does leave operators hands tied. |
|
|
|
|
|
|
|
|
|
|
|
Some of these issues will be resolved early in the 0.36 cycle, with the |
|
|
|
|
|
removal of the legacy components. |
|
|
|
|
|
|
|
|
|
|
|
The 0.36 release also provides the opportunity to make changes to the |
|
|
|
|
|
protocol, as the release will not be compatible with previous releases. |
|
|
|
|
|
|
|
|
|
|
|
Areas for Development |
|
|
|
|
|
--------------------- |
|
|
|
|
|
|
|
|
|
|
|
These sections describe features that may make sense to include in a Phase 2 of |
|
|
|
|
|
a P2P project. |
|
|
|
|
|
|
|
|
|
|
|
Internal Message Passing |
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
|
|
|
|
|
|
|
|
Currently, there's no provision for intranode communication using the P2P |
|
|
|
|
|
layer, which means when two reactors need to interact with each other they |
|
|
|
|
|
have to have dependencies on each other's interfaces, and |
|
|
|
|
|
initialization. Changing these interactions (e.g. transitions between |
|
|
|
|
|
blocksync and consensus) from procedure calls to message passing. |
|
|
|
|
|
|
|
|
|
|
|
This is a relatively simple change and could be implemented with the following |
|
|
|
|
|
components: |
|
|
|
|
|
|
|
|
|
|
|
- a constant to represent "local" delivery as the ``To``` field on |
|
|
|
|
|
``p2p.Envelope``. |
|
|
|
|
|
|
|
|
|
|
|
- special path for routing local messages that doesn't require message |
|
|
|
|
|
serialization (protobuf marshalling/unmarshaling). |
|
|
|
|
|
|
|
|
|
|
|
Adding these semantics, particularly if in conjunction with synchronous |
|
|
|
|
|
semantics provides a solution to dependency graph problems currently present |
|
|
|
|
|
in the Tendermint codebase, which will simplify development, make it possible |
|
|
|
|
|
to isolate components for testing. |
|
|
|
|
|
|
|
|
|
|
|
Eventually, this will also make it possible to have a logical Tendermint node |
|
|
|
|
|
running in multiple processes or in a collection of containers, although the |
|
|
|
|
|
usecase of this may be debatable. |
|
|
|
|
|
|
|
|
|
|
|
Synchronous Semantics (Paired Request/Response) |
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
|
|
|
|
|
|
|
|
In the current system, all messages are sent with fire-and-forget semantics, |
|
|
|
|
|
and there's no coupling between a request sent via the p2p layer, and a |
|
|
|
|
|
response. These kinds of semantics would simplify the implementation of |
|
|
|
|
|
state and block sync reactors, and make intra-node message passing more |
|
|
|
|
|
powerful. |
|
|
|
|
|
|
|
|
|
|
|
For some interactions, like gossiping transactions between the mempools of |
|
|
|
|
|
different nodes, fire-and-forget semantics make sense, but for other |
|
|
|
|
|
operations the missing link between requests/responses leads to either |
|
|
|
|
|
inefficiency when a node fails to respond or becomes unavailable, or code that |
|
|
|
|
|
is just difficult to follow. |
|
|
|
|
|
|
|
|
|
|
|
To support this kind of work, the protocol would need to accommodate some kind |
|
|
|
|
|
of request/response ID to allow identifying out-of-order responses over a |
|
|
|
|
|
single connection. Additionally, expanded the programming model of the |
|
|
|
|
|
``p2p.Channel`` to accommodate some kind of _future_ or similar paradigm to |
|
|
|
|
|
make it viable to write reactor code without needing for the reactor developer |
|
|
|
|
|
to wrestle with lower level concurency constructs. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Timeout Handling (QoS) |
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
|
|
|
|
|
|
|
|
Currently, all timeouts, buffering, and QoS features are handled at the router |
|
|
|
|
|
layer, and the reactors are implemented in ways that assume/require |
|
|
|
|
|
asynchronous operation. This both increases the required complexity at the |
|
|
|
|
|
routing layer, and means that misbehavior at the reactor level is difficult to |
|
|
|
|
|
detect or attribute. Additionally, the current system provides three main |
|
|
|
|
|
parameters to control quality of service: |
|
|
|
|
|
|
|
|
|
|
|
- buffer sizes for channels and queues. |
|
|
|
|
|
|
|
|
|
|
|
- priorities for channels |
|
|
|
|
|
|
|
|
|
|
|
- queue implementation details for shedding load. |
|
|
|
|
|
|
|
|
|
|
|
These end up being quite coarse controls, and changing the settings are |
|
|
|
|
|
difficult because as the queues and channels are able to buffer large numbers |
|
|
|
|
|
of messages it can be hard to see the impact of a given change, particularly |
|
|
|
|
|
in our extant test environment. In general, we should endeavor to: |
|
|
|
|
|
|
|
|
|
|
|
- set real timeouts, via contexts, on most message send operations, so that |
|
|
|
|
|
senders rather than queues can be responsible for timeout |
|
|
|
|
|
logic. Additionally, this will make it possible to avoid sending messages |
|
|
|
|
|
during shutdown. |
|
|
|
|
|
|
|
|
|
|
|
- reduce (to the greatest extent possible) the amount of buffering in |
|
|
|
|
|
channels and the queues, to more readily surface backpressure and reduce the |
|
|
|
|
|
potential for buildup of stale messages. |
|
|
|
|
|
|
|
|
|
|
|
Stream Based Connection Handling |
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
|
|
|
|
|
|
|
|
Currently the transport layer is message based, which makes sense from a |
|
|
|
|
|
mental model of how the protocol works, but makes it more difficult to |
|
|
|
|
|
implement transports and connection types, as it forces a higher level view of |
|
|
|
|
|
the connection and interaction which makes it harder to implement for novel |
|
|
|
|
|
transport types and makes it more likely that message-based caching and rate |
|
|
|
|
|
limiting will be implemented at the transport layer rather than at a more |
|
|
|
|
|
appropriate level. |
|
|
|
|
|
|
|
|
|
|
|
The transport then, would be responsible for negitating the connection and the |
|
|
|
|
|
handshake and otherwise behave like a socket/file discriptor with ``Read` and |
|
|
|
|
|
``Write`` methods. |
|
|
|
|
|
|
|
|
|
|
|
While this was included in the initial design for the new P2P layer, it may be |
|
|
|
|
|
obviated entirely if the transport and peer layer is replaced with libp2p, |
|
|
|
|
|
which is primarily stream based. |
|
|
|
|
|
|
|
|
|
|
|
Service Discovery |
|
|
|
|
|
~~~~~~~~~~~~~~~~~ |
|
|
|
|
|
|
|
|
|
|
|
In the current system, Tendermint assumes that all nodes in a network are |
|
|
|
|
|
largely equivelent, and nodes tend to be "chatty" making many requests of |
|
|
|
|
|
large numbers of peers and waiting for peers to (hopefully) respond. While |
|
|
|
|
|
this works and has allowed Tendermint to get to a certain point, this both |
|
|
|
|
|
produces a theoretical scaling bottle neck and makes it harder to test and |
|
|
|
|
|
verify components of the system. |
|
|
|
|
|
|
|
|
|
|
|
In addition to peer's identity and connection information, peers should be |
|
|
|
|
|
able to advertise a number of services or capabilities, and node operators or |
|
|
|
|
|
developers should be able to specify peer capability requirements (e.g. target |
|
|
|
|
|
at least <x>-percent of peers with <y> capability.) |
|
|
|
|
|
|
|
|
|
|
|
These capabilities may be useful in selecting peers to send messages to, it |
|
|
|
|
|
may make sense to extend Tendermint's message addressing capability to allow |
|
|
|
|
|
reactors to send messages to groups of peers based on role rather than only |
|
|
|
|
|
allowing addressing to one or all peers. |
|
|
|
|
|
|
|
|
|
|
|
Having a good service discovery mechanism may pair well with the synchronous |
|
|
|
|
|
semantics (request/response) work, as it allows reactors to "make a request of |
|
|
|
|
|
a peer with <x> capability and wait for the response," rather force the |
|
|
|
|
|
reactors to need to track the capabilities or state of specific peers. |
|
|
|
|
|
|
|
|
|
|
|
Solutions |
|
|
|
|
|
--------- |
|
|
|
|
|
|
|
|
|
|
|
Continued Homegrown Implementation |
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
|
|
|
|
|
|
|
|
The current peer system is homegrown and is conceptually compatible with the |
|
|
|
|
|
needs of the project, and while there are limitations to the system, the p2p |
|
|
|
|
|
layer is not (currently as of 0.35) a major source of bugs or friction during |
|
|
|
|
|
development. |
|
|
|
|
|
|
|
|
|
|
|
However, the current implementation makes a number of allowances for |
|
|
|
|
|
interoperability, and there are a collection of iterative improvements that |
|
|
|
|
|
should be considered in the next couple of releases. To maintain the current |
|
|
|
|
|
implementation, upcoming work would include: |
|
|
|
|
|
|
|
|
|
|
|
- change the ``Transport`` mechanism to facilitate easier implementations. |
|
|
|
|
|
|
|
|
|
|
|
- implement different ``Transport`` handlers to be able to manage peer |
|
|
|
|
|
connections using different protocols (e.g. QUIC, etc.) |
|
|
|
|
|
|
|
|
|
|
|
- entirely remove the constructs and implementations of the legacy peer |
|
|
|
|
|
implementation. |
|
|
|
|
|
|
|
|
|
|
|
- establish and enforce clearer chains of responsibility for connection |
|
|
|
|
|
establishment (e.g. handshaking, setup,) which is currently shared between |
|
|
|
|
|
three components. |
|
|
|
|
|
|
|
|
|
|
|
- report better metrics regarding the into the state of peers and network |
|
|
|
|
|
connectivity, which are opaque outside of the system. This is constrained at |
|
|
|
|
|
the moment as a side effect of the split responsibility for connection |
|
|
|
|
|
establishment. |
|
|
|
|
|
|
|
|
|
|
|
- extend the PEX system to include service information so that ndoes in the |
|
|
|
|
|
network weren't necessarily homogeneous. |
|
|
|
|
|
|
|
|
|
|
|
While maintaining a bespoke peer management layer would seem to distract from |
|
|
|
|
|
development of core functionality, the truth is that (once the legacy code is |
|
|
|
|
|
removed,) the scope of the peer layer is relatively small from a maintenance |
|
|
|
|
|
perspective, and having control at this layer might actually afford the |
|
|
|
|
|
project with the ability to more rapidly iterate on some features. |
|
|
|
|
|
|
|
|
|
|
|
LibP2P |
|
|
|
|
|
~~~~~~ |
|
|
|
|
|
|
|
|
|
|
|
LibP2P provides components that, approximately, account for the |
|
|
|
|
|
``PeerManager`` and ``Transport`` components of the current (new) P2P |
|
|
|
|
|
stack. The Go APIs seem reasonable, and being able to externalize the |
|
|
|
|
|
implementation details of peer and connection management seems like it could |
|
|
|
|
|
provide a lot of benefits, particularly in supporting a more active ecosystem. |
|
|
|
|
|
|
|
|
|
|
|
In general the API provides the kind of stream-based, multi-protocol |
|
|
|
|
|
supporting, and idiomatic baseline for implementing a peer layer. Additionally |
|
|
|
|
|
because it handles peer exchange and connection management at a lower |
|
|
|
|
|
level, by using libp2p it'd be possible to remove a good deal of code in favor |
|
|
|
|
|
of just using libp2p. Having said that, Tendermint's P2P layer covers a |
|
|
|
|
|
greater scope (e.g. message routing to different peers) and that layer is |
|
|
|
|
|
something that Tendermint might want to retain. |
|
|
|
|
|
|
|
|
|
|
|
The are a number of unknowns that require more research including how much of |
|
|
|
|
|
a peer database the Tendermint engine itself needs to maintain, in order to |
|
|
|
|
|
support higher level operations (consensus, statesync), but it might be the |
|
|
|
|
|
case that our internal systems need to know much less about peers than |
|
|
|
|
|
otherwise specified. Similarly, the current system has a notion of peer |
|
|
|
|
|
scoring that cannot be communicated to libp2p, which may be fine as this is |
|
|
|
|
|
only used to support peer exchange (PEX,) which would become a property libp2p |
|
|
|
|
|
and not expressed in it's current higher-level form. |
|
|
|
|
|
|
|
|
|
|
|
In general, the effort to switch to libp2p would involve: |
|
|
|
|
|
|
|
|
|
|
|
- timing it during an appropriate protocol-breaking window, as it doesn't seem |
|
|
|
|
|
viable to support both libp2p *and* the current p2p protocol. |
|
|
|
|
|
|
|
|
|
|
|
- providing some in-memory testing network to support the use case that the |
|
|
|
|
|
current ``p2p.MemoryNetwork`` provides. |
|
|
|
|
|
|
|
|
|
|
|
- re-homing the ``p2p.Router`` implementation on top of libp2p components to |
|
|
|
|
|
be able to maintain the current reactor implementations. |
|
|
|
|
|
|
|
|
|
|
|
Open question include: |
|
|
|
|
|
|
|
|
|
|
|
- how much local buffering should we be doing? It sort of seems like we should |
|
|
|
|
|
figure out what the expected behavior is for libp2p for QoS-type |
|
|
|
|
|
functionality, and if our requirements mean that we should be implementing |
|
|
|
|
|
this on top of things ourselves? |
|
|
|
|
|
|
|
|
|
|
|
- if Tendermint was going to use libp2p, how would libp2p's stability |
|
|
|
|
|
guarantees (protocol, etc.) impact/constrain Tendermint's stability |
|
|
|
|
|
guarantees? |
|
|
|
|
|
|
|
|
|
|
|
- what kind of introspection does libp2p provide, and to what extend would |
|
|
|
|
|
this change or constrain the kind of observability that Tendermint is able |
|
|
|
|
|
to provide? |
|
|
|
|
|
|
|
|
|
|
|
- how do efforts to select "the best" (healthy, close, well-behaving, etc.) |
|
|
|
|
|
peers work out if Tendermint is not maintaining a local peer database? |
|
|
|
|
|
|
|
|
|
|
|
- would adding additional higher level semantics (internal message passing, |
|
|
|
|
|
request/response pairs, service discovery, etc.) facilitate removing some of |
|
|
|
|
|
the direct linkages between constructs/components in the system and reduce |
|
|
|
|
|
the need for Tendermint nodes to maintain state about its peers? |
|
|
|
|
|
|
|
|
|
|
|
References |
|
|
|
|
|
---------- |
|
|
|
|
|
|
|
|
|
|
|
- `Tracking Ticket for P2P Refactor Project <https://github.com/tendermint/tendermint/issues/5670>`_ |
|
|
|
|
|
- `ADR 61: P2P Refactor Scope <../architecture/adr-061-p2p-refactor-scope.md>`_ |
|
|
|
|
|
- `ADR 62: P2P Architecture and Abstraction <../architecture/adr-061-p2p-architecture.md>`_ |