rfc: p2p next steps (#6866)

3 years ago · 23abb0de8b
--- a/docs/rfc/README.md
+++ b/docs/rfc/README.md
@ -37,4 +37,6 @@ sections.
 ## Table of Contents
 - [RFC-000: P2P Roadmap](./rfc-000.p2p-roadmap.rst)
 <!-- - [RFC-NNN: Title](./rfc-NNN.title.md) -->
--- a/docs/rfc/rfc-000-p2p-roadmap.rst
+++ b/docs/rfc/rfc-000-p2p-roadmap.rst
@ -0,0 +1,316 @@
 ====================
 RFC 000: P2P Roadmap
 ====================
 Changelog
 ---------
 - 2021-08-20: Completed initial draft and distributed via a gist
 - 2021-08-25: Migrated as an RFC and changed format
 Abstract
 --------
 This document discusses the future of peer network management in Tendermint, with
 a particular focus on features, semantics, and a proposed roadmap.
 Specifically, we consider libp2p as a tool kit for implementing some fundamentals.
 Background
 ----------
 For the 0.35 release cycle the switching/routing layer of Tendermint was
 replaced. This work was done "in place," and produced a version of Tendermint
 that was backward-compatible and interoperable with previous versions of the
 software. While there are new p2p/peer management constructs in the new
 version (e.g. ``PeerManager`` and ``Router``), the main effect of this change
 was to simplify the ways that other components within Tendermint interacted with
 the peer management layer, and to make it possible for higher-level components
 (specifically the reactors), to be used and tested more independently.
 This refactoring, which was a major undertaking, was entirely necessary to
 enable areas for future development and iteration on this aspect of
 Tendermint. There are also a number of potential user-facing features that
 depend heavily on the p2p layer: additional transport protocols, transport
 compression, improved resilience to network partitions. These improvements to
 modularity, stability, and reliability of the p2p system will also make
 ongoing maintenance and feature development easier in the rest of Tendermint.
 Critique of Current Peer-to-Peer Infrastructure
 ---------------------------------------
 The current (refactored) P2P stack is an improvement on the previous iteration
 (legacy), but as of 0.35, there remains room for improvement in the design and
 implementation of the P2P layer. 
 Some limitations of the current stack include:
 - heavy reliance on buffering to avoid backups in the flow of components,
  which is fragile to maintain and can lead to unexpected memory usage
  patterns and forces the routing layer to make decisions about when messages
  should be discarded. 
 - the current p2p stack relies on convention (rather than the compiler) to
  enforce the API boundaries and conventions between reactors and the router,
  making it very easy to write "wrong" reactor code or introduce a bad
  dependency.
 - the current stack is probably more complex and difficult to maintain because
  the legacy system must coexist with the new components in 0.35. When the
  legacy stack is removed there are some simple changes that will become
  possible and could reduce the complexity of the new system. (e.g. `#6598
  <https://github.com/tendermint/tendermint/issues/6598>`_.)
 - the current stack encapsulates a lot of information about peers, and makes it
  difficult to expose that information to monitoring/observability tools. This
  general opacity also makes it difficult to interact with the peer system
  from other areas of the code base (e.g. tests, reactors).
 - the legacy stack provided some control to operators to force the system to
  dial new peers or seed nodes or manipulate the topology of the system _in
  situ_. The current stack can't easily provide this, and while the new stack
  may have better behavior, it does leave operators hands tied.
 Some of these issues will be resolved early in the 0.36 cycle, with the
 removal of the legacy components.
 The 0.36 release also provides the opportunity to make changes to the
 protocol, as the release will not be compatible with previous releases.
 Areas for Development
 ---------------------
 These sections describe features that may make sense to include in a Phase 2 of
 a P2P project.
 Internal Message Passing
 ~~~~~~~~~~~~~~~~~~~~~~~~
 Currently, there's no provision for intranode communication using the P2P
 layer, which means when two reactors need to interact with each other they
 have to have dependencies on each other's interfaces, and
 initialization. Changing these interactions (e.g. transitions between
 blocksync and consensus) from procedure calls to message passing.
 This is a relatively simple change and could be implemented with the following
 components:
 - a constant to represent "local" delivery as  the ``To``` field on
  ``p2p.Envelope``.
 - special path for routing local messages that doesn't require message
  serialization (protobuf marshalling/unmarshaling).
 Adding these semantics, particularly if in conjunction with synchronous
 semantics provides a solution to dependency graph problems currently present
 in the Tendermint codebase, which will simplify development, make it possible
 to isolate components for testing. 
 Eventually, this will also make it possible to have a logical Tendermint node
 running in multiple processes or in a collection of containers, although the
 usecase of this may be debatable.
 Synchronous Semantics (Paired Request/Response)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 In the current system, all messages are sent with fire-and-forget semantics,
 and there's no coupling between a request sent via the p2p layer, and a
 response. These kinds of semantics would simplify the implementation of
 state and block sync reactors, and make intra-node message passing more
 powerful.
 For some interactions, like gossiping transactions between the mempools of
 different nodes, fire-and-forget semantics make sense, but for other
 operations the missing link between requests/responses leads to either
 inefficiency when a node fails to respond or becomes unavailable, or code that
 is just difficult to follow.
 To support this kind of work, the protocol would need to accommodate some kind
 of request/response ID to allow identifying out-of-order responses over a
 single connection. Additionally, expanded the programming model of the
 ``p2p.Channel`` to accommodate some kind of _future_ or similar paradigm to
 make it viable to write reactor code without needing for the reactor developer
 to wrestle with lower level concurency constructs.
 Timeout Handling (QoS)
 ~~~~~~~~~~~~~~~~~~~~~~
 Currently, all timeouts, buffering, and QoS features are handled at the router
 layer, and the reactors are implemented in ways that assume/require
 asynchronous operation. This both increases the required complexity at the
 routing layer, and means that misbehavior at the reactor level is difficult to
 detect or attribute. Additionally, the current system provides three main
 parameters to control quality of service:
 - buffer sizes for channels and queues.
 - priorities for channels
 - queue implementation details for shedding load.
 These end up being quite coarse controls, and changing the settings are
 difficult because as the queues and channels are able to buffer large numbers
 of messages it can be hard to see the impact of a given change, particularly
 in our extant test environment. In general, we should endeavor to: 
 - set real timeouts, via contexts, on most message send operations, so that
  senders rather than queues can be responsible for timeout
  logic. Additionally, this will make it possible to avoid sending messages
  during shutdown.
 - reduce (to the greatest extent possible) the amount of buffering in
  channels and the queues, to more readily surface backpressure and reduce the
  potential for buildup of stale messages.
 Stream Based Connection Handling
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Currently the transport layer is message based, which makes sense from a
 mental model of how the protocol works, but makes it more difficult to
 implement transports and connection types, as it forces a higher level view of
 the connection and interaction which makes it harder to implement for novel
 transport types and makes it more likely that message-based caching and rate
 limiting will be implemented at the transport layer rather than at a more
 appropriate level.
 The transport then, would be responsible for negitating the connection and the
 handshake and otherwise behave like a socket/file discriptor with ``Read` and
 ``Write`` methods.
 While this was included in the initial design for the new P2P layer, it may be
 obviated entirely if the transport and peer layer is replaced with libp2p,
 which is primarily stream based.
 Service Discovery
 ~~~~~~~~~~~~~~~~~
 In the current system, Tendermint assumes that all nodes in a network are
 largely equivelent, and nodes tend to be "chatty" making many requests of
 large numbers of peers and waiting for peers to (hopefully) respond. While
 this works and has allowed Tendermint to get to a certain point, this both
 produces a theoretical scaling bottle neck and makes it harder to test and
 verify components of the system.
 In addition to peer's identity and connection information, peers should be
 able to advertise a number of services or capabilities, and node operators or
 developers should be able to specify peer capability requirements (e.g. target
 at least <x>-percent of peers with <y> capability.)  
 These capabilities may be useful in selecting peers to send messages to, it
 may make sense to extend Tendermint's message addressing capability to allow
 reactors to send messages to groups of peers based on role rather than only
 allowing addressing to one or all peers.
 Having a good service discovery mechanism may pair well with the synchronous
 semantics (request/response) work, as it allows reactors to "make a request of
 a peer with <x> capability and wait for the response," rather force the
 reactors to need to track the capabilities or state of specific peers.
 Solutions
 ---------
 Continued Homegrown Implementation
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The current peer system is homegrown and is conceptually compatible with the
 needs of the project, and while there are limitations to the system, the p2p
 layer is not (currently as of 0.35) a major source of bugs or friction during
 development. 
 However, the current implementation makes a number of allowances for
 interoperability, and there are a collection of iterative improvements that
 should be considered in the next couple of releases. To maintain the current
 implementation, upcoming work would include:
 - change the ``Transport`` mechanism to facilitate easier implementations.
 - implement different ``Transport`` handlers to be able to manage peer
  connections using different protocols (e.g. QUIC, etc.)
 - entirely remove the constructs and implementations of the legacy peer
  implementation. 
 - establish and enforce clearer chains of responsibility for connection
  establishment (e.g. handshaking, setup,) which is currently shared between
  three components. 
 - report better metrics regarding the into the state of peers and network
  connectivity, which are opaque outside of the system. This is constrained at
  the moment as a side effect of the split responsibility for connection
  establishment.
 - extend the PEX system to include service information so that ndoes in the
  network weren't necessarily homogeneous.
 While maintaining a bespoke peer management layer would seem to distract from
 development of core functionality, the truth is that (once the legacy code is
 removed,) the scope of the peer layer is relatively small from a maintenance
 perspective, and having control at this layer might actually afford the
 project with the ability to more rapidly iterate on some features.
 LibP2P
 ~~~~~~
 LibP2P provides components that, approximately, account for the
 ``PeerManager`` and ``Transport`` components of the current (new) P2P
 stack. The Go APIs seem reasonable, and being able to externalize the
 implementation details of peer and connection management seems like it could
 provide a lot of benefits, particularly in supporting a more active ecosystem.
 In general the API provides the kind of stream-based, multi-protocol
 supporting, and idiomatic baseline for implementing a peer layer. Additionally
 because it handles peer exchange and connection management at a lower
 level, by using libp2p it'd be possible to remove a good deal of code in favor
 of just using libp2p. Having said that, Tendermint's P2P layer covers a
 greater scope (e.g. message routing to different peers) and that layer is
 something that Tendermint might want to retain.
 The are a number of unknowns that require more research including how much of
 a peer database the Tendermint engine itself needs to maintain, in order to
 support higher level operations (consensus, statesync), but it might be the
 case that our internal systems need to know much less about peers than
 otherwise specified. Similarly, the current system has a notion of peer
 scoring that cannot be communicated to libp2p, which may be fine as this is
 only used to support peer exchange (PEX,) which would become a property libp2p
 and not expressed in it's current higher-level form. 
 In general, the effort to switch to libp2p would involve: 
 - timing it during an appropriate protocol-breaking window, as it doesn't seem
  viable to support both libp2p *and* the current p2p protocol. 
 - providing some in-memory testing network to support the use case that the
  current ``p2p.MemoryNetwork`` provides.
 - re-homing the ``p2p.Router`` implementation on top of libp2p components to
  be able to maintain the current reactor implementations.
 Open question include: 
 - how much local buffering should we be doing? It sort of seems like we should
  figure out what the expected behavior is for libp2p for QoS-type
  functionality, and if our requirements mean that we should be implementing
  this on top of things ourselves?
 - if Tendermint was going to use libp2p, how would libp2p's stability
  guarantees (protocol, etc.) impact/constrain Tendermint's stability
  guarantees?
 - what kind of introspection does libp2p provide, and to what extend would
  this change or constrain the kind of observability that Tendermint is able
  to provide?
 - how do efforts to select "the best" (healthy, close, well-behaving, etc.)
  peers work out if Tendermint is not maintaining a local peer database?
 - would adding additional higher level semantics (internal message passing,
  request/response pairs, service discovery, etc.) facilitate removing some of
  the direct linkages between constructs/components in the system and reduce
  the need for Tendermint nodes to maintain state about its peers?
 References
 ----------
 - `Tracking Ticket for P2P Refactor Project <https://github.com/tendermint/tendermint/issues/5670>`_ 
 - `ADR 61: P2P Refactor Scope <../architecture/adr-061-p2p-refactor-scope.md>`_
 - `ADR 62: P2P Architecture and Abstraction <../architecture/adr-061-p2p-architecture.md>`_