From 23abb0de8b5b7bf77bfcf95fbbbc7715420b04e4 Mon Sep 17 00:00:00 2001
From: Sam Kleinman <garen@tychoish.com>
Date: Fri, 27 Aug 2021 12:14:59 -0400
Subject: [PATCH] rfc: p2p next steps (#6866)

---
 docs/rfc/README.md               |   2 +
 docs/rfc/rfc-000-p2p-roadmap.rst | 316 +++++++++++++++++++++++++++++++
 2 files changed, 318 insertions(+)
 create mode 100644 docs/rfc/rfc-000-p2p-roadmap.rst

diff --git a/docs/rfc/README.md b/docs/rfc/README.md
index c05853aca..68733c8e8 100644
--- a/docs/rfc/README.md
+++ b/docs/rfc/README.md
@@ -37,4 +37,6 @@ sections.
 
 ## Table of Contents
 
+- [RFC-000: P2P Roadmap](./rfc-000.p2p-roadmap.rst)
+
 <!-- - [RFC-NNN: Title](./rfc-NNN.title.md) -->
diff --git a/docs/rfc/rfc-000-p2p-roadmap.rst b/docs/rfc/rfc-000-p2p-roadmap.rst
new file mode 100644
index 000000000..64dda773e
--- /dev/null
+++ b/docs/rfc/rfc-000-p2p-roadmap.rst
@@ -0,0 +1,316 @@
+====================
+RFC 000: P2P Roadmap
+====================
+
+Changelog
+---------
+
+- 2021-08-20: Completed initial draft and distributed via a gist
+- 2021-08-25: Migrated as an RFC and changed format
+
+Abstract
+--------
+
+This document discusses the future of peer network management in Tendermint, with
+a particular focus on features, semantics, and a proposed roadmap.
+Specifically, we consider libp2p as a tool kit for implementing some fundamentals.
+
+Background
+----------
+
+For the 0.35 release cycle the switching/routing layer of Tendermint was
+replaced. This work was done "in place," and produced a version of Tendermint
+that was backward-compatible and interoperable with previous versions of the
+software. While there are new p2p/peer management constructs in the new
+version (e.g. ``PeerManager`` and ``Router``), the main effect of this change
+was to simplify the ways that other components within Tendermint interacted with
+the peer management layer, and to make it possible for higher-level components
+(specifically the reactors), to be used and tested more independently.
+
+This refactoring, which was a major undertaking, was entirely necessary to
+enable areas for future development and iteration on this aspect of
+Tendermint. There are also a number of potential user-facing features that
+depend heavily on the p2p layer: additional transport protocols, transport
+compression, improved resilience to network partitions. These improvements to
+modularity, stability, and reliability of the p2p system will also make
+ongoing maintenance and feature development easier in the rest of Tendermint.
+
+Critique of Current Peer-to-Peer Infrastructure
+---------------------------------------
+
+The current (refactored) P2P stack is an improvement on the previous iteration
+(legacy), but as of 0.35, there remains room for improvement in the design and
+implementation of the P2P layer. 
+
+Some limitations of the current stack include:
+
+- heavy reliance on buffering to avoid backups in the flow of components,
+  which is fragile to maintain and can lead to unexpected memory usage
+  patterns and forces the routing layer to make decisions about when messages
+  should be discarded. 
+  
+- the current p2p stack relies on convention (rather than the compiler) to
+  enforce the API boundaries and conventions between reactors and the router,
+  making it very easy to write "wrong" reactor code or introduce a bad
+  dependency.
+
+- the current stack is probably more complex and difficult to maintain because
+  the legacy system must coexist with the new components in 0.35. When the
+  legacy stack is removed there are some simple changes that will become
+  possible and could reduce the complexity of the new system. (e.g. `#6598
+  <https://github.com/tendermint/tendermint/issues/6598>`_.)
+
+- the current stack encapsulates a lot of information about peers, and makes it
+  difficult to expose that information to monitoring/observability tools. This
+  general opacity also makes it difficult to interact with the peer system
+  from other areas of the code base (e.g. tests, reactors).
+  
+- the legacy stack provided some control to operators to force the system to
+  dial new peers or seed nodes or manipulate the topology of the system _in
+  situ_. The current stack can't easily provide this, and while the new stack
+  may have better behavior, it does leave operators hands tied.
+
+Some of these issues will be resolved early in the 0.36 cycle, with the
+removal of the legacy components.
+
+The 0.36 release also provides the opportunity to make changes to the
+protocol, as the release will not be compatible with previous releases.
+
+Areas for Development
+---------------------
+
+These sections describe features that may make sense to include in a Phase 2 of
+a P2P project.
+
+Internal Message Passing
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Currently, there's no provision for intranode communication using the P2P
+layer, which means when two reactors need to interact with each other they
+have to have dependencies on each other's interfaces, and
+initialization. Changing these interactions (e.g. transitions between
+blocksync and consensus) from procedure calls to message passing.
+
+This is a relatively simple change and could be implemented with the following
+components:
+
+- a constant to represent "local" delivery as  the ``To``` field on
+  ``p2p.Envelope``.
+  
+- special path for routing local messages that doesn't require message
+  serialization (protobuf marshalling/unmarshaling).
+  
+Adding these semantics, particularly if in conjunction with synchronous
+semantics provides a solution to dependency graph problems currently present
+in the Tendermint codebase, which will simplify development, make it possible
+to isolate components for testing. 
+
+Eventually, this will also make it possible to have a logical Tendermint node
+running in multiple processes or in a collection of containers, although the
+usecase of this may be debatable.
+
+Synchronous Semantics (Paired Request/Response)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In the current system, all messages are sent with fire-and-forget semantics,
+and there's no coupling between a request sent via the p2p layer, and a
+response. These kinds of semantics would simplify the implementation of
+state and block sync reactors, and make intra-node message passing more
+powerful.
+
+For some interactions, like gossiping transactions between the mempools of
+different nodes, fire-and-forget semantics make sense, but for other
+operations the missing link between requests/responses leads to either
+inefficiency when a node fails to respond or becomes unavailable, or code that
+is just difficult to follow.
+
+To support this kind of work, the protocol would need to accommodate some kind
+of request/response ID to allow identifying out-of-order responses over a
+single connection. Additionally, expanded the programming model of the
+``p2p.Channel`` to accommodate some kind of _future_ or similar paradigm to
+make it viable to write reactor code without needing for the reactor developer
+to wrestle with lower level concurency constructs.
+
+
+Timeout Handling (QoS)
+~~~~~~~~~~~~~~~~~~~~~~
+
+Currently, all timeouts, buffering, and QoS features are handled at the router
+layer, and the reactors are implemented in ways that assume/require
+asynchronous operation. This both increases the required complexity at the
+routing layer, and means that misbehavior at the reactor level is difficult to
+detect or attribute. Additionally, the current system provides three main
+parameters to control quality of service:
+
+- buffer sizes for channels and queues.
+  
+- priorities for channels
+
+- queue implementation details for shedding load.
+
+These end up being quite coarse controls, and changing the settings are
+difficult because as the queues and channels are able to buffer large numbers
+of messages it can be hard to see the impact of a given change, particularly
+in our extant test environment. In general, we should endeavor to: 
+
+- set real timeouts, via contexts, on most message send operations, so that
+  senders rather than queues can be responsible for timeout
+  logic. Additionally, this will make it possible to avoid sending messages
+  during shutdown.
+  
+- reduce (to the greatest extent possible) the amount of buffering in
+  channels and the queues, to more readily surface backpressure and reduce the
+  potential for buildup of stale messages.
+
+Stream Based Connection Handling
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Currently the transport layer is message based, which makes sense from a
+mental model of how the protocol works, but makes it more difficult to
+implement transports and connection types, as it forces a higher level view of
+the connection and interaction which makes it harder to implement for novel
+transport types and makes it more likely that message-based caching and rate
+limiting will be implemented at the transport layer rather than at a more
+appropriate level.
+
+The transport then, would be responsible for negitating the connection and the
+handshake and otherwise behave like a socket/file discriptor with ``Read` and
+``Write`` methods.
+
+While this was included in the initial design for the new P2P layer, it may be
+obviated entirely if the transport and peer layer is replaced with libp2p,
+which is primarily stream based.
+
+Service Discovery
+~~~~~~~~~~~~~~~~~
+
+In the current system, Tendermint assumes that all nodes in a network are
+largely equivelent, and nodes tend to be "chatty" making many requests of
+large numbers of peers and waiting for peers to (hopefully) respond. While
+this works and has allowed Tendermint to get to a certain point, this both
+produces a theoretical scaling bottle neck and makes it harder to test and
+verify components of the system.
+
+In addition to peer's identity and connection information, peers should be
+able to advertise a number of services or capabilities, and node operators or
+developers should be able to specify peer capability requirements (e.g. target
+at least <x>-percent of peers with <y> capability.)  
+
+These capabilities may be useful in selecting peers to send messages to, it
+may make sense to extend Tendermint's message addressing capability to allow
+reactors to send messages to groups of peers based on role rather than only
+allowing addressing to one or all peers.
+
+Having a good service discovery mechanism may pair well with the synchronous
+semantics (request/response) work, as it allows reactors to "make a request of
+a peer with <x> capability and wait for the response," rather force the
+reactors to need to track the capabilities or state of specific peers.
+
+Solutions
+---------
+
+Continued Homegrown Implementation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The current peer system is homegrown and is conceptually compatible with the
+needs of the project, and while there are limitations to the system, the p2p
+layer is not (currently as of 0.35) a major source of bugs or friction during
+development. 
+
+However, the current implementation makes a number of allowances for
+interoperability, and there are a collection of iterative improvements that
+should be considered in the next couple of releases. To maintain the current
+implementation, upcoming work would include:
+
+- change the ``Transport`` mechanism to facilitate easier implementations.
+
+- implement different ``Transport`` handlers to be able to manage peer
+  connections using different protocols (e.g. QUIC, etc.)
+
+- entirely remove the constructs and implementations of the legacy peer
+  implementation. 
+  
+- establish and enforce clearer chains of responsibility for connection
+  establishment (e.g. handshaking, setup,) which is currently shared between
+  three components. 
+
+- report better metrics regarding the into the state of peers and network
+  connectivity, which are opaque outside of the system. This is constrained at
+  the moment as a side effect of the split responsibility for connection
+  establishment.
+  
+- extend the PEX system to include service information so that ndoes in the
+  network weren't necessarily homogeneous.
+
+While maintaining a bespoke peer management layer would seem to distract from
+development of core functionality, the truth is that (once the legacy code is
+removed,) the scope of the peer layer is relatively small from a maintenance
+perspective, and having control at this layer might actually afford the
+project with the ability to more rapidly iterate on some features.
+
+LibP2P
+~~~~~~
+
+LibP2P provides components that, approximately, account for the
+``PeerManager`` and ``Transport`` components of the current (new) P2P
+stack. The Go APIs seem reasonable, and being able to externalize the
+implementation details of peer and connection management seems like it could
+provide a lot of benefits, particularly in supporting a more active ecosystem.
+
+In general the API provides the kind of stream-based, multi-protocol
+supporting, and idiomatic baseline for implementing a peer layer. Additionally
+because it handles peer exchange and connection management at a lower
+level, by using libp2p it'd be possible to remove a good deal of code in favor
+of just using libp2p. Having said that, Tendermint's P2P layer covers a
+greater scope (e.g. message routing to different peers) and that layer is
+something that Tendermint might want to retain.
+
+The are a number of unknowns that require more research including how much of
+a peer database the Tendermint engine itself needs to maintain, in order to
+support higher level operations (consensus, statesync), but it might be the
+case that our internal systems need to know much less about peers than
+otherwise specified. Similarly, the current system has a notion of peer
+scoring that cannot be communicated to libp2p, which may be fine as this is
+only used to support peer exchange (PEX,) which would become a property libp2p
+and not expressed in it's current higher-level form. 
+
+In general, the effort to switch to libp2p would involve: 
+
+- timing it during an appropriate protocol-breaking window, as it doesn't seem
+  viable to support both libp2p *and* the current p2p protocol. 
+  
+- providing some in-memory testing network to support the use case that the
+  current ``p2p.MemoryNetwork`` provides.
+
+- re-homing the ``p2p.Router`` implementation on top of libp2p components to
+  be able to maintain the current reactor implementations.
+  
+Open question include: 
+
+- how much local buffering should we be doing? It sort of seems like we should
+  figure out what the expected behavior is for libp2p for QoS-type
+  functionality, and if our requirements mean that we should be implementing
+  this on top of things ourselves?
+
+- if Tendermint was going to use libp2p, how would libp2p's stability
+  guarantees (protocol, etc.) impact/constrain Tendermint's stability
+  guarantees?
+
+- what kind of introspection does libp2p provide, and to what extend would
+  this change or constrain the kind of observability that Tendermint is able
+  to provide?
+
+- how do efforts to select "the best" (healthy, close, well-behaving, etc.)
+  peers work out if Tendermint is not maintaining a local peer database?
+  
+- would adding additional higher level semantics (internal message passing,
+  request/response pairs, service discovery, etc.) facilitate removing some of
+  the direct linkages between constructs/components in the system and reduce
+  the need for Tendermint nodes to maintain state about its peers?
+
+References
+----------
+
+- `Tracking Ticket for P2P Refactor Project <https://github.com/tendermint/tendermint/issues/5670>`_ 
+- `ADR 61: P2P Refactor Scope <../architecture/adr-061-p2p-refactor-scope.md>`_
+- `ADR 62: P2P Architecture and Abstraction <../architecture/adr-061-p2p-architecture.md>`_