diff --git a/docs/architecture/README.md b/docs/architecture/README.md index a4e326274..7025a72f6 100644 --- a/docs/architecture/README.md +++ b/docs/architecture/README.md @@ -97,3 +97,4 @@ Note the context/background should be written in the present tense. - [ADR-041: Proposer-Selection-via-ABCI](./adr-041-proposer-selection-via-abci.md) - [ADR-045: ABCI-Evidence](./adr-045-abci-evidence.md) - [ADR-057: RPC](./adr-057-RPC.md) +- [ADR-069: Node Initialization](./adr-069-flexible-node-initialization.md) diff --git a/docs/architecture/adr-069-flexible-node-intitalization.md b/docs/architecture/adr-069-flexible-node-intitalization.md new file mode 100644 index 000000000..ec66725be --- /dev/null +++ b/docs/architecture/adr-069-flexible-node-intitalization.md @@ -0,0 +1,273 @@ +# ADR 069: Flexible Node Initialization + +## Changlog + +- 2021-06-09: Initial Draft (@tychoish) + +- 2021-07-21: Major Revision (@tychoish) + +## Status + +Proposed. + +## Context + +In an effort to support [Go-API-Stability](./adr-060-go-api-stability.md), +during the 0.35 development cycle, we have attempted to reduce the the API +surface area by moving most of the interface of the `node` package into +unexported functions, as well as moving the reactors to an `internal` +package. Having this coincide with the 0.35 release made a lot of sense +because these interfaces were _already_ changing as a result of the `p2p` +[refactor](./adr-061-p2p-refactor-scope.md), so it made sense to think a bit +more about how tendermint exposes this API. + +While the interfaces of the P2P layer and most of the node package are already +internalized, this precludes some operational patterns that are important to +users who use tendermint as a library. Specifically, introspecting the +tendermint node service and replacing components is not supported in the latest +version of the code, and some of these use cases would require maintaining a +vendor copy of the code. Adding these features requires rather extensive +(internal/implementation) changes to the `node` and `rpc` packages, and this +ADR describes a model for changing the way that tendermint nodes initialize, in +service of providing this kind of functionality. + +We consider node initialization, because the current implemention +provides strong connections between all components, as well as between +the components of the node and the RPC layer, and being able to think +about the interactions of these components will help enable these +features and help define the requirements of the node package. + +## Alternative Approaches + +These alternatives are presented to frame the design space and to +contextualize the decision in terms of product requirements. These +ideas are not inherently bad, and may even be possible or desireable +in the (distant) future, and merely provide additional context for how +we, in the moment came to our decision(s). + +### Do Nothing + +The current implementation is functional and sufficient for the vast +majority of use cases (e.g., all users of the Cosmos-SDK as well as +anyone who runs tendermint and the ABCI application in separate +processes). In the current implementation, and even previous versions, +modifying node initialization or injecting custom components required +copying most of the `node` package, which required such users +to maintain a vendored copy of tendermint. + +While this is (likely) not tenable in the long term, as users do want +more modularity, and the current service implementation is brittle and +difficult to maintain, in the short term it may be possible to delay +implementation somewhat. Eventually, however, we will need to make the +`node` package easier to maintain and reason about. + +### Generic Service Pluggability + +One possible system design would export interfaces (in the Golang +sense) for all components of the system, to permit runtime dependency +injection of all components in the system, so that users can compose +tendermint nodes of arbitrary user-supplied components. + +Although this level of customization would provide benefits, it would be a huge +undertaking (particularly with regards to API design work) that we do not have +scope for at the moment. Eventually providing support for some kinds of +pluggability may be useful, so the current solution does not explicitly +foreclose the possibility of this alternative. + +### Abstract Dependency Based Startup and Shutdown + +The main proposal in this document makes tendermint node initialization simpler +and more abstract, but the system lacks a number of +features which daemon/service initialization could provide, such as a +system allowing the authors of services to control initialization and shutdown order +of components using dependency relationships. + +Such a system could work by allowing services to declare +initialization order dependencies to other reactors (by ID, perhaps) +so that the node could decide the initialization based on the +dependencies declared by services rather than requiring the node to +encode this logic directly. + +This level of configuration is probably more complicated than is needed. Given +that the authors of components in the current implementation of tendermint +already *do* need to know about other components, a dependency-based system +would probably be overly-abstract at this stage. + +## Decisions + +- To the greatest extent possible, factor the code base so that + packages are responsible for their own initialization, and minimize + the amount of code in the `node` package itself. + +- As a design goal, reduce direct coupling and dependencies between + components in the implementation of `node`. + +- Begin iterating on a more-flexible internal framework for + initializing tendermint nodes to make the initatilization process + less hard-coded by the implementation of the node objects. + + - Reactors should not need to expose their interfaces *within* the + implementation of the node type + + - This refactoring should be entirely opaque to users. + + - These node initialization changes should not require a + reevaluation of the `service.Service` or a generic initialization + orchestration framework. + +- Do not proactively provide a system for injecting + components/services within a tendtermint node, though make it + possible to retrofit this kind of plugability in the future if + needed. + +- Prioritize implementation of p2p-based statesync reactor to obviate + need for users to inject a custom state-sync provider. + +## Detailed Design + +The [current +nodeImpl](https://github.com/tendermint/tendermint/blob/master/node/node.go#L47) +includes direct references to the implementations of each of the +reactors, which should be replaced by references to `service.Service` +objects. This will require moving construction of the [rpc +service](https://github.com/tendermint/tendermint/blob/master/node/node.go#L771) +into the constructor of +[makeNode](https://github.com/tendermint/tendermint/blob/master/node/node.go#L126). One +possible implementation of this would be to eliminate the current +`ConfigureRPC` method on the node package and instead [configure it +here](https://github.com/tendermint/tendermint/pull/6798/files#diff-375d57e386f20eaa5f09f02bb9d28bfc48ac3dca18d0325f59492208219e5618R441). + +To avoid adding complexity to the `node` package, we will add a +composite service implementation to the `service` package +that implements `service.Service` and is composed of a sequence of +underlying `service.Service` objects and handles their +startup/shutdown in the specified sequential order. + +Consensus, blocksync (*née* fast sync), and statesync all depend on +each other, and have significant initialization dependencies that are +presently encoded in the `node` package. As part of this change, a +new package/component (likely named `blocks` located at +`internal/blocks`) will encapsulate the initialization of these block +management areas of the code. + +### Injectable Component Option + +This section briefly describes a possible implementation for +user-supplied services running within a node. This should not be +implemented unless user-supplied components are a hard requirement for +a user. + +In order to allow components to be replaced, a new public function +will be added to the public interface of `node` with a signature that +resembles the following: + +```go +func NewWithServices(conf *config.Config, + logger log.Logger, + cf proxy.ClientCreator, + gen *types.GenesisDoc, + srvs []service.Service, +) (service.Service, error) { +``` + +The `service.Service` objects will be initialized in the order supplied, after +all pre-configured/default services have started (and shut down in reverse +order). The given services may implement additional interfaces, allowing them +to replace specific default services. `NewWithServices` will validate input +service lists with the following rules: + +- None of the services may already be running. +- The caller may not supply more than one replacement reactor for a given + default service type. + +If callers violate any of these rules, `NewWithServices` will return +an error. To retract support for this kind of operation in the future, +the function can be modified to *always* return an error. + +## Consequences + +### Positive + +- The node package will become easier to maintain. + +- It will become easier to add additional services within tendermint + nodes. + +- It will become possible to replace default components in the node + package without vendoring the tendermint repo and modifying internal + code. + +- The current end-to-end (e2e) test suite will be able to prevent any + regressions, and the new functionality can be thoroughly unit tested. + +- The scope of this project is very narrow, which minimizes risk. + +### Negative + +- This increases our reliance on the `service.Service` interface which + is probably not an interface that we want to fully commit to. + +- This proposal implements a fairly minimal set of functionality and + leaves open the possibility for many additional features which are + not included in the scope of this proposal. + +### Neutral + +N/A + +## Open Questions + +- To what extent does this new initialization framework need to accommodate + the legacy p2p stack? Would it be possible to delay a great deal of this + work to the 0.36 cycle to avoid this complexity? + + - Answer: _depends on timing_, and the requirement to ship pluggable reactors in 0.35. + +- Where should additional public types be exported for the 0.35 + release? + + Related to the general project of API stabilization we want to deprecate + the `types` package, and move its contents into a new `pkg` hierarchy; + however, the design of the `pkg` interface is currently underspecified. + If `types` is going to remain for the 0.35 release, then we should consider + the impact of using multiple organizing modalities for this code within a + single release. + +## Future Work + +- Improve or simplify the `service.Service` interface. There are some + pretty clear limitations with this interface as written (there's no + way to timeout slow startup or shut down, the cycle between the + `service.BaseService` and `service.Service` implementations is + troubling, the default panic in `OnReset` seems troubling.) + +- As part of the refactor of `service.Service` have all services/nodes + respect the lifetime of a `context.Context` object, and avoid the + current practice of creating `context.Context` objects in p2p and + reactor code. This would be required for in-process multi-tenancy. + +- Support explicit dependencies between components and allow for + parallel startup, so that different reactors can startup at the same + time, where possible. + +## References + +- [this + branch](https://github.com/tendermint/tendermint/tree/tychoish/scratch-node-minimize) + contains experimental work in the implementation of the node package + to unwind some of the hard dependencies between components. + +- [the component + graph](https://peter.bourgon.org/go-for-industrial-programming/#the-component-graph) + as a framing for internal service construction. + +## Appendix + +### Dependencies + +There's a relationship between the blockchain and consensus reactor +described by the following dependency graph makes replacing some of +these components more difficult relative to other reactors or +components. + +![consensus blockchain dependency graph](./img/consensus_blockchain.png) diff --git a/docs/architecture/img/consensus_blockchain.png b/docs/architecture/img/consensus_blockchain.png new file mode 100644 index 000000000..dd0f4daa8 Binary files /dev/null and b/docs/architecture/img/consensus_blockchain.png differ