You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

268 lines
11 KiB

  1. # ADR 069: Flexible Node Initialization
  2. ## Changlog
  3. - 2021-06-09: Initial Draft (@tychoish)
  4. - 2021-07-21: Major Revision (@tychoish)
  5. ## Status
  6. Proposed.
  7. ## Context
  8. In an effort to support [Go-API-Stability](./adr-060-go-api-stability.md),
  9. during the 0.35 development cycle, we have attempted to reduce the the API
  10. surface area by moving most of the interface of the `node` package into
  11. unexported functions, as well as moving the reactors to an `internal`
  12. package. Having this coincide with the 0.35 release made a lot of sense
  13. because these interfaces were _already_ changing as a result of the `p2p`
  14. [refactor](./adr-061-p2p-refactor-scope.md), so it made sense to think a bit
  15. more about how tendermint exposes this API.
  16. While the interfaces of the P2P layer and most of the node package are already
  17. internalized, this precludes some operational patterns that are important to
  18. users who use tendermint as a library. Specifically, introspecting the
  19. tendermint node service and replacing components is not supported in the latest
  20. version of the code, and some of these use cases would require maintaining a
  21. vendor copy of the code. Adding these features requires rather extensive
  22. (internal/implementation) changes to the `node` and `rpc` packages, and this
  23. ADR describes a model for changing the way that tendermint nodes initialize, in
  24. service of providing this kind of functionality.
  25. We consider node initialization, because the current implemention
  26. provides strong connections between all components, as well as between
  27. the components of the node and the RPC layer, and being able to think
  28. about the interactions of these components will help enable these
  29. features and help define the requirements of the node package.
  30. ## Alternative Approaches
  31. These alternatives are presented to frame the design space and to
  32. contextualize the decision in terms of product requirements. These
  33. ideas are not inherently bad, and may even be possible or desireable
  34. in the (distant) future, and merely provide additional context for how
  35. we, in the moment came to our decision(s).
  36. ### Do Nothing
  37. The current implementation is functional and sufficient for the vast
  38. majority of use cases (e.g., all users of the Cosmos-SDK as well as
  39. anyone who runs tendermint and the ABCI application in separate
  40. processes). In the current implementation, and even previous versions,
  41. modifying node initialization or injecting custom components required
  42. copying most of the `node` package, which required such users
  43. to maintain a vendored copy of tendermint.
  44. While this is (likely) not tenable in the long term, as users do want
  45. more modularity, and the current service implementation is brittle and
  46. difficult to maintain, in the short term it may be possible to delay
  47. implementation somewhat. Eventually, however, we will need to make the
  48. `node` package easier to maintain and reason about.
  49. ### Generic Service Pluggability
  50. One possible system design would export interfaces (in the Golang
  51. sense) for all components of the system, to permit runtime dependency
  52. injection of all components in the system, so that users can compose
  53. tendermint nodes of arbitrary user-supplied components.
  54. Although this level of customization would provide benefits, it would be a huge
  55. undertaking (particularly with regards to API design work) that we do not have
  56. scope for at the moment. Eventually providing support for some kinds of
  57. pluggability may be useful, so the current solution does not explicitly
  58. foreclose the possibility of this alternative.
  59. ### Abstract Dependency Based Startup and Shutdown
  60. The main proposal in this document makes tendermint node initialization simpler
  61. and more abstract, but the system lacks a number of
  62. features which daemon/service initialization could provide, such as a
  63. system allowing the authors of services to control initialization and shutdown order
  64. of components using dependency relationships.
  65. Such a system could work by allowing services to declare
  66. initialization order dependencies to other reactors (by ID, perhaps)
  67. so that the node could decide the initialization based on the
  68. dependencies declared by services rather than requiring the node to
  69. encode this logic directly.
  70. This level of configuration is probably more complicated than is needed. Given
  71. that the authors of components in the current implementation of tendermint
  72. already *do* need to know about other components, a dependency-based system
  73. would probably be overly-abstract at this stage.
  74. ## Decisions
  75. - To the greatest extent possible, factor the code base so that
  76. packages are responsible for their own initialization, and minimize
  77. the amount of code in the `node` package itself.
  78. - As a design goal, reduce direct coupling and dependencies between
  79. components in the implementation of `node`.
  80. - Begin iterating on a more-flexible internal framework for
  81. initializing tendermint nodes to make the initatilization process
  82. less hard-coded by the implementation of the node objects.
  83. - Reactors should not need to expose their interfaces *within* the
  84. implementation of the node type
  85. - This refactoring should be entirely opaque to users.
  86. - These node initialization changes should not require a
  87. reevaluation of the `service.Service` or a generic initialization
  88. orchestration framework.
  89. - Do not proactively provide a system for injecting
  90. components/services within a tendtermint node, though make it
  91. possible to retrofit this kind of plugability in the future if
  92. needed.
  93. - Prioritize implementation of p2p-based statesync reactor to obviate
  94. need for users to inject a custom state-sync provider.
  95. ## Detailed Design
  96. The [current
  97. nodeImpl](https://github.com/tendermint/tendermint/blob/master/node/node.go#L47)
  98. includes direct references to the implementations of each of the
  99. reactors, which should be replaced by references to `service.Service`
  100. objects. This will require moving construction of the [rpc
  101. service](https://github.com/tendermint/tendermint/blob/master/node/node.go#L771)
  102. into the constructor of
  103. [makeNode](https://github.com/tendermint/tendermint/blob/master/node/node.go#L126). One
  104. possible implementation of this would be to eliminate the current
  105. `ConfigureRPC` method on the node package and instead [configure it
  106. here](https://github.com/tendermint/tendermint/pull/6798/files#diff-375d57e386f20eaa5f09f02bb9d28bfc48ac3dca18d0325f59492208219e5618R441).
  107. To avoid adding complexity to the `node` package, we will add a
  108. composite service implementation to the `service` package
  109. that implements `service.Service` and is composed of a sequence of
  110. underlying `service.Service` objects and handles their
  111. startup/shutdown in the specified sequential order.
  112. Consensus, blocksync (*née* fast sync), and statesync all depend on
  113. each other, and have significant initialization dependencies that are
  114. presently encoded in the `node` package. As part of this change, a
  115. new package/component (likely named `blocks` located at
  116. `internal/blocks`) will encapsulate the initialization of these block
  117. management areas of the code.
  118. ### Injectable Component Option
  119. This section briefly describes a possible implementation for
  120. user-supplied services running within a node. This should not be
  121. implemented unless user-supplied components are a hard requirement for
  122. a user.
  123. In order to allow components to be replaced, a new public function
  124. will be added to the public interface of `node` with a signature that
  125. resembles the following:
  126. ```go
  127. func NewWithServices(conf *config.Config,
  128. logger log.Logger,
  129. cf proxy.ClientCreator,
  130. gen *types.GenesisDoc,
  131. srvs []service.Service,
  132. ) (service.Service, error) {
  133. ```
  134. The `service.Service` objects will be initialized in the order supplied, after
  135. all pre-configured/default services have started (and shut down in reverse
  136. order). The given services may implement additional interfaces, allowing them
  137. to replace specific default services. `NewWithServices` will validate input
  138. service lists with the following rules:
  139. - None of the services may already be running.
  140. - The caller may not supply more than one replacement reactor for a given
  141. default service type.
  142. If callers violate any of these rules, `NewWithServices` will return
  143. an error. To retract support for this kind of operation in the future,
  144. the function can be modified to *always* return an error.
  145. ## Consequences
  146. ### Positive
  147. - The node package will become easier to maintain.
  148. - It will become easier to add additional services within tendermint
  149. nodes.
  150. - It will become possible to replace default components in the node
  151. package without vendoring the tendermint repo and modifying internal
  152. code.
  153. - The current end-to-end (e2e) test suite will be able to prevent any
  154. regressions, and the new functionality can be thoroughly unit tested.
  155. - The scope of this project is very narrow, which minimizes risk.
  156. ### Negative
  157. - This increases our reliance on the `service.Service` interface which
  158. is probably not an interface that we want to fully commit to.
  159. - This proposal implements a fairly minimal set of functionality and
  160. leaves open the possibility for many additional features which are
  161. not included in the scope of this proposal.
  162. ### Neutral
  163. N/A
  164. ## Open Questions
  165. - To what extent does this new initialization framework need to accommodate
  166. the legacy p2p stack? Would it be possible to delay a great deal of this
  167. work to the 0.36 cycle to avoid this complexity?
  168. - Answer: _depends on timing_, and the requirement to ship pluggable reactors in 0.35.
  169. - Where should additional public types be exported for the 0.35
  170. release?
  171. Related to the general project of API stabilization we want to deprecate
  172. the `types` package, and move its contents into a new `pkg` hierarchy;
  173. however, the design of the `pkg` interface is currently underspecified.
  174. If `types` is going to remain for the 0.35 release, then we should consider
  175. the impact of using multiple organizing modalities for this code within a
  176. single release.
  177. ## Future Work
  178. - Improve or simplify the `service.Service` interface. There are some
  179. pretty clear limitations with this interface as written (there's no
  180. way to timeout slow startup or shut down, the cycle between the
  181. `service.BaseService` and `service.Service` implementations is
  182. troubling, the default panic in `OnReset` seems troubling.)
  183. - As part of the refactor of `service.Service` have all services/nodes
  184. respect the lifetime of a `context.Context` object, and avoid the
  185. current practice of creating `context.Context` objects in p2p and
  186. reactor code. This would be required for in-process multi-tenancy.
  187. - Support explicit dependencies between components and allow for
  188. parallel startup, so that different reactors can startup at the same
  189. time, where possible.
  190. ## References
  191. - [the component
  192. graph](https://peter.bourgon.org/go-for-industrial-programming/#the-component-graph)
  193. as a framing for internal service construction.
  194. ## Appendix
  195. ### Dependencies
  196. There's a relationship between the blockchain and consensus reactor
  197. described by the following dependency graph makes replacing some of
  198. these components more difficult relative to other reactors or
  199. components.
  200. ![consensus blockchain dependency graph](./img/consensus_blockchain.png)