You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

213 lines
8.6 KiB

  1. ========================================
  2. RFC 004: E2E Test Framework Enhancements
  3. ========================================
  4. Changelog
  5. ---------
  6. - 2021-09-14: started initial draft (@tychoish)
  7. Abstract
  8. --------
  9. This document discusses a series of improvements to the e2e test framework
  10. that we can consider during the next few releases to help boost confidence in
  11. Tendermint releases, and improve developer efficiency.
  12. Background
  13. ----------
  14. During the 0.35 release cycle, the E2E tests were a source of great
  15. value, helping to identify a number of bugs before release. At the same time,
  16. the tests were not consistently passing during this time, thereby reducing
  17. their value, and forcing the core development team to allocate time and energy
  18. to maintaining and chasing down issues with the e2e tests and the test
  19. harness. The experience of this release cycle calls to mind a series of
  20. improvements to the test framework, and this document attempts to capture
  21. these improvements, along with motivations, and potential for impact.
  22. Projects
  23. --------
  24. Flexible Workload Generation
  25. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  26. Presently the e2e suite contains a single workload generation pattern, which
  27. exists simply to ensure that the test networks have some work during their
  28. runs. However, the shape and volume of the work is very consistent and is very
  29. gentle to help ensure test reliability.
  30. We don't need a complex workload generation framework, but being able to have
  31. a few different workload shapes available for test networks, both generated and
  32. hand-crafted, would be useful.
  33. Workload patterns/configurations might include:
  34. - transaction targeting patterns (include light nodes, round robin, target
  35. individual nodes)
  36. - variable transaction size over time.
  37. - transaction broadcast option (synchronously, checked, fire-and-forget,
  38. mixed).
  39. - number of transactions to submit.
  40. - non-transaction workloads: (evidence submission, query, event subscription.)
  41. Configurable Generator
  42. ~~~~~~~~~~~~~~~~~~~~~~
  43. The nightly e2e suite is defined by the `testnet generator
  44. <https://github.com/tendermint/tendermint/blob/master/test/e2e/generator/generate.go#L13-L65>`_,
  45. and it's difficult to add dimensions or change the focus of the test suite in
  46. any way without modifying the implementation of the generator. If the
  47. generator were more configurable, potentially via a file rather than in
  48. the Go implementation, we could modify the focus of the test suite on the
  49. fly.
  50. Features that we might want to configure:
  51. - number of test networks to generate of various topologies, to improve
  52. coverage of different configurations.
  53. - test application configurations (to modify the latency of ABCI calls, etc.)
  54. - size of test networks.
  55. - workload shape and behavior.
  56. - initial sync and catch-up configurations.
  57. The workload generator currently provides runtime options for limiting the
  58. generator to specific types of P2P stacks, and for generating multiple groups
  59. of test cases to support parallelism. The goal is to extend this pattern and
  60. avoid hardcoding the matrix of test cases in the generator code. Once the
  61. testnet configuration generation behavior is configurable at runtime,
  62. developers may be able to use the e2e framework to validate changes before
  63. landing changes that break e2e tests a day later.
  64. In addition to the autogenerated suite, it might make sense to maintain a
  65. small collection of hand-crafted cases that exercise configurations of
  66. concern, to run as part of the nightly (or less frequent) loop.
  67. Implementation Plan Structure
  68. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  69. As a development team, we should determine the features should impact the e2e
  70. testing early in the development cycle, and if we intend to modify the e2e
  71. tests to exercise a feature, we should identify this early and begin the
  72. integration process as early as possible.
  73. To facilitate this, we should adopt a practice whereby we exercise specific
  74. features that are currently under development more rigorously in the e2e
  75. suite, and then as development stabilizes we can reduce the number or weight
  76. of these features in the suite.
  77. As of 0.35 there are essentially two end to end tests: the suite of 64
  78. generated test networks, and the hand crafted `ci.toml` test case. The
  79. generated test cases help provide systemtic coverage, while the `ci` run
  80. provides coverage for a large number of features.
  81. Reduce Cycle Time
  82. ~~~~~~~~~~~~~~~~~
  83. One of the barriers to leveraging the e2e framework, and one of the challenges
  84. in debugging failures, is the cycle time of running a single test iteration is
  85. quite high: 5 minutes to build the docker image, plus the time to run the test
  86. or tests.
  87. There are a number of improvements and enhancements that can reduce the cycle
  88. time in practice:
  89. - reduce the amount of time required to build the docker image used in these
  90. tests. Without the dependency on CGo, the tendermint binaries could be
  91. (cross) compiled outside of the docker container and then injected into
  92. them, which would take better advantage of docker's native caching,
  93. although, without the dependency on CGo there would be no hard requirement
  94. for the e2e tests to use docker.
  95. - support test parallelism. Because of the way the testnets are orchestrated
  96. a single system can really only run one network at a time. For executions
  97. (local or remote) with more resources, there's no reason to run a few
  98. networks in parallel to reduce the feedback time.
  99. - prune testnet configurations that are unlikely to provide good signal, to
  100. shorten the time to feedback.
  101. - apply some kind of tiered approach to test execution, to improve the
  102. legibility of the test result. For example order tests by the dependency of
  103. their features, or run test networks without perturbations before running
  104. that configuration with perturbations, to be able to isolate the impact of
  105. specific features.
  106. - orchestrate the test harness directly from go test rather than via a special
  107. harness and shell scripts so e2e tests may more naively fit into developers
  108. existing workflows.
  109. Many of these improvements, particularly, reducing the build time will also
  110. reduce the time to get feedback during automated builds.
  111. Deeper Insights
  112. ~~~~~~~~~~~~~~~
  113. When a test network fails, it's incredibly difficult to understand _why_ the
  114. network failed, as the current system provides very little insight into the
  115. system outside of the process logs. When a test network stalls or fails
  116. developers should be able to quickly and easily get a sense of the state of
  117. the network and all nodes.
  118. Improvements in persuit of this goal, include functionality that would help
  119. node operators in production environments by improving the quality and utility
  120. of the logging messages and other reported metrics, but also provide some
  121. tools to collect and aggregate this data for developers in the context of test
  122. networks.
  123. - Interleave messages from all nodes in the network to be able to correlate
  124. events during the test run.
  125. - Collect structured metrics of the system operation (CPU/MEM/IO) during the
  126. test run, as well as from each tendermint/application process.
  127. - Build (simple) tools to be able to render and summarize the data collected
  128. during the test run to answer basic questions about test outcome.
  129. Flexible Assertions
  130. ~~~~~~~~~~~~~~~~~~~
  131. Currently, all assertions run for every test network, which makes the
  132. assertions pretty bland, and the framework primarily useful as a smoke-test
  133. framework, but it might be useful to be able to write and run different
  134. tests for different configurations. This could allow us to test outside of the
  135. happy-path.
  136. In general our existing assertions occupy a fraction of the total test time,
  137. so the relative cost of adding a few extra test assertions would be of limited
  138. cost, and could help build confidence.
  139. Additional Kinds of Testing
  140. ~~~~~~~~~~~~~~~~~~~~~~~~~~~
  141. The existing e2e suite, exercises networks of nodes that have homogeneous
  142. tendermint version, stable configuration, that are expected to make
  143. progress. There are many other possible test configurations that may be
  144. interesting to engage with. These could include dimensions, such as:
  145. - Multi-version testing to exercise our compatibility guarantees for networks
  146. that might have different tendermint versions.
  147. - As a flavor or mult-version testing, include upgrade testing, to build
  148. confidence in migration code and procedures.
  149. - Additional test applications, particularly practical-type applciations
  150. including some that use gaiad and/or the cosmos-sdk. Test-only applications
  151. that simulate other kinds of applications (e.g. variable application
  152. operation latency.)
  153. - Tests of "non-viable" configurations that ensure that forbidden combinations
  154. lead to halts.
  155. References
  156. ----------
  157. - `ADR 66: End-to-End Testing <../architecture/adr-66-e2e-testing.md>`_