|
|
- ========================================
- RFC 004: E2E Test Framework Enhancements
- ========================================
-
- Changelog
- ---------
-
- - 2021-09-14: started initial draft (@tychoish)
-
- Abstract
- --------
-
- This document discusses a series of improvements to the e2e test framework
- that we can consider during the next few releases to help boost confidence in
- Tendermint releases, and improve developer efficiency.
-
- Background
- ----------
-
- During the 0.35 release cycle, the E2E tests were a source of great
- value, helping to identify a number of bugs before release. At the same time,
- the tests were not consistently passing during this time, thereby reducing
- their value, and forcing the core development team to allocate time and energy
- to maintaining and chasing down issues with the e2e tests and the test
- harness. The experience of this release cycle calls to mind a series of
- improvements to the test framework, and this document attempts to capture
- these improvements, along with motivations, and potential for impact.
-
- Projects
- --------
-
- Flexible Workload Generation
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
- Presently the e2e suite contains a single workload generation pattern, which
- exists simply to ensure that the test networks have some work during their
- runs. However, the shape and volume of the work is very consistent and is very
- gentle to help ensure test reliability.
-
- We don't need a complex workload generation framework, but being able to have
- a few different workload shapes available for test networks, both generated and
- hand-crafted, would be useful.
-
- Workload patterns/configurations might include:
-
- - transaction targeting patterns (include light nodes, round robin, target
- individual nodes)
-
- - variable transaction size over time.
-
- - transaction broadcast option (synchronously, checked, fire-and-forget,
- mixed).
-
- - number of transactions to submit.
-
- - non-transaction workloads: (evidence submission, query, event subscription.)
-
- Configurable Generator
- ~~~~~~~~~~~~~~~~~~~~~~
-
- The nightly e2e suite is defined by the `testnet generator
- <https://github.com/tendermint/tendermint/blob/master/test/e2e/generator/generate.go#L13-L65>`_,
- and it's difficult to add dimensions or change the focus of the test suite in
- any way without modifying the implementation of the generator. If the
- generator were more configurable, potentially via a file rather than in
- the Go implementation, we could modify the focus of the test suite on the
- fly.
-
- Features that we might want to configure:
-
- - number of test networks to generate of various topologies, to improve
- coverage of different configurations.
-
- - test application configurations (to modify the latency of ABCI calls, etc.)
-
- - size of test networks.
-
- - workload shape and behavior.
-
- - initial sync and catch-up configurations.
-
- The workload generator currently provides runtime options for limiting the
- generator to specific types of P2P stacks, and for generating multiple groups
- of test cases to support parallelism. The goal is to extend this pattern and
- avoid hardcoding the matrix of test cases in the generator code. Once the
- testnet configuration generation behavior is configurable at runtime,
- developers may be able to use the e2e framework to validate changes before
- landing changes that break e2e tests a day later.
-
- In addition to the autogenerated suite, it might make sense to maintain a
- small collection of hand-crafted cases that exercise configurations of
- concern, to run as part of the nightly (or less frequent) loop.
-
- Implementation Plan Structure
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
- As a development team, we should determine the features should impact the e2e
- testing early in the development cycle, and if we intend to modify the e2e
- tests to exercise a feature, we should identify this early and begin the
- integration process as early as possible.
-
- To facilitate this, we should adopt a practice whereby we exercise specific
- features that are currently under development more rigorously in the e2e
- suite, and then as development stabilizes we can reduce the number or weight
- of these features in the suite.
-
- As of 0.35 there are essentially two end to end tests: the suite of 64
- generated test networks, and the hand crafted `ci.toml` test case. The
- generated test cases help provide systemtic coverage, while the `ci` run
- provides coverage for a large number of features.
-
- Reduce Cycle Time
- ~~~~~~~~~~~~~~~~~
-
- One of the barriers to leveraging the e2e framework, and one of the challenges
- in debugging failures, is the cycle time of running a single test iteration is
- quite high: 5 minutes to build the docker image, plus the time to run the test
- or tests.
-
- There are a number of improvements and enhancements that can reduce the cycle
- time in practice:
-
- - reduce the amount of time required to build the docker image used in these
- tests. Without the dependency on CGo, the tendermint binaries could be
- (cross) compiled outside of the docker container and then injected into
- them, which would take better advantage of docker's native caching,
- although, without the dependency on CGo there would be no hard requirement
- for the e2e tests to use docker.
-
- - support test parallelism. Because of the way the testnets are orchestrated
- a single system can really only run one network at a time. For executions
- (local or remote) with more resources, there's no reason to run a few
- networks in parallel to reduce the feedback time.
-
- - prune testnet configurations that are unlikely to provide good signal, to
- shorten the time to feedback.
-
- - apply some kind of tiered approach to test execution, to improve the
- legibility of the test result. For example order tests by the dependency of
- their features, or run test networks without perturbations before running
- that configuration with perturbations, to be able to isolate the impact of
- specific features.
-
- - orchestrate the test harness directly from go test rather than via a special
- harness and shell scripts so e2e tests may more naively fit into developers
- existing workflows.
-
- Many of these improvements, particularly, reducing the build time will also
- reduce the time to get feedback during automated builds.
-
- Deeper Insights
- ~~~~~~~~~~~~~~~
-
- When a test network fails, it's incredibly difficult to understand _why_ the
- network failed, as the current system provides very little insight into the
- system outside of the process logs. When a test network stalls or fails
- developers should be able to quickly and easily get a sense of the state of
- the network and all nodes.
-
- Improvements in persuit of this goal, include functionality that would help
- node operators in production environments by improving the quality and utility
- of the logging messages and other reported metrics, but also provide some
- tools to collect and aggregate this data for developers in the context of test
- networks.
-
- - Interleave messages from all nodes in the network to be able to correlate
- events during the test run.
-
- - Collect structured metrics of the system operation (CPU/MEM/IO) during the
- test run, as well as from each tendermint/application process.
-
- - Build (simple) tools to be able to render and summarize the data collected
- during the test run to answer basic questions about test outcome.
-
- Flexible Assertions
- ~~~~~~~~~~~~~~~~~~~
-
- Currently, all assertions run for every test network, which makes the
- assertions pretty bland, and the framework primarily useful as a smoke-test
- framework, but it might be useful to be able to write and run different
- tests for different configurations. This could allow us to test outside of the
- happy-path.
-
- In general our existing assertions occupy a fraction of the total test time,
- so the relative cost of adding a few extra test assertions would be of limited
- cost, and could help build confidence.
-
- Additional Kinds of Testing
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
- The existing e2e suite, exercises networks of nodes that have homogeneous
- tendermint version, stable configuration, that are expected to make
- progress. There are many other possible test configurations that may be
- interesting to engage with. These could include dimensions, such as:
-
- - Multi-version testing to exercise our compatibility guarantees for networks
- that might have different tendermint versions.
-
- - As a flavor or mult-version testing, include upgrade testing, to build
- confidence in migration code and procedures.
-
- - Additional test applications, particularly practical-type applciations
- including some that use gaiad and/or the cosmos-sdk. Test-only applications
- that simulate other kinds of applications (e.g. variable application
- operation latency.)
-
- - Tests of "non-viable" configurations that ensure that forbidden combinations
- lead to halts.
-
- References
- ----------
-
- - `ADR 66: End-to-End Testing <../architecture/adr-66-e2e-testing.md>`_
|