|
@ -0,0 +1,213 @@ |
|
|
|
|
|
======================================== |
|
|
|
|
|
RFC 004: E2E Test Framework Enhancements |
|
|
|
|
|
======================================== |
|
|
|
|
|
|
|
|
|
|
|
Changelog |
|
|
|
|
|
--------- |
|
|
|
|
|
|
|
|
|
|
|
- 2021-09-14: started initial draft (@tychoish) |
|
|
|
|
|
|
|
|
|
|
|
Abstract |
|
|
|
|
|
-------- |
|
|
|
|
|
|
|
|
|
|
|
This document discusses a series of improvements to the e2e test framework |
|
|
|
|
|
that we can consider during the next few releases to help boost confidence in |
|
|
|
|
|
Tendermint releases, and improve developer efficiency. |
|
|
|
|
|
|
|
|
|
|
|
Background |
|
|
|
|
|
---------- |
|
|
|
|
|
|
|
|
|
|
|
During the 0.35 release cycle, the E2E tests were a source of great |
|
|
|
|
|
value, helping to identify a number of bugs before release. At the same time, |
|
|
|
|
|
the tests were not consistently passing during this time, thereby reducing |
|
|
|
|
|
their value, and forcing the core development team to allocate time and energy |
|
|
|
|
|
to maintaining and chasing down issues with the e2e tests and the test |
|
|
|
|
|
harness. The experience of this release cycle calls to mind a series of |
|
|
|
|
|
improvements to the test framework, and this document attempts to capture |
|
|
|
|
|
these improvements, along with motivations, and potential for impact. |
|
|
|
|
|
|
|
|
|
|
|
Projects |
|
|
|
|
|
-------- |
|
|
|
|
|
|
|
|
|
|
|
Flexible Workload Generation |
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
|
|
|
|
|
|
|
|
Presently the e2e suite contains a single workload generation pattern, which |
|
|
|
|
|
exists simply to ensure that the test networks have some work during their |
|
|
|
|
|
runs. However, the shape and volume of the work is very consistent and is very |
|
|
|
|
|
gentle to help ensure test reliability. |
|
|
|
|
|
|
|
|
|
|
|
We don't need a complex workload generation framework, but being able to have |
|
|
|
|
|
a few different workload shapes available for test networks, both generated and |
|
|
|
|
|
hand-crafted, would be useful. |
|
|
|
|
|
|
|
|
|
|
|
Workload patterns/configurations might include: |
|
|
|
|
|
|
|
|
|
|
|
- transaction targeting patterns (include light nodes, round robin, target |
|
|
|
|
|
individual nodes) |
|
|
|
|
|
|
|
|
|
|
|
- variable transaction size over time. |
|
|
|
|
|
|
|
|
|
|
|
- transaction broadcast option (synchronously, checked, fire-and-forget, |
|
|
|
|
|
mixed). |
|
|
|
|
|
|
|
|
|
|
|
- number of transactions to submit. |
|
|
|
|
|
|
|
|
|
|
|
- non-transaction workloads: (evidence submission, query, event subscription.) |
|
|
|
|
|
|
|
|
|
|
|
Configurable Generator |
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
|
|
|
|
|
|
|
|
The nightly e2e suite is defined by the `testnet generator |
|
|
|
|
|
<https://github.com/tendermint/tendermint/blob/master/test/e2e/generator/generate.go#L13-L65>`_, |
|
|
|
|
|
and it's difficult to add dimensions or change the focus of the test suite in |
|
|
|
|
|
any way without modifying the implementation of the generator. If the |
|
|
|
|
|
generator were more configurable, potentially via a file rather than in |
|
|
|
|
|
the Go implementation, we could modify the focus of the test suite on the |
|
|
|
|
|
fly. |
|
|
|
|
|
|
|
|
|
|
|
Features that we might want to configure: |
|
|
|
|
|
|
|
|
|
|
|
- number of test networks to generate of various topologies, to improve |
|
|
|
|
|
coverage of different configurations. |
|
|
|
|
|
|
|
|
|
|
|
- test application configurations (to modify the latency of ABCI calls, etc.) |
|
|
|
|
|
|
|
|
|
|
|
- size of test networks. |
|
|
|
|
|
|
|
|
|
|
|
- workload shape and behavior. |
|
|
|
|
|
|
|
|
|
|
|
- initial sync and catch-up configurations. |
|
|
|
|
|
|
|
|
|
|
|
The workload generator currently provides runtime options for limiting the |
|
|
|
|
|
generator to specific types of P2P stacks, and for generating multiple groups |
|
|
|
|
|
of test cases to support parallelism. The goal is to extend this pattern and |
|
|
|
|
|
avoid hardcoding the matrix of test cases in the generator code. Once the |
|
|
|
|
|
testnet configuration generation behavior is configurable at runtime, |
|
|
|
|
|
developers may be able to use the e2e framework to validate changes before |
|
|
|
|
|
landing changes that break e2e tests a day later. |
|
|
|
|
|
|
|
|
|
|
|
In addition to the autogenerated suite, it might make sense to maintain a |
|
|
|
|
|
small collection of hand-crafted cases that exercise configurations of |
|
|
|
|
|
concern, to run as part of the nightly (or less frequent) loop. |
|
|
|
|
|
|
|
|
|
|
|
Implementation Plan Structure |
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
|
|
|
|
|
|
|
|
As a development team, we should determine the features should impact the e2e |
|
|
|
|
|
testing early in the development cycle, and if we intend to modify the e2e |
|
|
|
|
|
tests to exercise a feature, we should identify this early and begin the |
|
|
|
|
|
integration process as early as possible. |
|
|
|
|
|
|
|
|
|
|
|
To facilitate this, we should adopt a practice whereby we exercise specific |
|
|
|
|
|
features that are currently under development more rigorously in the e2e |
|
|
|
|
|
suite, and then as development stabilizes we can reduce the number or weight |
|
|
|
|
|
of these features in the suite. |
|
|
|
|
|
|
|
|
|
|
|
As of 0.35 there are essentially two end to end tests: the suite of 64 |
|
|
|
|
|
generated test networks, and the hand crafted `ci.toml` test case. The |
|
|
|
|
|
generated test cases help provide systemtic coverage, while the `ci` run |
|
|
|
|
|
provides coverage for a large number of features. |
|
|
|
|
|
|
|
|
|
|
|
Reduce Cycle Time |
|
|
|
|
|
~~~~~~~~~~~~~~~~~ |
|
|
|
|
|
|
|
|
|
|
|
One of the barriers to leveraging the e2e framework, and one of the challenges |
|
|
|
|
|
in debugging failures, is the cycle time of running a single test iteration is |
|
|
|
|
|
quite high: 5 minutes to build the docker image, plus the time to run the test |
|
|
|
|
|
or tests. |
|
|
|
|
|
|
|
|
|
|
|
There are a number of improvements and enhancements that can reduce the cycle |
|
|
|
|
|
time in practice: |
|
|
|
|
|
|
|
|
|
|
|
- reduce the amount of time required to build the docker image used in these |
|
|
|
|
|
tests. Without the dependency on CGo, the tendermint binaries could be |
|
|
|
|
|
(cross) compiled outside of the docker container and then injected into |
|
|
|
|
|
them, which would take better advantage of docker's native caching, |
|
|
|
|
|
although, without the dependency on CGo there would be no hard requirement |
|
|
|
|
|
for the e2e tests to use docker. |
|
|
|
|
|
|
|
|
|
|
|
- support test parallelism. Because of the way the testnets are orchestrated |
|
|
|
|
|
a single system can really only run one network at a time. For executions |
|
|
|
|
|
(local or remote) with more resources, there's no reason to run a few |
|
|
|
|
|
networks in parallel to reduce the feedback time. |
|
|
|
|
|
|
|
|
|
|
|
- prune testnet configurations that are unlikely to provide good signal, to |
|
|
|
|
|
shorten the time to feedback. |
|
|
|
|
|
|
|
|
|
|
|
- apply some kind of tiered approach to test execution, to improve the |
|
|
|
|
|
legibility of the test result. For example order tests by the dependency of |
|
|
|
|
|
their features, or run test networks without perturbations before running |
|
|
|
|
|
that configuration with perturbations, to be able to isolate the impact of |
|
|
|
|
|
specific features. |
|
|
|
|
|
|
|
|
|
|
|
- orchestrate the test harness directly from go test rather than via a special |
|
|
|
|
|
harness and shell scripts so e2e tests may more naively fit into developers |
|
|
|
|
|
existing workflows. |
|
|
|
|
|
|
|
|
|
|
|
Many of these improvements, particularly, reducing the build time will also |
|
|
|
|
|
reduce the time to get feedback during automated builds. |
|
|
|
|
|
|
|
|
|
|
|
Deeper Insights |
|
|
|
|
|
~~~~~~~~~~~~~~~ |
|
|
|
|
|
|
|
|
|
|
|
When a test network fails, it's incredibly difficult to understand _why_ the |
|
|
|
|
|
network failed, as the current system provides very little insight into the |
|
|
|
|
|
system outside of the process logs. When a test network stalls or fails |
|
|
|
|
|
developers should be able to quickly and easily get a sense of the state of |
|
|
|
|
|
the network and all nodes. |
|
|
|
|
|
|
|
|
|
|
|
Improvements in persuit of this goal, include functionality that would help |
|
|
|
|
|
node operators in production environments by improving the quality and utility |
|
|
|
|
|
of the logging messages and other reported metrics, but also provide some |
|
|
|
|
|
tools to collect and aggregate this data for developers in the context of test |
|
|
|
|
|
networks. |
|
|
|
|
|
|
|
|
|
|
|
- Interleave messages from all nodes in the network to be able to correlate |
|
|
|
|
|
events during the test run. |
|
|
|
|
|
|
|
|
|
|
|
- Collect structured metrics of the system operation (CPU/MEM/IO) during the |
|
|
|
|
|
test run, as well as from each tendermint/application process. |
|
|
|
|
|
|
|
|
|
|
|
- Build (simple) tools to be able to render and summarize the data collected |
|
|
|
|
|
during the test run to answer basic questions about test outcome. |
|
|
|
|
|
|
|
|
|
|
|
Flexible Assertions |
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~ |
|
|
|
|
|
|
|
|
|
|
|
Currently, all assertions run for every test network, which makes the |
|
|
|
|
|
assertions pretty bland, and the framework primarily useful as a smoke-test |
|
|
|
|
|
framework, but it might be useful to be able to write and run different |
|
|
|
|
|
tests for different configurations. This could allow us to test outside of the |
|
|
|
|
|
happy-path. |
|
|
|
|
|
|
|
|
|
|
|
In general our existing assertions occupy a fraction of the total test time, |
|
|
|
|
|
so the relative cost of adding a few extra test assertions would be of limited |
|
|
|
|
|
cost, and could help build confidence. |
|
|
|
|
|
|
|
|
|
|
|
Additional Kinds of Testing |
|
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
|
|
|
|
|
|
|
|
|
The existing e2e suite, exercises networks of nodes that have homogeneous |
|
|
|
|
|
tendermint version, stable configuration, that are expected to make |
|
|
|
|
|
progress. There are many other possible test configurations that may be |
|
|
|
|
|
interesting to engage with. These could include dimensions, such as: |
|
|
|
|
|
|
|
|
|
|
|
- Multi-version testing to exercise our compatibility guarantees for networks |
|
|
|
|
|
that might have different tendermint versions. |
|
|
|
|
|
|
|
|
|
|
|
- As a flavor or mult-version testing, include upgrade testing, to build |
|
|
|
|
|
confidence in migration code and procedures. |
|
|
|
|
|
|
|
|
|
|
|
- Additional test applications, particularly practical-type applciations |
|
|
|
|
|
including some that use gaiad and/or the cosmos-sdk. Test-only applications |
|
|
|
|
|
that simulate other kinds of applications (e.g. variable application |
|
|
|
|
|
operation latency.) |
|
|
|
|
|
|
|
|
|
|
|
- Tests of "non-viable" configurations that ensure that forbidden combinations |
|
|
|
|
|
lead to halts. |
|
|
|
|
|
|
|
|
|
|
|
References |
|
|
|
|
|
---------- |
|
|
|
|
|
|
|
|
|
|
|
- `ADR 66: End-to-End Testing <../architecture/adr-66-e2e-testing.md>`_ |