diff --git a/docs/rfc/README.md b/docs/rfc/README.md index 727af2718..02bc3d9ca 100644 --- a/docs/rfc/README.md +++ b/docs/rfc/README.md @@ -40,5 +40,6 @@ sections. - [RFC-000: P2P Roadmap](./rfc-000-p2p-roadmap.rst) - [RFC-001: Storage Engines](./rfc-001-storage-engine.rst) - [RFC-002: Interprocess Communication](./rfc-002-ipc-ecosystem.md) +- [RFC-004: E2E Test Framework Enhancements](./rfc-004-e2e-framework.md) diff --git a/docs/rfc/rfc-004-e2e-framework.rst b/docs/rfc/rfc-004-e2e-framework.rst new file mode 100644 index 000000000..8508ca173 --- /dev/null +++ b/docs/rfc/rfc-004-e2e-framework.rst @@ -0,0 +1,213 @@ +======================================== +RFC 004: E2E Test Framework Enhancements +======================================== + +Changelog +--------- + +- 2021-09-14: started initial draft (@tychoish) + +Abstract +-------- + +This document discusses a series of improvements to the e2e test framework +that we can consider during the next few releases to help boost confidence in +Tendermint releases, and improve developer efficiency. + +Background +---------- + +During the 0.35 release cycle, the E2E tests were a source of great +value, helping to identify a number of bugs before release. At the same time, +the tests were not consistently passing during this time, thereby reducing +their value, and forcing the core development team to allocate time and energy +to maintaining and chasing down issues with the e2e tests and the test +harness. The experience of this release cycle calls to mind a series of +improvements to the test framework, and this document attempts to capture +these improvements, along with motivations, and potential for impact. + +Projects +-------- + +Flexible Workload Generation +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Presently the e2e suite contains a single workload generation pattern, which +exists simply to ensure that the test networks have some work during their +runs. However, the shape and volume of the work is very consistent and is very +gentle to help ensure test reliability. + +We don't need a complex workload generation framework, but being able to have +a few different workload shapes available for test networks, both generated and +hand-crafted, would be useful. + +Workload patterns/configurations might include: + +- transaction targeting patterns (include light nodes, round robin, target + individual nodes) + +- variable transaction size over time. + +- transaction broadcast option (synchronously, checked, fire-and-forget, + mixed). + +- number of transactions to submit. + +- non-transaction workloads: (evidence submission, query, event subscription.) + +Configurable Generator +~~~~~~~~~~~~~~~~~~~~~~ + +The nightly e2e suite is defined by the `testnet generator +`_, +and it's difficult to add dimensions or change the focus of the test suite in +any way without modifying the implementation of the generator. If the +generator were more configurable, potentially via a file rather than in +the Go implementation, we could modify the focus of the test suite on the +fly. + +Features that we might want to configure: + +- number of test networks to generate of various topologies, to improve + coverage of different configurations. + +- test application configurations (to modify the latency of ABCI calls, etc.) + +- size of test networks. + +- workload shape and behavior. + +- initial sync and catch-up configurations. + +The workload generator currently provides runtime options for limiting the +generator to specific types of P2P stacks, and for generating multiple groups +of test cases to support parallelism. The goal is to extend this pattern and +avoid hardcoding the matrix of test cases in the generator code. Once the +testnet configuration generation behavior is configurable at runtime, +developers may be able to use the e2e framework to validate changes before +landing changes that break e2e tests a day later. + +In addition to the autogenerated suite, it might make sense to maintain a +small collection of hand-crafted cases that exercise configurations of +concern, to run as part of the nightly (or less frequent) loop. + +Implementation Plan Structure +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +As a development team, we should determine the features should impact the e2e +testing early in the development cycle, and if we intend to modify the e2e +tests to exercise a feature, we should identify this early and begin the +integration process as early as possible. + +To facilitate this, we should adopt a practice whereby we exercise specific +features that are currently under development more rigorously in the e2e +suite, and then as development stabilizes we can reduce the number or weight +of these features in the suite. + +As of 0.35 there are essentially two end to end tests: the suite of 64 +generated test networks, and the hand crafted `ci.toml` test case. The +generated test cases help provide systemtic coverage, while the `ci` run +provides coverage for a large number of features. + +Reduce Cycle Time +~~~~~~~~~~~~~~~~~ + +One of the barriers to leveraging the e2e framework, and one of the challenges +in debugging failures, is the cycle time of running a single test iteration is +quite high: 5 minutes to build the docker image, plus the time to run the test +or tests. + +There are a number of improvements and enhancements that can reduce the cycle +time in practice: + +- reduce the amount of time required to build the docker image used in these + tests. Without the dependency on CGo, the tendermint binaries could be + (cross) compiled outside of the docker container and then injected into + them, which would take better advantage of docker's native caching, + although, without the dependency on CGo there would be no hard requirement + for the e2e tests to use docker. + +- support test parallelism. Because of the way the testnets are orchestrated + a single system can really only run one network at a time. For executions + (local or remote) with more resources, there's no reason to run a few + networks in parallel to reduce the feedback time. + +- prune testnet configurations that are unlikely to provide good signal, to + shorten the time to feedback. + +- apply some kind of tiered approach to test execution, to improve the + legibility of the test result. For example order tests by the dependency of + their features, or run test networks without perturbations before running + that configuration with perturbations, to be able to isolate the impact of + specific features. + +- orchestrate the test harness directly from go test rather than via a special + harness and shell scripts so e2e tests may more naively fit into developers + existing workflows. + +Many of these improvements, particularly, reducing the build time will also +reduce the time to get feedback during automated builds. + +Deeper Insights +~~~~~~~~~~~~~~~ + +When a test network fails, it's incredibly difficult to understand _why_ the +network failed, as the current system provides very little insight into the +system outside of the process logs. When a test network stalls or fails +developers should be able to quickly and easily get a sense of the state of +the network and all nodes. + +Improvements in persuit of this goal, include functionality that would help +node operators in production environments by improving the quality and utility +of the logging messages and other reported metrics, but also provide some +tools to collect and aggregate this data for developers in the context of test +networks. + +- Interleave messages from all nodes in the network to be able to correlate + events during the test run. + +- Collect structured metrics of the system operation (CPU/MEM/IO) during the + test run, as well as from each tendermint/application process. + +- Build (simple) tools to be able to render and summarize the data collected + during the test run to answer basic questions about test outcome. + +Flexible Assertions +~~~~~~~~~~~~~~~~~~~ + +Currently, all assertions run for every test network, which makes the +assertions pretty bland, and the framework primarily useful as a smoke-test +framework, but it might be useful to be able to write and run different +tests for different configurations. This could allow us to test outside of the +happy-path. + +In general our existing assertions occupy a fraction of the total test time, +so the relative cost of adding a few extra test assertions would be of limited +cost, and could help build confidence. + +Additional Kinds of Testing +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The existing e2e suite, exercises networks of nodes that have homogeneous +tendermint version, stable configuration, that are expected to make +progress. There are many other possible test configurations that may be +interesting to engage with. These could include dimensions, such as: + +- Multi-version testing to exercise our compatibility guarantees for networks + that might have different tendermint versions. + +- As a flavor or mult-version testing, include upgrade testing, to build + confidence in migration code and procedures. + +- Additional test applications, particularly practical-type applciations + including some that use gaiad and/or the cosmos-sdk. Test-only applications + that simulate other kinds of applications (e.g. variable application + operation latency.) + +- Tests of "non-viable" configurations that ensure that forbidden combinations + lead to halts. + +References +---------- + +- `ADR 66: End-to-End Testing <../architecture/adr-66-e2e-testing.md>`_