It seems weird in retrospect that we allow networks to contain
applications that use different ABCI protocols.
(cherry picked from commit f2a8f5e054)
Co-authored-by: Sam Kleinman <garen@tychoish.com>
This PR tackles the case of using the e2e application in a long lived testnet. The application continually saves snapshots (usually every 100 blocks) which after a while bloats the size of the application. This PR prunes older snapshots so that only the most recent 10 snapshots remain.
(cherry picked from commit 5703ae2fb3)
Co-authored-by: Callum Waters <cmwaters19@gmail.com>
I observed a couple of problems with the generator in some recent tests:
- there were a couple of hybrid test cases which did not have any
legacy nodes (randomness and all.) I change the probability to
produce more reliable results.
- added options to the generation to be able to add a max (to
compliment the earlier min) number of nodes for local testing.
- added an option to support reversing the sort order so "more
complex" networks were first, as well as tweaked some of the point
values.
- this refactored the generators cli parsing to be a bit more clear.
In the last run, there were two problems at the RPC layer returned
from light nodes' RPC end points. I think exercising the light client
proxy RPC system is something that can/should be done via unit
testing, and that likely these errors are (in production) transient
and (in CI) very likely to fail for test environment issues.
If the e2e tests error, they leave all of the e2e state around including containers and networks etc.
We should clean this up when the tests shuts down, even if it exits in error.
This should address last night's failure. We've taken the perspective
of "the load generator shouldn't cause tests to fail" in recent
days/weeks, and I think this is just a next step along that line. The
e2e tests shouldn't test performance.
I included some comments indicating the ways that this isn't ideal (it
is perhaps not), and I think that if test networks could make
assertions about the required rate, that might be a cool future
improvement (and good, perhaps, for system benchmarking.)
I think the `Sync` check covers our primary use case, and perhaps we
can turn this back on in the future after some kind of event-system
rewrite, or RPC rewrite that will avoid the serverside timeout.
These are mostly the timeouts that I think we're still hitting in CI.
At this point, the tests (on master) pass on my local machine (which is quite beefy) so I think this is just the first in (perhaps?) a sequence of changes that attempt to change timeouts and load patterns so that the tests pass in CI more reliably.
The previous implemention of hybrid set testing, which was entirely my
own creation, was a bit peculiar, and I think this probably clears thins up.
The previous implementation had far fewer legacy nodes in hybrid
networks, *and* also for some reason that I can't quite explain,
caused a test case to fail.