In the last run, there were two problems at the RPC layer returned
from light nodes' RPC end points. I think exercising the light client
proxy RPC system is something that can/should be done via unit
testing, and that likely these errors are (in production) transient
and (in CI) very likely to fail for test environment issues.
If the e2e tests error, they leave all of the e2e state around including containers and networks etc.
We should clean this up when the tests shuts down, even if it exits in error.
This should address last night's failure. We've taken the perspective
of "the load generator shouldn't cause tests to fail" in recent
days/weeks, and I think this is just a next step along that line. The
e2e tests shouldn't test performance.
I included some comments indicating the ways that this isn't ideal (it
is perhaps not), and I think that if test networks could make
assertions about the required rate, that might be a cool future
improvement (and good, perhaps, for system benchmarking.)
These are mostly the timeouts that I think we're still hitting in CI.
At this point, the tests (on master) pass on my local machine (which is quite beefy) so I think this is just the first in (perhaps?) a sequence of changes that attempt to change timeouts and load patterns so that the tests pass in CI more reliably.
The 0.35 release cycle renamed the 'fastsync' functionality to 'blocksync'. This change brings the configuration parameters in line with that change. Namely, it updates the configuration file `[fastsync]` field to be `[blocksync]` and changes the command line flag and config file parameters `--fast-sync` and `fast-sync` to `--enable-block-sync` and `enable-block-sync` respectively.
Error messages were added to help users encountering these changes be able to quickly make the needed update to their files/scripts.
When using the old command line argument for fast-sync, the following is printed
```
./build/tendermint start --proxy-app=kvstore --consensus.create-empty-blocks=false --fast-sync=false
ERROR: invalid argument "false" for "--fast-sync" flag: --fast-sync has been deprecated, please use --enable-block-sync
```
When using one of the old config file parameters, the following is printed:
```
./build/tendermint start --proxy-app=kvstore --consensus.create-empty-blocks=false
ERROR: error in config file: a configuration parameter named 'fast-sync' was found in the configuration file. The 'fast-sync' parameter has been renamed to 'enable-block-sync', please update the 'fast-sync' field in your configuration file to 'enable-block-sync'
```
This is just a configuration change to default to using the new stack
unless explicitly disabled (e.g. `UseLegacy`) this renames the
configuration value and makes the configuration logic more clear.
The legacy option is good to retain as a fallback if the new stack has
issues operationally, but we should make sure that most of the time
we're using the new stack.
In the transaction load generator, the e2e test harness previously distributed load randomly to hosts, which was a source of test non-determinism. This change distributes the load generation to the different nodes in the set in a round robin fashion, to produce more reliable results, but does not otherwise change the behavior of the test harness.
I realized after my last commit that my change made a following line of code a bit redundant.
(alternatively my last change was redunadnt to the existing code.)
I took this oppertunity to make some minor cleanups and logging changes to the node changes which I hope will make tests a bit more clear.
This tweaks sleeps around pertubations, based on a theory that our
tests with "kill" pertubations restart the nodes fast enough the peers
haven't marked it down when it tries to reconnect. In my local test
runs, this clears out *most* of the test failures that I've seen,
except for one evidence-related test-harness problem (which should be
handled separately.)
## Description
Expose p2p functions for use in the sdk.
These functions could also be copied over to the sdk. I dont have a preference of which is better.
This PR make some tweaks to backfill after running e2e tests:
- Separates sync and backfill as two distinct processes that the node calls. The reason is because if sync fails then the node should fail but if backfill fails it is still possible to proceed.
- Removes peers who don't have the block at a height from the local peer list. As the process goes backwards if a node doesn't have a block at a height they're likely pruning blocks and thus they won't have any prior ones either.
- Sleep when we've run out of peers, then try again.