This should address last night's failure. We've taken the perspective
of "the load generator shouldn't cause tests to fail" in recent
days/weeks, and I think this is just a next step along that line. The
e2e tests shouldn't test performance.
I included some comments indicating the ways that this isn't ideal (it
is perhaps not), and I think that if test networks could make
assertions about the required rate, that might be a cool future
improvement (and good, perhaps, for system benchmarking.)
These are mostly the timeouts that I think we're still hitting in CI.
At this point, the tests (on master) pass on my local machine (which is quite beefy) so I think this is just the first in (perhaps?) a sequence of changes that attempt to change timeouts and load patterns so that the tests pass in CI more reliably.
In the transaction load generator, the e2e test harness previously distributed load randomly to hosts, which was a source of test non-determinism. This change distributes the load generation to the different nodes in the set in a round robin fashion, to produce more reliable results, but does not otherwise change the behavior of the test harness.