Having looked at our network address parsing and connection code, it
really looks like we're not doing anything on top of what the standard
library is doing (both in terms using `net.ParseIP` and also
`net.Dial`,) and I don't think we need to run the tests 2x the number
of times just to run through different areas of the standard
library. I think most of our users are going to be using IPv4, and
would be down to fully remove this dimension as well, if we find it's
making noise, but for now I think it's fine.
This tweaks sleeps around pertubations, based on a theory that our
tests with "kill" pertubations restart the nodes fast enough the peers
haven't marked it down when it tries to reconnect. In my local test
runs, this clears out *most* of the test failures that I've seen,
except for one evidence-related test-harness problem (which should be
handled separately.)
## Description
Expose p2p functions for use in the sdk.
These functions could also be copied over to the sdk. I dont have a preference of which is better.
This PR make some tweaks to backfill after running e2e tests:
- Separates sync and backfill as two distinct processes that the node calls. The reason is because if sync fails then the node should fail but if backfill fails it is still possible to proceed.
- Removes peers who don't have the block at a height from the local peer list. As the process goes backwards if a node doesn't have a block at a height they're likely pruning blocks and thus they won't have any prior ones either.
- Sleep when we've run out of peers, then try again.
I believe that this, in my testing seems to help the e2e state-sync
tests complete more reliably, by fixing some potential, range-related
slice building, as well as the way the test app hashes snapshots.
Additionally, and I'm not sure if we want to do this, but I added this
hook to the reactor that re-sends the request for snapshots during the
retry. This helps in tests prevent systems from getting stuck, but I
think in reality, it might create more traffic, and operators would
just restart a state-syncing node to get a similar effect.