Updates #8077. The panic handler for consensus currently attempts to effect a
clean shutdown, but this can leave a failed node running in an unknown state
for an arbitrary amount of time after the failure.
Since a panic at this point means consensus is already irrecoverably broken, we
should not allow the node to continue executing. After making a best effort to
shut down the writeahead log, re-panic to ensure the node will terminate before
any further state transitions are processed.
Even with this change, it is possible some transitions may occur while the
cleanup is happening. It might be preferable to abort unconditionally without
any attempt at cleanup.
Related changes:
- Clean up the creation of WAL directories.
- Filter WAL close errors at rethrow.
Our test cases spew a lot of files and directories around $TMPDIR. Make more
thorough use of the testing package's TempDir methods to ensure these are
cleaned up.
In a few cases, this required plumbing test contexts through existing helper
code. In a couple places an explicit path was required, to work around cases
where we do global setup during a TestMain function. Those cases probably
deserve more thorough cleansing (preferably with fire), but for now I have just
worked around it to keep focused on the cleanup.
During file rotation and WAL shutdown, there was a race condition between users
of an autofile and its termination. To fix this, ensure operations on an
autofile are properly synchronized, and report errors when attempting to use an
autofile after it was closed.
Notably:
- Simplify the cancellation protocol between signal and Close.
- Exclude writers to an autofile during rotation.
- Add documentation about what is going on.
There is a lot more that could be improved here, but this addresses the more
obvious races that have been panicking unit tests.
## Description
Internalize some libs. This reduces the amount ot public API tendermint is supporting. The moved libraries are mainly ones that are used within Tendermint-core.