I see this panic in tests occasionally, and I don't think there's any
need to close this channel:
- it's only sent to in one place which has a select case with a
default clause, so there's no chance of deadlocks.
- the only place we recieve from it thas a timeout.
This contains two major changes:
- Remove the legacy test logging method, and just explicitly call the
noop logger. This is just to make the test logging behavior more
coherent and clear.
- Move the logging in the light package from the testing.T logger to
the noop logger. It's really the case that we very rarely need/want
to consider test logs unless we're doing reproductions and running a
narrow set of tests.
In most cases, I (for one) prefer to run in verbose mode so I can
watch progress of tests, but I basically never need to consider
logs. If I do want to see logs, then I can edit in the testing.T
logger locally (which is what you have to do today, anyway.)
closes: #7756
# What does this pull request change?
This pull request adds a new runbook for operators enountering errors related to the new Proposer-Based Timestamps algorithm. The goal of this runbook is to give operators a set of clear steps that they can follow if they are having issues producing blocks because of clock synchronization problems.
This pull request also renames the `*PrevoteDelay` metrics to drop the term `MessageDelay`. These metrics provide a combined view of `message_delay` + `synchrony` so the name may be confusing.
# Questions to reviewers
* Are there ways to make the set of steps clearer or are there any pieces that seem confusing?
The events switch code is largely vestigal and is responsible for
wiring between the consensus state machine and the consensus
reactor. While there might have been a need, historicallly to managed
these subscriptions at runtime, it's nolonger used: subscriptions are
registered during startup, and then the switch shuts down at at the
end.
Eventually the EventSwitch should be replaced by a much smaller
implementation of an eventloop in the consensus state machine, but
cutting down on the scope of the event switch will help clarify the
requirements from the consensus side.
This change implements the logic for the PrepareProposal ABCI++ method call. The main logic for creating and issuing the PrepareProposal request lives in execution.go and is tested in a set of new tests in execution_test.go. This change also updates the mempool mock to use a mockery generated version and removes much of the plumbing for the no longer used ABCIResponses.
Updates #8077. The panic handler for consensus currently attempts to effect a
clean shutdown, but this can leave a failed node running in an unknown state
for an arbitrary amount of time after the failure.
Since a panic at this point means consensus is already irrecoverably broken, we
should not allow the node to continue executing. After making a best effort to
shut down the writeahead log, re-panic to ensure the node will terminate before
any further state transitions are processed.
Even with this change, it is possible some transitions may occur while the
cleanup is happening. It might be preferable to abort unconditionally without
any attempt at cleanup.
Related changes:
- Clean up the creation of WAL directories.
- Filter WAL close errors at rethrow.
This change set implements the most recent version of `FinalizeBlock`.
# What does this change actually contain?
* This change set is rather large but fear not! The majority of the files touched and changes are renaming `ResponseDeliverTx` to `ExecTxResult`. This should be a pretty inoffensive change since they're effectively the same type but with a different name.
* The `execBlockOnProxyApp` was totally removed since it served as just a wrapper around the logic that is now mostly encapsulated within `FinalizeBlock`
* The `updateState` helper function has been made a public method on `State`. It was being exposed as a shim through the testing infrastructure, so this seemed innocuous.
* Tests already existed to ensure that the application received the `ByzantineValidators` and the `ValidatorUpdates`, but one was fixed up to ensure that `LastCommitInfo` was being sent across.
* Tests were removed from the `psql` indexer that seemed to search for an event in the indexer that was not being created.
# Questions for reviewers
* We store this [ABCIResponses](5721a13ab1/proto/tendermint/state/types.pb.go (L37)) type in the data base as the block results. This type has changed since v0.35 to contain the `FinalizeBlock` response. I'm wondering if we need to do any shimming to keep the old data retrieveable?
* Similarly, this change is exposed via the RPC through [ResultBlockResults](5721a13ab1/rpc/coretypes/responses.go (L69)) changing. Should we somehow shim or notify for this change?
closes: #7658
This is mostly an extremely small change where I double a somewhat
arbitrarly set timeout from 1m to 2m for an entire test. When I put
these timeouts in the test, they were arbitrary based on my local
performance (which is quite fact,) and I expected that they'd need to
be tweaked in the future.
A big chunk of this PR is reworking a collection of helper functions
that produce somewhat intractable messages when a test fails, so that
the error messages take up less vertical space, hopefully without
losing any debugability.
This change implements the spec for `ProcessProposal`. It first calls the Tendermint block validation logic to check that all of the proposed block fields are well formed and do not violate any of the rules for Tendermint to consider the block valid and then passes the validated block the `ProcessProposal`.
This change also adds additional fixtures to test the change. It adds the `baseMock` types that holds a mock as well as a reference to `BaseApplication`. If the function was not setup by the test on the contained mock Application, the type delegates to the `BaseApplication` and returns what `BaseApplication` returns.
The change also switches the `makeState` helper to take an arg struct so that an ABCI application can be plumbed through when needed.
closes: #7656
This is the first step in removing the mutex from ABCI applications:
making our test applications hold mutexes, which this does, hopefully
with zero impact. If this lands well, then we can explore deleting the
other mutexes (in the ABCI server and the clients.) While this change
is not user impacting at all, removing the other mutexes *will* be.
In persuit of this, I've changed the KV app somewhat, to put almost
all of the logic in the base application and make the persistent
application mostly be a wrapper on top of that with a different
storage layer.
* event: Added Events after evidence validation; evidence: refactored AddEvidence
Added context and Metrics as parameter for the pool constructor
* evidence: pushed event firing into evidence pool and added metrics to represent the size of the evpool
* state: fixed parameters of evpool mock functions
* evidence: added test to confirm events are generated
* Removed obsolete EvidenceEventPublisher interface
* evidence: pool removed error on missing eventbus
Went through #2871, there are several issues, this PR tries to tackle the `HasVoteMessage` with an invalid validator index sent by a bad peer and it prevents the bad vote goes to the peerMsgQueue.
Future work, check other bad message cases and plumbing the reactor errors with the peer manager and then can disconnect the peer sending the bad messages.
* Rebased and git-squashed the commits in PR #6546
migrate abci to finalizeBlock
work on abci, proxy and mempool
abciresponse, blok events, indexer, some tests
fix some tests
fix errors
fix errors in abci
fix tests amd errors
* Fixes after rebasing PR#6546
* Restored height to RequestFinalizeBlock & other
* Fixed more UTs
* Fixed kvstore
* More UT fixes
* last TC fixed
* make format
* Update internal/consensus/mempool_test.go
Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
* Addressed @williambanfield's comments
* Fixed UTs
* Addressed last comments from @williambanfield
* make format
Co-authored-by: marbar3778 <marbar3778@yahoo.com>
Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>
Our test cases spew a lot of files and directories around $TMPDIR. Make more
thorough use of the testing package's TempDir methods to ensure these are
cleaned up.
In a few cases, this required plumbing test contexts through existing helper
code. In a couple places an explicit path was required, to work around cases
where we do global setup during a TestMain function. Those cases probably
deserve more thorough cleansing (preferably with fire), but for now I have just
worked around it to keep focused on the cleanup.
This change adds logic to double the message delay bound after every 10 rounds. Alternatives to this somewhat magic number were discussed. Specifically, whether or not to make '10' modifiable as a parameter was discussed. Since this behavior only exists to ensure liveness in the case that these values were poorly chosen to begin with, a method to configure this value was not created. Chains that notice many 'untimely' rounds per the [relevant metric](https://github.com/tendermint/tendermint/pull/7709) are expected to take action to increase the configured message delay to more accurately match the conditions of the network.
closes: https://github.com/tendermint/spec/issues/371