I see this panic in tests occasionally, and I don't think there's any
need to close this channel:
- it's only sent to in one place which has a select case with a
default clause, so there's no chance of deadlocks.
- the only place we recieve from it thas a timeout.
closes: #7756
# What does this pull request change?
This pull request adds a new runbook for operators enountering errors related to the new Proposer-Based Timestamps algorithm. The goal of this runbook is to give operators a set of clear steps that they can follow if they are having issues producing blocks because of clock synchronization problems.
This pull request also renames the `*PrevoteDelay` metrics to drop the term `MessageDelay`. These metrics provide a combined view of `message_delay` + `synchrony` so the name may be confusing.
# Questions to reviewers
* Are there ways to make the set of steps clearer or are there any pieces that seem confusing?
This change implements the logic for the PrepareProposal ABCI++ method call. The main logic for creating and issuing the PrepareProposal request lives in execution.go and is tested in a set of new tests in execution_test.go. This change also updates the mempool mock to use a mockery generated version and removes much of the plumbing for the no longer used ABCIResponses.
Updates #8077. The panic handler for consensus currently attempts to effect a
clean shutdown, but this can leave a failed node running in an unknown state
for an arbitrary amount of time after the failure.
Since a panic at this point means consensus is already irrecoverably broken, we
should not allow the node to continue executing. After making a best effort to
shut down the writeahead log, re-panic to ensure the node will terminate before
any further state transitions are processed.
Even with this change, it is possible some transitions may occur while the
cleanup is happening. It might be preferable to abort unconditionally without
any attempt at cleanup.
Related changes:
- Clean up the creation of WAL directories.
- Filter WAL close errors at rethrow.
This change implements the spec for `ProcessProposal`. It first calls the Tendermint block validation logic to check that all of the proposed block fields are well formed and do not violate any of the rules for Tendermint to consider the block valid and then passes the validated block the `ProcessProposal`.
This change also adds additional fixtures to test the change. It adds the `baseMock` types that holds a mock as well as a reference to `BaseApplication`. If the function was not setup by the test on the contained mock Application, the type delegates to the `BaseApplication` and returns what `BaseApplication` returns.
The change also switches the `makeState` helper to take an arg struct so that an ABCI application can be plumbed through when needed.
closes: #7656
* event: Added Events after evidence validation; evidence: refactored AddEvidence
Added context and Metrics as parameter for the pool constructor
* evidence: pushed event firing into evidence pool and added metrics to represent the size of the evpool
* state: fixed parameters of evpool mock functions
* evidence: added test to confirm events are generated
* Removed obsolete EvidenceEventPublisher interface
* evidence: pool removed error on missing eventbus
This change adds logic to double the message delay bound after every 10 rounds. Alternatives to this somewhat magic number were discussed. Specifically, whether or not to make '10' modifiable as a parameter was discussed. Since this behavior only exists to ensure liveness in the case that these values were poorly chosen to begin with, a method to configure this value was not created. Chains that notice many 'untimely' rounds per the [relevant metric](https://github.com/tendermint/tendermint/pull/7709) are expected to take action to increase the configured message delay to more accurately match the conditions of the network.
closes: https://github.com/tendermint/spec/issues/371
This pull request merges in the changes for implementing Proposer-based timestamps into `master`. The power was primarily being done in the `wb/proposer-based-timestamps` branch, with changes being merged into that branch during development. This pull request represents an amalgamation of the changes made into that development branch. All of the changes that were placed into that branch have been cleanly rebased on top of the latest `master`. The changes compile and the tests pass insofar as our tests in general pass.
### Note To Reviewers
These changes have been extensively reviewed during development. There is not much new here. In the interest of making effective use of time, I would recommend against trying to perform a complete audit of the changes presented and instead examine for mistakes that may have occurred during the process of rebasing the changes. I gave the complete change set a first pass for any issues, but additional eyes would be very appreciated.
In sum, this change set does the following:
closes#6942
merges in #6849
There are no further uses of this package anywhere in Tendermint.
All the uses in the Cosmos SDK are for types that now work correctly with the
standard encoding/json package.
## What does this pull request do?
This pull requests adds two metrics intended for use in calculating an experimental value for `MessageDelay`.
The metrics are as follows:
```
# HELP tendermint_consensus_complete_prevote_message_delay Difference in seconds between the proposal timestamp and the timestamp of the prevote that achieved 100% of the voting power in the prevote step.
# TYPE tendermint_consensus_complete_prevote_message_delay gauge
tendermint_consensus_complete_prevote_message_delay{chain_id="test-chain-aZbwF1"} 0.013025505
# HELP tendermint_consensus_quorum_prevote_message_delay Difference in seconds between the proposal timestamp and the timestamp of the prevote that achieved a quorum in the prevote step.
# TYPE tendermint_consensus_quorum_prevote_message_delay gauge
tendermint_consensus_quorum_prevote_message_delay{chain_id="test-chain-aZbwF1"} 0.013025505
```
## Why this change?
For more information on what these metrics are calculating, see #7202. The aim is to merge to backport these metrics to v0.34 and run nodes on a few popular chains with these metrics to determine the experimental values for `MessageDelay` on these popular chains and use these to select our default `SynchronyParams.MessageDelay` value.
## Why Gauges for the metrics?
Gauges allow us to overwrite the metric on each successive observation. We can then capture these metrics over time to track the highest and lowest observed value.
Where possible, replace uses of the custom JSON library with the standard
library. The custom library treats interface and unnamed lteral types
differently, so this change avoids those even where it would probably be safe
to switch them.
This continues the push of plumbing contexts through tendermint. I
attempted to find all goroutines in the production code (non-test) and
made sure that these threads would exit when their contexts were
canceled, and I believe this PR does that.
This is a very small change, but removes a method from the
`service.Service` interface (a win!) and forces callers to explicitly
pass loggers in to objects during construction rather than (later)
injecting them. There's not a real need for this kind of lazy
construction of loggers, and I think a decent potential for confusion
for mutable loggers.
The main concern I have is that this changes the constructor API for
ABCI clients. I think this is fine, and I suspect that as we plumb
contexts through, and make changes to the RPC services there'll be a
number of similar sorts of changes to various (quasi) public
interfaces, which I think we should welcome.