Browse Source

ADR-74: Migrate Timeout Parameters to Consensus Parameters (#7503)

related to: #7274 and #7275 

Still somewhat uncertain on two things that I'd appreciate more feedback on:
1. The optional temporary local overrides. Perhaps this is superfluous and we can simply make the transition without the override?
2. If this set of parameters seems to be large enough to allow application developers to create the chains they want but not so large as to be needlessly complex.
pull/7601/head
William Banfield 2 years ago
committed by GitHub
parent
commit
2f75899320
No known key found for this signature in database GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 204 additions and 0 deletions
  1. +1
    -0
      docs/architecture/README.md
  2. +203
    -0
      docs/architecture/adr-074-timeout-params.md

+ 1
- 0
docs/architecture/README.md View File

@ -101,3 +101,4 @@ Note the context/background should be written in the present tense.
- [ADR-057: RPC](./adr-057-RPC.md)
- [ADR-069: Node Initialization](./adr-069-flexible-node-initialization.md)
- [ADR-071: Proposer-Based Timestamps](adr-071-proposer-based-timestamps.md)
- [ADR-074: Migrate Timeout Parameters to Consensus Parameters](./adr-074-timeout-params.md)

+ 203
- 0
docs/architecture/adr-074-timeout-params.md View File

@ -0,0 +1,203 @@
# ADR 74: Migrate Timeout Parameters to Consensus Parameters
## Changelog
- 03-Jan-2022: Initial draft (@williambanfield)
- 13-Jan-2022: Updated to indicate work on upgrade path needed (@williambanfield)
## Status
Proposed
## Context
### Background
Tendermint's consensus timeout parameters are currently configured locally by each validator
in the validator's [config.toml][config-toml].
This means that the validators on a Tendermint network may have different timeouts
from each other. There is no reason for validators on the same network to configure
different timeout values. Proper functioning of the Tendermint consensus algorithm
relies on these parameters being uniform across validators.
The configurable values are as follows:
* `TimeoutPropose`
* How long the consensus algorithm waits for a proposal block before issuing a prevote.
* If no prevote arrives by `TimeoutPropose`, then the consensus algorithm will issue a nil prevote.
* `TimeoutProposeDelta`
* How much the `TimeoutPropose` grows each round.
* `TimeoutPrevote`
* How long the consensus algorithm waits after receiving +2/3 prevotes with
no quorum for a value before issuing a precommit for nil.
(See the [arXiv paper][arxiv-paper], Algorithm 1, Line 34)
* `TimeoutPrevoteDelta`
* How much the `TimeoutPrevote` increases with each round.
* `TimeoutPrecommit`
* How long the consensus algorithm waits after receiving +2/3 precommits that
do not have a quorum for a value before entering the next round.
(See the [arXiv paper][arxiv-paper], Algorithm 1, Line 47)
* `TimeoutPrecommitDelta`
* How much the `TimeoutPrecommit` increases with each round.
* `TimeoutCommit`
* How long the consensus algorithm waits after committing a block but before starting the new height.
* This gives a validator a chance to receive slow precommits.
* `SkipTimeoutCommit`
* Make progress as soon as the node has 100% of the precommits.
### Overview of Change
We will consolidate the timeout parameters and migrate them from the node-local
`config.toml` file into the network-global consensus parameters.
The 8 timeout parameters will be consolidated down to 6. These will be as follows:
* `TimeoutPropose`
* Same as current `TimeoutPropose`.
* `TimeoutProposeDelta`
* Same as current `TimeoutProposeDelta`.
* `TimeoutVote`
* How long validators wait for votes in both the prevote
and precommit phase of the consensus algorithm. This parameter subsumes
the current `TimeoutPrevote` and `TimeoutPrecommit` parameters.
* `TimeoutVoteDelta`
* How much the `TimeoutVote` will grow each successive round.
This parameter subsumes the current `TimeoutPrevoteDelta` and `TimeoutPrecommitDelta`
parameters.
* `TimeoutCommit`
* Same as current `TimeoutCommit`.
* `EnableTimeoutCommitBypass`
* Same as current `SkipTimeoutCommit`, renamed for clarity.
A safe default will be provided by Tendermint for each of these parameters and
networks will be able to update the parameters as they see fit. Local updates
to these parameters will no longer be possible; instead, the application will control
updating the parameters. Applications using the Cosmos SDK will be automatically be
able to change the values of these consensus parameters [via a governance proposal][cosmos-sdk-consensus-params].
This change is low-risk. While parameters are locally configurable, many running chains
do not change them from their default values. For example, initializing
a node on Osmosis, Terra, and the Cosmos Hub using the their `init` command produces
a `config.toml` with Tendermint's default values for these parameters.
### Why this parameter consolidation?
Reducing the number of parameters is good for UX. Fewer superfluous parameters makes
running and operating a Tendermint network less confusing.
The Prevote and Precommit messages are both similar sizes, require similar amounts
of processing so there is no strong need for them to be configured separately.
The `TimeoutPropose` parameter governs how long Tendermint will wait for the proposed
block to be gossiped. Blocks are much larger than votes and therefore tend to be
gossiped much more slowly. It therefore makes sense to keep `TimeoutPropose` and
the `TimeoutProposeDelta` as parameters separate from the vote timeouts.
`TimeoutCommit` is used by chains to ensure that the network waits for the votes from
slower validators before proceeding to the next height. Without this timeout, the votes
from slower validators would consistently not be included in blocks and those validators
would not be counted as 'up' from the chain's perspective. Being down damages a validator's
reputation and causes potential stakers to think twice before delegating to that validator.
`TimeoutCommit` also prevents the network from producing the next height as soon as validators
on the fastest hardware with a summed voting power of +2/3 of the network's total have
completed execution of the block. Allowing the network to proceed as soon as the fastest
+2/3 completed execution would have a cumulative effect over heights, eventually
leaving slower validators unable to participate in consensus at all. `TimeoutCommit`
therefore allows networks to have greater variability in hardware. Additional
discussion of this can be found in [tendermint issue 5911][tendermint-issue-5911-comment]
and [spec issue 359][spec-issue-359].
## Alternative Approaches
### Hardcode the parameters
Many Tendermint networks run on similar cloud-hosted infrastructure. Therefore,
they have similar bandwidth and machine resources. The timings for propagating votes
and blocks are likely to be reasonably similar across networks. As a result, the
timeout parameters are good candidates for being hardcoded. Hardcoding the timeouts
in Tendermint would mean entirely removing these parameters from any configuration
that could be altered by either an application or a node operator. Instead,
Tendermint would ship with a set of timeouts and all applications using Tendermint
would use this exact same set of values.
While Tendermint nodes often run with similar bandwidth and on similar cloud-hosted
machines, there are enough points of variability to make configuring
consensus timeouts meaningful. Namely, Tendermint network topologies are likely to be
very different from chain to chain. Additionally, applications may vary greatly in
how long the `Commit` phase may take. Applications that perform more work during `Commit`
require a longer `TimeoutCommit` to allow the application to complete its work
and be prepared for the next height.
## Decision
The decision has been made to implement this work, with the caveat that the
specific mechanism for introducing the new parameters to chains is still ongoing.
## Detailed Design
### New Consensus Parameters
A new `TimeoutParams` `message` will be added to the [params.proto file][consensus-params-proto].
This message will have the following form:
```proto
message TimeoutParams {
google.protobuf.Duration propose = 1;
google.protobuf.Duration propose_delta = 2;
google.protobuf.Duration vote = 3;
google.protobuf.Duration vote_delta = 4;
google.protobuf.Duration commit = 5;
bool enable_commit_timeout_bypass = 6;
}
```
This new message will be added as a field into the [`ConsensusParams`
message][consensus-params-proto]. The same default values that are [currently
set for these parameters][current-timeout-defaults] in the local configuration
file will be used as the defaults for these new consensus parameters in the
[consensus parameter defaults][default-consensus-params].
The new consensus parameters will be subject to the same
[validity rules][time-param-validation] as the current configuration values,
namely, each value must be non-negative.
### Migration
The new `ConsensusParameters` will be added during an upcoming release. In this
release, the old `config.toml` parameters will cease to control the timeouts and
an error will be logged on nodes that continue to specify these values. The specific
mechanism by which these parameters will added to a chain is being discussed in
[RFC-009][rfc-009] and will be decided ahead of the next release.
The specific mechanism for adding these parameters depends on work related to
[soft upgrades][soft-upgrades], which is still ongoing.
## Consequences
### Positive
* Timeout parameters will be equal across all of the validators in a Tendermint network.
* Remove superfluous timeout parameters.
### Negative
### Neutral
* Timeout parameters require consensus to change.
## References
[conseusus-params-proto]: https://github.com/tendermint/spec/blob/a00de7199f5558cdd6245bbbcd1d8405ccfb8129/proto/tendermint/types/params.proto#L11
[hashed-params]: https://github.com/tendermint/tendermint/blob/7cdf560173dee6773b80d1c574a06489d4c394fe/types/params.go#L49
[default-consensus-params]: https://github.com/tendermint/tendermint/blob/7cdf560173dee6773b80d1c574a06489d4c394fe/types/params.go#L79
[current-timeout-defaults]: https://github.com/tendermint/tendermint/blob/7cdf560173dee6773b80d1c574a06489d4c394fe/config/config.go#L955
[config-toml]: https://github.com/tendermint/tendermint/blob/5cc980698a3402afce76b26693ab54b8f67f038b/config/toml.go#L425-L440
[cosmos-sdk-consensus-params]: https://github.com/cosmos/cosmos-sdk/issues/6197
[time-param-validation]: https://github.com/tendermint/tendermint/blob/7cdf560173dee6773b80d1c574a06489d4c394fe/config/config.go#L1038
[tendermint-issue-5911-comment]: https://github.com/tendermint/tendermint/issues/5911#issuecomment-973560381
[spec-issue-359]: https://github.com/tendermint/spec/issues/359
[arxiv-paper]: https://arxiv.org/pdf/1807.04938.pdf
[soft-upgrades]: https://github.com/tendermint/spec/pull/222
[rfc-009]: https://github.com/tendermint/tendermint/pull/7524

Loading…
Cancel
Save