*light: rpc /status returns status of light client ; code refactoring
light: moved lightClientInfo into light.go, renamed String to ID
test/e2e: Return light client trusted height instead of SyncInfo trusted height
test/e2e/start.go: Not waiting for light client to catch up in tests. Removed querying of syncInfo in start if the node is a light node
* light: Removed call to primary /status. Added trustedPeriod to light info
* light/provider: added ID function to return IP of primary and witnesses
* light/provider/http/http_test: renamed String() to ID()
This change has two main effects:
1. Remove most of the Async methods from the abci.Client interface.
Remaining are FlushAsync, CommitTxAsync, and DeliverTxAsync.
2. Rename the synchronous methods to remove the "Sync" suffix.
The rest of the change is updating the implementations, subsets, and mocks of
the interface, along with the call sites that point to them.
* Fix stringly-typed mock stubs.
* Rename helper method.
After #7592, @cmwaters noticed that the logic for re-using old timestamps for proposals may not work with proposer-based timestamps. This change removes the logic to re-use old proposal timestamps since it is no longer correct. Two proposals with different timestamps can no longer be treated as equivalent. Signing a proposal that only differs by timestamp in the new algorithm can be thought of as roughly equivalent to signing a proposal that only differs by `BlockID` in the old scheme.
I also investigated the codebase and checked for any place we updated a timestamp using the pattern `(Timestamp = |Timestamp: )` and saw no additional places where we are updating the timestamp of a proposal message.
Here is the output of that search:
```
privval/file.go:372: vote.Timestamp = timestamp
privval/file.go:453: lastVote.Timestamp = now
privval/file.go:454: newVote.Timestamp = now
internal/test/factory/commit.go:25: Timestamp: now,
internal/test/factory/vote.go:34: Timestamp: time,
internal/consensus/state.go:2261: Timestamp: cs.voteTime(),
internal/consensus/state.go:2286: vote.Timestamp = v.Timestamp
light/detector.go:414: ev.Timestamp = common.Time
light/detector.go:418: ev.Timestamp = trusted.Time
types/block.go:616: Timestamp: ts,
types/block.go:725: Timestamp: cs.Timestamp,
types/block.go:736: cs.Timestamp = csp.Timestamp
types/block.go:800: Timestamp: commitSig.Timestamp,
types/evidence.go:84: Timestamp: blockTime,
types/evidence.go:190: dve.Timestamp = evidenceTime
types/evidence.go:202: Timestamp: dve.Timestamp,
types/evidence.go:228: Timestamp: pb.Timestamp,
types/evidence.go:382: Timestamp: %v}#%X`,
types/evidence.go:491: l.Timestamp = evidenceTime
types/evidence.go:517: Timestamp: l.Timestamp,
types/evidence.go:546: Timestamp: lpb.Timestamp,
types/evidence.go:722: Timestamp: time,
types/vote.go:80: Timestamp: vote.Timestamp,
types/vote.go:216: Timestamp: vote.Timestamp,
types/vote.go:240: vote.Timestamp = pv.Timestamp
types/test_util.go:27: Timestamp: now,
types/proposal.go:44: Timestamp: tmtime.Now(),
types/proposal.go:132: pb.Timestamp = p.Timestamp
types/proposal.go:157: p.Timestamp = pp.Timestamp
types/canonical.go:49: Timestamp: proposal.Timestamp,
types/canonical.go:62: Timestamp: vote.Timestamp,
test/e2e/runner/evidence.go:186: Timestamp: evTime,
```
This averts a log-after-close issue. We should probably also chase the shutdown
issues, but since ABCI clients should generally only shut down once per process
I don't think this is a real priority, and the trace is hairy.
The test filter was looking for "TestGoFiles", which does not include tests in
a separate package (e.g., "package foo_test" for "package foo").
This caused several packages not to be tested in CI, including:
github.com/tendermint/tendermint/abci/client
github.com/tendermint/tendermint/crypto
github.com/tendermint/tendermint/crypto/tmhash
github.com/tendermint/tendermint/internal/eventbus
github.com/tendermint/tendermint/internal/evidence
github.com/tendermint/tendermint/internal/inspect
github.com/tendermint/tendermint/internal/jsontypes
github.com/tendermint/tendermint/internal/libs/protoio
github.com/tendermint/tendermint/internal/libs/sync
github.com/tendermint/tendermint/internal/p2p/pex
github.com/tendermint/tendermint/internal/pubsub
github.com/tendermint/tendermint/internal/pubsub/query
github.com/tendermint/tendermint/internal/pubsub/query/syntax
github.com/tendermint/tendermint/internal/state/indexer
github.com/tendermint/tendermint/internal/state/indexer/block/kv
github.com/tendermint/tendermint/libs/json
github.com/tendermint/tendermint/libs/log
github.com/tendermint/tendermint/libs/os
github.com/tendermint/tendermint/light
github.com/tendermint/tendermint/light/provider/http
github.com/tendermint/tendermint/privval/grpc
github.com/tendermint/tendermint/proto/tendermint/blocksync
github.com/tendermint/tendermint/proto/tendermint/consensus
github.com/tendermint/tendermint/proto/tendermint/statesync
github.com/tendermint/tendermint/rpc/client
github.com/tendermint/tendermint/rpc/client/mock
github.com/tendermint/tendermint/test/e2e/tests
github.com/tendermint/tendermint/test/fuzz/mempool
github.com/tendermint/tendermint/test/fuzz/p2p/secretconnection
github.com/tendermint/tendermint/test/fuzz/rpc/jsonrpc/server
Updates #7626 and #7634.
The interaction between defers and t.Cleanup can be delicate.
For this case, which regularly flakes in CI, be explicit:
Defer the closes and waits before making any attempt to leaktest.
During file rotation and WAL shutdown, there was a race condition between users
of an autofile and its termination. To fix this, ensure operations on an
autofile are properly synchronized, and report errors when attempting to use an
autofile after it was closed.
Notably:
- Simplify the cancellation protocol between signal and Close.
- Exclude writers to an autofile during rotation.
- Add documentation about what is going on.
There is a lot more that could be improved here, but this addresses the more
obvious races that have been panicking unit tests.
## What does this pull request do?
This pull requests adds two metrics intended for use in calculating an experimental value for `MessageDelay`.
The metrics are as follows:
```
# HELP tendermint_consensus_complete_prevote_message_delay Difference in seconds between the proposal timestamp and the timestamp of the prevote that achieved 100% of the voting power in the prevote step.
# TYPE tendermint_consensus_complete_prevote_message_delay gauge
tendermint_consensus_complete_prevote_message_delay{chain_id="test-chain-aZbwF1"} 0.013025505
# HELP tendermint_consensus_quorum_prevote_message_delay Difference in seconds between the proposal timestamp and the timestamp of the prevote that achieved a quorum in the prevote step.
# TYPE tendermint_consensus_quorum_prevote_message_delay gauge
tendermint_consensus_quorum_prevote_message_delay{chain_id="test-chain-aZbwF1"} 0.013025505
```
## Why this change?
For more information on what these metrics are calculating, see #7202. The aim is to merge to backport these metrics to v0.34 and run nodes on a few popular chains with these metrics to determine the experimental values for `MessageDelay` on these popular chains and use these to select our default `SynchronyParams.MessageDelay` value.
## Why Gauges for the metrics?
Gauges allow us to overwrite the metric on each successive observation. We can then capture these metrics over time to track the highest and lowest observed value.
This commit changes the behaviour of the /unconfirmed_txs endpoint by replacing limit with a page and perPage parameter for pagination.
The test case for unconfirmed_txs have been accommodated to properly test this change and the documentation for the API as well.
The custom error types in the provider package did not propagate their wrapped
underlying reasons, making it difficult for the test to check that the correct
error was observed.
- Fix the custom errors to have a true underlying error (not just a string).
- Add Unwrap methods to support inspection by errors.Is.
- Update usage in a few places.
- Fix the test to check for acceptable variation.
Fixes#7609.
After writing and then reading a bunch of random messages, the test was
checking that it did not read the same number of messages that it wrote.
The sense of this check was inverted; they should match.
Introduced by accident in #7522. I'm not sure why this did not show up in CI.
Edit: I now know why it didn't show up in ci: #7608.
Add package jsontypes that implements a subset of the custom libs/json
package. Specifically it handles encoding and decoding of interface types
wrapped in "tagged" JSON objects. It omits the deep reflection on arbitrary
types, preserving only the handling of type tags wrapper encoding.
- Register interface types (Evidence, PubKey, PrivKey) for tagged encoding.
- Update the existing implementations to satisfy the type.
- Register those types with the jsontypes registry.
- Add string tags to 64-bit integer fields where needed.
- Add marshalers to structs that export interface-typed fields.
related to: #7274 and #7275
Still somewhat uncertain on two things that I'd appreciate more feedback on:
1. The optional temporary local overrides. Perhaps this is superfluous and we can simply make the transition without the override?
2. If this set of parameters seems to be large enough to allow application developers to create the chains they want but not so large as to be needlessly complex.
The parameters for RPC GET requests are parsed from query arguments in the
request URL. Rework this code to remove the need for tmjson. The structure of a
call still requires reflection, and still works the same way as before, but the
code structure has been simplified and cleaned up a bit.
Points of note:
- Consolidate handling of pointer types, so we only need to dereference once.
- Reduce the number of allocations of reflective types.
- Report errors for unsupported types rather than returning untyped nil.
Update the tests as well. There was one test case that checked for an error on
a behaviour the OpenAPI docs explicitly demonstrates as supported, so I fixed
that test case, and also added some new ones for cases that weren't checked.
Related:
* Update e2e base Go image to 1.17 (to match config).
Require that RPC functions take a context as their first argument, and return
an error as either their only result, or the second of two results.
This does not change how functions are dispatched, but will make it a little
easier to make more invasive changes in the near future.
Instead of taking a comma-separated string of parameter names, take each
parameter name as a separate argument. Now that we no longer have an extra flag
for caching, this fits nicely into a variadic trailer.
* Update all usage of NewRPCFunc and NewWSRPCFunc.
Define interfaces for the various methods a service may implement. This is
basically just the set of things on Environment that are exported as RPCs, but
these are also implemented by the light proxy.
* internal/rpc: use NewRoutesMap to construct routes on service start
* light/proxy: use NewRoutesMap to construct RPC routes
Rather than installing two separate panic handlers, defer the bookkeeping
separately from recovery, and lift the delegated handler call out to the top
level of the wrapper.
Also: Regularize the server middleware wrappers.
Add writeRPCResponse and writeHTTPResponse helpers, that handle the way RPC
responses are written to HTTP replies. These replace the exported helpers.
Visible effects:
- JSON results are now marshaled without indentation.
- HTTP status codes are now normalized.
- Cache control headers are no longer set.
Details:
- When writing a response to a URL (GET) request, do not marshal the whole
JSON-RPC object into the body, only encode the result or the error object.
This is a user-visible change.
- Do not change the HTTP status code for RPC errors. The RPC error already
reports what went wrong, the HTTP status should only report problems with the
HTTP transaction itself. This is a user-visible change.
- Encode JSON without indentation in POST response bodies. This is mainly cosmetic
but saves quite a bit of response data. Indent is still applied to GET responses to make
life easier for code examples.
- Remove an obsolete TODO about reporting an HTTP error on websocket upgrade.
Nothing needed to change; the upgrader already reports an error.
- Report an HTTP error when starting the server loop fails.
- Improve logging for encoding errors.
- Log less aggressively.
In two cases, we check for the content of an error right after asserting that
no error occurs. Fix the sense of those checks.
In one case, we check that there is no error with the diagnostic "expected
error". It's not clear whether this means "an error was expected" (which is
what I believe) or "we got the expected error". However, given the way the mock
plumbing is set up, the first interpretation seems right.