tendermint

Commit Graph

Author	SHA1	Message	Date
Sam Kleinman	c33be0a410	state: propogate error from state store (#8171 ) * state: propogate error from state store * fix lint	3 years ago
Sam Kleinman	c680cca96e	consensus: reduce size of test fixtures and logging rate (#8172 ) We can reduce the size of test fixtures (which will improve test reliability) without impacting these tests' primary role (which is correctness.) Also reducing these test logging will make the tests easier to read, which whill be a good quality of life improvement for devs.	3 years ago
William Banfield	485c96b0d3	consensus: change lock handling in reactor and handleMsg for RoundState (forward-port #7994 #7992 ) (#8139 ) Related to #8157	3 years ago
Sam Kleinman	9a833a8495	consensus: skip channel close during shutdown (#8155 ) I see this panic in tests occasionally, and I don't think there's any need to close this channel: - it's only sent to in one place which has a select case with a default clause, so there's no chance of deadlocks. - the only place we recieve from it thas a timeout.	3 years ago
Sam Kleinman	0bded371c5	testing: logger cleanup (#8153 ) This contains two major changes: - Remove the legacy test logging method, and just explicitly call the noop logger. This is just to make the test logging behavior more coherent and clear. - Move the logging in the light package from the testing.T logger to the noop logger. It's really the case that we very rarely need/want to consider test logs unless we're doing reproductions and running a narrow set of tests. In most cases, I (for one) prefer to run in verbose mode so I can watch progress of tests, but I basically never need to consider logs. If I do want to see logs, then I can edit in the testing.T logger locally (which is what you have to do today, anyway.)	3 years ago
Sam Kleinman	12d13cd31d	mempool: reduce size of test (#8152 ) This is failing intermittently, but it's a really simple test, and I suspect that we're just running into thread scheduling issues on CI nodes. I don't think making the test smaller reduces the utility of this test.	3 years ago
William Banfield	bba8367aac	state: panic on ResponsePrepareProposal validation error (#8145 ) * state: panic on ResponsePrepareProposal validation error * lint++ Co-authored-by: Sam Kleinman <garen@tychoish.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	3 years ago
Sam Kleinman	f1a8f47d4d	types: minor cleanup of un or minimally used types (#8154 )	3 years ago
Sam Kleinman	f61e6e4201	autofile: remove vestigal close mechanism (#8150 )	3 years ago
Sam Kleinman	1db41663c7	consensus: avoid race in accessing channel (#8149 )	3 years ago
Sam Kleinman	5e0e05f938	consensus: avoid persistent kvstore in tests (#8148 )	3 years ago
Sam Kleinman	5bb51aab03	events: remove service aspects of event switch (#8146 )	3 years ago
Sam Kleinman	13f7501950	blocksync: remove intermediate channel (#8140 ) Based on local testing, I'm now convinced that this is ok, and also I think the fact that the new p2p layer has more caching and queue.	3 years ago
JayT106	4400b0f6d3	p2p: adjust max non-persistent peer score (#8137 ) Guarantee persistent peers have the highest connecting priority. The peerStore.Ranked returns an arbitrary order of peers with the same scores.	3 years ago
Sam Kleinman	a68e356596	consensus: avoid extra close channel (#8144 ) Saw this in a test panic, doesn't seem neccessary.	3 years ago
William Banfield	7c91b53999	docs: PBTS synchrony issues runbook (#8129 ) closes: #7756 # What does this pull request change? This pull request adds a new runbook for operators enountering errors related to the new Proposer-Based Timestamps algorithm. The goal of this runbook is to give operators a set of clear steps that they can follow if they are having issues producing blocks because of clock synchronization problems. This pull request also renames the `PrevoteDelay` metrics to drop the term `MessageDelay`. These metrics provide a combined view of `message_delay` + `synchrony` so the name may be confusing. # Questions to reviewers Are there ways to make the set of steps clearer or are there any pieces that seem confusing?	3 years ago
Sam Kleinman	1dd8807cc3	mempool: test harness should expose application (#8143 ) This is minor, but I was trying to write a test and realized that the application reference in the harness isn't actually used, which is quite confusing.	3 years ago
Sam Kleinman	07b46d5a05	blocksync: drop redundant shutdown mechanisms (#8136 )	3 years ago
Sam Kleinman	7a0b05f22d	libs/events: remove unneccessary unsubscription code (#8135 ) The events switch code is largely vestigal and is responsible for wiring between the consensus state machine and the consensus reactor. While there might have been a need, historicallly to managed these subscriptions at runtime, it's nolonger used: subscriptions are registered during startup, and then the switch shuts down at at the end. Eventually the EventSwitch should be replaced by a much smaller implementation of an eventloop in the consensus state machine, but cutting down on the scope of the event switch will help clarify the requirements from the consensus side.	3 years ago
Sam Kleinman	bedb68078c	libs/clist: remove unused surface area (#8134 )	3 years ago
Sam Kleinman	48b1952f18	state: avoid panics for marshaling errors (#8125 )	3 years ago
William Banfield	68c624f5de	abci++: synchronize PrepareProposal with the newest version of the spec (#8094 ) This change implements the logic for the PrepareProposal ABCI++ method call. The main logic for creating and issuing the PrepareProposal request lives in execution.go and is tested in a set of new tests in execution_test.go. This change also updates the mempool mock to use a mockery generated version and removes much of the plumbing for the no longer used ABCIResponses.	3 years ago
Sam Kleinman	faf123bda2	autofile: reduce minor panic and docs changes (#8122 ) * autofile: reduce minor panic and docs changes * fix lint	3 years ago
Sam Kleinman	da5c09cf6f	cleanup: remove commented code (#8123 )	3 years ago
Sam Kleinman	b08dd93d88	libs/log: remove Must constructor (#8120 ) * libs/log: remove Must constructor * Update test/e2e/node/main.go Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com> * use stdlog Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com>	3 years ago
M. J. Fromberger	e9bc33d807	consensus: ensure the node terminates on consensus failure (#8111 ) Updates #8077. The panic handler for consensus currently attempts to effect a clean shutdown, but this can leave a failed node running in an unknown state for an arbitrary amount of time after the failure. Since a panic at this point means consensus is already irrecoverably broken, we should not allow the node to continue executing. After making a best effort to shut down the writeahead log, re-panic to ensure the node will terminate before any further state transitions are processed. Even with this change, it is possible some transitions may occur while the cleanup is happening. It might be preferable to abort unconditionally without any attempt at cleanup. Related changes: - Clean up the creation of WAL directories. - Filter WAL close errors at rethrow.	3 years ago
M. J. Fromberger	658a7661c5	p2p: remove unnecessary panic handling in PEX reactor (#8110 ) The message handling in this reactor is all under control of the reactor itself, and does not call out to callbacks or other externally-supplied code. It doesn't need to check for panics. - Remove an irrelevant channel ID check. - Remove an unnecessary panic recovery wrapper.	3 years ago
M. J. Fromberger	89b4321af2	p2p: update polling interval calculation for PEX requests (#8106 ) The PEX reactor has a simple feedback control mechanism to decide how often to poll peers for peer address updates. The idea is to poll more frequently when knowledge of the network is less, and decrease frequency as knowledge grows. This change solves two problems: 1. It is possible in some cases we may poll a peer "too often" and get dropped by that peer for spamming. 2. The first successful peer update with any content resets the polling timer to a very long time (10m), meaning if we are unlucky in getting an incomplete reply while the network is small, we may not try again for a very long time. This may contribute to difficulties bootstrapping sync. The main change here is to only update the interval when new information is added to the system, and not (as before) whenever a request is sent out to a peer. The rate computation is essentially the same as before, although the code has been a bit simplified, and I consolidated some of the error handling so that we don't have to check in multiple places for the same conditions. Related changes: - Improve error diagnostics for too-soon and overflow conditions. - Clean up state handling in the poll interval computation. - Pin the minimum interval avert a chance of PEX spamming a peer.	3 years ago
JayT106	d9c9675e2a	p2p+flowrate: rate control refactor (#7828 ) Adding `CurrentTransferRate ` in the flowrate package because only the status of the transfer rate has been used.	3 years ago
Sam Kleinman	c35d6d6e2c	node: pass eventbus at construction time (#8084 ) * node: pass eventbus at construction time * remove cruft	3 years ago
Sam Kleinman	691cb52528	statesync: avoid leaking a thread during tests (#8085 ) * statesync: avoid leaking a thread during tests * fix	3 years ago
Sam Kleinman	01266881b8	evidence: manage and initialize state objects more clearly in the pool (#8080 )	3 years ago
M. J. Fromberger	2df5c85a8d	Fix govet errors for %w use in test errors. (#8083 ) The %w syntax is a fmt.Errorf thing, not supported by the testing package.	3 years ago
Sam Kleinman	8df7b6103f	proxy: collapse triforcated abci.Client (#8067 )	3 years ago
Sam Kleinman	f1659ce329	consensus: fix TestInvalidState race and reporting (#8071 )	3 years ago
William Banfield	0b8a62c87b	abci: Synchronize FinalizeBlock with the updated specification (#7983 ) This change set implements the most recent version of `FinalizeBlock`. # What does this change actually contain? * This change set is rather large but fear not! The majority of the files touched and changes are renaming `ResponseDeliverTx` to `ExecTxResult`. This should be a pretty inoffensive change since they're effectively the same type but with a different name. * The `execBlockOnProxyApp` was totally removed since it served as just a wrapper around the logic that is now mostly encapsulated within `FinalizeBlock` * The `updateState` helper function has been made a public method on `State`. It was being exposed as a shim through the testing infrastructure, so this seemed innocuous. * Tests already existed to ensure that the application received the `ByzantineValidators` and the `ValidatorUpdates`, but one was fixed up to ensure that `LastCommitInfo` was being sent across. * Tests were removed from the `psql` indexer that seemed to search for an event in the indexer that was not being created. # Questions for reviewers * We store this [ABCIResponses](`5721a13ab1/proto/tendermint/state/types.pb.go (L37))` type in the data base as the block results. This type has changed since v0.35 to contain the `FinalizeBlock` response. I'm wondering if we need to do any shimming to keep the old data retrieveable? * Similarly, this change is exposed via the RPC through [ResultBlockResults](`5721a13ab1/rpc/coretypes/responses.go (L69))` changing. Should we somehow shim or notify for this change? closes: #7658	3 years ago
Sam Kleinman	0167f0d527	node: nodes should fetch state on startup (#8062 )	3 years ago
Sam Kleinman	9d98484845	node: excise node handle within rpc env (#8063 )	3 years ago
M. J. Fromberger	63ff2f052d	Remove now-unused and deprecated Subscribe methods. (#8064 ) Both pubsub and eventbus are internal packages now, and all the existing use has been updated to use the SubscribeWithArgs methods instead.	3 years ago
Sam Kleinman	a3881f0fb1	consensus: improve wal test cleanup (#8059 ) I believe that this gets rid of our temp-file related test errors.	3 years ago
M. J. Fromberger	af96ef2fe4	rpc: set a minimum long-polling interval for Events (#8050 ) Since the goal of reading events at the head of the event log is to satisfy a subscription style interface, there is no point in allowing head polling with no wait interval. The pagination case already bypasses long polling, so the extra option is unneessary. Set a minimum default long-polling interval for the head case. Add a test for minimum delay.	3 years ago
M. J. Fromberger	a22942504c	p2p: re-enable tests previously disabled (#8049 )	3 years ago
Sam Kleinman	21087563eb	consensus: validator set changes test cleanup (#8035 ) This is mostly an extremely small change where I double a somewhat arbitrarly set timeout from 1m to 2m for an entire test. When I put these timeouts in the test, they were arbitrary based on my local performance (which is quite fact,) and I expected that they'd need to be tweaked in the future. A big chunk of this PR is reworking a collection of helper functions that produce somewhat intractable messages when a test fails, so that the error messages take up less vertical space, hopefully without losing any debugability.	3 years ago
Sam Kleinman	a965f03c15	statesync: avoid compounding retry logic for fetching consensus parameters (#8032 ) We're waiting between trying witnesses (which shouldn't be neccessary because the witnesses shouldn't depend on each other,) and also between attempts, and really the outer sleep should be enough.	3 years ago
Sam Kleinman	58dc172611	p2p: plumb rudamentary service discovery to rectors and update statesync (#8030 ) This is a little coarse, but the idea is that we'll send information about the channels a peer has upon the peer-up event that we send to reactors that we can then use to reject peers (if neeeded) from reactors. This solves the problem where statesync would hang in test networks (and presumably real) where we would attempt to statesync from seed nodes, thereby hanging silently forever.	3 years ago
Sam Kleinman	f25b7ceeb2	consensus: make orchestration more reliable for invalid precommit test (#8013 ) Co-authored-by: M. J. Fromberger <fromberger@interchain.io>	3 years ago
Sam Kleinman	a153f82433	p2p: ignore transport close error during cleanup (#8011 ) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	3 years ago
William Banfield	c80734e5af	state: synchronize the ProcessProposal implementation with the latest version of the spec (#7961 ) This change implements the spec for `ProcessProposal`. It first calls the Tendermint block validation logic to check that all of the proposed block fields are well formed and do not violate any of the rules for Tendermint to consider the block valid and then passes the validated block the `ProcessProposal`. This change also adds additional fixtures to test the change. It adds the `baseMock` types that holds a mock as well as a reference to `BaseApplication`. If the function was not setup by the test on the contained mock Application, the type delegates to the `BaseApplication` and returns what `BaseApplication` returns. The change also switches the `makeState` helper to take an arg struct so that an ABCI application can be plumbed through when needed. closes: #7656	3 years ago
Sam Kleinman	89dbebd1c5	p2p: retry failed connections slightly more aggressively (#8010 ) * p2p: retry failed connections slightly more aggressively * fix dial interval test	3 years ago
Sam Kleinman	c8ae5db50e	p2p: relax pong timeout (#8007 )	3 years ago

1 2 3 4 5 ...

341 Commits (c33be0a4106fcefe77bf67eef020a3f32932341f)