tendermint

Commit Graph

Author	SHA1	Message	Date
William Banfield	177850a2c9	statesync: remove deadlock on init fail (#7029 ) When statesync is stopped during shutdown, it has the possibility of deadlocking. A dump of goroutines reveals that this is related to the peerUpdates channel not returning anything on its `Done()` channel when `OnStop` is called. As this is occuring, `processPeerUpdate` is attempting to acquire the reactor lock. It appears that this lock can never be acquired. I looked for the places where the lock may remain locked accidentally and cleaned them up in hopes to eradicate the issue. Dumps of the relevant goroutines may be found below. Note that the line numbers below are relative to the code in the `v0.35.0-rc1` tag. ``` goroutine 36 [chan receive]: github.com/tendermint/tendermint/internal/statesync.(Reactor).OnStop(0xc00058f200) github.com/tendermint/tendermint/internal/statesync/reactor.go:243 +0x117 github.com/tendermint/tendermint/libs/service.(BaseService).Stop(0xc00058f200, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/node.(nodeImpl).OnStop(0xc0001ea240) github.com/tendermint/tendermint/node/node.go:769 +0x132 github.com/tendermint/tendermint/libs/service.(BaseService).Stop(0xc0001ea240, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1() github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62 github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000629500, 0x7fdb52f96358, 0xc0002b5030, 0xc00000daa0) github.com/tendermint/tendermint/libs/os/os.go:26 +0x102 created by github.com/tendermint/tendermint/libs/os.TrapSignal github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6 goroutine 188 [semacquire]: sync.runtime_SemacquireMutex(0xc00026b1cc, 0x0, 0x1) runtime/sema.go:71 +0x47 sync.(Mutex).lockSlow(0xc00026b1c8) sync/mutex.go:138 +0x105 sync.(Mutex).Lock(...) sync/mutex.go:81 sync.(RWMutex).Lock(0xc00026b1c8) sync/rwmutex.go:111 +0x90 github.com/tendermint/tendermint/internal/statesync.(Reactor).processPeerUpdate(0xc00026b080, 0xc000650008, 0x28, 0x124de90, 0x4) github.com/tendermint/tendermint/internal/statesync/reactor.go:849 +0x1a5 github.com/tendermint/tendermint/internal/statesync.(Reactor).processPeerUpdates(0xc00026b080) github.com/tendermint/tendermint/internal/statesync/reactor.go:883 +0xab created by github.com/tendermint/tendermint/internal/statesync.(Reactor.OnStart github.com/tendermint/tendermint/internal/statesync/reactor.go:219 +0xcd) ```	3 years ago
Sam Kleinman	71c6682b57	statesync: clean up reactor/syncer lifecylce (#6995 ) I've been noticing that there are a number of situations where the statesync reactor blocks waiting for peers (or similar,) I've moved things around to improve outcomes in local tests.	3 years ago
Sam Kleinman	1c4950dbd2	state: move package to internal (#6964 )	3 years ago
JayT106	84ffaaaf37	statesync/rpc: metrics for the statesync and the rpc SyncInfo (#6795 )	3 years ago
Sam Kleinman	9dfdc62eb7	proxy: move proxy package to internal (#6953 )	3 years ago
Callum Waters	bda948e814	statesync: implement p2p state provider (#6807 )	3 years ago
Sam Kleinman	8228936155	e2e: extend timeouts in test harness (#6694 )	3 years ago
Callum Waters	051e127d38	light: correctly handle contexts (#6687 )	3 years ago
Sam Kleinman	ae5f98881b	p2p: make NodeID and NetAddress public (#6583 )	3 years ago
Callum Waters	6e238b5b9d	statesync: make fetching chunks more robust (#6587 )	3 years ago
Aleksandr Bezobchuk	7d961b55b2	state sync: tune request timeout and chunkers (#6566 )	3 years ago
Callum Waters	74af343f28	statesync: tune backfill process (#6565 ) This PR make some tweaks to backfill after running e2e tests: - Separates sync and backfill as two distinct processes that the node calls. The reason is because if sync fails then the node should fail but if backfill fails it is still possible to proceed. - Removes peers who don't have the block at a height from the local peer list. As the process goes backwards if a node doesn't have a block at a height they're likely pruning blocks and thus they won't have any prior ones either. - Sleep when we've run out of peers, then try again.	3 years ago
Sam Kleinman	a855f96946	p2p: renames for reactors and routing layer internal moves (#6547 )	3 years ago
Marko	719e028e00	libs: internalize some packages (#6366 ) ## Description Internalize some libs. This reduces the amount ot public API tendermint is supporting. The moved libraries are mainly ones that are used within Tendermint-core.	4 years ago
Sam Kleinman	d36a5905a6	statesync: improve e2e test outcomes (#6378 ) I believe that this, in my testing seems to help the e2e state-sync tests complete more reliably, by fixing some potential, range-related slice building, as well as the way the test app hashes snapshots. Additionally, and I'm not sure if we want to do this, but I added this hook to the reactor that re-sends the request for snapshots during the retry. This helps in tests prevent systems from getting stuck, but I think in reality, it might create more traffic, and operators would just restart a state-syncing node to get a similar effect.	4 years ago
Callum Waters	162f67cf26	correct spelling to US english (#6077 )	4 years ago
Anton Kaliaev	d76add65a6	libs/log: format []byte as hexidecimal string (uppercased) (#5960 ) Closes: #5806 Co-authored-by: Lanie Hei <heixx011@umn.edu>	4 years ago
Erik Grinaker	1b6df6783d	p2p: replace PeerID with NodeID	4 years ago
Aleksandr Bezobchuk	a879eb444d	p2p: state sync reactor refactor (#5671 )	4 years ago
Anton Kaliaev	e13b4386ff	abci: modify Client interface and socket client (#5673 ) `abci.Client`: - Sync and Async methods now accept a context for cancellation * grpc client uses context to cancel both Sync and Async requests * local client ignores context parameter * socket client uses context to cancel Sync requests and to drop Async requests before sending them if context was cancelled prior to that - Async methods return an error * socket client returns an error immediately if queue is full for Async requests * local client always returns nil error * grpc client returns an error if context was cancelled before we got response or the receiving queue had a space for response (do not confuse with the sending queue from the socket client) - specify clients semantics in [doc.go](https://raw.githubusercontent.com/tendermint/tendermint/27112fffa62276bc016d56741f686f0f77931748/abci/client/doc.go) `mempool.TxInfo` - add optional `Context` to `TxInfo`, which can be used to cancel `CheckTx` request Closes #5190	4 years ago
Erik Grinaker	f83ecdad1d	config: add state sync discovery_time setting (#5399 ) Reduces the state sync discovery time from 20 to 15 seconds, and makes it configurable.	4 years ago
Anton Kaliaev	85a4be87a7	rpc/client: take context as first param (#5347 ) Closes #5145 also applies to light/client	4 years ago
Marko	2ac5a559b4	libs: wrap mutexes for build flag with godeadlock (#5126 ) ## Description This PR wraps the stdlib sync.(RW)Mutex & godeadlock.(RW)Mutex. This enables using go-deadlock via a build flag instead of using sed to replace sync with godeadlock in all files Closes: #3242	4 years ago
Erik Grinaker	59a17b28a7	proto: improve enums (#5099 ) Fixes some minor issues with Protobuf enums, not likely to break anything. Branched off of #5096, rebase to `master` before merging.	4 years ago
Marko	dedf0d2350	proto: folder structure adhere to buf (#5025 )	4 years ago
Marko	b9af87c4ea	state: proto migration (#4951 )	5 years ago
Marko	4e6a844d6f	statesync: use Protobuf instead of Amino for p2p traffic (#4943 ) ## Description Closes: #XXX	5 years ago
Erik Grinaker	81c2798df0	abci: fix protobuf lint issues Fix some linter issues to conform with the Protobuf style guide. The state sync enum changes are ok to break since it's not released yet. Personally I find the uppercase kind of ugly, but that's what the guide says. Couldn't find a way to generate camel case in Go, short of specifying custom names for each and every enum variant. Another option would be to simply disable the enum case lint.	5 years ago
Erik Grinaker	511ab6717c	add state sync reactor (#4705 ) Fixes #828. Adds state sync, as outlined in [ADR-053](https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-053-state-sync-prototype.md). See related PRs in Cosmos SDK (https://github.com/cosmos/cosmos-sdk/pull/5803) and Gaia (https://github.com/cosmos/gaia/pull/327). This is split out of the previous PR #4645, and branched off of the ABCI interface in #4704. * Adds a new P2P reactor which exchanges snapshots with peers, and bootstraps an empty local node from remote snapshots when requested. * Adds a new configuration section `[statesync]` that enables state sync and configures the light client. Also enables `statesync:info` logging by default. * Integrates state sync into node startup. Does not support the v2 blockchain reactor, since it needs some reorganization to defer startup.	5 years ago

13 Commits (44b5d330b07103d079dfbbe610f4cdba914fdbc6)