tendermint

Commit Graph

Author	SHA1	Message	Date
Sam Kleinman	7ed57ef5f9	statesync: more orderly dispatcher shutdown (#7601 )	3 years ago
Sam Kleinman	e07c4cdcf2	node: collapse initialization internals (#7567 )	3 years ago
Sam Kleinman	5bf1bdcfb4	reactors: skip log on some routine cancels (#7556 )	3 years ago
Sam Kleinman	fc36c7782f	statesync: reactor and channel construction (#7529 )	3 years ago
Sam Kleinman	bef120dadf	contexts: remove all TODO instances (#7466 )	3 years ago
Sam Kleinman	d0e03f01fc	sync: remove special mutexes (#7438 )	3 years ago
Sam Kleinman	65c0aaee5e	p2p: use recieve for channel iteration (#7425 )	3 years ago
Sam Kleinman	bd6dc3ca88	p2p: refactor channel Send/out (#7414 )	3 years ago
Sam Kleinman	cb88bd3941	p2p: migrate to use new interface for channel errors (#7403 ) * p2p: migrate to use new interface for channel errors * Update internal/p2p/p2ptest/require.go Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com> * rename * feedback Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com>	3 years ago
Sam Kleinman	892f5d9524	service: cleanup mempool and peer update shutdown (#7401 )	3 years ago
Sam Kleinman	0ff3d4b89d	service: cleanup close channel in reactors (#7399 )	3 years ago
Sam Kleinman	a62ac27047	service: remove exported logger from base implemenation (#7381 )	3 years ago
Sam Kleinman	8a991e288c	service: plumb contexts to all (most) threads (#7363 ) This continues the push of plumbing contexts through tendermint. I attempted to find all goroutines in the production code (non-test) and made sure that these threads would exit when their contexts were canceled, and I believe this PR does that.	3 years ago
Sam Kleinman	6ab62fe7b6	service: remove stop method and use contexts (#7292 )	3 years ago
Sam Kleinman	ca8f004112	p2p: remove final shims from p2p package (#7136 ) This is, perhaps, the trival final piece of #7075 that I've been working on. There's more work to be done: - push more of the setup into the pacakges themselves - move channel-based sending/filtering out of the - simplify the buffering throuhgout the p2p stack.	3 years ago
Sam Kleinman	cbe6ad6cd5	p2p: flatten channel descriptor (#7132 )	3 years ago
Sam Kleinman	0900ea8396	p2p: channel shim cleanup (#7129 )	3 years ago
Sam Kleinman	f4a56f4034	p2p: refactor channel description (#7130 ) This is another small sliver of #7075, with the intention of removing the legacy shim layer related to channel registration.	3 years ago
Sam Kleinman	ded310093e	lint: fix collection of stale errors (#7090 ) Few things that had been annoying.	3 years ago
Sam Kleinman	1b5bb5348f	p2p: cleanup unused arguments (#7079 ) This is mostly just reading through the output of uparam, after noticing that there were a few places where we were ignoring some arguments.	3 years ago
William Banfield	243c62cc68	statesync: improve rare p2p race condition (#7042 ) This is intended to fix a test failure that occurs in the p2p state provider. The issue presents as the state provider timing out waiting for the consensus params response. The reason that this can occur is because the statesync reactor has the possibility of attempting to respond to the params request before the state provider is ready to read it. This results in the reactor hitting the `default` case seen here and then never sending on the channel. The stateprovider will then block waiting for a response and never receive one because the reactor opted not to send it.	3 years ago
William Banfield	177850a2c9	statesync: remove deadlock on init fail (#7029 ) When statesync is stopped during shutdown, it has the possibility of deadlocking. A dump of goroutines reveals that this is related to the peerUpdates channel not returning anything on its `Done()` channel when `OnStop` is called. As this is occuring, `processPeerUpdate` is attempting to acquire the reactor lock. It appears that this lock can never be acquired. I looked for the places where the lock may remain locked accidentally and cleaned them up in hopes to eradicate the issue. Dumps of the relevant goroutines may be found below. Note that the line numbers below are relative to the code in the `v0.35.0-rc1` tag. ``` goroutine 36 [chan receive]: github.com/tendermint/tendermint/internal/statesync.(Reactor).OnStop(0xc00058f200) github.com/tendermint/tendermint/internal/statesync/reactor.go:243 +0x117 github.com/tendermint/tendermint/libs/service.(BaseService).Stop(0xc00058f200, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/node.(nodeImpl).OnStop(0xc0001ea240) github.com/tendermint/tendermint/node/node.go:769 +0x132 github.com/tendermint/tendermint/libs/service.(BaseService).Stop(0xc0001ea240, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1() github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62 github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000629500, 0x7fdb52f96358, 0xc0002b5030, 0xc00000daa0) github.com/tendermint/tendermint/libs/os/os.go:26 +0x102 created by github.com/tendermint/tendermint/libs/os.TrapSignal github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6 goroutine 188 [semacquire]: sync.runtime_SemacquireMutex(0xc00026b1cc, 0x0, 0x1) runtime/sema.go:71 +0x47 sync.(Mutex).lockSlow(0xc00026b1c8) sync/mutex.go:138 +0x105 sync.(Mutex).Lock(...) sync/mutex.go:81 sync.(RWMutex).Lock(0xc00026b1c8) sync/rwmutex.go:111 +0x90 github.com/tendermint/tendermint/internal/statesync.(Reactor).processPeerUpdate(0xc00026b080, 0xc000650008, 0x28, 0x124de90, 0x4) github.com/tendermint/tendermint/internal/statesync/reactor.go:849 +0x1a5 github.com/tendermint/tendermint/internal/statesync.(Reactor).processPeerUpdates(0xc00026b080) github.com/tendermint/tendermint/internal/statesync/reactor.go:883 +0xab created by github.com/tendermint/tendermint/internal/statesync.(Reactor.OnStart github.com/tendermint/tendermint/internal/statesync/reactor.go:219 +0xcd) ```	3 years ago
Sam Kleinman	23fe6fd2f9	statesync: ensure test network properly configured (#7026 ) This test reliably gets hung up on network configuration, (which may be a real issue,) but it's network setup is handcranked and we should ensure that the test focuses on it's core assertions and doesn't fail for test architecture reasons.	3 years ago
Sam Kleinman	9a16d930c6	statesync: add logging while waiting for peers (#7007 )	3 years ago
Sam Kleinman	71c6682b57	statesync: clean up reactor/syncer lifecylce (#6995 ) I've been noticing that there are a number of situations where the statesync reactor blocks waiting for peers (or similar,) I've moved things around to improve outcomes in local tests.	3 years ago
Sam Kleinman	bb8ffcb95b	store: move pacakge to internal (#6978 )	3 years ago
Sam Kleinman	1c4950dbd2	state: move package to internal (#6964 )	3 years ago
JayT106	84ffaaaf37	statesync/rpc: metrics for the statesync and the rpc SyncInfo (#6795 )	3 years ago
Sam Kleinman	9dfdc62eb7	proxy: move proxy package to internal (#6953 )	3 years ago
Callum Waters	bda948e814	statesync: implement p2p state provider (#6807 )	3 years ago
JayT106	e70445f942	statesync/event: emit statesync start/end event (#6700 )	3 years ago
Callum Waters	6dd0cf92c8	router/statesync: add helpful log messages (#6724 )	3 years ago
Sam Kleinman	ab5c63eff3	statesync: increase dispatcher timeout (#6714 )	3 years ago
Callum Waters	a12e2bbb60	statesync: use initial height as a floor to backfilling (#6709 )	3 years ago
William Banfield	cabd916517	Revert "statesync: keep peer despite lightblock query fail (#6692 )" (#6696 ) * Revert "statesync: keep peer despite lightblock query fail (#6692)" This reverts commit `50b00dff71`.	3 years ago
William Banfield	50b00dff71	statesync: keep peer despite lightblock query fail (#6692 ) When a peer responds with no lightblock for the height we queried, we call the [removePeer method](https://github.com/tendermint/tendermint/blob/master/internal/statesync/reactor.go#L339). This removes the peer from the [dispatcher's list of called peer's](`ad65883152/internal/statesync/dispatcher.go (L159))`. When the dispatcher then receives responses from the removed peer, it [drops their responses](`ad65883152/internal/statesync/dispatcher.go (L130))`. These responses may be meaningful or contain a block or data that will help statesync proceed. [The logs](https://gist.github.com/tychoish/34a1f61eaae3c36c23efc7d0001e805c), when this change is applied, show an additional 3 networking testnets passing. addresses: #6691	3 years ago
Callum Waters	051e127d38	light: correctly handle contexts (#6687 )	3 years ago
Aleksandr Bezobchuk	1dec3e139a	add stacktrace to panic logs (#6662 )	3 years ago
Callum Waters	a1e1e6c290	test: fix non-deterministic backfill test (#6648 )	3 years ago
Sam Kleinman	917180dfd2	p2p: reduce buffering on channels (#6609 ) Having smaller buffers in each reactor/channel will mean that there will be fewer stale messages.	3 years ago
Callum Waters	6e238b5b9d	statesync: make fetching chunks more robust (#6587 )	3 years ago
Callum Waters	25bb556fee	p2p: increase queue size to 16MB (#6588 )	3 years ago
Aleksandr Bezobchuk	7d961b55b2	state sync: tune request timeout and chunkers (#6566 )	3 years ago
Callum Waters	74af343f28	statesync: tune backfill process (#6565 ) This PR make some tweaks to backfill after running e2e tests: - Separates sync and backfill as two distinct processes that the node calls. The reason is because if sync fails then the node should fail but if backfill fails it is still possible to proceed. - Removes peers who don't have the block at a height from the local peer list. As the process goes backwards if a node doesn't have a block at a height they're likely pruning blocks and thus they won't have any prior ones either. - Sleep when we've run out of peers, then try again.	4 years ago
Callum Waters	6f6ac5c04e	state sync: reverse sync implementation (#6463 )	4 years ago
Sam Kleinman	a855f96946	p2p: renames for reactors and routing layer internal moves (#6547 )	4 years ago
Marko	719e028e00	libs: internalize some packages (#6366 ) ## Description Internalize some libs. This reduces the amount ot public API tendermint is supporting. The moved libraries are mainly ones that are used within Tendermint-core.	4 years ago
Sam Kleinman	d36a5905a6	statesync: improve e2e test outcomes (#6378 ) I believe that this, in my testing seems to help the e2e state-sync tests complete more reliably, by fixing some potential, range-related slice building, as well as the way the test app hashes snapshots. Additionally, and I'm not sure if we want to do this, but I added this hook to the reactor that re-sends the request for snapshots during the retry. This helps in tests prevent systems from getting stuck, but I think in reality, it might create more traffic, and operators would just restart a state-syncing node to get a similar effect.	4 years ago
Aleksandr Bezobchuk	a554005136	p2p: revised router message scheduling (#6126 )	4 years ago
Aleksandr Bezobchuk	16bbe8c862	consensus: p2p refactor (#5969 )	4 years ago

46 Commits (dbe2146d0a01e5986a4fd27e8b2c7461eacaa883)