tendermint

Commit Graph

Author	SHA1	Message	Date
William Banfield	6a0d9c832a	blocksync: fix shutdown deadlock issue (#7030 ) When shutting down blocksync, it is observed that the process can hang completely. A dump of running goroutines reveals that this is due to goroutines not listening on the correct shutdown signal. Namely, the `poolRoutine` goroutine does not wait on `pool.Quit`. The `poolRoutine` does not receive any other shutdown signal during `OnStop` becuase it must stop before the `r.closeCh` is closed. Currently the `poolRoutine` listens in the `closeCh` which will not close until the `poolRoutine` stops and calls `poolWG.Done()`. This change also puts the `requestRoutine()` in the `OnStart` method to make it more visible since it does not rely on anything that is spawned in the `poolRoutine`. ``` goroutine 183 [semacquire]: sync.runtime_Semacquire(0xc0000d3bd8) runtime/sema.go:56 +0x45 sync.(WaitGroup).Wait(0xc0000d3bd0) sync/waitgroup.go:130 +0x65 github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).OnStop(0xc0000d3a00) github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:193 +0x47 github.com/tendermint/tendermint/libs/service.(BaseService).Stop(0xc0000d3a00, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/node.(nodeImpl).OnStop(0xc00052c000) github.com/tendermint/tendermint/node/node.go:758 +0xc62 github.com/tendermint/tendermint/libs/service.(BaseService).Stop(0xc00052c000, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1() github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62 github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000df6d20, 0x7f04a68da900, 0xc0004a8930, 0xc0005a72d8) github.com/tendermint/tendermint/libs/os/os.go:26 +0x102 created by github.com/tendermint/tendermint/libs/os.TrapSignal github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6 goroutine 161 [select]: github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).poolRoutine(0xc0000d3a00, 0x0) github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:464 +0x2b3 created by github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).OnStart github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:174 +0xf1 goroutine 162 [select]: github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).processBlockSyncCh(0xc0000d3a00) github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:310 +0x151 created by github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).OnStart github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:177 +0x54 goroutine 163 [select]: github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).processPeerUpdates(0xc0000d3a00) github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:363 +0x12b created by github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).OnStart github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:178 +0x76 ```	3 years ago
Sam Kleinman	bb8ffcb95b	store: move pacakge to internal (#6978 )	3 years ago
M. J. Fromberger	cf7537ea5f	cleanup: Reduce and normalize import path aliasing. (#6975 ) The code in the Tendermint repository makes heavy use of import aliasing. This is made necessary by our extensive reuse of common base package names, and by repetition of similar names across different subdirectories. Unfortunately we have not been very consistent about which packages we alias in various circumstances, and the aliases we use vary. In the spirit of the advice in the style guide and https://github.com/golang/go/wiki/CodeReviewComments#imports, his change makes an effort to clean up and normalize import aliasing. This change makes no API or behavioral changes. It is a pure cleanup intended o help make the code more readable to developers (including myself) trying to understand what is being imported where. Only unexported names have been modified, and the changes were generated and applied mechanically with gofmt -r and comby, respecting the lexical and syntactic rules of Go. Even so, I did not fix every inconsistency. Where the changes would be too disruptive, I left it alone. The principles I followed in this cleanup are: - Remove aliases that restate the package name. - Remove aliases where the base package name is unambiguous. - Move overly-terse abbreviations from the import to the usage site. - Fix lexical issues (remove underscores, remove capitalization). - Fix import groupings to more closely match the style guide. - Group blank (side-effecting) imports and ensure they are commented. - Add aliases to multiple imports with the same base package name.	3 years ago
Sam Kleinman	1c4950dbd2	state: move package to internal (#6964 )	3 years ago
Hlib Kanunnikov	d0e33b4292	blocksync: complete transition from Blockchain to BlockSync (#6847 )	3 years ago
Callum Waters	6ff4c3139c	blockchain: rename to blocksync service (#6755 )	3 years ago
Cuong Manh Le	37bc1d74df	internal/blockchain/v0: prevent all possible race for blockchainCh.Out (#6637 ) This commit extends the fix in #6518, so all other goroutine which run concurrently with processBlockchainCh can safely send data to blockchain out channel via a bridge channel. This helps eliminating all possible data race with sending and closing blockchainCh.Out channel at the same time. Fixes #6516	3 years ago
JayT106	d4cda544ae	fastsync/rpc: add TotalSyncedTime & RemainingTime to SyncInfo in /status RPC (#6620 )	3 years ago
Aleksandr Bezobchuk	1dec3e139a	add stacktrace to panic logs (#6662 )	3 years ago
Sam Kleinman	917180dfd2	p2p: reduce buffering on channels (#6609 ) Having smaller buffers in each reactor/channel will mean that there will be fewer stale messages.	3 years ago
JayT106	2cc872543b	rpc: add max peer block height into /status rpc call (#6610 ) use `maxPeerBlockHeight` information to show the current network's best height. Closes #3983 Relate to #3365 ref: the`highestBlock` in the response of `eth.isSyncing` call https://web3js.readthedocs.io/en/v1.3.4/web3-eth.html#issyncing	3 years ago
Sam Kleinman	ae5f98881b	p2p: make NodeID and NetAddress public (#6583 )	3 years ago
JayT106	2b0a3c151b	fastsync: update the metrics during fast-sync (#6590 ) Closes #3507	3 years ago
JayT106	a456b71f1f	state: move pruneBlocks from consensus/state to state/execution (#6541 ) state: move pruneBlocks function from consensus/state to state/execution Closes #5414	3 years ago
Cuong Manh Le	4e59575dc0	blockchain/v0: fix data race in blockchain channel (#6518 ) There is a possible data race/panic between processBlockchainCh and processPeerUpdates, since when we send to blockchainCh.Out in one goroutine and close the channel in the other. The race is seen in some Github Action runs. This commit fix the race, by adding a peerUpdatesCh as a bridge between processPeerUpdates and processBlockchainCh, so the former will send to this channel, the later will listen and forward the message to blockchainCh.Out channel. Updates #6516	3 years ago
Sam Kleinman	a855f96946	p2p: renames for reactors and routing layer internal moves (#6547 )	3 years ago
Aleksandr Bezobchuk	a554005136	p2p: revised router message scheduling (#6126 )	4 years ago
Erik Grinaker	9b6d6a3ad0	p2p: tighten up Router and add tests (#6044 ) This cleans up the `Router` code and adds a bunch of tests. These sorts of systems are a real pain to test, since they have a bunch of asynchronous goroutines living their own lives, so the test coverage is decent but not fantastic. Luckily we've been able to move all of the complex peer management and transport logic outside of the router, as synchronous components that are much easier to test, so the core router logic is fairly small and simple. This also provides some initial test tooling in `p2p/p2ptest` that automatically sets up in-memory networks and channels for use in integration tests. It also includes channel-oriented test asserters in `p2p/p2ptest/require.go`, but these have primarily been written for router testing and should probably be adapted or extended for reactor testing.	4 years ago
Erik Grinaker	2aad26e2f1	p2p: tighten up and test PeerManager (#6034 ) This tightens up the `PeerManager` and related code, adds a ton of tests, and fixes a bunch of inconsistencies and bugs.	4 years ago
Aleksandr Bezobchuk	b3aae970d8	blockchain v0: fix waitgroup data race (#5970 ) ## Description Fixes the data race in usage of `WaitGroup`. Specifically, the case where we invoke `Wait` _before_ the first delta `Add` call when the current waitgroup counter is zero. See https://golang.org/pkg/sync/#WaitGroup.Add. Still not sure how this manifests itself in a test since the reactor has to be stopped virtually immediately after being started (I think?). Regardless, this is the appropriate fix. closes: #5968	4 years ago
Aleksandr Bezobchuk	68bd2116f0	mempool: p2p refactor (#5919 )	4 years ago
Aleksandr Bezobchuk	62d7a5d028	blockchain v0: p2p refactor (#5858 )	4 years ago
Erik Grinaker	0555772d3a	blockchain/v0: stop tickers on poolRoutine exit (#5860 ) Fixes #5841.	4 years ago
Erik Grinaker	8e7d431f6f	p2p: rename ID to NodeID	4 years ago
Anton Kaliaev	aef1ac7ba5	modify Reactor priorities (#5826 ) blockchain/vX reactor priority was decreased because during the normal operation (i.e. when the node is not fast syncing) blockchain priority can't be the same as consensus reactor priority. Otherwise, it's theoretically possible to slow down consensus by constantly requesting blocks from the node. NOTE: ideally blockchain/vX reactor priority would be dynamic. e.g. when the node is fast syncing, the priority is 10 (max), but when it's done fast syncing - the priority gets decreased to 5 (only to serve blocks for other nodes). But it's not possible now, therefore I decided to focus on the normal operation (priority = 5). evidence and consensus critical messages are more important than the mempool ones, hence priorities are bumped by 1 (from 5 to 6). statesync reactor priority was changed from 1 to 5 to be the same as blockchain/vX priority. Refs https://github.com/tendermint/tendermint/issues/5816	4 years ago
Anton Kaliaev	89e908e340	blockchain/v0: relax termination conditions and increase sync timeout (#5741 ) Closes: #5718	4 years ago
Tess Rinearson	79890d8393	reactors: omit incoming message bytes from reactor logs (#5743 ) After a reactor has failed to parse an incoming message, it shouldn't output the "bad" data into the logs, as that data is unfiltered and could have anything in it. (We also don't think this information is helpful to have in the logs anyways.)	4 years ago
Anton Kaliaev	f2f6a78809	docs: warn developers about calling blocking funcs in Receive (#5679 ) Refs #2888	4 years ago
Marko	346aa14db5	fix lint failures with 1.31 (#5489 )	4 years ago
Callum Waters	4f79930c12	blockchain: remove duplication of validate basic (#5418 )	4 years ago
Marko	0ed8dba991	lint: enable errcheck (#5336 ) ## Description Enable errcheck linter throughout the codebase Closes: #5059	4 years ago
Marko	135ac0400e	blockchain: verify +2/3 (#5278 ) ## Description Verify only +2/3 of the commit. Closes: #5259	4 years ago
Erik Grinaker	edf5cff80f	blockchain: fix fast sync halt with initial height > 1 (#5249 ) Blockchain reactors were not updated to handle arbitrary initial height after #5191.	4 years ago
Anton Kaliaev	730e16566e	proto: change type + a cleanup (#5107 ) - drop Height & Base from StatusRequest It does not make sense nor it's used anywhere currently. Also, there seem to be no trace of these fields in the ADR-40 (blockchain reactor v2). - change PacketMsg#EOF type from int32 to bool	4 years ago
Lei Wang	430162f8a1	Update reactor.go (#5088 ) check bcR.fastSync flag when "OnStop" fix "service/service.go:161 Not stopping BlockPool -- have not been started yet {"impl": "BlockPool"}" error when kill process	4 years ago
Marko	dedf0d2350	proto: folder structure adhere to buf (#5025 )	4 years ago
Marko	51da4fe356	types: rename partsheader to partsetheader (#5029 ) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	4 years ago
Marko	a89f2581fc	blockchain: proto migration (#4969 ) ## Description migration of blockchain reactors to protobuf Closes: #XXX	4 years ago
Erik Grinaker	511ab6717c	add state sync reactor (#4705 ) Fixes #828. Adds state sync, as outlined in [ADR-053](https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-053-state-sync-prototype.md). See related PRs in Cosmos SDK (https://github.com/cosmos/cosmos-sdk/pull/5803) and Gaia (https://github.com/cosmos/gaia/pull/327). This is split out of the previous PR #4645, and branched off of the ABCI interface in #4704. * Adds a new P2P reactor which exchanges snapshots with peers, and bootstraps an empty local node from remote snapshots when requested. * Adds a new configuration section `[statesync]` that enables state sync and configures the light client. Also enables `statesync:info` logging by default. * Integrates state sync into node startup. Does not support the v2 blockchain reactor, since it needs some reorganization to defer startup.	5 years ago
Anton Kaliaev	8f463cf35c	p2p: set RecvMessageCapacity to maxMsgSize in all reactors to prevent malicious nodes from sending us large messages (~21MB, which is the default `RecvMessageCapacity`) This allows us to remove unnecessary `maxMsgSize` check in `decodeMsg`. Since each channel has a msg capacity set to `maxMsgSize`, there's no need to check it again in `decodeMsg`. Closes #1503	5 years ago
Erik Grinaker	4298bbcc4e	add support for block pruning via ABCI Commit response (#4588 ) * Added BlockStore.DeleteBlock() * Added initial block pruner prototype * wip * Added BlockStore.PruneBlocks() * Added consensus setting for block pruning * Added BlockStore base * Error on replay if base does not have blocks * Handle missing blocks when sending VoteSetMaj23Message * Error message tweak * Properly update blockstore state * Error message fix again * blockchain: ignore peer missing blocks * Added FIXME * Added test for block replay with truncated history * Handle peer base in blockchain reactor * Improved replay error handling * Added tests for Store.PruneBlocks() * Fix non-RPC handling of truncated block history * Panic on missing block meta in needProofBlock() * Updated changelog * Handle truncated block history in RPC layer * Added info about earliest block in /status RPC * Reorder height and base in blockchain reactor messages * Updated changelog * Fix tests * Appease linter * Minor review fixes * Non-empty BlockStores should always have base > 0 * Update code to assume base > 0 invariant * Added blockstore tests for pruning to 0 * Make sure we don't prune below the current base * Added BlockStore.Size() * config: added retain_blocks recommendations * Update v1 blockchain reactor to handle blockstore base * Added state database pruning * Propagate errors on missing validator sets * Comment tweaks * Improved error message Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * use ABCI field ResponseCommit.retain_height instead of retain-blocks config option * remove State.RetainHeight, return value instead * fix minor issues * rename pruneHeights() to pruneBlocks() * noop to fix GitHub borkage Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>	5 years ago
Erik Grinaker	68f37fff65	Use uint64 for consensus.Reactor.SwitchToConsensus() blocksSynced (#4433 )	5 years ago
Marko	27b00cf8d1	libs/common: refactor libs common 3 (#4232 ) * libs/common: refactor libs common 3 - move nil.go into types folder and make private - move service & baseservice out of common into service pkg ref #4147 Signed-off-by: Marko Baricevic <marbar3778@yahoo.com> * add changelog entry	5 years ago
Anton Kaliaev	3e1516b624	linters: enable stylecheck (#4153 ) Refs #3262	5 years ago
Marko	98cb8c9783	add staticcheck linting (#3828 ) cleanup to add linter grpc change: https://godoc.org/google.golang.org/grpc#WithContextDialer https://godoc.org/google.golang.org/grpc#WithDialer grpc/grpc-go#2627 prometheous change: due to UninstrumentedHandler, being deprecated in the future empty branch = empty if or else statement didn't delete them entirely but commented couldn't find a reason to have them could not replicate the issue #3406 but if want to keep it commented then we should comment out the if statement as well	5 years ago
Anca Zamfir	4d7cd8055b	blockchain: Reorg reactor (#3561 ) * go routines in blockchain reactor * Added reference to the go routine diagram * Initial commit * cleanup * Undo testing_logger change, committed by mistake * Fix the test loggers * pulled some fsm code into pool.go * added pool tests * changes to the design added block requests under peer moved the request trigger in the reactor poolRoutine, triggered now by a ticker in general moved everything required for making block requests smarter in the poolRoutine added a simple map of heights to keep track of what will need to be requested next added a few more tests * send errors to FSM in a different channel than blocks send errors (RemovePeer) from switch on a different channel than the one receiving blocks renamed channels added more pool tests * more pool tests * lint errors * more tests * more tests * switch fast sync to new implementation * fixed data race in tests * cleanup * finished fsm tests * address golangci comments :) * address golangci comments :) * Added timeout on next block needed to advance * updating docs and cleanup * fix issue in test from previous cleanup * cleanup * Added termination scenarios, tests and more cleanup * small fixes to adr, comments and cleanup * Fix bug in sendRequest() If we tried to send a request to a peer not present in the switch, a missing continue statement caused the request to be blackholed in a peer that was removed and never retried. While this bug was manifesting, the reactor kept asking for other blocks that would be stored and never consumed. Added the number of unconsumed blocks in the math for requesting blocks ahead of current processing height so eventually there will be no more blocks requested until the already received ones are consumed. * remove bpPeer's didTimeout field * Use distinct err codes for peer timeout and FSM timeouts * Don't allow peers to update with lower height * review comments from Ethan and Zarko * some cleanup, renaming, comments * Move block execution in separate goroutine * Remove pool's numPending * review comments * fix lint, remove old blockchain reactor and duplicates in fsm tests * small reorg around peer after review comments * add the reactor spec * verify block only once * review comments * change to int for max number of pending requests * cleanup and godoc * Add configuration flag fast sync version * golangci fixes * fix config template * move both reactor versions under blockchain * cleanup, golint, renaming stuff * updated documentation, fixed more golint warnings * integrate with behavior package * sync with master * gofmt * add changelog_pending entry * move to improvments * suggestion to changelog entry	5 years ago
zjubfd	439312b9c0	blockchain: dismiss request channel delay (#3459 ) Fixes #3457 The topic of the issue is that : write a BlockRequest int requestsCh channel will create an timer at the same time that stop the peer 15s later if no block have been received . But pop a BlockRequest from requestsCh and send it out may delay more than 15s later. So that the peer will be stopped for error("send nothing to us"). Extracting requestsCh into its own goroutine can make sure that every BlockRequest been handled timely. Instead of the requestsCh handling, we should probably pull the didProcessCh handling in a separate go routine since this is the one "starving" the other channel handlers. I believe the way it is right now, we still have issues with high delays in errorsCh handling that might cause sending requests to invalid/ disconnected peers.	6 years ago
Anca Zamfir	8a9eecce7f	test blockExec does not panic if all vals removed (#3241 ) Fix for #3224 Also address #2084	6 years ago
Anton Kaliaev	7fd51e6ade	make govet linter pass (#3292 ) * make govet linter pass Refs #3262 * close PipeReader and check for err	6 years ago
Zach	2182f6a702	update go version & other cleanup (#3018 ) * update go version & other cleanup * fix lints * go one.eleven.four * keep circle on 1.11.3 for now	6 years ago

6 Commits (4ca130d226189361651bd91d1738c623048a6f46)