You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1445 lines
41 KiB

lint: Enable Golint (#4212) * Fix many golint errors * Fix golint errors in the 'lite' package * Don't export Pool.store * Fix typo * Revert unwanted changes * Fix errors in counter package * Fix linter errors in kvstore package * Fix linter error in example package * Fix error in tests package * Fix linter errors in v2 package * Fix linter errors in consensus package * Fix linter errors in evidence package * Fix linter error in fail package * Fix linter errors in query package * Fix linter errors in core package * Fix linter errors in node package * Fix linter errors in mempool package * Fix linter error in conn package * Fix linter errors in pex package * Rename PEXReactor export to Reactor * Fix linter errors in trust package * Fix linter errors in upnp package * Fix linter errors in p2p package * Fix linter errors in proxy package * Fix linter errors in mock_test package * Fix linter error in client_test package * Fix linter errors in coretypes package * Fix linter errors in coregrpc package * Fix linter errors in rpcserver package * Fix linter errors in rpctypes package * Fix linter errors in rpctest package * Fix linter error in json2wal script * Fix linter error in wal2json script * Fix linter errors in kv package * Fix linter error in state package * Fix linter error in grpc_client * Fix linter errors in types package * Fix linter error in version package * Fix remaining errors * Address review comments * Fix broken tests * Reconcile package coregrpc * Fix golangci bot error * Fix new golint errors * Fix broken reference * Enable golint linter * minor changes to bring golint into line * fix failing test * fix pex reactor naming * address PR comments
5 years ago
10 years ago
new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases
7 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
add support for block pruning via ABCI Commit response (#4588) * Added BlockStore.DeleteBlock() * Added initial block pruner prototype * wip * Added BlockStore.PruneBlocks() * Added consensus setting for block pruning * Added BlockStore base * Error on replay if base does not have blocks * Handle missing blocks when sending VoteSetMaj23Message * Error message tweak * Properly update blockstore state * Error message fix again * blockchain: ignore peer missing blocks * Added FIXME * Added test for block replay with truncated history * Handle peer base in blockchain reactor * Improved replay error handling * Added tests for Store.PruneBlocks() * Fix non-RPC handling of truncated block history * Panic on missing block meta in needProofBlock() * Updated changelog * Handle truncated block history in RPC layer * Added info about earliest block in /status RPC * Reorder height and base in blockchain reactor messages * Updated changelog * Fix tests * Appease linter * Minor review fixes * Non-empty BlockStores should always have base > 0 * Update code to assume base > 0 invariant * Added blockstore tests for pruning to 0 * Make sure we don't prune below the current base * Added BlockStore.Size() * config: added retain_blocks recommendations * Update v1 blockchain reactor to handle blockstore base * Added state database pruning * Propagate errors on missing validator sets * Comment tweaks * Improved error message Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * use ABCI field ResponseCommit.retain_height instead of retain-blocks config option * remove State.RetainHeight, return value instead * fix minor issues * rename pruneHeights() to pruneBlocks() * noop to fix GitHub borkage Co-authored-by: Anton Kaliaev <anton.kalyaev@gmail.com>
5 years ago
abci: localClient improvements & bugfixes & pubsub Unsubscribe issues (#2748) * use READ lock/unlock in ConsensusState#GetLastHeight Refs #2721 * do not use defers when there's no need * fix peer formatting (output its address instead of the pointer) ``` [54310]: E[11-02|11:59:39.851] Connection failed @ sendRoutine module=p2p peer=0xb78f00 conn=MConn{74.207.236.148:26656} err="pong timeout" ``` https://github.com/tendermint/tendermint/issues/2721#issuecomment-435326581 * panic if peer has no state https://github.com/tendermint/tendermint/issues/2721#issuecomment-435347165 It's confusing that sometimes we check if peer has a state, but most of the times we expect it to be there 1. https://github.com/tendermint/tendermint/blob/add79700b5fe84417538202b6c927c8cc5383672/mempool/reactor.go#L138 2. https://github.com/tendermint/tendermint/blob/add79700b5fe84417538202b6c927c8cc5383672/rpc/core/consensus.go#L196 (edited) I will change everything to always assume peer has a state and panic otherwise that should help identify issues earlier * abci/localclient: extend lock on app callback App callback should be protected by lock as well (note this was already done for InitChainAsync, why not for others???). Otherwise, when we execute the block, tx might come in and call the callback in the same time we're updating it in execBlockOnProxyApp => DATA RACE Fixes #2721 Consensus state is locked ``` goroutine 113333 [semacquire, 309 minutes]: sync.runtime_SemacquireMutex(0xc00180009c, 0xc0000c7e00) /usr/local/go/src/runtime/sema.go:71 +0x3d sync.(*RWMutex).RLock(0xc001800090) /usr/local/go/src/sync/rwmutex.go:50 +0x4e github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).GetRoundState(0xc001800000, 0x0) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:218 +0x46 github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusReactor).queryMaj23Routine(0xc0017def80, 0x11104a0, 0xc0072488f0, 0xc007248 9c0) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/reactor.go:735 +0x16d created by github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusReactor).AddPeer /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/reactor.go:172 +0x236 ``` because localClient is locked ``` goroutine 1899 [semacquire, 309 minutes]: sync.runtime_SemacquireMutex(0xc00003363c, 0xc0000cb500) /usr/local/go/src/runtime/sema.go:71 +0x3d sync.(*Mutex).Lock(0xc000033638) /usr/local/go/src/sync/mutex.go:134 +0xff github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/abci/client.(*localClient).SetResponseCallback(0xc0001fb560, 0xc007868540) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/abci/client/local_client.go:32 +0x33 github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/proxy.(*appConnConsensus).SetResponseCallback(0xc00002f750, 0xc007868540) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/proxy/app_conn.go:57 +0x40 github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/state.execBlockOnProxyApp(0x1104e20, 0xc002ca0ba0, 0x11092a0, 0xc00002f750, 0xc0001fe960, 0xc000bfc660, 0x110cfe0, 0xc000090330, 0xc9d12, 0xc000d9d5a0, ...) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/state/execution.go:230 +0x1fd github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock(0xc002c2a230, 0x7, 0x0, 0xc000eae880, 0x6, 0xc002e52c60, 0x16, 0x1f927, 0xc9d12, 0xc000d9d5a0, ...) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/state/execution.go:96 +0x142 github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).finalizeCommit(0xc001800000, 0x1f928) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1339 +0xa3e github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).tryFinalizeCommit(0xc001800000, 0x1f928) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1270 +0x451 github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).enterCommit.func1(0xc001800000, 0x0, 0x1f928) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1218 +0x90 github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).enterCommit(0xc001800000, 0x1f928, 0x0) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1247 +0x6b8 github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).addVote(0xc001800000, 0xc003d8dea0, 0xc000cf4cc0, 0x28, 0xf1, 0xc003bc7ad0, 0xc003bc7b10) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1659 +0xbad github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).tryAddVote(0xc001800000, 0xc003d8dea0, 0xc000cf4cc0, 0x28, 0xf1, 0xf1, 0xf1) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1517 +0x59 github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).handleMsg(0xc001800000, 0xd98200, 0xc0070dbed0, 0xc000cf4cc0, 0x28) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:660 +0x64b github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine(0xc001800000, 0x0) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:617 +0x670 created by github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).OnStart /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:311 +0x132 ``` tx comes in and CheckTx is executed right when we execute the block ``` goroutine 111044 [semacquire, 309 minutes]: sync.runtime_SemacquireMutex(0xc00003363c, 0x0) /usr/local/go/src/runtime/sema.go:71 +0x3d sync.(*Mutex).Lock(0xc000033638) /usr/local/go/src/sync/mutex.go:134 +0xff github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/abci/client.(*localClient).CheckTxAsync(0xc0001fb0e0, 0xc002d94500, 0x13f, 0x280, 0x0) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/abci/client/local_client.go:85 +0x47 github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/proxy.(*appConnMempool).CheckTxAsync(0xc00002f720, 0xc002d94500, 0x13f, 0x280, 0x1) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/proxy/app_conn.go:114 +0x51 github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/mempool.(*Mempool).CheckTx(0xc002d3a320, 0xc002d94500, 0x13f, 0x280, 0xc0072355f0, 0x0, 0x0) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/mempool/mempool.go:316 +0x17b github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/core.BroadcastTxSync(0xc002d94500, 0x13f, 0x280, 0x0, 0x0, 0x0) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/core/mempool.go:93 +0xb8 reflect.Value.call(0xd85560, 0x10326c0, 0x13, 0xec7b8b, 0x4, 0xc00663f180, 0x1, 0x1, 0xc00663f180, 0xc00663f188, ...) /usr/local/go/src/reflect/value.go:447 +0x449 reflect.Value.Call(0xd85560, 0x10326c0, 0x13, 0xc00663f180, 0x1, 0x1, 0x0, 0x0, 0xc005cc9344) /usr/local/go/src/reflect/value.go:308 +0xa4 github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server.makeHTTPHandler.func2(0x1102060, 0xc00663f100, 0xc0082d7900) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server/handlers.go:269 +0x188 net/http.HandlerFunc.ServeHTTP(0xc002c81f20, 0x1102060, 0xc00663f100, 0xc0082d7900) /usr/local/go/src/net/http/server.go:1964 +0x44 net/http.(*ServeMux).ServeHTTP(0xc002c81b60, 0x1102060, 0xc00663f100, 0xc0082d7900) /usr/local/go/src/net/http/server.go:2361 +0x127 github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server.maxBytesHandler.ServeHTTP(0x10f8a40, 0xc002c81b60, 0xf4240, 0x1102060, 0xc00663f100, 0xc0082d7900) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server/http_server.go:219 +0xcf github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server.RecoverAndLogHandler.func1(0x1103220, 0xc00121e620, 0xc0082d7900) /root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server/http_server.go:192 +0x394 net/http.HandlerFunc.ServeHTTP(0xc002c06ea0, 0x1103220, 0xc00121e620, 0xc0082d7900) /usr/local/go/src/net/http/server.go:1964 +0x44 net/http.serverHandler.ServeHTTP(0xc001a1aa90, 0x1103220, 0xc00121e620, 0xc0082d7900) /usr/local/go/src/net/http/server.go:2741 +0xab net/http.(*conn).serve(0xc00785a3c0, 0x11041a0, 0xc000f844c0) /usr/local/go/src/net/http/server.go:1847 +0x646 created by net/http.(*Server).Serve /usr/local/go/src/net/http/server.go:2851 +0x2f5 ``` * consensus: use read lock in Receive#VoteMessage * use defer to unlock mutex because application might panic * use defer in every method of the localClient * add a changelog entry * drain channels before Unsubscribe(All) Read https://github.com/tendermint/tendermint/blob/55362ed76630f3e1ebec159a598f6a9fb5892cb1/libs/pubsub/pubsub.go#L13 for the detailed explanation of the issue. We'll need to fix it someday. Make sure to keep an eye on https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-033-pubsub.md * retry instead of panic when peer has no state in reactors other than consensus in /dump_consensus_state RPC endpoint, skip a peer with no state * rpc/core/mempool: simplify error messages * rpc/core/mempool: use time.After instead of timer also, do not log DeliverTx result (to be consistent with other memthods) * unlock before calling the callback in reqRes#SetCallback
6 years ago
7 years ago
fix data race Closes #1442 ``` WARNING: DATA RACE Write at 0x00c4209de7c8 by goroutine 23: github.com/tendermint/tendermint/types.(*Block).fillHeader() /home/vagrant/go/src/github.com/tendermint/tendermint/types/block.go:88 +0x157 github.com/tendermint/tendermint/types.(*Block).Hash() /home/vagrant/go/src/github.com/tendermint/tendermint/types/block.go:104 +0x121 github.com/tendermint/tendermint/types.(*Block).HashesTo() /home/vagrant/go/src/github.com/tendermint/tendermint/types/block.go:135 +0x4f github.com/tendermint/tendermint/consensus.(*ConsensusState).enterPrecommit() /home/vagrant/go/src/github.com/tendermint/tendermint/consensus/state.go:1037 +0x182d github.com/tendermint/tendermint/consensus.(*ConsensusState).addVote() /home/vagrant/go/src/github.com/tendermint/tendermint/consensus/state.go:1425 +0x1a6c github.com/tendermint/tendermint/consensus.(*ConsensusState).tryAddVote() /home/vagrant/go/src/github.com/tendermint/tendermint/consensus/state.go:1318 +0x77 github.com/tendermint/tendermint/consensus.(*ConsensusState).handleMsg() /home/vagrant/go/src/github.com/tendermint/tendermint/consensus/state.go:581 +0x7a9 github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine() /home/vagrant/go/src/github.com/tendermint/tendermint/consensus/state.go:539 +0x6c3 Previous read at 0x00c4209de7c8 by goroutine 47: github.com/tendermint/tendermint/vendor/github.com/tendermint/tmlibs/common.(*HexBytes).MarshalJSON() <autogenerated>:1 +0x52 github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino.invokeMarshalJSON() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino/json-encode.go:433 +0x88 github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino.(*Codec)._encodeReflectJSON() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino/json-encode.go:82 +0x8d2 github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino.(*Codec).encodeReflectJSON() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino/json-encode.go:50 +0x10e github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino.(*Codec).encodeReflectJSONStruct() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino/json-encode.go:348 +0x539 github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino.(*Codec)._encodeReflectJSON() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino/json-encode.go:119 +0x83f github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino.(*Codec).encodeReflectJSON() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino/json-encode.go:50 +0x10e github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino.(*Codec).encodeReflectJSONStruct() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino/json-encode.go:348 +0x539 github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino.(*Codec)._encodeReflectJSON() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino/json-encode.go:119 +0x83f github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino.(*Codec).encodeReflectJSON() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino/json-encode.go:50 +0x10e github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino.(*Codec).encodeReflectJSONStruct() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino/json-encode.go:348 +0x539 github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino.(*Codec)._encodeReflectJSON() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino/json-encode.go:119 +0x83f github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino.(*Codec).encodeReflectJSON() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino/json-encode.go:50 +0x10e github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino.(*Codec).encodeReflectJSONStruct() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino/json-encode.go:348 +0x539 github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino.(*Codec)._encodeReflectJSON() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino/json-encode.go:119 +0x83f github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino.(*Codec).encodeReflectJSON() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino/json-encode.go:50 +0x10e github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino.(*Codec).MarshalJSON() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/go-amino/amino.go:296 +0x182 github.com/tendermint/tendermint/rpc/lib/types.NewRPCSuccessResponse() /home/vagrant/go/src/github.com/tendermint/tendermint/rpc/lib/types/types.go:100 +0x12c github.com/tendermint/tendermint/rpc/lib/server.makeJSONRPCHandler.func1() /home/vagrant/go/src/github.com/tendermint/tendermint/rpc/lib/server/handlers.go:152 +0xab7 net/http.HandlerFunc.ServeHTTP() /usr/lib/go-1.9/src/net/http/server.go:1918 +0x51 net/http.(*ServeMux).ServeHTTP() /usr/lib/go-1.9/src/net/http/server.go:2254 +0xa2 github.com/tendermint/tendermint/rpc/lib/server.RecoverAndLogHandler.func1() /home/vagrant/go/src/github.com/tendermint/tendermint/rpc/lib/server/http_server.go:138 +0x4fa net/http.HandlerFunc.ServeHTTP() /usr/lib/go-1.9/src/net/http/server.go:1918 +0x51 net/http.serverHandler.ServeHTTP() /usr/lib/go-1.9/src/net/http/server.go:2619 +0xbc net/http.(*conn).serve() /usr/lib/go-1.9/src/net/http/server.go:1801 +0x83b Goroutine 23 (running) created at: github.com/tendermint/tendermint/consensus.(*ConsensusState).OnStart() /home/vagrant/go/src/github.com/tendermint/tendermint/consensus/state.go:250 +0x35b github.com/tendermint/tendermint/vendor/github.com/tendermint/tmlibs/common.(*BaseService).Start() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/tmlibs/common/service.go:130 +0x5fc github.com/tendermint/tendermint/consensus.(*ConsensusReactor).OnStart() /home/vagrant/go/src/github.com/tendermint/tendermint/consensus/reactor.go:69 +0x1b4 github.com/tendermint/tendermint/vendor/github.com/tendermint/tmlibs/common.(*BaseService).Start() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/tmlibs/common/service.go:130 +0x5fc github.com/tendermint/tendermint/consensus.(*ConsensusReactor).Start() <autogenerated>:1 +0x43 github.com/tendermint/tendermint/p2p.(*Switch).OnStart() /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch.go:177 +0x124 github.com/tendermint/tendermint/vendor/github.com/tendermint/tmlibs/common.(*BaseService).Start() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/tmlibs/common/service.go:130 +0x5fc github.com/tendermint/tendermint/node.(*Node).OnStart() /home/vagrant/go/src/github.com/tendermint/tendermint/node/node.go:416 +0xa1b github.com/tendermint/tendermint/vendor/github.com/tendermint/tmlibs/common.(*BaseService).Start() /home/vagrant/go/src/github.com/tendermint/tendermint/vendor/github.com/tendermint/tmlibs/common/service.go:130 +0x5fc github.com/tendermint/tendermint/rpc/test.StartTendermint() /home/vagrant/go/src/github.com/tendermint/tendermint/rpc/test/helpers.go:100 +0x5b github.com/tendermint/tendermint/rpc/client_test.TestMain() /home/vagrant/go/src/github.com/tendermint/tendermint/rpc/client/main_test.go:17 +0x4c main.main() github.com/tendermint/tendermint/rpc/client/_test/_testmain.go:76 +0x1cd Goroutine 47 (running) created at: net/http.(*Server).Serve() /usr/lib/go-1.9/src/net/http/server.go:2720 +0x37c net/http.Serve() /usr/lib/go-1.9/src/net/http/server.go:2323 +0xe2 github.com/tendermint/tendermint/rpc/lib/server.StartHTTPServer.func1() /home/vagrant/go/src/github.com/tendermint/tendermint/rpc/lib/server/http_server.go:35 +0xb3 ```
7 years ago
10 years ago
protect Record* peerStateStats functions by mutex Fixes #1414 DATA RACE: ``` Read at 0x00c4214ee940 by goroutine 146: github.com/tendermint/tendermint/consensus.(*peerStateStats).String() <autogenerated>:1 +0x57 fmt.(*pp).handleMethods() /usr/local/go/src/fmt/print.go:596 +0x3f4 fmt.(*pp).printArg() /usr/local/go/src/fmt/print.go:679 +0x11f fmt.(*pp).doPrintf() /usr/local/go/src/fmt/print.go:996 +0x319 fmt.Sprintf() /usr/local/go/src/fmt/print.go:196 +0x73 github.com/tendermint/tendermint/consensus.(*PeerState).StringIndented() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1426 +0x573 github.com/tendermint/tendermint/consensus.(*PeerState).String() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1419 +0x66 github.com/go-logfmt/logfmt.safeString() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:299 +0x9d github.com/go-logfmt/logfmt.writeValue() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:217 +0x5a0 github.com/go-logfmt/logfmt.(*Encoder).EncodeKeyval() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:61 +0x1dd github.com/tendermint/tmlibs/log.tmfmtLogger.Log() /home/ubuntu/go/src/github.com/tendermint/tmlibs/log/tmfmt_logger.go:107 +0x1001 github.com/tendermint/tmlibs/log.(*tmfmtLogger).Log() <autogenerated>:1 +0x93 github.com/go-kit/kit/log.(*context).Log() /home/ubuntu/go/src/github.com/go-kit/kit/log/log.go:124 +0x248 github.com/tendermint/tmlibs/log.(*tmLogger).Debug() /home/ubuntu/go/src/github.com/tendermint/tmlibs/log/tm_logger.go:64 +0x1d0 github.com/tendermint/tendermint/consensus.(*PeerState).PickSendVote() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1059 +0x242 github.com/tendermint/tendermint/consensus.(*ConsensusReactor).gossipVotesForHeight() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:789 +0x6ef github.com/tendermint/tendermint/consensus.(*ConsensusReactor).gossipVotesRoutine() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:723 +0x1039 Previous write at 0x00c4214ee940 by goroutine 21: github.com/tendermint/tendermint/consensus.(*PeerState).RecordVote() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1242 +0x15a github.com/tendermint/tendermint/consensus.(*ConsensusReactor).Receive() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:309 +0x32e6 github.com/tendermint/tendermint/p2p.createMConnection.func1() /home/ubuntu/go/src/github.com/tendermint/tendermint/p2p/peer.go:365 +0xea github.com/tendermint/tendermint/p2p/conn.(*MConnection).recvRoutine() /home/ubuntu/go/src/github.com/tendermint/tendermint/p2p/conn/connection.go:531 +0x779 ```
7 years ago
protect Record* peerStateStats functions by mutex Fixes #1414 DATA RACE: ``` Read at 0x00c4214ee940 by goroutine 146: github.com/tendermint/tendermint/consensus.(*peerStateStats).String() <autogenerated>:1 +0x57 fmt.(*pp).handleMethods() /usr/local/go/src/fmt/print.go:596 +0x3f4 fmt.(*pp).printArg() /usr/local/go/src/fmt/print.go:679 +0x11f fmt.(*pp).doPrintf() /usr/local/go/src/fmt/print.go:996 +0x319 fmt.Sprintf() /usr/local/go/src/fmt/print.go:196 +0x73 github.com/tendermint/tendermint/consensus.(*PeerState).StringIndented() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1426 +0x573 github.com/tendermint/tendermint/consensus.(*PeerState).String() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1419 +0x66 github.com/go-logfmt/logfmt.safeString() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:299 +0x9d github.com/go-logfmt/logfmt.writeValue() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:217 +0x5a0 github.com/go-logfmt/logfmt.(*Encoder).EncodeKeyval() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:61 +0x1dd github.com/tendermint/tmlibs/log.tmfmtLogger.Log() /home/ubuntu/go/src/github.com/tendermint/tmlibs/log/tmfmt_logger.go:107 +0x1001 github.com/tendermint/tmlibs/log.(*tmfmtLogger).Log() <autogenerated>:1 +0x93 github.com/go-kit/kit/log.(*context).Log() /home/ubuntu/go/src/github.com/go-kit/kit/log/log.go:124 +0x248 github.com/tendermint/tmlibs/log.(*tmLogger).Debug() /home/ubuntu/go/src/github.com/tendermint/tmlibs/log/tm_logger.go:64 +0x1d0 github.com/tendermint/tendermint/consensus.(*PeerState).PickSendVote() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1059 +0x242 github.com/tendermint/tendermint/consensus.(*ConsensusReactor).gossipVotesForHeight() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:789 +0x6ef github.com/tendermint/tendermint/consensus.(*ConsensusReactor).gossipVotesRoutine() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:723 +0x1039 Previous write at 0x00c4214ee940 by goroutine 21: github.com/tendermint/tendermint/consensus.(*PeerState).RecordVote() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1242 +0x15a github.com/tendermint/tendermint/consensus.(*ConsensusReactor).Receive() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:309 +0x32e6 github.com/tendermint/tendermint/p2p.createMConnection.func1() /home/ubuntu/go/src/github.com/tendermint/tendermint/p2p/peer.go:365 +0xea github.com/tendermint/tendermint/p2p/conn.(*MConnection).recvRoutine() /home/ubuntu/go/src/github.com/tendermint/tendermint/p2p/conn/connection.go:531 +0x779 ```
7 years ago
10 years ago
10 years ago
10 years ago
10 years ago
  1. package consensus
  2. import (
  3. "fmt"
  4. "runtime/debug"
  5. "time"
  6. cstypes "github.com/tendermint/tendermint/internal/consensus/types"
  7. tmsync "github.com/tendermint/tendermint/internal/libs/sync"
  8. "github.com/tendermint/tendermint/internal/p2p"
  9. sm "github.com/tendermint/tendermint/internal/state"
  10. "github.com/tendermint/tendermint/libs/bits"
  11. tmevents "github.com/tendermint/tendermint/libs/events"
  12. "github.com/tendermint/tendermint/libs/log"
  13. "github.com/tendermint/tendermint/libs/service"
  14. tmcons "github.com/tendermint/tendermint/proto/tendermint/consensus"
  15. tmproto "github.com/tendermint/tendermint/proto/tendermint/types"
  16. "github.com/tendermint/tendermint/types"
  17. )
  18. var (
  19. _ service.Service = (*Reactor)(nil)
  20. _ p2p.Wrapper = (*tmcons.Message)(nil)
  21. // ChannelShims contains a map of ChannelDescriptorShim objects, where each
  22. // object wraps a reference to a legacy p2p ChannelDescriptor and the corresponding
  23. // p2p proto.Message the new p2p Channel is responsible for handling.
  24. //
  25. //
  26. // TODO: Remove once p2p refactor is complete.
  27. // ref: https://github.com/tendermint/tendermint/issues/5670
  28. ChannelShims = map[p2p.ChannelID]*p2p.ChannelDescriptorShim{
  29. StateChannel: {
  30. MsgType: new(tmcons.Message),
  31. Descriptor: &p2p.ChannelDescriptor{
  32. ID: byte(StateChannel),
  33. Priority: 8,
  34. SendQueueCapacity: 64,
  35. RecvMessageCapacity: maxMsgSize,
  36. RecvBufferCapacity: 128,
  37. MaxSendBytes: 12000,
  38. },
  39. },
  40. DataChannel: {
  41. MsgType: new(tmcons.Message),
  42. Descriptor: &p2p.ChannelDescriptor{
  43. // TODO: Consider a split between gossiping current block and catchup
  44. // stuff. Once we gossip the whole block there is nothing left to send
  45. // until next height or round.
  46. ID: byte(DataChannel),
  47. Priority: 12,
  48. SendQueueCapacity: 64,
  49. RecvBufferCapacity: 512,
  50. RecvMessageCapacity: maxMsgSize,
  51. MaxSendBytes: 40000,
  52. },
  53. },
  54. VoteChannel: {
  55. MsgType: new(tmcons.Message),
  56. Descriptor: &p2p.ChannelDescriptor{
  57. ID: byte(VoteChannel),
  58. Priority: 10,
  59. SendQueueCapacity: 64,
  60. RecvBufferCapacity: 128,
  61. RecvMessageCapacity: maxMsgSize,
  62. MaxSendBytes: 150,
  63. },
  64. },
  65. VoteSetBitsChannel: {
  66. MsgType: new(tmcons.Message),
  67. Descriptor: &p2p.ChannelDescriptor{
  68. ID: byte(VoteSetBitsChannel),
  69. Priority: 5,
  70. SendQueueCapacity: 8,
  71. RecvBufferCapacity: 128,
  72. RecvMessageCapacity: maxMsgSize,
  73. MaxSendBytes: 50,
  74. },
  75. },
  76. }
  77. )
  78. const (
  79. StateChannel = p2p.ChannelID(0x20)
  80. DataChannel = p2p.ChannelID(0x21)
  81. VoteChannel = p2p.ChannelID(0x22)
  82. VoteSetBitsChannel = p2p.ChannelID(0x23)
  83. maxMsgSize = 1048576 // 1MB; NOTE: keep in sync with types.PartSet sizes.
  84. blocksToContributeToBecomeGoodPeer = 10000
  85. votesToContributeToBecomeGoodPeer = 10000
  86. listenerIDConsensus = "consensus-reactor"
  87. )
  88. type ReactorOption func(*Reactor)
  89. // NOTE: Temporary interface for switching to block sync, we should get rid of v0.
  90. // See: https://github.com/tendermint/tendermint/issues/4595
  91. type BlockSyncReactor interface {
  92. SwitchToBlockSync(sm.State) error
  93. GetMaxPeerBlockHeight() int64
  94. // GetTotalSyncedTime returns the time duration since the blocksync starting.
  95. GetTotalSyncedTime() time.Duration
  96. // GetRemainingSyncTime returns the estimating time the node will be fully synced,
  97. // if will return 0 if the blocksync does not perform or the number of block synced is
  98. // too small (less than 100).
  99. GetRemainingSyncTime() time.Duration
  100. }
  101. //go:generate ../../scripts/mockery_generate.sh ConsSyncReactor
  102. // ConsSyncReactor defines an interface used for testing abilities of node.startStateSync.
  103. type ConsSyncReactor interface {
  104. SwitchToConsensus(sm.State, bool)
  105. SetStateSyncingMetrics(float64)
  106. SetBlockSyncingMetrics(float64)
  107. }
  108. // Reactor defines a reactor for the consensus service.
  109. type Reactor struct {
  110. service.BaseService
  111. state *State
  112. eventBus *types.EventBus
  113. Metrics *Metrics
  114. mtx tmsync.RWMutex
  115. peers map[types.NodeID]*PeerState
  116. waitSync bool
  117. stateCh *p2p.Channel
  118. dataCh *p2p.Channel
  119. voteCh *p2p.Channel
  120. voteSetBitsCh *p2p.Channel
  121. peerUpdates *p2p.PeerUpdates
  122. // NOTE: We need a dedicated stateCloseCh channel for signaling closure of
  123. // the StateChannel due to the fact that the StateChannel message handler
  124. // performs a send on the VoteSetBitsChannel. This is an antipattern, so having
  125. // this dedicated channel,stateCloseCh, is necessary in order to avoid data races.
  126. stateCloseCh chan struct{}
  127. closeCh chan struct{}
  128. }
  129. // NewReactor returns a reference to a new consensus reactor, which implements
  130. // the service.Service interface. It accepts a logger, consensus state, references
  131. // to relevant p2p Channels and a channel to listen for peer updates on. The
  132. // reactor will close all p2p Channels when stopping.
  133. func NewReactor(
  134. logger log.Logger,
  135. cs *State,
  136. stateCh *p2p.Channel,
  137. dataCh *p2p.Channel,
  138. voteCh *p2p.Channel,
  139. voteSetBitsCh *p2p.Channel,
  140. peerUpdates *p2p.PeerUpdates,
  141. waitSync bool,
  142. options ...ReactorOption,
  143. ) *Reactor {
  144. r := &Reactor{
  145. state: cs,
  146. waitSync: waitSync,
  147. peers: make(map[types.NodeID]*PeerState),
  148. Metrics: NopMetrics(),
  149. stateCh: stateCh,
  150. dataCh: dataCh,
  151. voteCh: voteCh,
  152. voteSetBitsCh: voteSetBitsCh,
  153. peerUpdates: peerUpdates,
  154. stateCloseCh: make(chan struct{}),
  155. closeCh: make(chan struct{}),
  156. }
  157. r.BaseService = *service.NewBaseService(logger, "Consensus", r)
  158. for _, opt := range options {
  159. opt(r)
  160. }
  161. return r
  162. }
  163. // OnStart starts separate go routines for each p2p Channel and listens for
  164. // envelopes on each. In addition, it also listens for peer updates and handles
  165. // messages on that p2p channel accordingly. The caller must be sure to execute
  166. // OnStop to ensure the outbound p2p Channels are closed.
  167. func (r *Reactor) OnStart() error {
  168. r.Logger.Debug("consensus wait sync", "wait_sync", r.WaitSync())
  169. // start routine that computes peer statistics for evaluating peer quality
  170. //
  171. // TODO: Evaluate if we need this to be synchronized via WaitGroup as to not
  172. // leak the goroutine when stopping the reactor.
  173. go r.peerStatsRoutine()
  174. r.subscribeToBroadcastEvents()
  175. if !r.WaitSync() {
  176. if err := r.state.Start(); err != nil {
  177. return err
  178. }
  179. }
  180. go r.processStateCh()
  181. go r.processDataCh()
  182. go r.processVoteCh()
  183. go r.processVoteSetBitsCh()
  184. go r.processPeerUpdates()
  185. return nil
  186. }
  187. // OnStop stops the reactor by signaling to all spawned goroutines to exit and
  188. // blocking until they all exit, as well as unsubscribing from events and stopping
  189. // state.
  190. func (r *Reactor) OnStop() {
  191. r.unsubscribeFromBroadcastEvents()
  192. if err := r.state.Stop(); err != nil {
  193. r.Logger.Error("failed to stop consensus state", "err", err)
  194. }
  195. if !r.WaitSync() {
  196. r.state.Wait()
  197. }
  198. r.mtx.Lock()
  199. // Close and wait for each of the peers to shutdown.
  200. // This is safe to perform with the lock since none of the peers require the
  201. // lock to complete any of the methods that the waitgroup is waiting on.
  202. for _, state := range r.peers {
  203. state.closer.Close()
  204. state.broadcastWG.Wait()
  205. }
  206. r.mtx.Unlock()
  207. // Close the StateChannel goroutine separately since it uses its own channel
  208. // to signal closure.
  209. close(r.stateCloseCh)
  210. <-r.stateCh.Done()
  211. // Close closeCh to signal to all spawned goroutines to gracefully exit. All
  212. // p2p Channels should execute Close().
  213. close(r.closeCh)
  214. // Wait for all p2p Channels to be closed before returning. This ensures we
  215. // can easily reason about synchronization of all p2p Channels and ensure no
  216. // panics will occur.
  217. <-r.voteSetBitsCh.Done()
  218. <-r.dataCh.Done()
  219. <-r.voteCh.Done()
  220. <-r.peerUpdates.Done()
  221. }
  222. // SetEventBus sets the reactor's event bus.
  223. func (r *Reactor) SetEventBus(b *types.EventBus) {
  224. r.eventBus = b
  225. r.state.SetEventBus(b)
  226. }
  227. // WaitSync returns whether the consensus reactor is waiting for state/block sync.
  228. func (r *Reactor) WaitSync() bool {
  229. r.mtx.RLock()
  230. defer r.mtx.RUnlock()
  231. return r.waitSync
  232. }
  233. // ReactorMetrics sets the reactor's metrics as an option function.
  234. func ReactorMetrics(metrics *Metrics) ReactorOption {
  235. return func(r *Reactor) { r.Metrics = metrics }
  236. }
  237. // SwitchToConsensus switches from block-sync mode to consensus mode. It resets
  238. // the state, turns off block-sync, and starts the consensus state-machine.
  239. func (r *Reactor) SwitchToConsensus(state sm.State, skipWAL bool) {
  240. r.Logger.Info("switching to consensus")
  241. // we have no votes, so reconstruct LastCommit from SeenCommit
  242. if state.LastBlockHeight > 0 {
  243. r.state.reconstructLastCommit(state)
  244. }
  245. // NOTE: The line below causes broadcastNewRoundStepRoutine() to broadcast a
  246. // NewRoundStepMessage.
  247. r.state.updateToState(state)
  248. r.mtx.Lock()
  249. r.waitSync = false
  250. r.mtx.Unlock()
  251. r.Metrics.BlockSyncing.Set(0)
  252. r.Metrics.StateSyncing.Set(0)
  253. if skipWAL {
  254. r.state.doWALCatchup = false
  255. }
  256. if err := r.state.Start(); err != nil {
  257. panic(fmt.Sprintf(`failed to start consensus state: %v
  258. conS:
  259. %+v
  260. conR:
  261. %+v`, err, r.state, r))
  262. }
  263. d := types.EventDataBlockSyncStatus{Complete: true, Height: state.LastBlockHeight}
  264. if err := r.eventBus.PublishEventBlockSyncStatus(d); err != nil {
  265. r.Logger.Error("failed to emit the blocksync complete event", "err", err)
  266. }
  267. }
  268. // String returns a string representation of the Reactor.
  269. //
  270. // NOTE: For now, it is just a hard-coded string to avoid accessing unprotected
  271. // shared variables.
  272. //
  273. // TODO: improve!
  274. func (r *Reactor) String() string {
  275. return "ConsensusReactor"
  276. }
  277. // StringIndented returns an indented string representation of the Reactor.
  278. func (r *Reactor) StringIndented(indent string) string {
  279. r.mtx.RLock()
  280. defer r.mtx.RUnlock()
  281. s := "ConsensusReactor{\n"
  282. s += indent + " " + r.state.StringIndented(indent+" ") + "\n"
  283. for _, ps := range r.peers {
  284. s += indent + " " + ps.StringIndented(indent+" ") + "\n"
  285. }
  286. s += indent + "}"
  287. return s
  288. }
  289. // GetPeerState returns PeerState for a given NodeID.
  290. func (r *Reactor) GetPeerState(peerID types.NodeID) (*PeerState, bool) {
  291. r.mtx.RLock()
  292. defer r.mtx.RUnlock()
  293. ps, ok := r.peers[peerID]
  294. return ps, ok
  295. }
  296. func (r *Reactor) broadcastNewRoundStepMessage(rs *cstypes.RoundState) {
  297. r.stateCh.Out <- p2p.Envelope{
  298. Broadcast: true,
  299. Message: makeRoundStepMessage(rs),
  300. }
  301. }
  302. func (r *Reactor) broadcastNewValidBlockMessage(rs *cstypes.RoundState) {
  303. psHeader := rs.ProposalBlockParts.Header()
  304. r.stateCh.Out <- p2p.Envelope{
  305. Broadcast: true,
  306. Message: &tmcons.NewValidBlock{
  307. Height: rs.Height,
  308. Round: rs.Round,
  309. BlockPartSetHeader: psHeader.ToProto(),
  310. BlockParts: rs.ProposalBlockParts.BitArray().ToProto(),
  311. IsCommit: rs.Step == cstypes.RoundStepCommit,
  312. },
  313. }
  314. }
  315. func (r *Reactor) broadcastHasVoteMessage(vote *types.Vote) {
  316. r.stateCh.Out <- p2p.Envelope{
  317. Broadcast: true,
  318. Message: &tmcons.HasVote{
  319. Height: vote.Height,
  320. Round: vote.Round,
  321. Type: vote.Type,
  322. Index: vote.ValidatorIndex,
  323. },
  324. }
  325. }
  326. // subscribeToBroadcastEvents subscribes for new round steps and votes using the
  327. // internal pubsub defined in the consensus state to broadcast them to peers
  328. // upon receiving.
  329. func (r *Reactor) subscribeToBroadcastEvents() {
  330. err := r.state.evsw.AddListenerForEvent(
  331. listenerIDConsensus,
  332. types.EventNewRoundStepValue,
  333. func(data tmevents.EventData) {
  334. r.broadcastNewRoundStepMessage(data.(*cstypes.RoundState))
  335. select {
  336. case r.state.onStopCh <- data.(*cstypes.RoundState):
  337. default:
  338. }
  339. },
  340. )
  341. if err != nil {
  342. r.Logger.Error("failed to add listener for events", "err", err)
  343. }
  344. err = r.state.evsw.AddListenerForEvent(
  345. listenerIDConsensus,
  346. types.EventValidBlockValue,
  347. func(data tmevents.EventData) {
  348. r.broadcastNewValidBlockMessage(data.(*cstypes.RoundState))
  349. },
  350. )
  351. if err != nil {
  352. r.Logger.Error("failed to add listener for events", "err", err)
  353. }
  354. err = r.state.evsw.AddListenerForEvent(
  355. listenerIDConsensus,
  356. types.EventVoteValue,
  357. func(data tmevents.EventData) {
  358. r.broadcastHasVoteMessage(data.(*types.Vote))
  359. },
  360. )
  361. if err != nil {
  362. r.Logger.Error("failed to add listener for events", "err", err)
  363. }
  364. }
  365. func (r *Reactor) unsubscribeFromBroadcastEvents() {
  366. r.state.evsw.RemoveListener(listenerIDConsensus)
  367. }
  368. func makeRoundStepMessage(rs *cstypes.RoundState) *tmcons.NewRoundStep {
  369. return &tmcons.NewRoundStep{
  370. Height: rs.Height,
  371. Round: rs.Round,
  372. Step: uint32(rs.Step),
  373. SecondsSinceStartTime: int64(time.Since(rs.StartTime).Seconds()),
  374. LastCommitRound: rs.LastCommit.GetRound(),
  375. }
  376. }
  377. func (r *Reactor) sendNewRoundStepMessage(peerID types.NodeID) {
  378. rs := r.state.GetRoundState()
  379. msg := makeRoundStepMessage(rs)
  380. r.stateCh.Out <- p2p.Envelope{
  381. To: peerID,
  382. Message: msg,
  383. }
  384. }
  385. func (r *Reactor) gossipDataForCatchup(rs *cstypes.RoundState, prs *cstypes.PeerRoundState, ps *PeerState) {
  386. logger := r.Logger.With("height", prs.Height).With("peer", ps.peerID)
  387. if index, ok := prs.ProposalBlockParts.Not().PickRandom(); ok {
  388. // ensure that the peer's PartSetHeader is correct
  389. blockMeta := r.state.blockStore.LoadBlockMeta(prs.Height)
  390. if blockMeta == nil {
  391. logger.Error(
  392. "failed to load block meta",
  393. "our_height", rs.Height,
  394. "blockstore_base", r.state.blockStore.Base(),
  395. "blockstore_height", r.state.blockStore.Height(),
  396. )
  397. time.Sleep(r.state.config.PeerGossipSleepDuration)
  398. return
  399. } else if !blockMeta.BlockID.PartSetHeader.Equals(prs.ProposalBlockPartSetHeader) {
  400. logger.Info(
  401. "peer ProposalBlockPartSetHeader mismatch; sleeping",
  402. "block_part_set_header", blockMeta.BlockID.PartSetHeader,
  403. "peer_block_part_set_header", prs.ProposalBlockPartSetHeader,
  404. )
  405. time.Sleep(r.state.config.PeerGossipSleepDuration)
  406. return
  407. }
  408. part := r.state.blockStore.LoadBlockPart(prs.Height, index)
  409. if part == nil {
  410. logger.Error(
  411. "failed to load block part",
  412. "index", index,
  413. "block_part_set_header", blockMeta.BlockID.PartSetHeader,
  414. "peer_block_part_set_header", prs.ProposalBlockPartSetHeader,
  415. )
  416. time.Sleep(r.state.config.PeerGossipSleepDuration)
  417. return
  418. }
  419. partProto, err := part.ToProto()
  420. if err != nil {
  421. logger.Error("failed to convert block part to proto", "err", err)
  422. time.Sleep(r.state.config.PeerGossipSleepDuration)
  423. return
  424. }
  425. logger.Debug("sending block part for catchup", "round", prs.Round, "index", index)
  426. r.dataCh.Out <- p2p.Envelope{
  427. To: ps.peerID,
  428. Message: &tmcons.BlockPart{
  429. Height: prs.Height, // not our height, so it does not matter.
  430. Round: prs.Round, // not our height, so it does not matter
  431. Part: *partProto,
  432. },
  433. }
  434. return
  435. }
  436. time.Sleep(r.state.config.PeerGossipSleepDuration)
  437. }
  438. func (r *Reactor) gossipDataRoutine(ps *PeerState) {
  439. logger := r.Logger.With("peer", ps.peerID)
  440. defer ps.broadcastWG.Done()
  441. OUTER_LOOP:
  442. for {
  443. if !r.IsRunning() {
  444. return
  445. }
  446. select {
  447. case <-ps.closer.Done():
  448. // The peer is marked for removal via a PeerUpdate as the doneCh was
  449. // explicitly closed to signal we should exit.
  450. return
  451. default:
  452. }
  453. rs := r.state.GetRoundState()
  454. prs := ps.GetRoundState()
  455. // Send proposal Block parts?
  456. if rs.ProposalBlockParts.HasHeader(prs.ProposalBlockPartSetHeader) {
  457. if index, ok := rs.ProposalBlockParts.BitArray().Sub(prs.ProposalBlockParts.Copy()).PickRandom(); ok {
  458. part := rs.ProposalBlockParts.GetPart(index)
  459. partProto, err := part.ToProto()
  460. if err != nil {
  461. logger.Error("failed to convert block part to proto", "err", err)
  462. return
  463. }
  464. logger.Debug("sending block part", "height", prs.Height, "round", prs.Round)
  465. r.dataCh.Out <- p2p.Envelope{
  466. To: ps.peerID,
  467. Message: &tmcons.BlockPart{
  468. Height: rs.Height, // this tells peer that this part applies to us
  469. Round: rs.Round, // this tells peer that this part applies to us
  470. Part: *partProto,
  471. },
  472. }
  473. ps.SetHasProposalBlockPart(prs.Height, prs.Round, index)
  474. continue OUTER_LOOP
  475. }
  476. }
  477. // if the peer is on a previous height that we have, help catch up
  478. blockStoreBase := r.state.blockStore.Base()
  479. if blockStoreBase > 0 && 0 < prs.Height && prs.Height < rs.Height && prs.Height >= blockStoreBase {
  480. heightLogger := logger.With("height", prs.Height)
  481. // If we never received the commit message from the peer, the block parts
  482. // will not be initialized.
  483. if prs.ProposalBlockParts == nil {
  484. blockMeta := r.state.blockStore.LoadBlockMeta(prs.Height)
  485. if blockMeta == nil {
  486. heightLogger.Error(
  487. "failed to load block meta",
  488. "blockstoreBase", blockStoreBase,
  489. "blockstoreHeight", r.state.blockStore.Height(),
  490. )
  491. time.Sleep(r.state.config.PeerGossipSleepDuration)
  492. } else {
  493. ps.InitProposalBlockParts(blockMeta.BlockID.PartSetHeader)
  494. }
  495. // Continue the loop since prs is a copy and not effected by this
  496. // initialization.
  497. continue OUTER_LOOP
  498. }
  499. r.gossipDataForCatchup(rs, prs, ps)
  500. continue OUTER_LOOP
  501. }
  502. // if height and round don't match, sleep
  503. if (rs.Height != prs.Height) || (rs.Round != prs.Round) {
  504. time.Sleep(r.state.config.PeerGossipSleepDuration)
  505. continue OUTER_LOOP
  506. }
  507. // By here, height and round match.
  508. // Proposal block parts were already matched and sent if any were wanted.
  509. // (These can match on hash so the round doesn't matter)
  510. // Now consider sending other things, like the Proposal itself.
  511. // Send Proposal && ProposalPOL BitArray?
  512. if rs.Proposal != nil && !prs.Proposal {
  513. // Proposal: share the proposal metadata with peer.
  514. {
  515. propProto := rs.Proposal.ToProto()
  516. logger.Debug("sending proposal", "height", prs.Height, "round", prs.Round)
  517. r.dataCh.Out <- p2p.Envelope{
  518. To: ps.peerID,
  519. Message: &tmcons.Proposal{
  520. Proposal: *propProto,
  521. },
  522. }
  523. // NOTE: A peer might have received a different proposal message, so
  524. // this Proposal msg will be rejected!
  525. ps.SetHasProposal(rs.Proposal)
  526. }
  527. // ProposalPOL: lets peer know which POL votes we have so far. The peer
  528. // must receive ProposalMessage first. Note, rs.Proposal was validated,
  529. // so rs.Proposal.POLRound <= rs.Round, so we definitely have
  530. // rs.Votes.Prevotes(rs.Proposal.POLRound).
  531. if 0 <= rs.Proposal.POLRound {
  532. pPol := rs.Votes.Prevotes(rs.Proposal.POLRound).BitArray()
  533. pPolProto := pPol.ToProto()
  534. logger.Debug("sending POL", "height", prs.Height, "round", prs.Round)
  535. r.dataCh.Out <- p2p.Envelope{
  536. To: ps.peerID,
  537. Message: &tmcons.ProposalPOL{
  538. Height: rs.Height,
  539. ProposalPolRound: rs.Proposal.POLRound,
  540. ProposalPol: *pPolProto,
  541. },
  542. }
  543. }
  544. continue OUTER_LOOP
  545. }
  546. // nothing to do -- sleep
  547. time.Sleep(r.state.config.PeerGossipSleepDuration)
  548. continue OUTER_LOOP
  549. }
  550. }
  551. // pickSendVote picks a vote and sends it to the peer. It will return true if
  552. // there is a vote to send and false otherwise.
  553. func (r *Reactor) pickSendVote(ps *PeerState, votes types.VoteSetReader) bool {
  554. if vote, ok := ps.PickVoteToSend(votes); ok {
  555. r.Logger.Debug("sending vote message", "ps", ps, "vote", vote)
  556. r.voteCh.Out <- p2p.Envelope{
  557. To: ps.peerID,
  558. Message: &tmcons.Vote{
  559. Vote: vote.ToProto(),
  560. },
  561. }
  562. ps.SetHasVote(vote)
  563. return true
  564. }
  565. return false
  566. }
  567. func (r *Reactor) gossipVotesForHeight(rs *cstypes.RoundState, prs *cstypes.PeerRoundState, ps *PeerState) bool {
  568. logger := r.Logger.With("height", prs.Height).With("peer", ps.peerID)
  569. // if there are lastCommits to send...
  570. if prs.Step == cstypes.RoundStepNewHeight {
  571. if r.pickSendVote(ps, rs.LastCommit) {
  572. logger.Debug("picked rs.LastCommit to send")
  573. return true
  574. }
  575. }
  576. // if there are POL prevotes to send...
  577. if prs.Step <= cstypes.RoundStepPropose && prs.Round != -1 && prs.Round <= rs.Round && prs.ProposalPOLRound != -1 {
  578. if polPrevotes := rs.Votes.Prevotes(prs.ProposalPOLRound); polPrevotes != nil {
  579. if r.pickSendVote(ps, polPrevotes) {
  580. logger.Debug("picked rs.Prevotes(prs.ProposalPOLRound) to send", "round", prs.ProposalPOLRound)
  581. return true
  582. }
  583. }
  584. }
  585. // if there are prevotes to send...
  586. if prs.Step <= cstypes.RoundStepPrevoteWait && prs.Round != -1 && prs.Round <= rs.Round {
  587. if r.pickSendVote(ps, rs.Votes.Prevotes(prs.Round)) {
  588. logger.Debug("picked rs.Prevotes(prs.Round) to send", "round", prs.Round)
  589. return true
  590. }
  591. }
  592. // if there are precommits to send...
  593. if prs.Step <= cstypes.RoundStepPrecommitWait && prs.Round != -1 && prs.Round <= rs.Round {
  594. if r.pickSendVote(ps, rs.Votes.Precommits(prs.Round)) {
  595. logger.Debug("picked rs.Precommits(prs.Round) to send", "round", prs.Round)
  596. return true
  597. }
  598. }
  599. // if there are prevotes to send...(which are needed because of validBlock mechanism)
  600. if prs.Round != -1 && prs.Round <= rs.Round {
  601. if r.pickSendVote(ps, rs.Votes.Prevotes(prs.Round)) {
  602. logger.Debug("picked rs.Prevotes(prs.Round) to send", "round", prs.Round)
  603. return true
  604. }
  605. }
  606. // if there are POLPrevotes to send...
  607. if prs.ProposalPOLRound != -1 {
  608. if polPrevotes := rs.Votes.Prevotes(prs.ProposalPOLRound); polPrevotes != nil {
  609. if r.pickSendVote(ps, polPrevotes) {
  610. logger.Debug("picked rs.Prevotes(prs.ProposalPOLRound) to send", "round", prs.ProposalPOLRound)
  611. return true
  612. }
  613. }
  614. }
  615. return false
  616. }
  617. func (r *Reactor) gossipVotesRoutine(ps *PeerState) {
  618. logger := r.Logger.With("peer", ps.peerID)
  619. defer ps.broadcastWG.Done()
  620. // XXX: simple hack to throttle logs upon sleep
  621. logThrottle := 0
  622. OUTER_LOOP:
  623. for {
  624. if !r.IsRunning() {
  625. return
  626. }
  627. select {
  628. case <-ps.closer.Done():
  629. // The peer is marked for removal via a PeerUpdate as the doneCh was
  630. // explicitly closed to signal we should exit.
  631. return
  632. default:
  633. }
  634. rs := r.state.GetRoundState()
  635. prs := ps.GetRoundState()
  636. switch logThrottle {
  637. case 1: // first sleep
  638. logThrottle = 2
  639. case 2: // no more sleep
  640. logThrottle = 0
  641. }
  642. // if height matches, then send LastCommit, Prevotes, and Precommits
  643. if rs.Height == prs.Height {
  644. if r.gossipVotesForHeight(rs, prs, ps) {
  645. continue OUTER_LOOP
  646. }
  647. }
  648. // special catchup logic -- if peer is lagging by height 1, send LastCommit
  649. if prs.Height != 0 && rs.Height == prs.Height+1 {
  650. if r.pickSendVote(ps, rs.LastCommit) {
  651. logger.Debug("picked rs.LastCommit to send", "height", prs.Height)
  652. continue OUTER_LOOP
  653. }
  654. }
  655. // catchup logic -- if peer is lagging by more than 1, send Commit
  656. blockStoreBase := r.state.blockStore.Base()
  657. if blockStoreBase > 0 && prs.Height != 0 && rs.Height >= prs.Height+2 && prs.Height >= blockStoreBase {
  658. // Load the block commit for prs.Height, which contains precommit
  659. // signatures for prs.Height.
  660. if commit := r.state.blockStore.LoadBlockCommit(prs.Height); commit != nil {
  661. if r.pickSendVote(ps, commit) {
  662. logger.Debug("picked Catchup commit to send", "height", prs.Height)
  663. continue OUTER_LOOP
  664. }
  665. }
  666. }
  667. if logThrottle == 0 {
  668. // we sent nothing -- sleep
  669. logThrottle = 1
  670. logger.Debug(
  671. "no votes to send; sleeping",
  672. "rs.Height", rs.Height,
  673. "prs.Height", prs.Height,
  674. "localPV", rs.Votes.Prevotes(rs.Round).BitArray(), "peerPV", prs.Prevotes,
  675. "localPC", rs.Votes.Precommits(rs.Round).BitArray(), "peerPC", prs.Precommits,
  676. )
  677. } else if logThrottle == 2 {
  678. logThrottle = 1
  679. }
  680. time.Sleep(r.state.config.PeerGossipSleepDuration)
  681. continue OUTER_LOOP
  682. }
  683. }
  684. // NOTE: `queryMaj23Routine` has a simple crude design since it only comes
  685. // into play for liveness when there's a signature DDoS attack happening.
  686. func (r *Reactor) queryMaj23Routine(ps *PeerState) {
  687. defer ps.broadcastWG.Done()
  688. OUTER_LOOP:
  689. for {
  690. if !r.IsRunning() {
  691. return
  692. }
  693. select {
  694. case <-ps.closer.Done():
  695. // The peer is marked for removal via a PeerUpdate as the doneCh was
  696. // explicitly closed to signal we should exit.
  697. return
  698. default:
  699. }
  700. // maybe send Height/Round/Prevotes
  701. {
  702. rs := r.state.GetRoundState()
  703. prs := ps.GetRoundState()
  704. if rs.Height == prs.Height {
  705. if maj23, ok := rs.Votes.Prevotes(prs.Round).TwoThirdsMajority(); ok {
  706. r.stateCh.Out <- p2p.Envelope{
  707. To: ps.peerID,
  708. Message: &tmcons.VoteSetMaj23{
  709. Height: prs.Height,
  710. Round: prs.Round,
  711. Type: tmproto.PrevoteType,
  712. BlockID: maj23.ToProto(),
  713. },
  714. }
  715. time.Sleep(r.state.config.PeerQueryMaj23SleepDuration)
  716. }
  717. }
  718. }
  719. // maybe send Height/Round/Precommits
  720. {
  721. rs := r.state.GetRoundState()
  722. prs := ps.GetRoundState()
  723. if rs.Height == prs.Height {
  724. if maj23, ok := rs.Votes.Precommits(prs.Round).TwoThirdsMajority(); ok {
  725. r.stateCh.Out <- p2p.Envelope{
  726. To: ps.peerID,
  727. Message: &tmcons.VoteSetMaj23{
  728. Height: prs.Height,
  729. Round: prs.Round,
  730. Type: tmproto.PrecommitType,
  731. BlockID: maj23.ToProto(),
  732. },
  733. }
  734. time.Sleep(r.state.config.PeerQueryMaj23SleepDuration)
  735. }
  736. }
  737. }
  738. // maybe send Height/Round/ProposalPOL
  739. {
  740. rs := r.state.GetRoundState()
  741. prs := ps.GetRoundState()
  742. if rs.Height == prs.Height && prs.ProposalPOLRound >= 0 {
  743. if maj23, ok := rs.Votes.Prevotes(prs.ProposalPOLRound).TwoThirdsMajority(); ok {
  744. r.stateCh.Out <- p2p.Envelope{
  745. To: ps.peerID,
  746. Message: &tmcons.VoteSetMaj23{
  747. Height: prs.Height,
  748. Round: prs.ProposalPOLRound,
  749. Type: tmproto.PrevoteType,
  750. BlockID: maj23.ToProto(),
  751. },
  752. }
  753. time.Sleep(r.state.config.PeerQueryMaj23SleepDuration)
  754. }
  755. }
  756. }
  757. // Little point sending LastCommitRound/LastCommit, these are fleeting and
  758. // non-blocking.
  759. // maybe send Height/CatchupCommitRound/CatchupCommit
  760. {
  761. prs := ps.GetRoundState()
  762. if prs.CatchupCommitRound != -1 && prs.Height > 0 && prs.Height <= r.state.blockStore.Height() &&
  763. prs.Height >= r.state.blockStore.Base() {
  764. if commit := r.state.LoadCommit(prs.Height); commit != nil {
  765. r.stateCh.Out <- p2p.Envelope{
  766. To: ps.peerID,
  767. Message: &tmcons.VoteSetMaj23{
  768. Height: prs.Height,
  769. Round: commit.Round,
  770. Type: tmproto.PrecommitType,
  771. BlockID: commit.BlockID.ToProto(),
  772. },
  773. }
  774. time.Sleep(r.state.config.PeerQueryMaj23SleepDuration)
  775. }
  776. }
  777. }
  778. time.Sleep(r.state.config.PeerQueryMaj23SleepDuration)
  779. continue OUTER_LOOP
  780. }
  781. }
  782. // processPeerUpdate process a peer update message. For new or reconnected peers,
  783. // we create a peer state if one does not exist for the peer, which should always
  784. // be the case, and we spawn all the relevant goroutine to broadcast messages to
  785. // the peer. During peer removal, we remove the peer for our set of peers and
  786. // signal to all spawned goroutines to gracefully exit in a non-blocking manner.
  787. func (r *Reactor) processPeerUpdate(peerUpdate p2p.PeerUpdate) {
  788. r.Logger.Debug("received peer update", "peer", peerUpdate.NodeID, "status", peerUpdate.Status)
  789. r.mtx.Lock()
  790. defer r.mtx.Unlock()
  791. switch peerUpdate.Status {
  792. case p2p.PeerStatusUp:
  793. // Do not allow starting new broadcasting goroutines after reactor shutdown
  794. // has been initiated. This can happen after we've manually closed all
  795. // peer goroutines and closed r.closeCh, but the router still sends in-flight
  796. // peer updates.
  797. if !r.IsRunning() {
  798. return
  799. }
  800. var (
  801. ps *PeerState
  802. ok bool
  803. )
  804. ps, ok = r.peers[peerUpdate.NodeID]
  805. if !ok {
  806. ps = NewPeerState(r.Logger, peerUpdate.NodeID)
  807. r.peers[peerUpdate.NodeID] = ps
  808. }
  809. if !ps.IsRunning() {
  810. // Set the peer state's closer to signal to all spawned goroutines to exit
  811. // when the peer is removed. We also set the running state to ensure we
  812. // do not spawn multiple instances of the same goroutines and finally we
  813. // set the waitgroup counter so we know when all goroutines have exited.
  814. ps.broadcastWG.Add(3)
  815. ps.SetRunning(true)
  816. // start goroutines for this peer
  817. go r.gossipDataRoutine(ps)
  818. go r.gossipVotesRoutine(ps)
  819. go r.queryMaj23Routine(ps)
  820. // Send our state to the peer. If we're block-syncing, broadcast a
  821. // RoundStepMessage later upon SwitchToConsensus().
  822. if !r.waitSync {
  823. go r.sendNewRoundStepMessage(ps.peerID)
  824. }
  825. }
  826. case p2p.PeerStatusDown:
  827. ps, ok := r.peers[peerUpdate.NodeID]
  828. if ok && ps.IsRunning() {
  829. // signal to all spawned goroutines for the peer to gracefully exit
  830. ps.closer.Close()
  831. go func() {
  832. // Wait for all spawned broadcast goroutines to exit before marking the
  833. // peer state as no longer running and removal from the peers map.
  834. ps.broadcastWG.Wait()
  835. r.mtx.Lock()
  836. delete(r.peers, peerUpdate.NodeID)
  837. r.mtx.Unlock()
  838. ps.SetRunning(false)
  839. }()
  840. }
  841. }
  842. }
  843. // handleStateMessage handles envelopes sent from peers on the StateChannel.
  844. // An error is returned if the message is unrecognized or if validation fails.
  845. // If we fail to find the peer state for the envelope sender, we perform a no-op
  846. // and return. This can happen when we process the envelope after the peer is
  847. // removed.
  848. func (r *Reactor) handleStateMessage(envelope p2p.Envelope, msgI Message) error {
  849. ps, ok := r.GetPeerState(envelope.From)
  850. if !ok || ps == nil {
  851. r.Logger.Debug("failed to find peer state", "peer", envelope.From, "ch_id", "StateChannel")
  852. return nil
  853. }
  854. switch msg := envelope.Message.(type) {
  855. case *tmcons.NewRoundStep:
  856. r.state.mtx.RLock()
  857. initialHeight := r.state.state.InitialHeight
  858. r.state.mtx.RUnlock()
  859. if err := msgI.(*NewRoundStepMessage).ValidateHeight(initialHeight); err != nil {
  860. r.Logger.Error("peer sent us an invalid msg", "msg", msg, "err", err)
  861. return err
  862. }
  863. ps.ApplyNewRoundStepMessage(msgI.(*NewRoundStepMessage))
  864. case *tmcons.NewValidBlock:
  865. ps.ApplyNewValidBlockMessage(msgI.(*NewValidBlockMessage))
  866. case *tmcons.HasVote:
  867. ps.ApplyHasVoteMessage(msgI.(*HasVoteMessage))
  868. case *tmcons.VoteSetMaj23:
  869. r.state.mtx.RLock()
  870. height, votes := r.state.Height, r.state.Votes
  871. r.state.mtx.RUnlock()
  872. if height != msg.Height {
  873. return nil
  874. }
  875. vsmMsg := msgI.(*VoteSetMaj23Message)
  876. // peer claims to have a maj23 for some BlockID at <H,R,S>
  877. err := votes.SetPeerMaj23(msg.Round, msg.Type, ps.peerID, vsmMsg.BlockID)
  878. if err != nil {
  879. return err
  880. }
  881. // Respond with a VoteSetBitsMessage showing which votes we have and
  882. // consequently shows which we don't have.
  883. var ourVotes *bits.BitArray
  884. switch vsmMsg.Type {
  885. case tmproto.PrevoteType:
  886. ourVotes = votes.Prevotes(msg.Round).BitArrayByBlockID(vsmMsg.BlockID)
  887. case tmproto.PrecommitType:
  888. ourVotes = votes.Precommits(msg.Round).BitArrayByBlockID(vsmMsg.BlockID)
  889. default:
  890. panic("bad VoteSetBitsMessage field type; forgot to add a check in ValidateBasic?")
  891. }
  892. eMsg := &tmcons.VoteSetBits{
  893. Height: msg.Height,
  894. Round: msg.Round,
  895. Type: msg.Type,
  896. BlockID: msg.BlockID,
  897. }
  898. if votesProto := ourVotes.ToProto(); votesProto != nil {
  899. eMsg.Votes = *votesProto
  900. }
  901. r.voteSetBitsCh.Out <- p2p.Envelope{
  902. To: envelope.From,
  903. Message: eMsg,
  904. }
  905. default:
  906. return fmt.Errorf("received unknown message on StateChannel: %T", msg)
  907. }
  908. return nil
  909. }
  910. // handleDataMessage handles envelopes sent from peers on the DataChannel. If we
  911. // fail to find the peer state for the envelope sender, we perform a no-op and
  912. // return. This can happen when we process the envelope after the peer is
  913. // removed.
  914. func (r *Reactor) handleDataMessage(envelope p2p.Envelope, msgI Message) error {
  915. logger := r.Logger.With("peer", envelope.From, "ch_id", "DataChannel")
  916. ps, ok := r.GetPeerState(envelope.From)
  917. if !ok || ps == nil {
  918. r.Logger.Debug("failed to find peer state")
  919. return nil
  920. }
  921. if r.WaitSync() {
  922. logger.Info("ignoring message received during sync", "msg", fmt.Sprintf("%T", msgI))
  923. return nil
  924. }
  925. switch msg := envelope.Message.(type) {
  926. case *tmcons.Proposal:
  927. pMsg := msgI.(*ProposalMessage)
  928. ps.SetHasProposal(pMsg.Proposal)
  929. r.state.peerMsgQueue <- msgInfo{pMsg, envelope.From}
  930. case *tmcons.ProposalPOL:
  931. ps.ApplyProposalPOLMessage(msgI.(*ProposalPOLMessage))
  932. case *tmcons.BlockPart:
  933. bpMsg := msgI.(*BlockPartMessage)
  934. ps.SetHasProposalBlockPart(bpMsg.Height, bpMsg.Round, int(bpMsg.Part.Index))
  935. r.Metrics.BlockParts.With("peer_id", string(envelope.From)).Add(1)
  936. r.state.peerMsgQueue <- msgInfo{bpMsg, envelope.From}
  937. default:
  938. return fmt.Errorf("received unknown message on DataChannel: %T", msg)
  939. }
  940. return nil
  941. }
  942. // handleVoteMessage handles envelopes sent from peers on the VoteChannel. If we
  943. // fail to find the peer state for the envelope sender, we perform a no-op and
  944. // return. This can happen when we process the envelope after the peer is
  945. // removed.
  946. func (r *Reactor) handleVoteMessage(envelope p2p.Envelope, msgI Message) error {
  947. logger := r.Logger.With("peer", envelope.From, "ch_id", "VoteChannel")
  948. ps, ok := r.GetPeerState(envelope.From)
  949. if !ok || ps == nil {
  950. r.Logger.Debug("failed to find peer state")
  951. return nil
  952. }
  953. if r.WaitSync() {
  954. logger.Info("ignoring message received during sync", "msg", msgI)
  955. return nil
  956. }
  957. switch msg := envelope.Message.(type) {
  958. case *tmcons.Vote:
  959. r.state.mtx.RLock()
  960. height, valSize, lastCommitSize := r.state.Height, r.state.Validators.Size(), r.state.LastCommit.Size()
  961. r.state.mtx.RUnlock()
  962. vMsg := msgI.(*VoteMessage)
  963. ps.EnsureVoteBitArrays(height, valSize)
  964. ps.EnsureVoteBitArrays(height-1, lastCommitSize)
  965. ps.SetHasVote(vMsg.Vote)
  966. r.state.peerMsgQueue <- msgInfo{vMsg, envelope.From}
  967. default:
  968. return fmt.Errorf("received unknown message on VoteChannel: %T", msg)
  969. }
  970. return nil
  971. }
  972. // handleVoteSetBitsMessage handles envelopes sent from peers on the
  973. // VoteSetBitsChannel. If we fail to find the peer state for the envelope sender,
  974. // we perform a no-op and return. This can happen when we process the envelope
  975. // after the peer is removed.
  976. func (r *Reactor) handleVoteSetBitsMessage(envelope p2p.Envelope, msgI Message) error {
  977. logger := r.Logger.With("peer", envelope.From, "ch_id", "VoteSetBitsChannel")
  978. ps, ok := r.GetPeerState(envelope.From)
  979. if !ok || ps == nil {
  980. r.Logger.Debug("failed to find peer state")
  981. return nil
  982. }
  983. if r.WaitSync() {
  984. logger.Info("ignoring message received during sync", "msg", msgI)
  985. return nil
  986. }
  987. switch msg := envelope.Message.(type) {
  988. case *tmcons.VoteSetBits:
  989. r.state.mtx.RLock()
  990. height, votes := r.state.Height, r.state.Votes
  991. r.state.mtx.RUnlock()
  992. vsbMsg := msgI.(*VoteSetBitsMessage)
  993. if height == msg.Height {
  994. var ourVotes *bits.BitArray
  995. switch msg.Type {
  996. case tmproto.PrevoteType:
  997. ourVotes = votes.Prevotes(msg.Round).BitArrayByBlockID(vsbMsg.BlockID)
  998. case tmproto.PrecommitType:
  999. ourVotes = votes.Precommits(msg.Round).BitArrayByBlockID(vsbMsg.BlockID)
  1000. default:
  1001. panic("bad VoteSetBitsMessage field type; forgot to add a check in ValidateBasic?")
  1002. }
  1003. ps.ApplyVoteSetBitsMessage(vsbMsg, ourVotes)
  1004. } else {
  1005. ps.ApplyVoteSetBitsMessage(vsbMsg, nil)
  1006. }
  1007. default:
  1008. return fmt.Errorf("received unknown message on VoteSetBitsChannel: %T", msg)
  1009. }
  1010. return nil
  1011. }
  1012. // handleMessage handles an Envelope sent from a peer on a specific p2p Channel.
  1013. // It will handle errors and any possible panics gracefully. A caller can handle
  1014. // any error returned by sending a PeerError on the respective channel.
  1015. //
  1016. // NOTE: We process these messages even when we're block syncing. Messages affect
  1017. // either a peer state or the consensus state. Peer state updates can happen in
  1018. // parallel, but processing of proposals, block parts, and votes are ordered by
  1019. // the p2p channel.
  1020. //
  1021. // NOTE: We block on consensus state for proposals, block parts, and votes.
  1022. func (r *Reactor) handleMessage(chID p2p.ChannelID, envelope p2p.Envelope) (err error) {
  1023. defer func() {
  1024. if e := recover(); e != nil {
  1025. err = fmt.Errorf("panic in processing message: %v", e)
  1026. r.Logger.Error(
  1027. "recovering from processing message panic",
  1028. "err", err,
  1029. "stack", string(debug.Stack()),
  1030. )
  1031. }
  1032. }()
  1033. // We wrap the envelope's message in a Proto wire type so we can convert back
  1034. // the domain type that individual channel message handlers can work with. We
  1035. // do this here once to avoid having to do it for each individual message type.
  1036. // and because a large part of the core business logic depends on these
  1037. // domain types opposed to simply working with the Proto types.
  1038. protoMsg := new(tmcons.Message)
  1039. if err := protoMsg.Wrap(envelope.Message); err != nil {
  1040. return err
  1041. }
  1042. msgI, err := MsgFromProto(protoMsg)
  1043. if err != nil {
  1044. return err
  1045. }
  1046. r.Logger.Debug("received message", "ch_id", chID, "message", msgI, "peer", envelope.From)
  1047. switch chID {
  1048. case StateChannel:
  1049. err = r.handleStateMessage(envelope, msgI)
  1050. case DataChannel:
  1051. err = r.handleDataMessage(envelope, msgI)
  1052. case VoteChannel:
  1053. err = r.handleVoteMessage(envelope, msgI)
  1054. case VoteSetBitsChannel:
  1055. err = r.handleVoteSetBitsMessage(envelope, msgI)
  1056. default:
  1057. err = fmt.Errorf("unknown channel ID (%d) for envelope (%v)", chID, envelope)
  1058. }
  1059. return err
  1060. }
  1061. // processStateCh initiates a blocking process where we listen for and handle
  1062. // envelopes on the StateChannel. Any error encountered during message
  1063. // execution will result in a PeerError being sent on the StateChannel. When
  1064. // the reactor is stopped, we will catch the signal and close the p2p Channel
  1065. // gracefully.
  1066. func (r *Reactor) processStateCh() {
  1067. defer r.stateCh.Close()
  1068. for {
  1069. select {
  1070. case envelope := <-r.stateCh.In:
  1071. if err := r.handleMessage(r.stateCh.ID, envelope); err != nil {
  1072. r.Logger.Error("failed to process message", "ch_id", r.stateCh.ID, "envelope", envelope, "err", err)
  1073. r.stateCh.Error <- p2p.PeerError{
  1074. NodeID: envelope.From,
  1075. Err: err,
  1076. }
  1077. }
  1078. case <-r.stateCloseCh:
  1079. r.Logger.Debug("stopped listening on StateChannel; closing...")
  1080. return
  1081. }
  1082. }
  1083. }
  1084. // processDataCh initiates a blocking process where we listen for and handle
  1085. // envelopes on the DataChannel. Any error encountered during message
  1086. // execution will result in a PeerError being sent on the DataChannel. When
  1087. // the reactor is stopped, we will catch the signal and close the p2p Channel
  1088. // gracefully.
  1089. func (r *Reactor) processDataCh() {
  1090. defer r.dataCh.Close()
  1091. for {
  1092. select {
  1093. case envelope := <-r.dataCh.In:
  1094. if err := r.handleMessage(r.dataCh.ID, envelope); err != nil {
  1095. r.Logger.Error("failed to process message", "ch_id", r.dataCh.ID, "envelope", envelope, "err", err)
  1096. r.dataCh.Error <- p2p.PeerError{
  1097. NodeID: envelope.From,
  1098. Err: err,
  1099. }
  1100. }
  1101. case <-r.closeCh:
  1102. r.Logger.Debug("stopped listening on DataChannel; closing...")
  1103. return
  1104. }
  1105. }
  1106. }
  1107. // processVoteCh initiates a blocking process where we listen for and handle
  1108. // envelopes on the VoteChannel. Any error encountered during message
  1109. // execution will result in a PeerError being sent on the VoteChannel. When
  1110. // the reactor is stopped, we will catch the signal and close the p2p Channel
  1111. // gracefully.
  1112. func (r *Reactor) processVoteCh() {
  1113. defer r.voteCh.Close()
  1114. for {
  1115. select {
  1116. case envelope := <-r.voteCh.In:
  1117. if err := r.handleMessage(r.voteCh.ID, envelope); err != nil {
  1118. r.Logger.Error("failed to process message", "ch_id", r.voteCh.ID, "envelope", envelope, "err", err)
  1119. r.voteCh.Error <- p2p.PeerError{
  1120. NodeID: envelope.From,
  1121. Err: err,
  1122. }
  1123. }
  1124. case <-r.closeCh:
  1125. r.Logger.Debug("stopped listening on VoteChannel; closing...")
  1126. return
  1127. }
  1128. }
  1129. }
  1130. // processVoteCh initiates a blocking process where we listen for and handle
  1131. // envelopes on the VoteSetBitsChannel. Any error encountered during message
  1132. // execution will result in a PeerError being sent on the VoteSetBitsChannel.
  1133. // When the reactor is stopped, we will catch the signal and close the p2p
  1134. // Channel gracefully.
  1135. func (r *Reactor) processVoteSetBitsCh() {
  1136. defer r.voteSetBitsCh.Close()
  1137. for {
  1138. select {
  1139. case envelope := <-r.voteSetBitsCh.In:
  1140. if err := r.handleMessage(r.voteSetBitsCh.ID, envelope); err != nil {
  1141. r.Logger.Error("failed to process message", "ch_id", r.voteSetBitsCh.ID, "envelope", envelope, "err", err)
  1142. r.voteSetBitsCh.Error <- p2p.PeerError{
  1143. NodeID: envelope.From,
  1144. Err: err,
  1145. }
  1146. }
  1147. case <-r.closeCh:
  1148. r.Logger.Debug("stopped listening on VoteSetBitsChannel; closing...")
  1149. return
  1150. }
  1151. }
  1152. }
  1153. // processPeerUpdates initiates a blocking process where we listen for and handle
  1154. // PeerUpdate messages. When the reactor is stopped, we will catch the signal and
  1155. // close the p2p PeerUpdatesCh gracefully.
  1156. func (r *Reactor) processPeerUpdates() {
  1157. defer r.peerUpdates.Close()
  1158. for {
  1159. select {
  1160. case peerUpdate := <-r.peerUpdates.Updates():
  1161. r.processPeerUpdate(peerUpdate)
  1162. case <-r.closeCh:
  1163. r.Logger.Debug("stopped listening on peer updates channel; closing...")
  1164. return
  1165. }
  1166. }
  1167. }
  1168. func (r *Reactor) peerStatsRoutine() {
  1169. for {
  1170. if !r.IsRunning() {
  1171. r.Logger.Info("stopping peerStatsRoutine")
  1172. return
  1173. }
  1174. select {
  1175. case msg := <-r.state.statsMsgQueue:
  1176. ps, ok := r.GetPeerState(msg.PeerID)
  1177. if !ok || ps == nil {
  1178. r.Logger.Debug("attempt to update stats for non-existent peer", "peer", msg.PeerID)
  1179. continue
  1180. }
  1181. switch msg.Msg.(type) {
  1182. case *VoteMessage:
  1183. if numVotes := ps.RecordVote(); numVotes%votesToContributeToBecomeGoodPeer == 0 {
  1184. r.peerUpdates.SendUpdate(p2p.PeerUpdate{
  1185. NodeID: msg.PeerID,
  1186. Status: p2p.PeerStatusGood,
  1187. })
  1188. }
  1189. case *BlockPartMessage:
  1190. if numParts := ps.RecordBlockPart(); numParts%blocksToContributeToBecomeGoodPeer == 0 {
  1191. r.peerUpdates.SendUpdate(p2p.PeerUpdate{
  1192. NodeID: msg.PeerID,
  1193. Status: p2p.PeerStatusGood,
  1194. })
  1195. }
  1196. }
  1197. case <-r.closeCh:
  1198. return
  1199. }
  1200. }
  1201. }
  1202. func (r *Reactor) GetConsensusState() *State {
  1203. return r.state
  1204. }
  1205. func (r *Reactor) SetStateSyncingMetrics(v float64) {
  1206. r.Metrics.StateSyncing.Set(v)
  1207. }
  1208. func (r *Reactor) SetBlockSyncingMetrics(v float64) {
  1209. r.Metrics.BlockSyncing.Set(v)
  1210. }