tendermint

Commit Graph

Author	SHA1	Message	Date
Marko	3e2751d274	lint: Enable Golint (#4212 ) * Fix many golint errors * Fix golint errors in the 'lite' package * Don't export Pool.store * Fix typo * Revert unwanted changes * Fix errors in counter package * Fix linter errors in kvstore package * Fix linter error in example package * Fix error in tests package * Fix linter errors in v2 package * Fix linter errors in consensus package * Fix linter errors in evidence package * Fix linter error in fail package * Fix linter errors in query package * Fix linter errors in core package * Fix linter errors in node package * Fix linter errors in mempool package * Fix linter error in conn package * Fix linter errors in pex package * Rename PEXReactor export to Reactor * Fix linter errors in trust package * Fix linter errors in upnp package * Fix linter errors in p2p package * Fix linter errors in proxy package * Fix linter errors in mock_test package * Fix linter error in client_test package * Fix linter errors in coretypes package * Fix linter errors in coregrpc package * Fix linter errors in rpcserver package * Fix linter errors in rpctypes package * Fix linter errors in rpctest package * Fix linter error in json2wal script * Fix linter error in wal2json script * Fix linter errors in kv package * Fix linter error in state package * Fix linter error in grpc_client * Fix linter errors in types package * Fix linter error in version package * Fix remaining errors * Address review comments * Fix broken tests * Reconcile package coregrpc * Fix golangci bot error * Fix new golint errors * Fix broken reference * Enable golint linter * minor changes to bring golint into line * fix failing test * fix pex reactor naming * address PR comments	5 years ago
Anton Kaliaev	43aed989ac	cs: clarify where 24 comes from in maxMsgSizeBytes (wal.go)	5 years ago
Ethan Buchman	cbd5e031d6	Add more comments about the hard-coded limits (#4100 ) * crypto: expose MaxAunts for documentation purposes * types: update godoc for new maxes * docs: make hard-coded limits more explicit * wal: add todo to clarify max size * shorten lines in test	5 years ago
Marko	5206ce32a0	Update master (#4087 ) * cs: panic only when WAL#WriteSync fails - modify WAL#Write and WAL#WriteSync to return an error * fix test * types: validate Part#Proof add ValidateBasic to crypto/merkle/SimpleProof * cs: limit max bit array size and block parts count * cs: test new limits * cs: only assert important stuff * update changelog and bump version to 0.32.7 * fixes after Ethan's review * align max wal msg and max consensus msg sizes * fix tests * fix test * add change log for 31.11	5 years ago
Phil Salant	bc572217c0	Fix linter errors thrown by `lll` (#3970 ) * Fix long line errors in abci, crypto, and libs packages * Fix long lines in p2p and rpc packages * Fix long lines in abci, state, and tools packages * Fix long lines in behaviour and blockchain packages * Fix long lines in cmd and config packages * Begin fixing long lines in consensus package * Finish fixing long lines in consensus package * Add lll exclusion for lines containing URLs * Fix long lines in crypto package * Fix long lines in evidence package * Fix long lines in mempool and node packages * Fix long lines in libs package * Fix long lines in lite package * Fix new long line in node package * Fix long lines in p2p package * Ignore gocritic warning * Fix long lines in privval package * Fix long lines in rpc package * Fix long lines in scripts package * Fix long lines in state package * Fix long lines in tools package * Fix long lines in types package * Enable lll linter	5 years ago
Anton Kaliaev	ec9bff5234	rename WAL#Flush to WAL#FlushAndSync (#3345 ) * rename WAL#Flush to WAL#FlushAndSync - rename auto#Flush to auto#FlushAndSync - cleanup WAL interface to not leak implementation details! * remove Group() * add WALReader interface and return it in SearchForEndHeight() - add interface assertions Refs #3337 * replace WALReader with io.ReadCloser	6 years ago
Anton Kaliaev	ed1de13548	cs: update wal comments (#3334 ) * cs: update wal comments Follow-up to https://github.com/tendermint/tendermint/pull/3300 * Update consensus/wal.go Co-Authored-By: melekes <anton.kalyaev@gmail.com>	6 years ago
Thane Thomson	dff3deb2a9	cs: sync WAL more frequently (#3300 ) As per #3043, this adds a ticker to sync the WAL every 2s while the WAL is running. * Flush WAL every 2s This adds a ticker that flushes the WAL every 2s while the WAL is running. This is related to #3043. * Fix spelling * Increase timeout to 2mins for slower build environments * Make WAL sync interval configurable * Add TODO to replace testChan with more comprehensive testBus * Remove extraneous debug statement * Remove testChan in favour of using system time As per https://github.com/tendermint/tendermint/pull/3300#discussion_r255886586, this removes the `testChan` WAL member and replaces the approach with a system time-oriented one. In this new approach, we keep track of the system time at which each flush and periodic flush successfully occurred. The naming of the various functions is also updated here to be more consistent with "flushing" as opposed to "sync'ing". * Update naming convention and ensure lock for timestamp update * Add Flush method as part of WAL interface Adds a `Flush` method as part of the WAL interface to enforce the idea that we can manually trigger a WAL flush from outside of the WAL. This is employed in the consensus state management to flush the WAL prior to signing votes/proposals, as per https://github.com/tendermint/tendermint/issues/3043#issuecomment-453853630 * Update CHANGELOG_PENDING * Remove mutex approach and replace with DI The dependency injection approach to dealing with testing concerns could allow similar effects to some kind of "testing bus"-based approach. This commit introduces an example of this, where instead of relying on (potentially fragile) timing of things between the code and the test, we inject code into the function under test that can signal the test through a channel. This allows us to avoid the `time.Sleep()`-based approach previously employed. * Update comment on WAL flushing during vote signing Co-Authored-By: thanethomson <connect@thanethomson.com> * Simplify flush interval definition Co-Authored-By: thanethomson <connect@thanethomson.com> * Expand commentary on WAL disk flushing Co-Authored-By: thanethomson <connect@thanethomson.com> * Add broken test to illustrate WAL sync test problem Removes test-related state (dependency injection code) from the WAL data structure and adds test code to illustrate the problem with using `WALGenerateNBlocks` and `wal.SearchForEndHeight` to test periodic sync'ing. * Fix test error messages * Use WAL group buffer size to check for flush A function is added to `libs/autofile/group.go#Group` in order to return the size of the buffered data (i.e. data that has not yet been flushed to disk). The test now checks that, prior to a `time.Sleep`, the group buffer has data in it. After the `time.Sleep` (during which time the periodic flush should have been called), the buffer should be empty. * Remove config root dir removal from #3291 * Add godoc for NewWAL mentioning periodic sync	6 years ago
Anton Kaliaev	0b0a8b3128	cs/wal: refuse to encode msg that is bigger than maxMsgSizeBytes (#3303 ) Earlier this week somebody posted this in GoS Riot chat: ``` E[2019-02-12\|10:38:37.596] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[length 878916964 exceeded maximum possible value of 1048576 bytes]" E[2019-02-12\|10:38:37.596] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[length 825701731 exceeded maximum possible value of 1048576 bytes]" E[2019-02-12\|10:38:37.596] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[length 1631073634 exceeded maximum possible value of 1048576 bytes]" E[2019-02-12\|10:38:37.596] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[length 912418148 exceeded maximum possible value of 1048576 bytes]" E[2019-02-12\|10:38:37.600] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[failed to read data: EOF]" E[2019-02-12\|10:38:37.600] Error on catchup replay. Proceeding to start ConsensusState anyway module=consensus err="Cannot replay height 7242. WAL does not contain #ENDHEIGHT for 7241" E[2019-02-12\|10:38:37.861] Error dialing peer module=p2p err="dial tcp 35.183.126.181:26656: i/o timeout ``` Note the length error messages. What has happened is the length field got corrupted probably. I've looked at the code and noticed that we don't check the msg size during encoding. This PR fixes that. It also improves a few error messages in WALDecoder.	6 years ago
Ethan Buchman	dc6567c677	consensus: flush wal on stop (#3297 ) Should fix #3295 Also partial fix of #3043	6 years ago
Ethan Buchman	45b70ae031	fix non deterministic test failures and race in privval socket (#3258 ) * node: decrease retry conn timeout in test Should fix #3256 The retry timeout was set to the default, which is the same as the accept timeout, so it's no wonder this would fail. Here we decrease the retry timeout so we can try many times before the accept timeout. * p2p: increase handshake timeout in test This fails sometimes, presumably because the handshake timeout is so low (only 50ms). So increase it to 1s. Should fix #3187 * privval: fix race with ping. closes #3237 Pings happen in a go-routine and can happen concurrently with other messages. Since we use a request/response protocol, we expect to send a request and get back the corresponding response. But with pings happening concurrently, this assumption could be violated. We were using a mutex, but only a RWMutex, where the RLock was being held for sending messages - this was to allow the underlying connection to be replaced if it fails. Turns out we actually need to use a full lock (not just a read lock) to prevent multiple requests from happening concurrently. * node: fix test name. DelayedStop -> DelayedStart * autofile: Wait() method In the TestWALTruncate in consensus/wal_test.go we remove the WAL directory at the end of the test. However the wal.Stop() does not properly wait for the autofile group to finish shutting down. Hence it was possible that the group's go-routine is still running when the cleanup happens, which causes a panic since the directory disappeared. Here we add a Wait() method to properly wait until the go-routine exits so we can safely clean up. This fixes #2852.	6 years ago
Ethan Buchman	39eba4e154	WAL: better errors and new fail point (#3246 ) * privval: more info in errors * wal: change Debug logs to Info * wal: log and return error on corrupted wal instead of panicing * fail: Exit right away instead of sending interupt * consensus: FAIL before handling our own vote allows to replicate #3089: - run using `FAIL_TEST_INDEX=0` - delete some bytes from the end of the WAL - start normally Results in logs like: ``` I[2019-02-03\|18:12:58.225] Searching for height module=consensus wal=/Users/ethanbuchman/.tendermint/data/cs.wal/wal height=1 min=0 max=0 E[2019-02-03\|18:12:58.225] Error on catchup replay. Proceeding to start ConsensusState anyway module=consensus err="failed to read data: EOF" I[2019-02-03\|18:12:58.225] Started node module=main nodeInfo="{ProtocolVersion:{P2P:6 Block:9 App:1} ID_:35e87e93f2e31f305b65a5517fd2102331b56002 ListenAddr:tcp://0.0.0.0:26656 Network:test-chain-J8JvJH Version:0.29.1 Channels:4020212223303800 Moniker:Ethans-MacBook-Pro.local Other:{TxIndex:on RPCAddress:tcp://0.0.0.0:26657}}" E[2019-02-03\|18:12:58.226] Couldn't connect to any seeds module=p2p I[2019-02-03\|18:12:59.229] Timed out module=consensus dur=998.568ms height=1 round=0 step=RoundStepNewHeight I[2019-02-03\|18:12:59.230] enterNewRound(1/0). Current: 1/0/RoundStepNewHeight module=consensus height=1 round=0 I[2019-02-03\|18:12:59.230] enterPropose(1/0). Current: 1/0/RoundStepNewRound module=consensus height=1 round=0 I[2019-02-03\|18:12:59.230] enterPropose: Our turn to propose module=consensus height=1 round=0 proposer=AD278B7767B05D7FBEB76207024C650988FA77D5 privValidator="PrivValidator{AD278B7767B05D7FBEB76207024C650988FA77D5 LH:1, LR:0, LS:2}" E[2019-02-03\|18:12:59.230] enterPropose: Error signing proposal module=consensus height=1 round=0 err="Error signing proposal: Step regression at height 1 round 0. Got 1, last step 2" I[2019-02-03\|18:13:02.233] Timed out module=consensus dur=3s height=1 round=0 step=RoundStepPropose I[2019-02-03\|18:13:02.233] enterPrevote(1/0). Current: 1/0/RoundStepPropose module=consensus I[2019-02-03\|18:13:02.233] enterPrevote: ProposalBlock is nil module=consensus height=1 round=0 E[2019-02-03\|18:13:02.234] Error signing vote module=consensus height=1 round=0 vote="Vote{0:AD278B7767B0 1/00/1(Prevote) 000000000000 000000000000 @ 2019-02-04T02:13:02.233897Z}" err="Error signing vote: Conflicting data" ``` Notice the EOF, the step regression, and the conflicting data. * wal: change errors to be DataCorruptionError * exit on corrupt WAL * fix log * fix new line	6 years ago
Anton Kaliaev	d178ea9eaf	use our logger in autofile/group	6 years ago
goolAdapter	110b07fb3f	libs: Call Flush() before rename #2428 (#2439 ) * fix Group.RotateFile need call Flush() before rename. #2428 * fix some review issue. #2428 refactor Group's config: replace setting member with initial option * fix a handwriting mistake * fix a time window error between rename and write. * fix a syntax mistake. * change option name Get_ to With_ * fix review issue * fix review issue	6 years ago
Anton Kaliaev	0e1cd88863	Remove ConsensusParams.TxSize and ConsensusParams.BlockGossip (#2364 ) * remove ConsensusParams.TxSize and ConsensusParams.BlockGossip Refs #2347 * block part size is now fixed Refs #2347 * use max data size, not max bytes for tx limit Refs #2347	6 years ago
Zarko Milosevic	7b88172f41	Implement BFT time (#2203 ) * Implement BFT time * set LastValidators when creating state in state helper for heights >= 2	6 years ago
Dev Ojha	2756be5a59	libs: Remove usage of custom Fmt, in favor of fmt.Sprintf (#2199 ) * libs: Remove usage of custom Fmt, in favor of fmt.Sprintf Closes #2193 * Fix bug that was masked by custom Fmt!	6 years ago
Zach Ramsay	44dad6d70b	Revert "detele everything" This reverts commit `d02c5d1e30`.	6 years ago
Zach Ramsay	d02c5d1e30	detele everything	6 years ago
Ethan Buchman	d55243f0e6	fix import paths	6 years ago
Liamsi	d2c05bc5b9	Revert "delete everything" (includes everything non-go-crypto) This reverts commit `96a3502`	6 years ago
Liamsi	96a3502126	delete everything	6 years ago
Anton Kaliaev	1f22f34edf	flush wal group on stop Refs #1659 Refs https://github.com/tendermint/tmlibs/pull/217	7 years ago
Anton Kaliaev	708f35e5c1	do not look for height in older files if we've seen height - 1 Refs #1600	7 years ago
Anton Kaliaev	f3f5c7f472	we must only return io.EOF to progress to the next file in auto.Group since we never write msg partially, if we've encountered io.EOF in the middle of the msg, we must abort	7 years ago
Anton Kaliaev	68f6226bea	data is corrupted, but this requires manual intervention i.e., can't be skipped and we should only return DataCorruptionError if we can skip a msg safely	7 years ago
Anton Kaliaev	118b86b1ef	fix nil panic error msg is nil and if we continue executing, we'll get nil exception at `msg.Msg.(....)`	7 years ago
Anton Kaliaev	b9afcbe3a2	fix typo	7 years ago
Ethan Buchman	ee4eb59355	update comments	7 years ago
Ethan Buchman	082a02e6d1	consensus: only fsync wal after internal msgs	7 years ago
Anton Kaliaev	e88f74bb9b	remove wal_light setting Closes #1428	7 years ago
Jae Kwon	fb64314d1c	Review from Anton	7 years ago
Ethan Buchman	799beebd36	fix consensus tests	7 years ago
Jae Kwon	45ec5fd170	WIP consensus	7 years ago
Ethan Buchman	a17105fd46	p2p: peer.Key -> peer.ID	7 years ago
Anton Kaliaev	843e1ed400	Updates -> ValidatoSetUpdates	7 years ago
Anton Kaliaev	e57cad6c3f	correct maxMsgSizeBytes	7 years ago
Anton Kaliaev	06aece31cf	lower the max message size	7 years ago
Anton Kaliaev	af79a2a59e	fix error msg	7 years ago
Anton Kaliaev	ee66476d62	set max msg size otherwise, it is easy to get OutOfMemory panic (somebody can even expoit this)	7 years ago
Anton Kaliaev	40f9261d48	handle data corruption errors Refs #573	7 years ago
Anton Kaliaev	90944bb1a2	be specific about what type we're encoding to be consistent with Decode, which returns TimedWALMessage	7 years ago
Anton Kaliaev	07571741c5	[consensus] remove WAL separator (Refs #785 ) We don't really need a separator unless we have complex structures (rows, cells like RDBMS have https://www.sqlite.org/fileformat.html).	7 years ago
Anton Kaliaev	c6f025f40e	generate WAL on the fly (Refs #468 )	7 years ago
Anton Kaliaev	922af7c405	int64 height uint64 is considered dangerous. the details will follow in a blog post.	7 years ago
Anton Kaliaev	69b5da766c	service#Start, service#Stop signatures were changed See https://github.com/tendermint/tmlibs/issues/45	7 years ago
Zach Ramsay	6f3c05545d	fix new linting errors	7 years ago
Zach Ramsay	b75d4f73e7	errcheck: PR comment fixes	7 years ago
Anton Kaliaev	61d76a273f	fixes from Bucky's and Emmanuel's reviews	7 years ago
Anton Kaliaev	f6539737de	new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases	7 years ago

1 2

77 Commits (dfebac86f7120818648e784b96c31ae9c1673b3c)