tendermint

Commit Graph

Author	SHA1	Message	Date
M. J. Fromberger	a7eb95065d	autofile: ensure files are not reopened after closing (#7628 ) During file rotation and WAL shutdown, there was a race condition between users of an autofile and its termination. To fix this, ensure operations on an autofile are properly synchronized, and report errors when attempting to use an autofile after it was closed. Notably: - Simplify the cancellation protocol between signal and Close. - Exclude writers to an autofile during rotation. - Add documentation about what is going on. There is a lot more that could be improved here, but this addresses the more obvious races that have been panicking unit tests.	3 years ago
Sam Kleinman	82b65868ce	node+autofile: avoid leaks detected during WAL shutdown (#7599 )	3 years ago
Sam Kleinman	3c8955e4b8	errors: formating cleanup (#7507 )	3 years ago
Sam Kleinman	a62ac27047	service: remove exported logger from base implemenation (#7381 )	3 years ago
Sam Kleinman	a823d167bc	service: cleanup base implementation and some caller implementations (#7301 )	3 years ago
Sam Kleinman	6ab62fe7b6	service: remove stop method and use contexts (#7292 )	3 years ago
rene	736364178a	fix typo in log message (#6653 ) Co-authored-by: Callum Waters <cmwaters19@gmail.com>	3 years ago
Marko	719e028e00	libs: internalize some packages (#6366 ) ## Description Internalize some libs. This reduces the amount ot public API tendermint is supporting. The moved libraries are mainly ones that are used within Tendermint-core.	3 years ago
Marko	0ed8dba991	lint: enable errcheck (#5336 ) ## Description Enable errcheck linter throughout the codebase Closes: #5059	4 years ago
Erik Grinaker	8f48c49543	Fix some golangci-lint warnings (#4448 )	5 years ago
Erik Grinaker	b712c1cbb5	autofile: resolve relative paths (#4390 ) Fixes #2649	5 years ago
Marko	27b00cf8d1	libs/common: refactor libs common 3 (#4232 ) * libs/common: refactor libs common 3 - move nil.go into types folder and make private - move service & baseservice out of common into service pkg ref #4147 Signed-off-by: Marko Baricevic <marbar3778@yahoo.com> * add changelog entry	5 years ago
Marko	f9cce282da	gocritic (2/2) (#3864 ) Refs #3262	5 years ago
zjubfd	2233dd45bd	libs: remove useless code in group (#3504 ) * lib: remove useless code in group * update change log * Update CHANGELOG_PENDING.md Co-Authored-By: guagualvcha <baifudong@lancai.cn>	6 years ago
Anton Kaliaev	ec9bff5234	rename WAL#Flush to WAL#FlushAndSync (#3345 ) * rename WAL#Flush to WAL#FlushAndSync - rename auto#Flush to auto#FlushAndSync - cleanup WAL interface to not leak implementation details! * remove Group() * add WALReader interface and return it in SearchForEndHeight() - add interface assertions Refs #3337 * replace WALReader with io.ReadCloser	6 years ago
Thane Thomson	dff3deb2a9	cs: sync WAL more frequently (#3300 ) As per #3043, this adds a ticker to sync the WAL every 2s while the WAL is running. * Flush WAL every 2s This adds a ticker that flushes the WAL every 2s while the WAL is running. This is related to #3043. * Fix spelling * Increase timeout to 2mins for slower build environments * Make WAL sync interval configurable * Add TODO to replace testChan with more comprehensive testBus * Remove extraneous debug statement * Remove testChan in favour of using system time As per https://github.com/tendermint/tendermint/pull/3300#discussion_r255886586, this removes the `testChan` WAL member and replaces the approach with a system time-oriented one. In this new approach, we keep track of the system time at which each flush and periodic flush successfully occurred. The naming of the various functions is also updated here to be more consistent with "flushing" as opposed to "sync'ing". * Update naming convention and ensure lock for timestamp update * Add Flush method as part of WAL interface Adds a `Flush` method as part of the WAL interface to enforce the idea that we can manually trigger a WAL flush from outside of the WAL. This is employed in the consensus state management to flush the WAL prior to signing votes/proposals, as per https://github.com/tendermint/tendermint/issues/3043#issuecomment-453853630 * Update CHANGELOG_PENDING * Remove mutex approach and replace with DI The dependency injection approach to dealing with testing concerns could allow similar effects to some kind of "testing bus"-based approach. This commit introduces an example of this, where instead of relying on (potentially fragile) timing of things between the code and the test, we inject code into the function under test that can signal the test through a channel. This allows us to avoid the `time.Sleep()`-based approach previously employed. * Update comment on WAL flushing during vote signing Co-Authored-By: thanethomson <connect@thanethomson.com> * Simplify flush interval definition Co-Authored-By: thanethomson <connect@thanethomson.com> * Expand commentary on WAL disk flushing Co-Authored-By: thanethomson <connect@thanethomson.com> * Add broken test to illustrate WAL sync test problem Removes test-related state (dependency injection code) from the WAL data structure and adds test code to illustrate the problem with using `WALGenerateNBlocks` and `wal.SearchForEndHeight` to test periodic sync'ing. * Fix test error messages * Use WAL group buffer size to check for flush A function is added to `libs/autofile/group.go#Group` in order to return the size of the buffered data (i.e. data that has not yet been flushed to disk). The test now checks that, prior to a `time.Sleep`, the group buffer has data in it. After the `time.Sleep` (during which time the periodic flush should have been called), the buffer should be empty. * Remove config root dir removal from #3291 * Add godoc for NewWAL mentioning periodic sync	6 years ago
Ethan Buchman	45b70ae031	fix non deterministic test failures and race in privval socket (#3258 ) * node: decrease retry conn timeout in test Should fix #3256 The retry timeout was set to the default, which is the same as the accept timeout, so it's no wonder this would fail. Here we decrease the retry timeout so we can try many times before the accept timeout. * p2p: increase handshake timeout in test This fails sometimes, presumably because the handshake timeout is so low (only 50ms). So increase it to 1s. Should fix #3187 * privval: fix race with ping. closes #3237 Pings happen in a go-routine and can happen concurrently with other messages. Since we use a request/response protocol, we expect to send a request and get back the corresponding response. But with pings happening concurrently, this assumption could be violated. We were using a mutex, but only a RWMutex, where the RLock was being held for sending messages - this was to allow the underlying connection to be replaced if it fails. Turns out we actually need to use a full lock (not just a read lock) to prevent multiple requests from happening concurrently. * node: fix test name. DelayedStop -> DelayedStart * autofile: Wait() method In the TestWALTruncate in consensus/wal_test.go we remove the WAL directory at the end of the test. However the wal.Stop() does not properly wait for the autofile group to finish shutting down. Hence it was possible that the group's go-routine is still running when the cleanup happens, which causes a panic since the directory disappeared. Here we add a Wait() method to properly wait until the go-routine exits so we can safely clean up. This fixes #2852.	6 years ago
Anton Kaliaev	13badc1d29	[autofile/group] do not panic when checking size It's OK if the head will grow a little bit bigger, but we'll avoid panic. Refs #2703	6 years ago
Anton Kaliaev	d178ea9eaf	use our logger in autofile/group	6 years ago
Anton Kaliaev	5b1b1ea58a	[libs/autofile] fix DATA RACE by removing openFile() call (#2539 ) There's a time window after we call RotateFile() where autofile#index+1 does not exist. It will be created during the next call to Write(). BUT if somebody calls NewReader() before Write(), it will fail with "open /tmp/wal#index+1/wal: no such file or directory" We must create file (either by calling gr.Head.openFile() or directly) during NewReader() to ensure read calls succeed. Closes #2538	6 years ago
goolAdapter	110b07fb3f	libs: Call Flush() before rename #2428 (#2439 ) * fix Group.RotateFile need call Flush() before rename. #2428 * fix some review issue. #2428 refactor Group's config: replace setting member with initial option * fix a handwriting mistake * fix a time window error between rename and write. * fix a syntax mistake. * change option name Get_ to With_ * fix review issue * fix review issue	6 years ago
Ethan Buchman	9e940b95ad	libs/autofile: bring back loops (#2261 ) * libs/autofile: bring back loops * changelog, version	6 years ago
Dev Ojha	2756be5a59	libs: Remove usage of custom Fmt, in favor of fmt.Sprintf (#2199 ) * libs: Remove usage of custom Fmt, in favor of fmt.Sprintf Closes #2193 * Fix bug that was masked by custom Fmt!	6 years ago
Anton Kaliaev	b1cff0f9bf	[libs/autofile] create a Group ticker on Start 1) no need to stop the ticker in createTestGroup() method 2) now there is a symmetry - we start the ticker in OnStart(), we stop it in OnStop() Refs #2072	6 years ago
Anton Kaliaev	b33f73eaf1	stop autofile and autogroup properly NOTE: from the ticker#Stop documentation: ``` Stop does not close the channel, to prevent a read from the channel succeeding incorrectly. https://golang.org/src/time/tick.go?s=1318:1341#L35 ``` Refs #2072	6 years ago
Zach Ramsay	44dad6d70b	Revert "detele everything" This reverts commit `d02c5d1e30`.	6 years ago
Zach Ramsay	d02c5d1e30	detele everything	6 years ago
Ethan Buchman	d55243f0e6	fix import paths	6 years ago
Ethan Buchman	ae3bf81833	mv tmlibs files to libs dir	6 years ago
Anton Kaliaev	e0985bf566	flush on stop & function to close group as opposite to OpenGroup	7 years ago
Thomas Corbière	ee67e34519	Fix lint errors (#190 ) * use increment and decrement operators. * remove unnecessary else branches. * fix receiver names. * remove omittable code. * fix dot imports.	7 years ago
Anton Kaliaev	668698584d	[autofile] test GroupReader more extensively (Refs #69 )	7 years ago
Anton Kaliaev	81591e288e	fix metalinter warnings	7 years ago
Anton Kaliaev	21b2c26fb1	GroupReader#Read: return io.EOF if file is empty	7 years ago
Anton Kaliaev	c75ddd0fa3	return err if empty slice given	7 years ago
Anton Kaliaev	35e81018e9	add MinIndex method to Group	7 years ago
Anton Kaliaev	aace56018a	add Read method to GroupReader	7 years ago
Anton Kaliaev	45095e83e7	add Write method to autofile/Group	7 years ago
Anton Kaliaev	498fb1134a	write docs for autofile/group	7 years ago
Zach Ramsay	3c57c24921	linting: next round of fixes	7 years ago
Zach Ramsay	d6e03d2368	linting: add to Makefile & do some fixes	7 years ago
Anton Kaliaev	d71d1394ec	call fsync after flush (Refs #573 ) short: flushing the bufio buffer is not enough to ensure data consistency. long: Saving an entry to the WAL calls writeLine to append data to the autofile group backing the WAL, then calls group.Flush() to flush that data to persistent storage. group.Flush() in turn proxies to headBuf.flush(), flushing the active bufio.BufferedWriter. However, BufferedWriter wraps a Writer, not another BufferedWriter, and the way it flushes is by calling io.Writer.Write() to clear the BufferedWriter's buffer. The io.Writer we're wrapping here is AutoFile, whose Write method calls os.File.Write(), performing an unbuffered write to the operating system, where, I assume, it sits in the OS buffers awaiting sync. This means that Wal.Save does not, in fact, ensure the saved operation is synced to disk before returning.	7 years ago
Anton Kaliaev	74a7f8c92b	[autofile] close file before renaming it this might fix our windows bug https://github.com/tendermint/tendermint/issues/444 `0980f8e197`	8 years ago
Ethan Buchman	2f8551d3b6	go-common -> tmlibs	8 years ago
Ethan Buchman	900be74e8f	update import paths	8 years ago
Ethan Buchman	a893bb119b	merge go-autofile	8 years ago
Jae Kwon	0416e0aa9c	Close opened files	8 years ago
Jae Kwon	2a306419c8	Remove spurious fmt	8 years ago
Jae Kwon	dd12bd8f1b	Fix checkTotalSizeLimit bug; remove more than 1 file at a time	8 years ago
Jae Kwon	a528af55d3	Group is a BaseService; TotalSizeLimit enforced; tests fixed	8 years ago

8 Commits (d68d25dcd5d6a66aa4f9a67bd79e9f09cee84458)