tendermint

Commit Graph

Author	SHA1	Message	Date
M. J. Fromberger	3e9ecd8197	Prepare changelog for v0.35.0-rc4. (#7181 )	3 years ago
Sam Kleinman	e40a8468a4	config: backport file writing changes (#7182 )	3 years ago
M. J. Fromberger	85086d7452	Fix metric cardinality left over from backport (#7180 ) One of the patched uses in #7161 missed the message type field, triggering panic failures from Prometheus.	3 years ago
mergify[bot]	8314f24d79	pubsub: Use distinct client IDs for test subscriptions. (#7178 ) (#7179 ) Fixes #7176. Some of the benchmarks create a bunch of different subscriptions all sharing the same query. These were all using the same client ID, which violates one of the subscriber rules. Ensure each subscriber gets a unique ID. This has been broken as long as this library has been in the repo—I tracked it back to `bb9aa85d` and it was already failing there, so I think this never really worked. I'm not sure these test anything useful, but at least now they run. (cherry picked from commit `1fd7060542`) Co-authored-by: M. J. Fromberger <fromberger@interchain.io>	3 years ago
mergify[bot]	dd1471da91	p2p: add message type into the send/recv bytes metrics (backport #7155 ) (#7161 ) * p2p: add message type into the send/recv bytes metrics (#7155) This pull request adds a new "mesage_type" label to the send/recv bytes metrics calculated in the p2p code. Below is a snippet of the updated metrics that includes the updated label: ``` tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_HasVote",peer_id="2551a13ed720101b271a5df4816d1e4b3d3bd133"} 652 tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_HasVote",peer_id="4b1068420ef739db63377250553562b9a978708a"} 631 tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_HasVote",peer_id="927c50a5e508c747830ce3ba64a3f70fdda58ef2"} 631 tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_NewRoundStep",peer_id="2551a13ed720101b271a5df4816d1e4b3d3bd133"} 393 tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_NewRoundStep",peer_id="4b1068420ef739db63377250553562b9a978708a"} 357 tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_NewRoundStep",peer_id="927c50a5e508c747830ce3ba64a3f70fdda58ef2"} 386 ``` (cherry picked from commit `b4bc6bb4e8`)	3 years ago
mergify[bot]	c6d62cc8b2	docs: fix broken links and layout (#7154 ) (#7163 ) This PR does a few minor touch ups to the docs (cherry picked from commit `ce89292712`) Co-authored-by: Callum Waters <cmwaters19@gmail.com>	3 years ago
mergify[bot]	ce6014ddf5	docs: add reactor sections (backport #6510 ) (#7151 )	3 years ago
mergify[bot]	e62a75b627	state: add height assertion to rollback function (#7143 ) (#7148 ) (cherry picked from commit `a8ff617773`) Co-authored-by: Callum Waters <cmwaters19@gmail.com>	3 years ago
mergify[bot]	dbc72e0d69	mempool: remove panic when recheck-tx was not sent to ABCI application (#7134 ) (#7142 ) This pull request fixes a panic that exists in both mempools. The panic occurs when the ABCI client misses a response from the ABCI application. This happen when the ABCI client drops the request as a result of a full client queue. The fix here was to loop through the ordered list of recheck-tx in the callback until one matches the currently observed recheck request. (cherry picked from commit `b0130c88fb`) Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com>	3 years ago
mergify[bot]	57e4e18ba3	build: Fix build-docker to include the full context. (#7114 ) (#7116 ) Fixes #7068. The build-docker rule relies on being able to run make build-linux, but did not pull the Makefile into the build context. There are various ways to fix this, but this was probably the smallest. (cherry picked from commit `6538776e6a`) Co-authored-by: M. J. Fromberger <fromberger@interchain.io>	3 years ago
mergify[bot]	b7fe214b81	Revert "abci: change client to use multi-reader mutexes (#6306 )" (backport #7106 ) (#7110 ) * Revert "abci: change client to use multi-reader mutexes (#6306)" (#7106) This reverts commit `1c4dbe30d4`. (cherry picked from commit `34a3fcd8fc`)	3 years ago
mergify[bot]	66e8eec194	light: Update links in package docs. (#7099 ) (#7101 ) Fixes #7098. The light client documentation moved to the spec repository. I was not able to figure out what happened to light-client-protocol.md, it was removed in #5252 but no corresponding file exists in the spec repository. Since the spec also discusses the protocol, this change simply links to the spec and removes the non-functional reference. Alternatively we could link to the top-level [light client doc](https://docs.tendermint.com/master/tendermint-core/light-client.html) if you think that's better. (cherry picked from commit `48295955ed`) Co-authored-by: M. J. Fromberger <fromberger@interchain.io>	3 years ago
mergify[bot]	22e33aba98	e2e: light nodes should use builtin abci app (#7095 ) (#7097 ) (cherry picked from commit `befd669794`) Co-authored-by: Sam Kleinman <garen@tychoish.com>	3 years ago
mergify[bot]	af85f7e917	e2e: abci protocol should be consistent across networks (#7078 ) (#7086 ) It seems weird in retrospect that we allow networks to contain applications that use different ABCI protocols. (cherry picked from commit `f2a8f5e054`) Co-authored-by: Sam Kleinman <garen@tychoish.com>	3 years ago
mergify[bot]	f0cd54825f	cli: allow node operator to rollback last state (backport #7033 ) (#7081 )	3 years ago
M. J. Fromberger	98bc4f0e2b	Update changelog for v0.35.0-rc3. (#7074 )	3 years ago
mergify[bot]	bff85fc07b	mempool,rpc: add removetx rpc method (#7047 ) (#7065 ) Addresses one of the concerns with #7041. Provides a mechanism (via the RPC interface) to delete a single transaction, described by its hash, from the mempool. The method returns an error if the transaction cannot be found. Once the transaction is removed it remains in the cache and cannot be resubmitted until the cache is cleared or it expires from the cache. (cherry picked from commit `851d2e3bde`) Co-authored-by: Sam Kleinman <garen@tychoish.com>	3 years ago
mergify[bot]	4a952885c5	e2e: automatically prune old app snapshots (#7034 ) (#7063 ) This PR tackles the case of using the e2e application in a long lived testnet. The application continually saves snapshots (usually every 100 blocks) which after a while bloats the size of the application. This PR prunes older snapshots so that only the most recent 10 snapshots remain. (cherry picked from commit `5703ae2fb3`) Co-authored-by: Callum Waters <cmwaters19@gmail.com>	3 years ago
William Banfield	42ed5d75a5	consensus: wait until peerUpdates channel is closed to close remaining peers (#7058 ) (#7060 ) The race occurred as a result of a goroutine launched by `processPeerUpdate` racing with the `OnStop` method. The `processPeerUpdates` goroutine deletes from the map as `OnStop` is reading from it. This change updates the `OnStop` method to wait for the peer updates channel to be done before closing the peers. It also copies the map contents to a new map so that it will not conflict with the view of the map that the goroutine created in `processPeerUpdate` sees.	3 years ago
M. J. Fromberger	be684091ae	Revert "Consolidate related changelog entries. (#7056 )" (#7061 ) This reverts commits: `c16cd72c0a` `6ef847fdfe` We decided on another release candidate to sort out SDK merge issues.	3 years ago
M. J. Fromberger	c16cd72c0a	Consolidate related changelog entries. (#7056 )	3 years ago
M. J. Fromberger	6ef847fdfe	Consolidate release candidate changelogs for v0.35. (#7052 )	3 years ago
M. J. Fromberger	f361ce09b3	Update Go toolchains to 1.17 in Actions workflows. (#7049 )	3 years ago
William Banfield	243c62cc68	statesync: improve rare p2p race condition (#7042 ) This is intended to fix a test failure that occurs in the p2p state provider. The issue presents as the state provider timing out waiting for the consensus params response. The reason that this can occur is because the statesync reactor has the possibility of attempting to respond to the params request before the state provider is ready to read it. This results in the reactor hitting the `default` case seen here and then never sending on the channel. The stateprovider will then block waiting for a response and never receive one because the reactor opted not to send it.	3 years ago
William Banfield	177850a2c9	statesync: remove deadlock on init fail (#7029 ) When statesync is stopped during shutdown, it has the possibility of deadlocking. A dump of goroutines reveals that this is related to the peerUpdates channel not returning anything on its `Done()` channel when `OnStop` is called. As this is occuring, `processPeerUpdate` is attempting to acquire the reactor lock. It appears that this lock can never be acquired. I looked for the places where the lock may remain locked accidentally and cleaned them up in hopes to eradicate the issue. Dumps of the relevant goroutines may be found below. Note that the line numbers below are relative to the code in the `v0.35.0-rc1` tag. ``` goroutine 36 [chan receive]: github.com/tendermint/tendermint/internal/statesync.(Reactor).OnStop(0xc00058f200) github.com/tendermint/tendermint/internal/statesync/reactor.go:243 +0x117 github.com/tendermint/tendermint/libs/service.(BaseService).Stop(0xc00058f200, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/node.(nodeImpl).OnStop(0xc0001ea240) github.com/tendermint/tendermint/node/node.go:769 +0x132 github.com/tendermint/tendermint/libs/service.(BaseService).Stop(0xc0001ea240, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1() github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62 github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000629500, 0x7fdb52f96358, 0xc0002b5030, 0xc00000daa0) github.com/tendermint/tendermint/libs/os/os.go:26 +0x102 created by github.com/tendermint/tendermint/libs/os.TrapSignal github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6 goroutine 188 [semacquire]: sync.runtime_SemacquireMutex(0xc00026b1cc, 0x0, 0x1) runtime/sema.go:71 +0x47 sync.(Mutex).lockSlow(0xc00026b1c8) sync/mutex.go:138 +0x105 sync.(Mutex).Lock(...) sync/mutex.go:81 sync.(RWMutex).Lock(0xc00026b1c8) sync/rwmutex.go:111 +0x90 github.com/tendermint/tendermint/internal/statesync.(Reactor).processPeerUpdate(0xc00026b080, 0xc000650008, 0x28, 0x124de90, 0x4) github.com/tendermint/tendermint/internal/statesync/reactor.go:849 +0x1a5 github.com/tendermint/tendermint/internal/statesync.(Reactor).processPeerUpdates(0xc00026b080) github.com/tendermint/tendermint/internal/statesync/reactor.go:883 +0xab created by github.com/tendermint/tendermint/internal/statesync.(Reactor.OnStart github.com/tendermint/tendermint/internal/statesync/reactor.go:219 +0xcd) ```	3 years ago
M. J. Fromberger	bdd815ebc9	Align atomic struct field for compatibility in 32-bit ABIs. (#7037 ) The layout of struct fields means that interior fields may not be properly aligned for 64-bit access. Fixes #7000.	3 years ago
M. J. Fromberger	77052370cc	Update default config template to match mapstructure keys. (#7036 ) Fix a couple of cases where we updated the keys in the config reader, but forgot to update some of their uses in the default template. Fixes #7031.	3 years ago
William Banfield	6a0d9c832a	blocksync: fix shutdown deadlock issue (#7030 ) When shutting down blocksync, it is observed that the process can hang completely. A dump of running goroutines reveals that this is due to goroutines not listening on the correct shutdown signal. Namely, the `poolRoutine` goroutine does not wait on `pool.Quit`. The `poolRoutine` does not receive any other shutdown signal during `OnStop` becuase it must stop before the `r.closeCh` is closed. Currently the `poolRoutine` listens in the `closeCh` which will not close until the `poolRoutine` stops and calls `poolWG.Done()`. This change also puts the `requestRoutine()` in the `OnStart` method to make it more visible since it does not rely on anything that is spawned in the `poolRoutine`. ``` goroutine 183 [semacquire]: sync.runtime_Semacquire(0xc0000d3bd8) runtime/sema.go:56 +0x45 sync.(WaitGroup).Wait(0xc0000d3bd0) sync/waitgroup.go:130 +0x65 github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).OnStop(0xc0000d3a00) github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:193 +0x47 github.com/tendermint/tendermint/libs/service.(BaseService).Stop(0xc0000d3a00, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/node.(nodeImpl).OnStop(0xc00052c000) github.com/tendermint/tendermint/node/node.go:758 +0xc62 github.com/tendermint/tendermint/libs/service.(BaseService).Stop(0xc00052c000, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1() github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62 github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000df6d20, 0x7f04a68da900, 0xc0004a8930, 0xc0005a72d8) github.com/tendermint/tendermint/libs/os/os.go:26 +0x102 created by github.com/tendermint/tendermint/libs/os.TrapSignal github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6 goroutine 161 [select]: github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).poolRoutine(0xc0000d3a00, 0x0) github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:464 +0x2b3 created by github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).OnStart github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:174 +0xf1 goroutine 162 [select]: github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).processBlockSyncCh(0xc0000d3a00) github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:310 +0x151 created by github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).OnStart github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:177 +0x54 goroutine 163 [select]: github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).processPeerUpdates(0xc0000d3a00) github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:363 +0x12b created by github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).OnStart github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:178 +0x76 ```	3 years ago
Tess Rinearson	c9d92f5f19	.github: remove tessr and bez from codeowners (#7028 )	3 years ago
Sam Kleinman	23fe6fd2f9	statesync: ensure test network properly configured (#7026 ) This test reliably gets hung up on network configuration, (which may be a real issue,) but it's network setup is handcranked and we should ensure that the test focuses on it's core assertions and doesn't fail for test architecture reasons.	3 years ago
M. J. Fromberger	962caeae65	Make doc site index default to the latest release (#7023 ) Fix the order of lines in docs/versions so that v0.34 is last (the current release). Related changes: - Update docs/DOCS_README.md to reflect the current state of how we publish the site. - Fix the build-docs target in Makefile to not perturb the package-lock.json during the build. - Fix the Makefile rule to not clobber package-lock.json.	3 years ago
Sam Kleinman	8758078786	consensus: avoid unbuffered channel in state test (#7025 )	3 years ago
Sam Kleinman	b1dfbb8bc3	e2e: generator ensure p2p modes (#7021 )	3 years ago
Sam Kleinman	c18470a5f1	e2e: use network size in load generator (#7019 )	3 years ago
M. J. Fromberger	ea539dcb98	Update changelog for v0.35.0-rc2. (#7011 ) Also, relinkify and update the bounty URL.	3 years ago
Sam Kleinman	e35a42fc68	e2e: use smaller transactions (#7016 ) 75% of the failures in the last run all ran with the 10kb transactions. I'd like to dial it back and see if things improve more.	3 years ago
lklimek	1bd1593f20	fix: race condition in p2p_switch and pex_reactor (#7015 ) Closes https://github.com/tendermint/tendermint/issues/7014	3 years ago
Sam Kleinman	6be36613c9	e2e: reduce number of stateless nodes in test networks (#7010 )	3 years ago
Sam Kleinman	9a16d930c6	statesync: add logging while waiting for peers (#7007 )	3 years ago
Sam Kleinman	8023a2aeef	e2e: add generator tests (#7008 )	3 years ago
Sam Kleinman	6eaa3b24d6	ci: use cheaper codecov data collection (#7009 )	3 years ago
Sam Kleinman	b150ea6b3e	e2e: avoid seed nodes when statesyncing (#7006 )	3 years ago
Sam Kleinman	b879f71e8e	e2e: reduce log noise (#7004 )	3 years ago
dependabot[bot]	bce7c2f73b	build(deps): Bump google.golang.org/grpc from 1.40.0 to 1.41.0 (#7003 )	3 years ago
Callum Waters	60a6c6fb1a	e2e: allow running of single node using the e2e app (#6982 )	3 years ago
Sam Kleinman	fb9eaf576a	e2e: improve chances of statesyncing success (#7001 ) This reduces this situation where a node will get stuck block syncing, which seemed to happen a lot in last nights run.	3 years ago
Sam Kleinman	37ca98a544	e2e: reduce number of statesyncs in test networks (#6999 )	3 years ago
Sam Kleinman	c101fa17ab	e2e: add limit and sort to generator (#6998 ) I observed a couple of problems with the generator in some recent tests: - there were a couple of hybrid test cases which did not have any legacy nodes (randomness and all.) I change the probability to produce more reliable results. - added options to the generation to be able to add a max (to compliment the earlier min) number of nodes for local testing. - added an option to support reversing the sort order so "more complex" networks were first, as well as tweaked some of the point values. - this refactored the generators cli parsing to be a bit more clear.	3 years ago
M. J. Fromberger	118bfe2087	abci: Flush socket requests and responses immediately. (#6997 ) The main effect of this change is to flush the socket client and server message encoding buffers immediately once the message is fully and correctly encoded. This allows us to remove the timer and some other special cases, without changing the observed behaviour of the system. -- Background The socket protocol client and server each use a buffered writer to encode request and response messages onto the underlying connection. This reduces the possibility of a single message being split across multiple writes, but has the side-effect that a request may remain buffered for some time. The implementation worked around this by keeping a ticker that occasionally triggers a flush, and by flushing the writer in response to an explicit request baked into the client/server protocol (see also #6994). These workarounds are both unnecessary: Once a message has been dequeued for sending and fully encoded in wire format, there is no real use keeping all or part of it buffered locally. Moreover, using an asynchronous process to flush the buffer makes the round-trip performance of the request unpredictable. -- Benchmarks Code: https://play.golang.org/p/0ChUOxJOiHt I found no pre-existing performance benchmarks to justify the flush pattern, but a natural question is whether this will significantly harm client/server performance. To test this, I implemented a simple benchmark that transfers randomly-sized byte buffers from a no-op "client" to a no-op "server" over a Unix-domain socket, using a buffered writer, both with and without explicit flushes after each write. As the following data show, flushing every time (FLUSH=true) does reduce raw throughput, but not by a significant amount except for very small request sizes, where the transfer time is already trivial (1.9μs). Given that the client is calibrated for 1MiB transactions, the overhead is not meaningful. The percentage in each section is the speedup for flushing only when the buffer is full, relative to flushing every block. The benchmark uses the default buffer size (4096 bytes), which is the same value used by the socket client and server implementation: FLUSH NBLOCKS MAX AVG TOTAL ELAPSED TIME/BLOCK false 3957471 512 255 1011165416 2.00018873s 505ns true 1068568 512 255 273064368 2.000217051s 1.871µs (73%) false 536096 4096 2048 1098066401 2.000229108s 3.731µs true 477911 4096 2047 978746731 2.000177825s 4.185µs (10.8%) false 124595 16384 8181 1019340160 2.000235086s 16.053µs true 120995 16384 8179 989703064 2.000329349s 16.532µs (2.9%) false 2114 1048576 525693 1111316541 2.000479928s 946.3µs true 2083 1048576 526379 1096449173 2.001817137s 961.025µs (1.5%) Note also that the FLUSH=false baseline is actually faster than the production code, which flushes more often than is required by the buffer filling up. Moreover, the timer slows down the overall transaction rate of the client and server, indepenedent of how fast the socket transfer is, so the loss on a real workload is probably much less.	3 years ago
Sam Kleinman	71c6682b57	statesync: clean up reactor/syncer lifecylce (#6995 ) I've been noticing that there are a number of situations where the statesync reactor blocks waiting for peers (or similar,) I've moved things around to improve outcomes in local tests.	3 years ago

1 2 3 4 5 ...

7896 Commits (v0.35.0-rc4) All Branches Search

7896 Commits (v0.35.0-rc4)

All Branches