tendermint

Commit Graph

Author	SHA1	Message	Date
Sam Kleinman	b15b2c1b78	flowrate: cleanup unused files (#7158 ) I saw one of these tests fail and it looks like it was using code that wasn't being called anywhere, so I deleted it, and avoided the package name aliasing.	3 years ago
William Banfield	b4bc6bb4e8	p2p: add message type into the send/recv bytes metrics (#7155 ) This pull request adds a new "mesage_type" label to the send/recv bytes metrics calculated in the p2p code. Below is a snippet of the updated metrics that includes the updated label: ``` tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_HasVote",peer_id="2551a13ed720101b271a5df4816d1e4b3d3bd133"} 652 tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_HasVote",peer_id="4b1068420ef739db63377250553562b9a978708a"} 631 tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_HasVote",peer_id="927c50a5e508c747830ce3ba64a3f70fdda58ef2"} 631 tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_NewRoundStep",peer_id="2551a13ed720101b271a5df4816d1e4b3d3bd133"} 393 tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_NewRoundStep",peer_id="4b1068420ef739db63377250553562b9a978708a"} 357 tendermint_p2p_peer_receive_bytes_total{chID="32",chain_id="ci",message_type="consensus_NewRoundStep",peer_id="927c50a5e508c747830ce3ba64a3f70fdda58ef2"} 386 ```	3 years ago
Sam Kleinman	23be048294	p2p: use correct transport configuration (#7152 )	3 years ago
Callum Waters	68ca65f5d7	pex: remove legacy proto messages (#7147 ) This PR implements the proto changes made in https://github.com/tendermint/spec/pull/352, removing the legacy messages that were used in the pex reactor.	3 years ago
Callum Waters	a8ff617773	state: add height assertion to rollback function (#7143 )	3 years ago
William Banfield	b0130c88fb	mempool: remove panic when recheck-tx was not sent to ABCI application (#7134 ) This pull request fixes a panic that exists in both mempools. The panic occurs when the ABCI client misses a response from the ABCI application. This happen when the ABCI client drops the request as a result of a full client queue. The fix here was to loop through the ordered list of recheck-tx in the callback until one matches the currently observed recheck request.	3 years ago
Sam Kleinman	ca8f004112	p2p: remove final shims from p2p package (#7136 ) This is, perhaps, the trival final piece of #7075 that I've been working on. There's more work to be done: - push more of the setup into the pacakges themselves - move channel-based sending/filtering out of the - simplify the buffering throuhgout the p2p stack.	3 years ago
Sam Kleinman	7143f14a63	p2p: simplify open channel interface (#7133 ) A fourth #7075 component patch to simplify the channel creation interface	3 years ago
Sam Kleinman	cbe6ad6cd5	p2p: flatten channel descriptor (#7132 )	3 years ago
Sam Kleinman	0900ea8396	p2p: channel shim cleanup (#7129 )	3 years ago
Sam Kleinman	f4a56f4034	p2p: refactor channel description (#7130 ) This is another small sliver of #7075, with the intention of removing the legacy shim layer related to channel registration.	3 years ago
Marko	66a11fe527	blocksync: remove v0 folder structure (#7128 ) Remove v0 blocksync folder structure.	3 years ago
Jared Zhou	b95c261981	rpc: fix typo in broadcast commit (#7124 )	3 years ago
M. J. Fromberger	86f00135dd	rpc: Remove the deprecated gRPC interface to the RPC service (#7121 ) This change removes the partial gRPC interface to the RPC service, which was deprecated in resolution of #6718. Details: - rpc: Remove the client and server interfaces and proto definitions. - Remove the gRPC settings from the config library. - Remove gRPC setup for the RPC service in the node startup. - Fix various test helpers to remove gRPC bits. - Remove the --rpc.grpc-laddr flag from the CLI. Note that to satisfy the protobuf interface check, this change also includes a temporary edit to buf.yaml, that I will revert after this is merged.	3 years ago
William Banfield	ff7b0e638e	p2p: fix priority queue bytes pending calculation (#7120 ) This metric describes itself as 'pending' but never actual decrements when the messages are removed from the queue. This change fixes that by decrementing the metric when the data is removed from the queue.	3 years ago
William Banfield	36a1acff52	internal/proxy: add initial set of abci metrics (#7115 ) This PR adds an initial set of metrics for use ABCI. The initial metrics enable the calculation of timing histograms and call counts for each of the ABCI methods. The metrics are also labeled as either 'sync' or 'async' to determine if the method call was performed using ABCI's `*Async` methods. An example of these metrics is included here for reference: ``` tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.0001"} 0 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.0004"} 5 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.002"} 12 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.009"} 13 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.02"} 13 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.1"} 13 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.65"} 13 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="2"} 13 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="6"} 13 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="25"} 13 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="+Inf"} 13 tendermint_abci_connection_method_timing_sum{chain_id="ci",method="commit",type="sync"} 0.007802058000000001 tendermint_abci_connection_method_timing_count{chain_id="ci",method="commit",type="sync"} 13 ``` These metrics can easily be graphed using prometheus's `histogram_quantile(...)` method to pick out a particular quantile to graph or examine. I chose buckets that were somewhat of an estimate of expected range of times for ABCI operations. They start at .0001 seconds and range to 25 seconds. The hope is that this range captures enough possible times to be useful for us and operators.	3 years ago
Sam Kleinman	4781d04d18	node: always close database engine (#7113 )	3 years ago
Sam Kleinman	34a3fcd8fc	Revert "abci: change client to use multi-reader mutexes (#6306 )" (#7106 ) This reverts commit `1c4dbe30d4`.	3 years ago
Sam Kleinman	ded310093e	lint: fix collection of stale errors (#7090 ) Few things that had been annoying.	3 years ago
Sam Kleinman	3646b635d3	p2p, types: remove legacy NetAddress type (#7084 )	3 years ago
Callum Waters	59404003ee	p2p: rename pexV2 to pex (#7088 )	3 years ago
Sam Kleinman	1b5bb5348f	p2p: cleanup unused arguments (#7079 ) This is mostly just reading through the output of uparam, after noticing that there were a few places where we were ignoring some arguments.	3 years ago
Callum Waters	4ca130d226	cli: allow node operator to rollback last state (#7033 )	3 years ago
Sam Kleinman	5bf30bb049	p2p: cleanup transport interface (#7071 ) This is another batch of things to cleanup in the legacy P2P system.	3 years ago
Sam Kleinman	851d2e3bde	mempool,rpc: add removetx rpc method (#7047 ) Addresses one of the concerns with #7041. Provides a mechanism (via the RPC interface) to delete a single transaction, described by its hash, from the mempool. The method returns an error if the transaction cannot be found. Once the transaction is removed it remains in the cache and cannot be resubmitted until the cache is cleared or it expires from the cache.	3 years ago
Sam Kleinman	3ea81bfaa7	p2p: remove wdrr queue (#7064 ) This code hasn't been battle tested, and seems to have grown increasingly flaky int tests. Given our general direction of reducing queue complexity over the next couple of releases I think it makes sense to remove it.	3 years ago
Sam Kleinman	03ad7d6f20	p2p: delete legacy stack initial pass (#7035 ) A few notes: - this is not all the deletion that we can do, but this is the most "simple" case: it leaves in shims, and there's some trivial additional cleanup to the transport that can happen but that requires writing more code, and I wanted this to be easy to review above all else. - This should land after we cut the branch for 0.35, but I'm anticipating that to happen soon, and I wanted to run this through CI.	3 years ago
William Banfield	f5b9c210ca	consensus: wait until peerUpdates channel is closed to close remaining peers (#7058 ) The race occurred as a result of a goroutine launched by `processPeerUpdate` racing with the `OnStop` method. The `processPeerUpdates` goroutine deletes from the map as `OnStop` is reading from it. This change updates the `OnStop` method to wait for the peer updates channel to be done before closing the peers. It also copies the map contents to a new map so that it will not conflict with the view of the map that the goroutine created in `processPeerUpdate` sees.	3 years ago
Sam Kleinman	cb69ed8135	blocksync/v2: remove unsupported reactor (#7046 ) This commit should be one of the first to land as part of the v0.36 cycle after cutting the 0.35 branch. The blocksync/v2 reactor was originally implemented as an experiement to produce an implementation of the blockstack protocol that would be easier to test and validate, but it was never appropriately operationalized and this implementation was never fully debugged. When the p2p layer was refactored as part of the 0.35 cycle, the v2 implementation was not refactored and it was left in the codebase but not removed. This commit just removes all references to it.	3 years ago
William Banfield	243c62cc68	statesync: improve rare p2p race condition (#7042 ) This is intended to fix a test failure that occurs in the p2p state provider. The issue presents as the state provider timing out waiting for the consensus params response. The reason that this can occur is because the statesync reactor has the possibility of attempting to respond to the params request before the state provider is ready to read it. This results in the reactor hitting the `default` case seen here and then never sending on the channel. The stateprovider will then block waiting for a response and never receive one because the reactor opted not to send it.	3 years ago
William Banfield	177850a2c9	statesync: remove deadlock on init fail (#7029 ) When statesync is stopped during shutdown, it has the possibility of deadlocking. A dump of goroutines reveals that this is related to the peerUpdates channel not returning anything on its `Done()` channel when `OnStop` is called. As this is occuring, `processPeerUpdate` is attempting to acquire the reactor lock. It appears that this lock can never be acquired. I looked for the places where the lock may remain locked accidentally and cleaned them up in hopes to eradicate the issue. Dumps of the relevant goroutines may be found below. Note that the line numbers below are relative to the code in the `v0.35.0-rc1` tag. ``` goroutine 36 [chan receive]: github.com/tendermint/tendermint/internal/statesync.(Reactor).OnStop(0xc00058f200) github.com/tendermint/tendermint/internal/statesync/reactor.go:243 +0x117 github.com/tendermint/tendermint/libs/service.(BaseService).Stop(0xc00058f200, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/node.(nodeImpl).OnStop(0xc0001ea240) github.com/tendermint/tendermint/node/node.go:769 +0x132 github.com/tendermint/tendermint/libs/service.(BaseService).Stop(0xc0001ea240, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1() github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62 github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000629500, 0x7fdb52f96358, 0xc0002b5030, 0xc00000daa0) github.com/tendermint/tendermint/libs/os/os.go:26 +0x102 created by github.com/tendermint/tendermint/libs/os.TrapSignal github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6 goroutine 188 [semacquire]: sync.runtime_SemacquireMutex(0xc00026b1cc, 0x0, 0x1) runtime/sema.go:71 +0x47 sync.(Mutex).lockSlow(0xc00026b1c8) sync/mutex.go:138 +0x105 sync.(Mutex).Lock(...) sync/mutex.go:81 sync.(RWMutex).Lock(0xc00026b1c8) sync/rwmutex.go:111 +0x90 github.com/tendermint/tendermint/internal/statesync.(Reactor).processPeerUpdate(0xc00026b080, 0xc000650008, 0x28, 0x124de90, 0x4) github.com/tendermint/tendermint/internal/statesync/reactor.go:849 +0x1a5 github.com/tendermint/tendermint/internal/statesync.(Reactor).processPeerUpdates(0xc00026b080) github.com/tendermint/tendermint/internal/statesync/reactor.go:883 +0xab created by github.com/tendermint/tendermint/internal/statesync.(Reactor.OnStart github.com/tendermint/tendermint/internal/statesync/reactor.go:219 +0xcd) ```	3 years ago
M. J. Fromberger	bdd815ebc9	Align atomic struct field for compatibility in 32-bit ABIs. (#7037 ) The layout of struct fields means that interior fields may not be properly aligned for 64-bit access. Fixes #7000.	3 years ago
William Banfield	6a0d9c832a	blocksync: fix shutdown deadlock issue (#7030 ) When shutting down blocksync, it is observed that the process can hang completely. A dump of running goroutines reveals that this is due to goroutines not listening on the correct shutdown signal. Namely, the `poolRoutine` goroutine does not wait on `pool.Quit`. The `poolRoutine` does not receive any other shutdown signal during `OnStop` becuase it must stop before the `r.closeCh` is closed. Currently the `poolRoutine` listens in the `closeCh` which will not close until the `poolRoutine` stops and calls `poolWG.Done()`. This change also puts the `requestRoutine()` in the `OnStart` method to make it more visible since it does not rely on anything that is spawned in the `poolRoutine`. ``` goroutine 183 [semacquire]: sync.runtime_Semacquire(0xc0000d3bd8) runtime/sema.go:56 +0x45 sync.(WaitGroup).Wait(0xc0000d3bd0) sync/waitgroup.go:130 +0x65 github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).OnStop(0xc0000d3a00) github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:193 +0x47 github.com/tendermint/tendermint/libs/service.(BaseService).Stop(0xc0000d3a00, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/node.(nodeImpl).OnStop(0xc00052c000) github.com/tendermint/tendermint/node/node.go:758 +0xc62 github.com/tendermint/tendermint/libs/service.(BaseService).Stop(0xc00052c000, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1() github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62 github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000df6d20, 0x7f04a68da900, 0xc0004a8930, 0xc0005a72d8) github.com/tendermint/tendermint/libs/os/os.go:26 +0x102 created by github.com/tendermint/tendermint/libs/os.TrapSignal github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6 goroutine 161 [select]: github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).poolRoutine(0xc0000d3a00, 0x0) github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:464 +0x2b3 created by github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).OnStart github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:174 +0xf1 goroutine 162 [select]: github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).processBlockSyncCh(0xc0000d3a00) github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:310 +0x151 created by github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).OnStart github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:177 +0x54 goroutine 163 [select]: github.com/tendermint/tendermint/internal/blocksync/v0.(Reactor).processPeerUpdates(0xc0000d3a00) github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:363 +0x12b created by github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).OnStart github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:178 +0x76 ```	3 years ago
Sam Kleinman	23fe6fd2f9	statesync: ensure test network properly configured (#7026 ) This test reliably gets hung up on network configuration, (which may be a real issue,) but it's network setup is handcranked and we should ensure that the test focuses on it's core assertions and doesn't fail for test architecture reasons.	3 years ago
Sam Kleinman	8758078786	consensus: avoid unbuffered channel in state test (#7025 )	3 years ago
lklimek	1bd1593f20	fix: race condition in p2p_switch and pex_reactor (#7015 ) Closes https://github.com/tendermint/tendermint/issues/7014	3 years ago
Sam Kleinman	9a16d930c6	statesync: add logging while waiting for peers (#7007 )	3 years ago
Callum Waters	60a6c6fb1a	e2e: allow running of single node using the e2e app (#6982 )	3 years ago
Sam Kleinman	71c6682b57	statesync: clean up reactor/syncer lifecylce (#6995 ) I've been noticing that there are a number of situations where the statesync reactor blocks waiting for peers (or similar,) I've moved things around to improve outcomes in local tests.	3 years ago
Sam Kleinman	b203c91799	rpc: implement BroadcastTxCommit without event subscriptions (#6984 )	3 years ago
Sam Kleinman	bb8ffcb95b	store: move pacakge to internal (#6978 )	3 years ago
M. J. Fromberger	cf7537ea5f	cleanup: Reduce and normalize import path aliasing. (#6975 ) The code in the Tendermint repository makes heavy use of import aliasing. This is made necessary by our extensive reuse of common base package names, and by repetition of similar names across different subdirectories. Unfortunately we have not been very consistent about which packages we alias in various circumstances, and the aliases we use vary. In the spirit of the advice in the style guide and https://github.com/golang/go/wiki/CodeReviewComments#imports, his change makes an effort to clean up and normalize import aliasing. This change makes no API or behavioral changes. It is a pure cleanup intended o help make the code more readable to developers (including myself) trying to understand what is being imported where. Only unexported names have been modified, and the changes were generated and applied mechanically with gofmt -r and comby, respecting the lexical and syntactic rules of Go. Even so, I did not fix every inconsistency. Where the changes would be too disruptive, I left it alone. The principles I followed in this cleanup are: - Remove aliases that restate the package name. - Remove aliases where the base package name is unambiguous. - Move overly-terse abbreviations from the import to the usage site. - Fix lexical issues (remove underscores, remove capitalization). - Fix import groupings to more closely match the style guide. - Group blank (side-effecting) imports and ensure they are commented. - Add aliases to multiple imports with the same base package name.	3 years ago
M. J. Fromberger	41ac5b90c5	Fix script paths in go:generate directives. (#6973 ) We moved some files further down in the directory structure in #6964, which caused the relative paths to the mockery wrapper to stop working. There does not seem to be an obvious way to get the module root as a default environment variable, so for now I just added the extra up-slashes.	3 years ago
Sam Kleinman	1c4950dbd2	state: move package to internal (#6964 )	3 years ago
Sam Kleinman	07d10184a1	inspect: remove duplicated construction path (#6966 )	3 years ago
JayT106	84ffaaaf37	statesync/rpc: metrics for the statesync and the rpc SyncInfo (#6795 )	3 years ago
Sam Kleinman	9dfdc62eb7	proxy: move proxy package to internal (#6953 )	3 years ago
William Banfield	382947ce93	rfc: add performance taxonomy rfc (#6921 ) This document attempts to capture and discuss some of the areas of Tendermint that seem to be cited as causing performance issue. I'm hoping to continue to gather feedback and input on this document to better understand what issues Tendermint performance may cause for our users. The overall goal of this document is to allow the maintainers and community to get a better sense of these issues and to be more capably able to discuss them and weight trade-offs about any proposed performance-focused changes. This document does not aim to propose any performance improvements. It does suggest useful places for benchmarks and places where additional metrics would be useful for diagnosing and further understanding Tendermint performance. Please comment with areas where my reasoning seems off or with additional areas that Tendermint performance may be causing user pain.	3 years ago
William Banfield	63aeb50665	upgrading: add information into the UPGRADING.md for users of the codebase wishing to upgrade (#6898 ) * add information on upgrading to the new p2p library * clarify p2p backwards compatibility * reorder p2p queue list * add demo for p2p selection * fix spacing in upgrading	3 years ago
Callum Waters	8fe651ba30	e2e: clean up generation of evidence (#6904 )	3 years ago

1 2 3

127 Commits (b15b2c1b78ef7483837798623ba7e3a039a97b4c)