tendermint

Commit Graph

Author	SHA1	Message	Date
Sam Kleinman	ae5f98881b	p2p: make NodeID and NetAddress public (#6583 )	3 years ago
Sam Kleinman	bed58a749f	p2p: address audit issues with the peer manager (#6603 )	3 years ago
Sam Kleinman	d228afc548	p2p: avoid retry delay in error case (#6591 )	3 years ago
Sam Kleinman	a855f96946	p2p: renames for reactors and routing layer internal moves (#6547 )	3 years ago
Marko	719e028e00	libs: internalize some packages (#6366 ) ## Description Internalize some libs. This reduces the amount ot public API tendermint is supporting. The moved libraries are mainly ones that are used within Tendermint-core.	3 years ago
Sam Kleinman	0781ca3f50	p2p/pex: cleanup to pex internals and peerManager interface (#6476 )	4 years ago
Aleksandr Bezobchuk	bc643b19c4	p2p: support private peer IDs in new p2p stack (#6409 ) Pass a set of private peer ids to the `PeerManager` and any node that exists in this set is not returned in the `Advertise` method. closes: #6405	4 years ago
Callum Waters	9efc20c963	p2p: improve PEX reactor (#6305 )	4 years ago
Sam Kleinman	0f41f7465c	p2p: extend e2e tests for new p2p framework (#6323 )	4 years ago
Sam Kleinman	91506bf25d	p2p: simple peer scoring (#6277 )	4 years ago
Sam Kleinman	acbe3f6570	P2P: Evidence Reactor Test Refactor (#6238 )	4 years ago
Erik Grinaker	9b6d6a3ad0	p2p: tighten up Router and add tests (#6044 ) This cleans up the `Router` code and adds a bunch of tests. These sorts of systems are a real pain to test, since they have a bunch of asynchronous goroutines living their own lives, so the test coverage is decent but not fantastic. Luckily we've been able to move all of the complex peer management and transport logic outside of the router, as synchronous components that are much easier to test, so the core router logic is fairly small and simple. This also provides some initial test tooling in `p2p/p2ptest` that automatically sets up in-memory networks and channels for use in integration tests. It also includes channel-oriented test asserters in `p2p/p2ptest/require.go`, but these have primarily been written for router testing and should probably be adapted or extended for reactor testing.	4 years ago
Erik Grinaker	2aad26e2f1	p2p: tighten up and test PeerManager (#6034 ) This tightens up the `PeerManager` and related code, adds a ton of tests, and fixes a bunch of inconsistencies and bugs.	4 years ago
Erik Grinaker	fc71882f74	p2p: add tests and fix bugs for `NodeAddress` and `NodeID` (#6021 ) This renames `PeerAddress` to `NodeAddress`, moves it and `NodeID` into a separate file `address.go`, adds tests for them, and fixes a bunch of bugs and inconsistencies.	4 years ago
Erik Grinaker	1f39f808e1	p2p: tighten up and test Transport API (#6020 ) This tightens up the new P2P `Transport` API and infrastructure, fixes a bunch of bugs and inconsistencies, and adds tests.	4 years ago
Erik Grinaker	50b8907581	p2p: clean up new Transport infrastructure (#6017 ) This revises the new P2P `Transport` interface and does some preliminary code cleanups and simplifications. The major change here is to add `Connection.Handshake()` for performing node handshakes (once the stream transport API is implemented, this can be done entirely independent of the transport). This moves most of the handshaking logic into the `Router`, such as prevention of head-of-line blocking, validation of peer's `NodeInfo`, controlling timeouts, and so on. This significantly simplifies transports, completely removes the need for internal goroutines, and shares common logic across all transports. This also allows varying the handshake `NodeInfo` across peers, e.g. to vary `ListenAddr`. Similarly, connection filtering is also moved into the switch/router so that it can be shared between transports.	4 years ago
Erik Grinaker	fe5b312337	p2p: resolve PEX addresses in PEX reactor (#5980 ) This changes the new prototype PEX reactor to resolve peer address URLs into IP/port PEX addresses itself. Branched off of #5974. I've spent some time thinking about address handling in the P2P stack. We currently use `PeerAddress` URLs everywhere, except for two places: when dialing a peer, and when exchanging addresses via PEX. We had two options: 1. Resolve addresses to endpoints inside `PeerManager`. This would introduce a lot of added complexity: we would have to track connection statistics per endpoint, have goroutines that asynchronously resolve and refresh these endpoints, deal with resolve scheduling before dialing (which is trickier than it sounds since it involves multiple goroutines in the peer manager and router and messes with peer rating order), handle IP address visibility issues, and so on. 2. Resolve addresses to endpoints (IP/port) only where they're used: when dialing, and in PEX. Everywhere else we use URLs. I went with 2, because this significantly simplifies the handling of hostname resolution, and because I really think the PEX reactor should migrate to exchanging URLs instead of IP/port numbers anyway -- this allows operators to use DNS names for validators (and can easily migrate them to new IPs and/or load balance requests), and also allows different protocols (e.g. QUIC and `MemoryTransport`). Happy to discuss this.	4 years ago
Erik Grinaker	51aca684b8	p2p: add prototype PEX reactor for new stack (#5971 ) This adds a prototype PEX reactor for the new P2P stack.	4 years ago
Erik Grinaker	13e772c916	p2p: add PeerManager.Advertise() (#5957 ) Adds a naïve `PeerManager.Advertise()` method that the new PEX reactor can use to fetch addresses to advertise, as well as some other `FIXME`s on address advertisement.	4 years ago
Erik Grinaker	81daaacae9	p2p: simplify PeerManager upgrade logic (#5962 ) Follow-up from #5947, branched off of #5954. This simplifies the upgrade logic by adding explicit eviction requests, which can also be useful for other use-cases (e.g. if we need to ban a peer that's misbehaving). Changes: * Add `evict` map which queues up peers to explicitly evict. * `upgrading` now only tracks peers that we're upgrading via dialing (`DialNext` → `Dialed`/`DialFailed`). * `Dialed` will unmark `upgrading`, and queue `evict` if still beyond capacity. * `Accepted` will pick a random lower-scored peer to upgrade to, if appropriate, and doesn't care about `upgrading` (the dial will fail later, since it's already connected). * `EvictNext` will return a peer scheduled in `evict` if any, otherwise if beyond capacity just evict the lowest-scored peer. This limits all of the `upgrading` logic to `DialNext`, `Dialed`, and `DialFailed`, making it much simplier, and it should generally do the right thing in all cases I can think of.	4 years ago
Erik Grinaker	a741314c97	p2p: improve peerStore prototype (#5954 ) This improves the `peerStore` prototype by e.g.: * Using a database with Protobuf for persistence, but also keeping full peer set in memory for performance. * Simplifying the API, by taking/returning struct copies for safety, and removing errors for in-memory operations. * Caching the ranked peer set, as a temporary solution until a better data structure is implemented. * Adding `PeerManagerOptions.MaxPeers` and pruning the peer store (based on rank) when it's full. * Rewriting `PeerAddress` to be independent of `url.URL`, normalizing it and tightening semantics.	4 years ago
Erik Grinaker	7e0436c6e6	p2p: make PeerManager.DialNext() and EvictNext() block (#5947 ) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.	4 years ago
Erik Grinaker	670e9b427b	p2p: improve PeerManager prototype (#5936 ) This improves the prototype peer manager by: * Exporting `PeerManager`, making it accessible by e.g. reactors. * Replacing `Router.SubscribePeerUpdates()` with `PeerManager.Subscribe()`. * Tracking address/peer connection statistics, and retrying dial failures with exponential backoff. * Prioritizing peers, with persistent peers configuration. * Limiting simultaneous connections. * Evicting peers and upgrading to higher-priority peers. * Tracking peer heights, as a workaround for legacy shared peer state APIs. This is getting to a point where we need to determine precise semantics and implement tests, so we should figure out whether it's a reasonable abstraction that we want to use. The main questions are around the API model (i.e. synchronous method calls with the router polling the manager, vs. an event-driven model using channels, vs. the peer manager calling methods on the router to connect/disconnect peers), and who should have the responsibility of managing actual connections (currently the router, while the manager only tracks peer state).	4 years ago
Erik Grinaker	96215a06ed	p2p: add prototype peer lifecycle manager (#5882 ) This adds a prototype peer lifecycle manager, `peerManager`, which stores peer data in an internal `peerStore`. The overall idea here is to have methods for peer lifecycle events which exchange a very narrow subset of peer data, and to keep all of the peer metadata (i.e. the `peerInfo` struct) internal, to decouple this from the router and simplify concurrency control. See `peerManager` GoDoc for more information. The router is still responsible for actually dialing and accepting peer connections, and routing messages across them, but the peer manager is responsible for determining which peers to dial next, preventing multiple connections being established for the same peer (e.g. both inbound and outbound), and making sure we don't dial the same peer several times in parallel. Later it will also track retries and exponential backoff, as well as peer and address quality. It also assumes responsibility for peer updates subscriptions. It's a bit unclear to me whether we want the peer manager to take on the responsibility of actually dialing and accepting connections as well, or if it should only be tracking peer state for the router while the router is responsible for all transport concerns. Let's revisit this later.	4 years ago
Erik Grinaker	c61cd3fd05	p2p: add Router prototype (#5831 ) Early but functional prototype of the new `p2p.Router`, see its GoDoc comment for details on how it works. Expect much of this logic to change and improve as we evolve the new P2P stack. There is a simple test that sets up an in-memory network of four routers with reactors and passes messages between them, but otherwise no exhaustive tests since this is very much a work-in-progress.	4 years ago
Aleksandr Bezobchuk	e986602649	evidence: p2p refactor (#5747 )	4 years ago
Erik Grinaker	91bef75f62	p2p: rename PubKeyToID to NodeIDFromPubKey	4 years ago
Erik Grinaker	1b6df6783d	p2p: replace PeerID with NodeID	4 years ago
Erik Grinaker	8e7d431f6f	p2p: rename ID to NodeID	4 years ago
Erik Grinaker	bcfc889f25	p2p: implement new Transport interface (#5791 ) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.	4 years ago
Aleksandr Bezobchuk	a879eb444d	p2p: state sync reactor refactor (#5671 )	4 years ago
Marko	098ebaee22	p2p: reduce log severity (#5338 ) ## Description This PR aims to reduce the amount of `Logger.Error(..)` calls. Many of these calls are benign and do not need any intervention. Went from: ``` node1 \| E[2020-09-08\|14:32:48.407] Connection failed @ recvRoutine (reading byte) module=p2p peer=af8747a81383f40583ae8790d2cc1f92cc7e4a35@192.167.10.4:26656 conn=MConn{192.167.10.4:26656} err="read tcp 192.167.10.3:48614->192.167.10.4:26656: read: connection reset by peer" node1 \| E[2020-09-08\|14:32:48.407] Stopping peer for error module=p2p peer="Peer{MConn{192.167.10.4:26656} `af8747a813` out}" err="read tcp 192.167.10.3:48614->192.167.10.4:26656: read: connection reset by peer" node1 \| E[2020-09-08\|14:32:48.407] Error while stopping peer module=p2p peer=af8747a81383f40583ae8790d2cc1f92cc7e4a35@192.167.10.4:26656 err="already stopped" node1 \| E[2020-09-08\|14:32:48.408] MConnection flush failed module=p2p peer=af8747a81383f40583ae8790d2cc1f92cc7e4a35@192.167.10.4:26656 err="write tcp 192.167.10.3:48614->192.167.10.4:26656: use of closed network connection" ``` To: ``` node1 \| E[2020-09-08\|14:42:54.023] Stopping peer for error module=p2p peer="Peer{MConn{192.167.10.5:37844} `e3d01d1795` in}" err=EOF ``` Closes: #4937	4 years ago
Marko	fbdf8b098e	mocks: update with 2.2.1 (#5294 ) ## Description When downloading mockery I ran into an issue where we were using the old version. This PR updates to a more recent version. changelog? Closes: #XXX	4 years ago
Marko	6ccccb0933	lint: errcheck (#5091 ) ## Description add more error checks to tests gonna do a third PR that tackles the non test cases	4 years ago
Marko	7e2cc1db5e	linter: (1/2) enable errcheck (#5064 ) ## Description partially cleanup in preparation for errcheck i ignored a bunch of defer errors in tests but with the update to go 1.14 we can use `t.Cleanup(func() { if err := <>; err != nil {..}}` to cover those errors, I will do this in pr number two of enabling errcheck. ref #5059	4 years ago
Erik Grinaker	511ab6717c	add state sync reactor (#4705 ) Fixes #828. Adds state sync, as outlined in [ADR-053](https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-053-state-sync-prototype.md). See related PRs in Cosmos SDK (https://github.com/cosmos/cosmos-sdk/pull/5803) and Gaia (https://github.com/cosmos/gaia/pull/327). This is split out of the previous PR #4645, and branched off of the ABCI interface in #4704. * Adds a new P2P reactor which exchanges snapshots with peers, and bootstraps an empty local node from remote snapshots when requested. * Adds a new configuration section `[statesync]` that enables state sync and configures the light client. Also enables `statesync:info` logging by default. * Integrates state sync into node startup. Does not support the v2 blockchain reactor, since it needs some reorganization to defer startup.	5 years ago
Marko	7b52f51700	libs/common: Refactor libs/common 5 (#4240 ) * libs/common: Refactor libs/common 5 - move mathematical functions and types out of `libs/common` to math pkg - move net functions out of `libs/common` to net pkg - move string functions out of `libs/common` to strings pkg - move async functions out of `libs/common` to async pkg - move bit functions out of `libs/common` to bits pkg - move cmap functions out of `libs/common` to cmap pkg - move os functions out of `libs/common` to os pkg Signed-off-by: Marko Baricevic <marbar3778@yahoo.com> * fix testing issues * fix tests closes #41417 woooooooooooooooooo kill the cmn pkg Signed-off-by: Marko Baricevic <marbar3778@yahoo.com> * add changelog entry * fix goimport issues * run gofmt	5 years ago
Marko	27b00cf8d1	libs/common: refactor libs common 3 (#4232 ) * libs/common: refactor libs common 3 - move nil.go into types folder and make private - move service & baseservice out of common into service pkg ref #4147 Signed-off-by: Marko Baricevic <marbar3778@yahoo.com> * add changelog entry	5 years ago
Sean Braithwaite	c9ef824ddf	p2p: Per channel metrics (#3666 ) (#3677 ) * Add `chID` label to sent/receive byte mtrics * add changelog pending entry	6 years ago
Ethan Buchman	882622ec10	Fixes tendermint/tendermint#3522 * OriginalAddr -> SocketAddr OriginalAddr records the originally dialed address for outbound peers, rather than the peer's self reported address. For inbound peers, it was nil. Here, we rename it to SocketAddr and for inbound peers, set it to the RemoteAddr of the connection. * use SocketAddr Numerous places in the code call peer.NodeInfo().NetAddress(). However, this call to NetAddress() may perform a DNS lookup if the reported NodeInfo.ListenAddr includes a name. Failure of this lookup returns a nil address, which can lead to panics in the code. Instead, call peer.SocketAddr() to return the static address of the connection. * remove nodeInfo.NetAddress() Expose `transport.NetAddress()`, a static result determined when the transport is created. Removing NetAddress() from the nodeInfo prevents accidental DNS lookups. * fixes from review * linter * fixes from review	6 years ago
Anton Kaliaev	2449bf7300	p2p: file descriptor leaks (#3150 ) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```	6 years ago
Ethan Buchman	6168b404a7	p2p: NewMultiplexTransport takes an MConnConfig (#2869 ) * p2p: NewMultiplexTransport takes an MConnConfig * changelog * move test func to test file	6 years ago
Ethan Buchman	0d5e0d2f13	p2p/conn: FlushStop. Use in pex. Closes #2092 (#2802 ) * p2p/conn: FlushStop. Use in pex. Closes #2092 In seed mode, we call StopPeer immediately after Send. Since flushing msgs to the peer happens in the background, the peer connection is often closed before the messages are actually sent out. The new FlushStop method allows all msgs to first be written and flushed out on the conn before it is closed. * fix dummy peer * typo * fixes from review * more comments * ensure pex doesn't call FlushStop more than once FlushStop is not safe to call more than once, but we call it from Receive in a go-routine so Receive doesn't block. To ensure we only call it once, we use the lastReceivedRequests map - if an entry already exists, then FlushStop should already have been called and we can return.	6 years ago
Ethan Buchman	6e9aee5460	p2p: peer-id -> peer_id (#2771 ) * p2p: peer-id -> peer_id * update changelog	6 years ago
Ethan Buchman	746d137f86	p2p: Restore OriginalAddr (#2668 ) * p2p: bring back OriginalAddr * p2p: set OriginalAddr * update changelog	6 years ago
Ethan Buchman	0baa7588c2	p2p: NodeInfo is an interface; General cleanup (#2556 ) * p2p: NodeInfo is an interface * (squash) fixes from review * (squash) more fixes from review * p2p: remove peerConn.HandshakeTimeout * p2p: NodeInfo is two interfaces. Remove String() * fixes from review * remove test code from peer.RemoteIP() * p2p: remove peer.OriginalAddr(). See #2618 * use a mockPeer in peer_set_test.go * p2p: fix testNodeInfo naming * p2p: remove unused var * remove testRandNodeInfo * fix linter * fix retry dialing self * fix rpc	6 years ago
Zarko Milosevic	12675ecd92	consensus: Wait timeout precommit before starting new round (#2493 ) * Disable transitioning to new round upon 2/3+ of Precommit nils Pull in ensureVote test function from https://github.com/tendermint/tendermint/pull/2132 * Add several ensureX test methods to wrap channel read with timeout * Revert panic in tests	6 years ago
Matthew Slipper	587116dae1	metrics: Add additional metrics to p2p and consensus (#2425 ) * Add additional metrics to p2p and consensus Partially addresses https://github.com/cosmos/cosmos-sdk/issues/2169. * WIP * Updates from code review * Updates from code review * Add instrumentation namespace to configuration * Fix test failure * Updates from code review * Add quotes * Add atomic load * Use storeint64 * Use addInt64 in writePacketMsgTo	6 years ago
Alexander Simmerl	bdd01310a0	p2p: Integrate new Transport We are swapping the exisiting listener implementation with the newly introduced Transport and its default implementation MultiplexTransport, removing a large chunk of old connection setup and handling scattered over the Peer and Switch code. The Switch requires a Transport now and handles externally passed Peer filters.	6 years ago
Dev Ojha	2756be5a59	libs: Remove usage of custom Fmt, in favor of fmt.Sprintf (#2199 ) * libs: Remove usage of custom Fmt, in favor of fmt.Sprintf Closes #2193 * Fix bug that was masked by custom Fmt!	6 years ago

4 Commits (a341a626e0e3431b657535ddc4f43456b25a040f)