This revises the new P2P `Transport` interface and does some preliminary code cleanups and simplifications.
The major change here is to add `Connection.Handshake()` for performing node handshakes (once the stream transport API is implemented, this can be done entirely independent of the transport). This moves most of the handshaking logic into the `Router`, such as prevention of head-of-line blocking, validation of peer's `NodeInfo`, controlling timeouts, and so on. This significantly simplifies transports, completely removes the need for internal goroutines, and shares common logic across all transports. This also allows varying the handshake `NodeInfo` across peers, e.g. to vary `ListenAddr`. Similarly, connection filtering is also moved into the switch/router so that it can be shared between transports.
While debugging the mempool issue (#5796), I've noticed we're spending
quite a bit of time encoding blobs of data, which never get printed! The
reason is filtering occurs on the level below, so Go runtime rightfully
evaluates function arguments.
I think it's okay to not format raw bytes.
This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog.
The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack.
The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely.
There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
closes: #5770closes: #5769
also, include node ID in the output (#5769) and modify NodeKey to use
value semantics (it makes perfect sense for NodeKey to not be a
pointer).
Closes#4603
Commands used (VIM):
```
:args `rg -l errors.Wrap`
:argdo normal @q | update
```
where q is a macros rewriting the `errors.Wrap` to `fmt.Errorf`.
* lint: golint issue fixes
- on my local machine golint is a lot stricter than the bot so slowly going through and fixing things.
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* more fixes from golint
* remove isPeerPersistentFn
* add changelog entry
* libs/common: Refactor libs/common 5
- move mathematical functions and types out of `libs/common` to math pkg
- move net functions out of `libs/common` to net pkg
- move string functions out of `libs/common` to strings pkg
- move async functions out of `libs/common` to async pkg
- move bit functions out of `libs/common` to bits pkg
- move cmap functions out of `libs/common` to cmap pkg
- move os functions out of `libs/common` to os pkg
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* fix testing issues
* fix tests
closes#41417
woooooooooooooooooo kill the cmn pkg
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* add changelog entry
* fix goimport issues
* run gofmt
* libs/common: refactor libs common 3
- move nil.go into types folder and make private
- move service & baseservice out of common into service pkg
ref #4147
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* add changelog entry
* libs/common: refactor libs/common 2
- move random function to there own pkg
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* change imports and usage throughout repo
* fix goimports
* add changelog entry
implementation spec of Improved Trusted Peering ADR-050 by B-Harvest
- add unconditional_peer_ids and persistent_peers_max_dial_period to config
- add unconditionalPeerIDs map to Switch struct
default config value of persistent_peers_max_dial_period is 0s(disabled)
Refs #4072, #4053
* (1/2) of replace errors.go with github.com/pkg/errors
ref #3862
- step one in removing instances of errors.go in favor of github.com/pkg/errors
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* gofmt
* add in /store
* node: allow replacing existing p2p.Reactor(s)
using [`CustomReactors`
option](https://godoc.org/github.com/tendermint/tendermint/node#CustomReactors).
Warning: beware of accidental name clashes. Here is the list of existing
reactors: MEMPOOL, BLOCKCHAIN, CONSENSUS, EVIDENCE, PEX.
* check the absence of "CUSTOM" prefix
* merge 2 tests
* add doc.go to node package
* fix peer state init to late
Peer does not have a state yet. We set it in AddPeer.
We need an new interface before mconnection is started
* pex message to soon
fix reconnection pex send too fast,
error is caused lastReceivedRequests is still
not deleted when a peer reconnected
* add test case for initpeer
* add prove case
* remove potentially infinite loop
* Update consensus/reactor.go
Co-Authored-By: guagualvcha <baifudong@lancai.cn>
* Update consensus/reactor_test.go
Co-Authored-By: guagualvcha <baifudong@lancai.cn>
* document Reactor interface better
* refactor TestReactorReceiveDoesNotPanicIfAddPeerHasntBeenCalledYet
* fix merge conflicts
* blockchain: remove peer's ID from the pool in InitPeer
Refs #3338
* pex: resetPeersRequestsInfo both upon InitPeer and RemovePeer
* ensure RemovePeer is always called before InitPeer
by removing the peer from the switch last (after we've stopped it and
removed from all reactors)
* add some comments for ConsensusReactor#InitPeer
* fix pex reactor
* format code
* fix spelling
* update changelog
* remove unused methods
* do not clear lastReceivedRequests upon error
only in RemovePeer
* call InitPeer before we start the peer!
* add a comment to InitPeer
* write a test
* use waitUntilSwitchHasAtLeastNPeers func
* bring back timeouts
* Test to ensure Receive panics if InitPeer has not been called
## Description
also
handle errors from DialPeersAsync
remove nil addr from log msg
fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode
This is a follow-up from
#3593 (review)
Fixes most of the #3617, except #3593 (comment)
## Commits
* rpc: /dial_peers: only mark peers as persistent if flag is on
also
- handle errors from DialPeersAsync
- remove nil addr from log msg
- fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode
This is a follow-up from
https://github.com/tendermint/tendermint/pull/3593#pullrequestreview-233556909
* remove a call to AddPersistentPeers
TestDialFail will trigger a reconnect
## Description
Previously only outbound peers can be persistent.
Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect. This part is actually optional and can be reverted.
Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes#3362
## Commits
* make persistent prop independent of conn direction
Previously only outbound peers can be persistent. Now, even if the peer
is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect.
Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes#3362
* fix TestPEXReactorDialPeer test
* add a changelog entry
* update changelog
* add two tests
* reformat code
* test UnsafeDialPeers and UnsafeDialSeeds
* add TestSwitchDialPeersAsync
* spec: update p2p/config spec
* fixes after Ismail's review
* Apply suggestions from code review
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* fix merge conflict
* remove sleep from TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode
We don't need it actually.
* Remove deprecated PanicXXX functions from codebase
As per discussion over
[here](https://github.com/tendermint/tendermint/pull/3456#discussion_r278423492),
we need to remove these `PanicXXX` functions and eliminate our
dependence on them. In this PR, each and every `PanicXXX` function call
is replaced with a simple `panic` call.
* add a changelog entry
A prior change to address accidental DNS lookups introduced the
SocketAddr on peer, which was then used to add it to the addressbook.
Which in turn swallowed the self reported port of the peer, which is
important on a reconnect. This change revives the NetAddress on NodeInfo
which the Peer carries, but now returns an error to avoid nil
dereferencing another issue observed in the past. Additionally we could
potentially address #3532, yet the original problem statemenf of that
issue stands.
As a drive-by optimisation `MarkAsGood` now takes only a `p2p.ID` which
makes it interface a bit stricter and leaner.
* add actionable advice for ErrAddrBookNonRoutable err
Should replace https://github.com/tendermint/tendermint/pull/3463
* reorder checks in addrbook#addAddress so
ErrAddrBookPrivate is returned first
and do not log error in DialPeersAsync if the address is private
because it's not an error
ListOfKnownAddresses is removed
panic if addrbook size is less than zero
CrawlPeers does not attempt to connect to existing or peers we're currently dialing
various perf. fixes
improved tests (though not complete)
move IsDialingOrExistingAddress check into DialPeerWithAddress (Fixes#2716)
* addrbook: preallocate memory when saving addrbook to file
* addrbook: remove oldestFirst struct and check for ID
* oldestFirst replaced with sort.Slice
* ID is now mandatory, so no need to check
* addrbook: remove ListOfKnownAddresses
GetSelection is used instead in seed mode.
* addrbook: panic if size is less than 0
* rewrite addrbook#saveToFile to not use a counter
* test AttemptDisconnects func
* move IsDialingOrExistingAddress check into DialPeerWithAddress
* save and cleanup crawl peer data
* get rid of DefaultSeedDisconnectWaitPeriod
* make linter happy
* fix TestPEXReactorSeedMode
* fix comment
* add a changelog entry
* Apply suggestions from code review
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* rename ErrDialingOrExistingAddress to ErrCurrentlyDialingOrExistingAddress
* lowercase errors
* do not persist seed data
pros:
- no extra files
- less IO
cons:
- if the node crashes, seed might crawl a peer too soon
* fixes after Ethan's review
* add a changelog entry
* we should only consult Switch about peers
checking addrbook size does not make sense since only PEX reactor uses
it for dialing peers!
https://github.com/tendermint/tendermint/pull/3011#discussion_r270948875
* OriginalAddr -> SocketAddr
OriginalAddr records the originally dialed address for outbound peers,
rather than the peer's self reported address. For inbound peers, it was
nil. Here, we rename it to SocketAddr and for inbound peers, set it to
the RemoteAddr of the connection.
* use SocketAddr
Numerous places in the code call peer.NodeInfo().NetAddress().
However, this call to NetAddress() may perform a DNS lookup if the
reported NodeInfo.ListenAddr includes a name. Failure of this lookup
returns a nil address, which can lead to panics in the code.
Instead, call peer.SocketAddr() to return the static address of the
connection.
* remove nodeInfo.NetAddress()
Expose `transport.NetAddress()`, a static result determined
when the transport is created. Removing NetAddress() from the nodeInfo
prevents accidental DNS lookups.
* fixes from review
* linter
* fixes from review
* p2p: refactor Switch#Broadcast func
- call wg.Add only once
- do not call peers.List twice!
* bad for perfomance
* peers list can change in between calls!
Refs #3306
* p2p: use time.Ticker instead of RepeatTimer
no need in RepeatTimer since we don't Reset them
Refs #3306
* libs/common: remove RepeatTimer (also TimerMaker and Ticker interface)
"ancient code that’s caused no end of trouble" Ethan
I believe there's much simplier way to write a ticker than can be reset
https://medium.com/@arpith/resetting-a-ticker-in-go-63858a2c17ec
* fix dirty data in peerset
startInitPeer before PeerSet add the peer,
once mconnection start and Receive of one
Reactor faild, will try to remove it from PeerSet
while PeerSet still not contain the peer. Fix
this by change the order.
* fix test FilterDuplicate
* fix start/stop race
* fix err
* close peer's connection to avoid fd leak
Fixes#2967
* rename peer#Addr to RemoteAddr
* fix test
* fixes after Ethan's review
* bring back the check
* changelog entry
* write a test for switch#acceptRoutine
* increase timeouts? :(
* remove extra assertNPeersWithTimeout
* simplify test
* assert number of peers (just to be safe)
* Cleanup in OnStop
* run tests with verbose flag on CircleCI
* spawn a reading routine to prevent connection from closing
* get port from the listener
random port is faster, but often results in
```
panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered]
panic: listen tcp 127.0.0.1:44068: bind: address already in use
goroutine 79 [running]:
testing.tRunner.func1(0xc0001bd600)
/usr/local/go/src/testing/testing.go:792 +0x387
panic(0x974d20, 0xc0001b0500)
/usr/local/go/src/runtime/panic.go:513 +0x1b9
github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...)
/home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28
github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed)
/home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9
github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0)
/home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c
github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600)
/home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58
testing.tRunner(0xc0001bd600, 0xb42038)
/usr/local/go/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
/usr/local/go/src/testing/testing.go:878 +0x353
exit status 2
FAIL github.com/tendermint/tendermint/p2p 0.350s
```
* p2p: panic on transport error
Addresses #2823. Currently, the acceptRoutine exits if the transport returns
an error trying to accept a new connection. Once this happens, the node
can't accept any new connections. So here, we panic instead. While we
could potentially be more intelligent by rerunning the acceptRoutine, the
error may indicate something more fundamental (eg. file desriptor limit)
that requires a restart anyways. We can leave it to process managers to
handle that restart, and notify operators about the panic.
* changelog
* p2p: re-check after sleeps
* use NodeInfo as an interface
* Revert "use NodeInfo as an interface"
This reverts commit 5f7d055e6c.
* Revert "p2p: re-check after sleeps"
This reverts commit 7f41070da0.
* preserve dial to itself
* ignore ensured connections while re-connecting
* re-check after sleep
* keep protocol definition on net addresses
* decrease log level
* Revert "preserve dial to itself"
This reverts commit 0c6e0fc58d.
* correct func comment according to modification
Co-Authored-By: mgurevin <mehmet@gurevin.net>
* Require addressbook to only store addresses with valid ID
* Do not shut down peer immediately after sending pex addrs in SeedMode
* p2p: fix#2773
* seed mode: use go-routine to sleep before stopping peer
* p2p: NodeInfo is an interface
* (squash) fixes from review
* (squash) more fixes from review
* p2p: remove peerConn.HandshakeTimeout
* p2p: NodeInfo is two interfaces. Remove String()
* fixes from review
* remove test code from peer.RemoteIP()
* p2p: remove peer.OriginalAddr(). See #2618
* use a mockPeer in peer_set_test.go
* p2p: fix testNodeInfo naming
* p2p: remove unused var
* remove testRandNodeInfo
* fix linter
* fix retry dialing self
* fix rpc
We are swapping the exisiting listener implementation with the newly
introduced Transport and its default implementation MultiplexTransport,
removing a large chunk of old connection setup and handling scattered
over the Peer and Switch code. The Switch requires a Transport now and
handles externally passed Peer filters.
* ignore existing peers in DialPeersAsync
Fixes#2253
* rename HasPeerWithAddress to IsDialingOrExistingAddress
[breaking] remove Switch#IsDialing
* check if addrBook is nil
to be consistent with other usages of addrBook across Switch
* different log messages for 2 use-cases
* [p2p/pex] connect to more than 10 peers
also, remove DefaultMinNumOutboundPeers because a) I am not sure it's
needed b) it's super confusing
look closely
```
maxPeers := sw.config.MaxNumPeers - DefaultMinNumOutboundPeers
if maxPeers <= sw.peers.Size() {
sw.Logger.Info("Ignoring inbound connection: already have enough peers", "address", inConn.RemoteAddr().String(), "numPeers", sw.peers.Size(), "max", maxPeers)
```
we print maxPeers = config.MaxPeers - DefaultMinNumOutboundPeers. So we
may not have enough peers even though we say we have enough.
Refs #2130
* update spec
* replace MaxNumPeers with MaxNumInboundPeers/MaxNumOutboundPeers
Refs #2130
* update changelog
* make max rpc conns formula visible to users
* update spec
* docs: note max outbound peers excludes persistent