While working on tendermint my colleague @jinmannwong fixed a few of the unit tests that we found to be flaky in our CI. We thought that you might find this useful, see below for comments.
Attempt to fix this the below test failure by waiting for the listener to get ready. I am not at all convinced that this is the correct fix - the below indicates that the TCP socket was closed after it was set up - but I'm unable to come up with an actionable hypothesis for what caused it.
```
2020/05/14 17:25:11 Failed to accept conn: accept tcp 127.0.0.1:44737: use of closed network connection
2020/05/14 17:25:11 Failed to accept conn: accept tcp 127.0.0.1:42589: use of closed network connection
2020/05/14 17:25:11 Failed to accept conn: accept tcp 127.0.0.1:40905: use of closed network connection
2020/05/14 17:25:12 Failed to accept conn: accept tcp 127.0.0.1:39847: use of closed network connection
2020/05/14 17:25:12 Failed to accept conn: accept tcp 127.0.0.1:39989: use of closed network connection
2020/05/14 17:25:12 Failed to accept conn: accept tcp 127.0.0.1:43587: use of closed network connection
2020/05/14 17:25:12 Failed to accept conn: accept tcp 127.0.0.1:35415: use of closed network connection
2020/05/14 17:25:12 Failed to accept conn: accept tcp 127.0.0.1:38657: use of closed network connection
2020/05/14 17:25:12 Failed to accept conn: accept tcp 127.0.0.1:38217: use of closed network connection
2020/05/14 17:25:13 Failed to accept conn: accept tcp 127.0.0.1:42247: use of closed network connection
2020/05/14 17:25:16 Failed to accept conn: accept tcp 127.0.0.1:39705: use of closed network connection
2020/05/14 17:25:16 Failed to accept conn: accept tcp 127.0.0.1:39491: use of closed network connection
2020/05/14 17:25:16 Failed to accept conn: accept tcp 127.0.0.1:37107: use of closed network connection
2020/05/14 17:25:16 Failed to accept conn: accept tcp 127.0.0.1:39909: use of closed network connection
2020/05/14 17:25:16 Failed to accept conn: accept tcp 127.0.0.1:37987: use of closed network connection
2020/05/14 17:25:16 Failed to accept conn: accept tcp 127.0.0.1:41505: use of closed network connection
2020/05/14 17:25:16 Failed to accept conn: accept tcp 127.0.0.1:39121: use of closed network connection
2020/05/14 17:25:16 Failed to accept conn: accept tcp 127.0.0.1:46569: use of closed network connection
2020/05/14 17:25:16 Failed to accept conn: accept tcp 127.0.0.1:45643: use of closed network connection
2020/05/14 17:25:16 Failed to accept conn: accept tcp 127.0.0.1:35289: use of closed network connection
--- FAIL: TestTransportMultiplexAcceptMultiple (0.43s)
transport_test.go:200: auth failure: handshake failed: EOF
FAIL
```
* p2p: log error in transport tests
* p2p: exit from fast peer only when handshake is done
TestTransportMultiplexAcceptNonBlocking
fixes panic: write to a closed channel
* p2p: increase timeout in TestTransportMultiplexConnFilterTimeout
Fixes https://github.com/tendermint/tendermint/issues/4854#issuecomment-630739200
* p2p: yield control to another goroutine manually
* increase timeout in TestTransportMultiplexAcceptNonBlocking
Fixes#3521
The function NewNetAddressStringWithOptionalID is from a time when peer
IDs were optional. They're not anymore. So this should be renamed to
NewNetAddressString and should ensure the ID is provided.
* update changelog
* use NewNetAddress in transport tests
* use NewNetAddress in TestTransportMultiplexAcceptMultiple
* OriginalAddr -> SocketAddr
OriginalAddr records the originally dialed address for outbound peers,
rather than the peer's self reported address. For inbound peers, it was
nil. Here, we rename it to SocketAddr and for inbound peers, set it to
the RemoteAddr of the connection.
* use SocketAddr
Numerous places in the code call peer.NodeInfo().NetAddress().
However, this call to NetAddress() may perform a DNS lookup if the
reported NodeInfo.ListenAddr includes a name. Failure of this lookup
returns a nil address, which can lead to panics in the code.
Instead, call peer.SocketAddr() to return the static address of the
connection.
* remove nodeInfo.NetAddress()
Expose `transport.NetAddress()`, a static result determined
when the transport is created. Removing NetAddress() from the nodeInfo
prevents accidental DNS lookups.
* fixes from review
* linter
* fixes from review
ref: [#3010 (comment)](https://github.com/tendermint/tendermint/issues/3010#issuecomment-464287627)
> I tried searching for code where we authenticate a peer against its NetAddress.ID and couldn't find it. I don't see a reason to switch to Noise, but a need to ensure that the node's ID is authenticated e.g. after dialing from the address book.
* p2p: check secret conn id matches dialed id
* Fix all p2p tests & make code compile
* add simple test for dialing with wrong ID
* update changelog
* address review comments
* yet another place where to use IDAddressString and fix
testSetupMultiplexTransport
* p2p: NodeInfo is an interface
* (squash) fixes from review
* (squash) more fixes from review
* p2p: remove peerConn.HandshakeTimeout
* p2p: NodeInfo is two interfaces. Remove String()
* fixes from review
* remove test code from peer.RemoteIP()
* p2p: remove peer.OriginalAddr(). See #2618
* use a mockPeer in peer_set_test.go
* p2p: fix testNodeInfo naming
* p2p: remove unused var
* remove testRandNodeInfo
* fix linter
* fix retry dialing self
* fix rpc
This is the implementation for the design described in ADR 12[0]. It's
the first step of a larger refactor of the p2p package as tracked in
interface bundling all concerns of low-level connection handling and
isolating the rest of peer lifecycle management from the specifics of
the low-level internet protocols. Even if the swappable implementation
will never be utilised, already the isolation of conn related code in
one place will help with the reasoning about execution path and
addressation of security sensitive issues surfaced through bounty
programs and audits.
We deliberately decided to not have Peer filtering and other management
in the Transport, its sole responsibility is the translation of
connections to Peers, handing those to the caller fully setup. It's the
responsibility of the caller to reject those and or keep track. Peer
filtering will take place in the Switch and can be inspected in a the
following commit.
This changeset additionally is an exercise in clean separation of logic
and other infrastructural concerns like logging and instrumentation. By
leveraging a clean and minimal interface. How this looks can be seen in
a follow-up change.
Design #2069[2]
Refs #2067[3]
Fixes #2047[4]
Fixes #2046[5]
changes:
* describe Transport interface
* implement new default Transport: MultiplexTransport
* test MultiplexTransport with new constraints
* implement ConnSet for concurrent management of net.Conn, synchronous
to PeerSet
* implement and expose duplicate IP filter
* implemnt TransportOption for optional parametirisation
[0] https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-012-peer-transport.md
[1] https://github.com/tendermint/tendermint/issues/2067
[2] https://github.com/tendermint/tendermint/pull/2069
[3] https://github.com/tendermint/tendermint/issues/2067
[4] https://github.com/tendermint/tendermint/issues/2047
[5] https://github.com/tendermint/tendermint/issues/2046