You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1527 lines
46 KiB

p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: file descriptor leaks (#3150) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```
6 years ago
p2p: file descriptor leaks (#3150) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```
6 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: file descriptor leaks (#3150) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```
6 years ago
p2p: file descriptor leaks (#3150) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```
6 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
  1. package p2p
  2. import (
  3. "context"
  4. "errors"
  5. "fmt"
  6. "io"
  7. "math"
  8. "math/rand"
  9. "net"
  10. "runtime/debug"
  11. "sort"
  12. "sync"
  13. "time"
  14. "github.com/gogo/protobuf/proto"
  15. "github.com/google/orderedcode"
  16. dbm "github.com/tendermint/tm-db"
  17. "github.com/tendermint/tendermint/libs/cmap"
  18. "github.com/tendermint/tendermint/libs/log"
  19. "github.com/tendermint/tendermint/libs/service"
  20. tmconn "github.com/tendermint/tendermint/p2p/conn"
  21. p2pproto "github.com/tendermint/tendermint/proto/tendermint/p2p"
  22. )
  23. // PeerStatus specifies peer statuses.
  24. type PeerStatus string
  25. const (
  26. PeerStatusNew = PeerStatus("new") // New peer which we haven't tried to contact yet.
  27. PeerStatusUp = PeerStatus("up") // Peer which we have an active connection to.
  28. PeerStatusDown = PeerStatus("down") // Peer which we're temporarily disconnected from.
  29. PeerStatusRemoved = PeerStatus("removed") // Peer which has been removed.
  30. PeerStatusBanned = PeerStatus("banned") // Peer which is banned for misbehavior.
  31. )
  32. // PeerError is a peer error reported by a reactor via the Error channel. The
  33. // severity may cause the peer to be disconnected or banned depending on policy.
  34. type PeerError struct {
  35. PeerID NodeID
  36. Err error
  37. Severity PeerErrorSeverity
  38. }
  39. // PeerErrorSeverity determines the severity of a peer error.
  40. type PeerErrorSeverity string
  41. const (
  42. PeerErrorSeverityLow PeerErrorSeverity = "low" // Mostly ignored.
  43. PeerErrorSeverityHigh PeerErrorSeverity = "high" // May disconnect.
  44. PeerErrorSeverityCritical PeerErrorSeverity = "critical" // Ban.
  45. )
  46. // PeerUpdatesCh defines a wrapper around a PeerUpdate go channel that allows
  47. // a reactor to listen for peer updates and safely close it when stopping.
  48. type PeerUpdatesCh struct {
  49. closeOnce sync.Once
  50. // updatesCh defines the go channel in which the router sends peer updates to
  51. // reactors. Each reactor will have its own PeerUpdatesCh to listen for updates
  52. // from.
  53. updatesCh chan PeerUpdate
  54. // doneCh is used to signal that a PeerUpdatesCh is closed. It is the
  55. // reactor's responsibility to invoke Close.
  56. doneCh chan struct{}
  57. }
  58. // NewPeerUpdates returns a reference to a new PeerUpdatesCh.
  59. func NewPeerUpdates(updatesCh chan PeerUpdate) *PeerUpdatesCh {
  60. return &PeerUpdatesCh{
  61. updatesCh: updatesCh,
  62. doneCh: make(chan struct{}),
  63. }
  64. }
  65. // Updates returns a read-only go channel where a consuming reactor can listen
  66. // for peer updates sent from the router.
  67. func (puc *PeerUpdatesCh) Updates() <-chan PeerUpdate {
  68. return puc.updatesCh
  69. }
  70. // Close closes the PeerUpdatesCh channel. It should only be closed by the respective
  71. // reactor when stopping and ensure nothing is listening for updates.
  72. //
  73. // NOTE: After a PeerUpdatesCh is closed, the router may safely assume it can no
  74. // longer send on the internal updatesCh, however it should NEVER explicitly close
  75. // it as that could result in panics by sending on a closed channel.
  76. func (puc *PeerUpdatesCh) Close() {
  77. puc.closeOnce.Do(func() {
  78. close(puc.doneCh)
  79. })
  80. }
  81. // Done returns a read-only version of the PeerUpdatesCh's internal doneCh go
  82. // channel that should be used by a router to signal when it is safe to explicitly
  83. // not send any peer updates.
  84. func (puc *PeerUpdatesCh) Done() <-chan struct{} {
  85. return puc.doneCh
  86. }
  87. // PeerUpdate is a peer status update for reactors.
  88. type PeerUpdate struct {
  89. PeerID NodeID
  90. Status PeerStatus
  91. }
  92. // PeerScore is a numeric score assigned to a peer (higher is better).
  93. type PeerScore uint16
  94. const (
  95. // PeerScorePersistent is added for persistent peers.
  96. PeerScorePersistent PeerScore = 100
  97. )
  98. // PeerManager manages peer lifecycle information, using a peerStore for
  99. // underlying storage. Its primary purpose is to determine which peers to
  100. // connect to next, make sure a peer only has a single active connection (either
  101. // inbound or outbound), and evict peers to make room for higher-scored peers.
  102. // It does not manage actual connections (this is handled by the Router),
  103. // only the peer lifecycle state.
  104. //
  105. // We track dialing and connected states independently. This allows us to accept
  106. // an inbound connection from a peer while the router is also dialing an
  107. // outbound connection to that same peer, which will cause the dialer to
  108. // eventually error when attempting to mark the peer as connected. This also
  109. // avoids race conditions where multiple goroutines may end up dialing a peer if
  110. // an incoming connection was briefly accepted and disconnected while we were
  111. // also dialing.
  112. //
  113. // For an outbound connection, the flow is as follows:
  114. // - DialNext: returns a peer address to dial, marking the peer as dialing.
  115. // - DialFailed: reports a dial failure, unmarking the peer as dialing.
  116. // - Dialed: successfully dialed, unmarking as dialing and marking as connected
  117. // (or erroring if already connected).
  118. // - Ready: routing is up, broadcasts a PeerStatusUp peer update to subscribers.
  119. // - Disconnected: peer disconnects, unmarking as connected and broadcasts a
  120. // PeerStatusDown peer update.
  121. //
  122. // For an inbound connection, the flow is as follows:
  123. // - Accepted: successfully accepted connection, marking as connected (or erroring
  124. // if already connected).
  125. // - Ready: routing is up, broadcasts a PeerStatusUp peer update to subscribers.
  126. // - Disconnected: peer disconnects, unmarking as connected and broadcasts a
  127. // PeerStatusDown peer update.
  128. //
  129. // When evicting peers, either because peers are explicitly scheduled for
  130. // eviction or we are connected to too many peers, the flow is as follows:
  131. // - EvictNext: if marked evict and connected, unmark evict and mark evicting.
  132. // If beyond MaxConnected, pick lowest-scored peer and mark evicting.
  133. // - Disconnected: unmark connected, evicting, evict, and broadcast a
  134. // PeerStatusDown peer update.
  135. //
  136. // If all connection slots are full (at MaxConnections), we can use up to
  137. // MaxConnectionsUpgrade additional connections to probe any higher-scored
  138. // unconnected peers, and if we reach them (or they reach us) we allow the
  139. // connection and evict a lower-scored peer. We mark the lower-scored peer as
  140. // upgrading[from]=to to make sure no other higher-scored peers can claim the
  141. // same one for an upgrade. The flow is as follows:
  142. // - Accepted: if upgrade is possible, mark connected and add lower-scored to evict.
  143. // - DialNext: if upgrade is possible, mark upgrading[from]=to and dialing.
  144. // - DialFailed: unmark upgrading[from]=to and dialing.
  145. // - Dialed: unmark upgrading[from]=to and dialing, mark as connected, add
  146. // lower-scored to evict.
  147. // - EvictNext: pick peer from evict, mark as evicting.
  148. // - Disconnected: unmark connected, upgrading[from]=to, evict, evicting.
  149. //
  150. // FIXME: The old stack supports ABCI-based peer ID filtering via
  151. // /p2p/filter/id/<ID> queries, we should implement this here as well by taking
  152. // a peer ID filtering callback in PeerManagerOptions and configuring it during
  153. // Node setup.
  154. type PeerManager struct {
  155. options PeerManagerOptions
  156. wakeDialCh chan struct{} // wakes up DialNext() on relevant peer changes
  157. wakeEvictCh chan struct{} // wakes up EvictNext() on relevant peer changes
  158. closeCh chan struct{} // signal channel for Close()
  159. closeOnce sync.Once
  160. mtx sync.Mutex
  161. store *peerStore
  162. dialing map[NodeID]bool // peers being dialed (DialNext -> Dialed/DialFail)
  163. upgrading map[NodeID]NodeID // peers claimed for upgrade (DialNext -> Dialed/DialFail)
  164. connected map[NodeID]bool // connected peers (Dialed/Accepted -> Disconnected)
  165. evict map[NodeID]bool // peers scheduled for eviction (Connected -> EvictNext)
  166. evicting map[NodeID]bool // peers being evicted (EvictNext -> Disconnected)
  167. subscriptions map[*PeerUpdatesCh]*PeerUpdatesCh // keyed by struct identity (address)
  168. }
  169. // PeerManagerOptions specifies options for a PeerManager.
  170. type PeerManagerOptions struct {
  171. // PersistentPeers are peers that we want to maintain persistent connections
  172. // to. These will be scored higher than other peers, and if
  173. // MaxConnectedUpgrade is non-zero any lower-scored peers will be evicted if
  174. // necessary to make room for these.
  175. PersistentPeers []NodeID
  176. // MaxPeers is the maximum number of peers to track information about, i.e.
  177. // store in the peer store. When exceeded, the lowest-scored unconnected peers
  178. // will be deleted. 0 means no limit.
  179. MaxPeers uint16
  180. // MaxConnected is the maximum number of connected peers (inbound and
  181. // outbound). 0 means no limit.
  182. MaxConnected uint16
  183. // MaxConnectedUpgrade is the maximum number of additional connections to
  184. // use for probing any better-scored peers to upgrade to when all connection
  185. // slots are full. 0 disables peer upgrading.
  186. //
  187. // For example, if we are already connected to MaxConnected peers, but we
  188. // know or learn about better-scored peers (e.g. configured persistent
  189. // peers) that we are not connected too, then we can probe these peers by
  190. // using up to MaxConnectedUpgrade connections, and once connected evict the
  191. // lowest-scored connected peers. This also works for inbound connections,
  192. // i.e. if a higher-scored peer attempts to connect to us, we can accept
  193. // the connection and evict a lower-scored peer.
  194. MaxConnectedUpgrade uint16
  195. // MinRetryTime is the minimum time to wait between retries. Retry times
  196. // double for each retry, up to MaxRetryTime. 0 disables retries.
  197. MinRetryTime time.Duration
  198. // MaxRetryTime is the maximum time to wait between retries. 0 means
  199. // no maximum, in which case the retry time will keep doubling.
  200. MaxRetryTime time.Duration
  201. // MaxRetryTimePersistent is the maximum time to wait between retries for
  202. // peers listed in PersistentPeers. 0 uses MaxRetryTime instead.
  203. MaxRetryTimePersistent time.Duration
  204. // RetryTimeJitter is the upper bound of a random interval added to
  205. // retry times, to avoid thundering herds. 0 disables jutter.
  206. RetryTimeJitter time.Duration
  207. }
  208. // NewPeerManager creates a new peer manager.
  209. func NewPeerManager(peerDB dbm.DB, options PeerManagerOptions) (*PeerManager, error) {
  210. store, err := newPeerStore(peerDB)
  211. if err != nil {
  212. return nil, err
  213. }
  214. peerManager := &PeerManager{
  215. options: options,
  216. closeCh: make(chan struct{}),
  217. // We use a buffer of size 1 for these trigger channels, with
  218. // non-blocking sends. This ensures that if e.g. wakeDial() is called
  219. // multiple times before the initial trigger is picked up we only
  220. // process the trigger once.
  221. //
  222. // FIXME: This should maybe be a libs/sync type.
  223. wakeDialCh: make(chan struct{}, 1),
  224. wakeEvictCh: make(chan struct{}, 1),
  225. store: store,
  226. dialing: map[NodeID]bool{},
  227. upgrading: map[NodeID]NodeID{},
  228. connected: map[NodeID]bool{},
  229. evict: map[NodeID]bool{},
  230. evicting: map[NodeID]bool{},
  231. subscriptions: map[*PeerUpdatesCh]*PeerUpdatesCh{},
  232. }
  233. if err = peerManager.configurePeers(); err != nil {
  234. return nil, err
  235. }
  236. if err = peerManager.prunePeers(); err != nil {
  237. return nil, err
  238. }
  239. return peerManager, nil
  240. }
  241. // configurePeers configures peers in the peer store with ephemeral runtime
  242. // configuration, e.g. setting peerInfo.Persistent based on
  243. // PeerManagerOptions.PersistentPeers. The caller must hold the mutex lock.
  244. func (m *PeerManager) configurePeers() error {
  245. for _, peerID := range m.options.PersistentPeers {
  246. if peer, ok := m.store.Get(peerID); ok {
  247. peer.Persistent = true
  248. if err := m.store.Set(peer); err != nil {
  249. return err
  250. }
  251. }
  252. }
  253. return nil
  254. }
  255. // prunePeers removes peers from the peer store if it contains more than
  256. // MaxPeers peers. The lowest-scored non-connected peers are removed.
  257. // The caller must hold the mutex lock.
  258. func (m *PeerManager) prunePeers() error {
  259. if m.options.MaxPeers == 0 || m.store.Size() <= int(m.options.MaxPeers) {
  260. return nil
  261. }
  262. m.mtx.Lock()
  263. defer m.mtx.Unlock()
  264. ranked := m.store.Ranked()
  265. for i := len(ranked) - 1; i >= 0; i-- {
  266. peerID := ranked[i].ID
  267. switch {
  268. case m.store.Size() <= int(m.options.MaxPeers):
  269. break
  270. case m.dialing[peerID]:
  271. case m.connected[peerID]:
  272. case m.evicting[peerID]:
  273. default:
  274. if err := m.store.Delete(peerID); err != nil {
  275. return err
  276. }
  277. }
  278. }
  279. return nil
  280. }
  281. // Close closes the peer manager, releasing resources allocated with it
  282. // (specifically any running goroutines).
  283. func (m *PeerManager) Close() {
  284. m.closeOnce.Do(func() {
  285. close(m.closeCh)
  286. })
  287. }
  288. // Add adds a peer to the manager, given as an address. If the peer already
  289. // exists, the address is added to it.
  290. func (m *PeerManager) Add(address NodeAddress) error {
  291. if err := address.Validate(); err != nil {
  292. return err
  293. }
  294. m.mtx.Lock()
  295. defer m.mtx.Unlock()
  296. peer, ok := m.store.Get(address.NodeID)
  297. if !ok {
  298. peer = m.makePeerInfo(address.NodeID)
  299. }
  300. if _, ok := peer.AddressInfo[address.String()]; !ok {
  301. peer.AddressInfo[address.String()] = &peerAddressInfo{Address: address}
  302. }
  303. if err := m.store.Set(peer); err != nil {
  304. return err
  305. }
  306. if err := m.prunePeers(); err != nil {
  307. return err
  308. }
  309. m.wakeDial()
  310. return nil
  311. }
  312. // Advertise returns a list of peer addresses to advertise to a peer.
  313. //
  314. // FIXME: This is fairly naïve and only returns the addresses of the
  315. // highest-ranked peers.
  316. func (m *PeerManager) Advertise(peerID NodeID, limit uint16) []NodeAddress {
  317. m.mtx.Lock()
  318. defer m.mtx.Unlock()
  319. addresses := make([]NodeAddress, 0, limit)
  320. for _, peer := range m.store.Ranked() {
  321. if peer.ID == peerID {
  322. continue
  323. }
  324. for _, addressInfo := range peer.AddressInfo {
  325. if len(addresses) >= int(limit) {
  326. return addresses
  327. }
  328. addresses = append(addresses, addressInfo.Address)
  329. }
  330. }
  331. return addresses
  332. }
  333. // makePeerInfo creates a peerInfo for a new peer.
  334. func (m *PeerManager) makePeerInfo(id NodeID) peerInfo {
  335. isPersistent := false
  336. for _, p := range m.options.PersistentPeers {
  337. if id == p {
  338. isPersistent = true
  339. break
  340. }
  341. }
  342. return peerInfo{
  343. ID: id,
  344. Persistent: isPersistent,
  345. AddressInfo: map[string]*peerAddressInfo{},
  346. }
  347. }
  348. // Subscribe subscribes to peer updates. The caller must consume the peer
  349. // updates in a timely fashion and close the subscription when done, since
  350. // delivery is guaranteed and will block peer connection/disconnection
  351. // otherwise.
  352. func (m *PeerManager) Subscribe() *PeerUpdatesCh {
  353. // FIXME: We may want to use a size 1 buffer here. When the router
  354. // broadcasts a peer update it has to loop over all of the
  355. // subscriptions, and we want to avoid blocking and waiting for a
  356. // context switch before continuing to the next subscription. This also
  357. // prevents tail latencies from compounding across updates. We also want
  358. // to make sure the subscribers are reasonably in sync, so it should be
  359. // kept at 1. However, this should be benchmarked first.
  360. peerUpdates := NewPeerUpdates(make(chan PeerUpdate))
  361. m.mtx.Lock()
  362. m.subscriptions[peerUpdates] = peerUpdates
  363. m.mtx.Unlock()
  364. go func() {
  365. <-peerUpdates.Done()
  366. m.mtx.Lock()
  367. delete(m.subscriptions, peerUpdates)
  368. m.mtx.Unlock()
  369. }()
  370. return peerUpdates
  371. }
  372. // broadcast broadcasts a peer update to all subscriptions. The caller must
  373. // already hold the mutex lock. This means the mutex is held for the duration
  374. // of the broadcast, which we want to make sure all subscriptions receive all
  375. // updates in the same order.
  376. //
  377. // FIXME: Consider using more fine-grained mutexes here, and/or a channel to
  378. // enforce ordering of updates.
  379. func (m *PeerManager) broadcast(peerUpdate PeerUpdate) {
  380. for _, sub := range m.subscriptions {
  381. select {
  382. case sub.updatesCh <- peerUpdate:
  383. case <-sub.doneCh:
  384. }
  385. }
  386. }
  387. // DialNext finds an appropriate peer address to dial, and marks it as dialing.
  388. // If no peer is found, or all connection slots are full, it blocks until one
  389. // becomes available. The caller must call Dialed() or DialFailed() for the
  390. // returned peer. The context can be used to cancel the call.
  391. func (m *PeerManager) DialNext(ctx context.Context) (NodeID, NodeAddress, error) {
  392. for {
  393. id, address, err := m.TryDialNext()
  394. if err != nil || id != "" {
  395. return id, address, err
  396. }
  397. select {
  398. case <-m.wakeDialCh:
  399. case <-ctx.Done():
  400. return "", NodeAddress{}, ctx.Err()
  401. }
  402. }
  403. }
  404. // TryDialNext is equivalent to DialNext(), but immediately returns an empty
  405. // peer ID if no peers or connection slots are available.
  406. func (m *PeerManager) TryDialNext() (NodeID, NodeAddress, error) {
  407. m.mtx.Lock()
  408. defer m.mtx.Unlock()
  409. // We allow dialing MaxConnected+MaxConnectedUpgrade peers. Including
  410. // MaxConnectedUpgrade allows us to probe additional peers that have a
  411. // higher score than any other peers, and if successful evict it.
  412. if m.options.MaxConnected > 0 &&
  413. len(m.connected)+len(m.dialing) >= int(m.options.MaxConnected)+int(m.options.MaxConnectedUpgrade) {
  414. return "", NodeAddress{}, nil
  415. }
  416. for _, peer := range m.store.Ranked() {
  417. if m.dialing[peer.ID] || m.connected[peer.ID] {
  418. continue
  419. }
  420. for _, addressInfo := range peer.AddressInfo {
  421. if time.Since(addressInfo.LastDialFailure) < m.retryDelay(addressInfo.DialFailures, peer.Persistent) {
  422. continue
  423. }
  424. // We now have an eligible address to dial. If we're full but have
  425. // upgrade capacity (as checked above), we find a lower-scored peer
  426. // we can replace and mark it as upgrading so noone else claims it.
  427. //
  428. // If we don't find one, there is no point in trying additional
  429. // peers, since they will all have the same or lower score than this
  430. // peer (since they're ordered by score via peerStore.Ranked).
  431. if m.options.MaxConnected > 0 && len(m.connected) >= int(m.options.MaxConnected) {
  432. upgradeFromPeer := m.findUpgradeCandidate(peer.ID, peer.Score())
  433. if upgradeFromPeer == "" {
  434. return "", NodeAddress{}, nil
  435. }
  436. m.upgrading[upgradeFromPeer] = peer.ID
  437. }
  438. m.dialing[peer.ID] = true
  439. return peer.ID, addressInfo.Address, nil
  440. }
  441. }
  442. return "", NodeAddress{}, nil
  443. }
  444. // wakeDial is used to notify DialNext about changes that *may* cause new
  445. // peers to become eligible for dialing, such as peer disconnections and
  446. // retry timeouts.
  447. func (m *PeerManager) wakeDial() {
  448. // The channel has a 1-size buffer. A non-blocking send ensures
  449. // we only queue up at most 1 trigger between each DialNext().
  450. select {
  451. case m.wakeDialCh <- struct{}{}:
  452. default:
  453. }
  454. }
  455. // wakeEvict is used to notify EvictNext about changes that *may* cause
  456. // peers to become eligible for eviction, such as peer upgrades.
  457. func (m *PeerManager) wakeEvict() {
  458. // The channel has a 1-size buffer. A non-blocking send ensures
  459. // we only queue up at most 1 trigger between each EvictNext().
  460. select {
  461. case m.wakeEvictCh <- struct{}{}:
  462. default:
  463. }
  464. }
  465. // retryDelay calculates a dial retry delay using exponential backoff, based on
  466. // retry settings in PeerManagerOptions. If MinRetryTime is 0, this returns
  467. // MaxInt64 (i.e. an infinite retry delay, effectively disabling retries).
  468. func (m *PeerManager) retryDelay(failures uint32, persistent bool) time.Duration {
  469. if failures == 0 {
  470. return 0
  471. }
  472. if m.options.MinRetryTime == 0 {
  473. return time.Duration(math.MaxInt64)
  474. }
  475. maxDelay := m.options.MaxRetryTime
  476. if persistent && m.options.MaxRetryTimePersistent > 0 {
  477. maxDelay = m.options.MaxRetryTimePersistent
  478. }
  479. delay := m.options.MinRetryTime * time.Duration(math.Pow(2, float64(failures)))
  480. if maxDelay > 0 && delay > maxDelay {
  481. delay = maxDelay
  482. }
  483. // FIXME: This should use a PeerManager-scoped RNG.
  484. delay += time.Duration(rand.Int63n(int64(m.options.RetryTimeJitter))) // nolint:gosec
  485. return delay
  486. }
  487. // DialFailed reports a failed dial attempt. This will make the peer available
  488. // for dialing again when appropriate.
  489. //
  490. // FIXME: This should probably delete or mark bad addresses/peers after some time.
  491. func (m *PeerManager) DialFailed(peerID NodeID, address NodeAddress) error {
  492. m.mtx.Lock()
  493. defer m.mtx.Unlock()
  494. delete(m.dialing, peerID)
  495. for from, to := range m.upgrading {
  496. if to == peerID {
  497. delete(m.upgrading, from) // Unmark failed upgrade attempt.
  498. }
  499. }
  500. peer, ok := m.store.Get(peerID)
  501. if !ok { // Peer may have been removed while dialing, ignore.
  502. return nil
  503. }
  504. addressInfo, ok := peer.AddressInfo[address.String()]
  505. if !ok {
  506. return nil // Assume the address has been removed, ignore.
  507. }
  508. addressInfo.LastDialFailure = time.Now().UTC()
  509. addressInfo.DialFailures++
  510. if err := m.store.Set(peer); err != nil {
  511. return err
  512. }
  513. // We spawn a goroutine that notifies DialNext() again when the retry
  514. // timeout has elapsed, so that we can consider dialing it again.
  515. go func() {
  516. retryDelay := m.retryDelay(addressInfo.DialFailures, peer.Persistent)
  517. if retryDelay == time.Duration(math.MaxInt64) {
  518. return
  519. }
  520. // Use an explicit timer with deferred cleanup instead of
  521. // time.After(), to avoid leaking goroutines on PeerManager.Close().
  522. timer := time.NewTimer(retryDelay)
  523. defer timer.Stop()
  524. select {
  525. case <-timer.C:
  526. m.wakeDial()
  527. case <-m.closeCh:
  528. }
  529. }()
  530. m.wakeDial()
  531. return nil
  532. }
  533. // Dialed marks a peer as successfully dialed. Any further incoming connections
  534. // will be rejected, and once disconnected the peer may be dialed again.
  535. func (m *PeerManager) Dialed(peerID NodeID, address NodeAddress) error {
  536. m.mtx.Lock()
  537. defer m.mtx.Unlock()
  538. delete(m.dialing, peerID)
  539. var upgradeFromPeer NodeID
  540. for from, to := range m.upgrading {
  541. if to == peerID {
  542. delete(m.upgrading, from)
  543. upgradeFromPeer = from
  544. // Don't break, just in case this peer was marked as upgrading for
  545. // multiple lower-scored peers (shouldn't really happen).
  546. }
  547. }
  548. if m.connected[peerID] {
  549. return fmt.Errorf("peer %v is already connected", peerID)
  550. }
  551. if m.options.MaxConnected > 0 &&
  552. len(m.connected) >= int(m.options.MaxConnected)+int(m.options.MaxConnectedUpgrade) {
  553. return fmt.Errorf("already connected to maximum number of peers")
  554. }
  555. peer, ok := m.store.Get(peerID)
  556. if !ok {
  557. return fmt.Errorf("peer %q was removed while dialing", peerID)
  558. }
  559. now := time.Now().UTC()
  560. peer.LastConnected = now
  561. if addressInfo, ok := peer.AddressInfo[address.String()]; ok {
  562. addressInfo.DialFailures = 0
  563. addressInfo.LastDialSuccess = now
  564. // If not found, assume address has been removed.
  565. }
  566. if err := m.store.Set(peer); err != nil {
  567. return err
  568. }
  569. if upgradeFromPeer != "" && m.options.MaxConnected > 0 &&
  570. len(m.connected) >= int(m.options.MaxConnected) {
  571. // Look for an even lower-scored peer that may have appeared
  572. // since we started the upgrade.
  573. if p, ok := m.store.Get(upgradeFromPeer); ok {
  574. if u := m.findUpgradeCandidate(p.ID, p.Score()); u != "" {
  575. upgradeFromPeer = u
  576. }
  577. }
  578. m.evict[upgradeFromPeer] = true
  579. }
  580. m.connected[peerID] = true
  581. m.wakeEvict()
  582. return nil
  583. }
  584. // Accepted marks an incoming peer connection successfully accepted. If the peer
  585. // is already connected or we don't allow additional connections then this will
  586. // return an error.
  587. //
  588. // If full but MaxConnectedUpgrade is non-zero and the incoming peer is
  589. // better-scored than any existing peers, then we accept it and evict a
  590. // lower-scored peer.
  591. //
  592. // NOTE: We can't take an address here, since e.g. TCP uses a different port
  593. // number for outbound traffic than inbound traffic, so the peer's endpoint
  594. // wouldn't necessarily be an appropriate address to dial.
  595. //
  596. // FIXME: When we accept a connection from a peer, we should register that
  597. // peer's address in the peer store so that we can dial it later. In order to do
  598. // that, we'll need to get the remote address after all, but as noted above that
  599. // can't be the remote endpoint since that will usually have the wrong port
  600. // number.
  601. func (m *PeerManager) Accepted(peerID NodeID) error {
  602. m.mtx.Lock()
  603. defer m.mtx.Unlock()
  604. if m.connected[peerID] {
  605. return fmt.Errorf("peer %q is already connected", peerID)
  606. }
  607. if m.options.MaxConnected > 0 &&
  608. len(m.connected) >= int(m.options.MaxConnected)+int(m.options.MaxConnectedUpgrade) {
  609. return fmt.Errorf("already connected to maximum number of peers")
  610. }
  611. peer, ok := m.store.Get(peerID)
  612. if !ok {
  613. peer = m.makePeerInfo(peerID)
  614. }
  615. // If all connections slots are full, but we allow upgrades (and we checked
  616. // above that we have upgrade capacity), then we can look for a lower-scored
  617. // peer to replace and if found accept the connection anyway and evict it.
  618. var upgradeFromPeer NodeID
  619. if m.options.MaxConnected > 0 && len(m.connected) >= int(m.options.MaxConnected) {
  620. upgradeFromPeer = m.findUpgradeCandidate(peer.ID, peer.Score())
  621. if upgradeFromPeer == "" {
  622. return fmt.Errorf("already connected to maximum number of peers")
  623. }
  624. }
  625. peer.LastConnected = time.Now().UTC()
  626. if err := m.store.Set(peer); err != nil {
  627. return err
  628. }
  629. m.connected[peerID] = true
  630. if upgradeFromPeer != "" {
  631. m.evict[upgradeFromPeer] = true
  632. }
  633. m.wakeEvict()
  634. return nil
  635. }
  636. // Ready marks a peer as ready, broadcasting status updates to subscribers. The
  637. // peer must already be marked as connected. This is separate from Dialed() and
  638. // Accepted() to allow the router to set up its internal queues before reactors
  639. // start sending messages.
  640. func (m *PeerManager) Ready(peerID NodeID) {
  641. m.mtx.Lock()
  642. defer m.mtx.Unlock()
  643. if m.connected[peerID] {
  644. m.broadcast(PeerUpdate{
  645. PeerID: peerID,
  646. Status: PeerStatusUp,
  647. })
  648. }
  649. }
  650. // Disconnected unmarks a peer as connected, allowing new connections to be
  651. // established.
  652. func (m *PeerManager) Disconnected(peerID NodeID) error {
  653. m.mtx.Lock()
  654. defer m.mtx.Unlock()
  655. delete(m.connected, peerID)
  656. delete(m.upgrading, peerID)
  657. delete(m.evict, peerID)
  658. delete(m.evicting, peerID)
  659. m.broadcast(PeerUpdate{
  660. PeerID: peerID,
  661. Status: PeerStatusDown,
  662. })
  663. m.wakeDial()
  664. return nil
  665. }
  666. // EvictNext returns the next peer to evict (i.e. disconnect). If no evictable
  667. // peers are found, the call will block until one becomes available or the
  668. // context is cancelled.
  669. func (m *PeerManager) EvictNext(ctx context.Context) (NodeID, error) {
  670. for {
  671. id, err := m.TryEvictNext()
  672. if err != nil || id != "" {
  673. return id, err
  674. }
  675. select {
  676. case <-m.wakeEvictCh:
  677. case <-ctx.Done():
  678. return "", ctx.Err()
  679. }
  680. }
  681. }
  682. // TryEvictNext is equivalent to EvictNext, but immediately returns an empty
  683. // node ID if no evictable peers are found.
  684. func (m *PeerManager) TryEvictNext() (NodeID, error) {
  685. m.mtx.Lock()
  686. defer m.mtx.Unlock()
  687. // If any connected peers are explicitly scheduled for eviction, we return a
  688. // random one.
  689. for peerID := range m.evict {
  690. delete(m.evict, peerID)
  691. if m.connected[peerID] && !m.evicting[peerID] {
  692. m.evicting[peerID] = true
  693. return peerID, nil
  694. }
  695. }
  696. // If we're below capacity, we don't need to evict anything.
  697. if m.options.MaxConnected == 0 ||
  698. len(m.connected)-len(m.evicting) <= int(m.options.MaxConnected) {
  699. return "", nil
  700. }
  701. // If we're above capacity, just pick the lowest-ranked peer to evict.
  702. ranked := m.store.Ranked()
  703. for i := len(ranked) - 1; i >= 0; i-- {
  704. peer := ranked[i]
  705. if m.connected[peer.ID] && !m.evicting[peer.ID] {
  706. m.evicting[peer.ID] = true
  707. return peer.ID, nil
  708. }
  709. }
  710. return "", nil
  711. }
  712. // findUpgradeCandidate looks for a lower-scored peer that we could evict
  713. // to make room for the given peer. Returns an empty ID if none is found.
  714. // The caller must hold the mutex lock.
  715. func (m *PeerManager) findUpgradeCandidate(id NodeID, score PeerScore) NodeID {
  716. ranked := m.store.Ranked()
  717. for i := len(ranked) - 1; i >= 0; i-- {
  718. candidate := ranked[i]
  719. switch {
  720. case candidate.Score() >= score:
  721. return "" // no further peers can be scored lower, due to sorting
  722. case !m.connected[candidate.ID]:
  723. case m.evict[candidate.ID]:
  724. case m.evicting[candidate.ID]:
  725. case m.upgrading[candidate.ID] != "":
  726. default:
  727. return candidate.ID
  728. }
  729. }
  730. return ""
  731. }
  732. // GetHeight returns a peer's height, as reported via SetHeight. If the peer
  733. // or height is unknown, this returns 0.
  734. //
  735. // FIXME: This is a temporary workaround for the peer state stored via the
  736. // legacy Peer.Set() and Peer.Get() APIs, used to share height state between the
  737. // consensus and mempool reactors. These dependencies should be removed from the
  738. // reactors, and instead query this information independently via new P2P
  739. // protocol additions.
  740. func (m *PeerManager) GetHeight(peerID NodeID) int64 {
  741. m.mtx.Lock()
  742. defer m.mtx.Unlock()
  743. peer, _ := m.store.Get(peerID)
  744. return peer.Height
  745. }
  746. // SetHeight stores a peer's height, making it available via GetHeight. If the
  747. // peer is unknown, it is created.
  748. //
  749. // FIXME: This is a temporary workaround for the peer state stored via the
  750. // legacy Peer.Set() and Peer.Get() APIs, used to share height state between the
  751. // consensus and mempool reactors. These dependencies should be removed from the
  752. // reactors, and instead query this information independently via new P2P
  753. // protocol additions.
  754. func (m *PeerManager) SetHeight(peerID NodeID, height int64) error {
  755. m.mtx.Lock()
  756. defer m.mtx.Unlock()
  757. peer, ok := m.store.Get(peerID)
  758. if !ok {
  759. peer = m.makePeerInfo(peerID)
  760. }
  761. peer.Height = height
  762. return m.store.Set(peer)
  763. }
  764. // peerStore stores information about peers. It is not thread-safe, assuming
  765. // it is used only by PeerManager which handles concurrency control, allowing
  766. // it to execute multiple operations atomically via its own mutex.
  767. //
  768. // The entire set of peers is kept in memory, for performance. It is loaded
  769. // from disk on initialization, and any changes are written back to disk
  770. // (without fsync, since we can afford to lose recent writes).
  771. type peerStore struct {
  772. db dbm.DB
  773. peers map[NodeID]*peerInfo
  774. ranked []*peerInfo // cache for Ranked(), nil invalidates cache
  775. }
  776. // newPeerStore creates a new peer store, loading all persisted peers from the
  777. // database into memory.
  778. func newPeerStore(db dbm.DB) (*peerStore, error) {
  779. store := &peerStore{
  780. db: db,
  781. }
  782. if err := store.loadPeers(); err != nil {
  783. return nil, err
  784. }
  785. return store, nil
  786. }
  787. // loadPeers loads all peers from the database into memory.
  788. func (s *peerStore) loadPeers() error {
  789. peers := make(map[NodeID]*peerInfo)
  790. start, end := keyPeerInfoRange()
  791. iter, err := s.db.Iterator(start, end)
  792. if err != nil {
  793. return err
  794. }
  795. defer iter.Close()
  796. for ; iter.Valid(); iter.Next() {
  797. // FIXME: We may want to tolerate failures here, by simply logging
  798. // the errors and ignoring the faulty peer entries.
  799. msg := new(p2pproto.PeerInfo)
  800. if err := proto.Unmarshal(iter.Value(), msg); err != nil {
  801. return fmt.Errorf("invalid peer Protobuf data: %w", err)
  802. }
  803. peer, err := peerInfoFromProto(msg)
  804. if err != nil {
  805. return fmt.Errorf("invalid peer data: %w", err)
  806. }
  807. peers[peer.ID] = peer
  808. }
  809. if iter.Error() != nil {
  810. return iter.Error()
  811. }
  812. s.peers = peers
  813. s.ranked = nil // invalidate cache if populated
  814. return nil
  815. }
  816. // Get fetches a peer. The boolean indicates whether the peer existed or not.
  817. // The returned peer info is a copy, and can be mutated at will.
  818. func (s *peerStore) Get(id NodeID) (peerInfo, bool) {
  819. peer, ok := s.peers[id]
  820. return peer.Copy(), ok
  821. }
  822. // Set stores peer data. The input data will be copied, and can safely be reused
  823. // by the caller.
  824. func (s *peerStore) Set(peer peerInfo) error {
  825. if err := peer.Validate(); err != nil {
  826. return err
  827. }
  828. peer = peer.Copy()
  829. // FIXME: We may want to optimize this by avoiding saving to the database
  830. // if there haven't been any changes to persisted fields.
  831. bz, err := peer.ToProto().Marshal()
  832. if err != nil {
  833. return err
  834. }
  835. if err = s.db.Set(keyPeerInfo(peer.ID), bz); err != nil {
  836. return err
  837. }
  838. if current, ok := s.peers[peer.ID]; !ok || current.Score() != peer.Score() {
  839. // If the peer is new, or its score changes, we invalidate the Ranked() cache.
  840. s.peers[peer.ID] = &peer
  841. s.ranked = nil
  842. } else {
  843. // Otherwise, since s.ranked contains pointers to the old data and we
  844. // want those pointers to remain valid with the new data, we have to
  845. // update the existing pointer address.
  846. *current = peer
  847. }
  848. return nil
  849. }
  850. // Delete deletes a peer, or does nothing if it does not exist.
  851. func (s *peerStore) Delete(id NodeID) error {
  852. if _, ok := s.peers[id]; !ok {
  853. return nil
  854. }
  855. if err := s.db.Delete(keyPeerInfo(id)); err != nil {
  856. return err
  857. }
  858. delete(s.peers, id)
  859. s.ranked = nil
  860. return nil
  861. }
  862. // List retrieves all peers in an arbitrary order. The returned data is a copy,
  863. // and can be mutated at will.
  864. func (s *peerStore) List() []peerInfo {
  865. peers := make([]peerInfo, 0, len(s.peers))
  866. for _, peer := range s.peers {
  867. peers = append(peers, peer.Copy())
  868. }
  869. return peers
  870. }
  871. // Ranked returns a list of peers ordered by score (better peers first). Peers
  872. // with equal scores are returned in an arbitrary order. The returned list must
  873. // not be mutated or accessed concurrently by the caller, since it returns
  874. // pointers to internal peerStore data for performance.
  875. //
  876. // Ranked is used to determine both which peers to dial, which ones to evict,
  877. // and which ones to delete completely.
  878. //
  879. // FIXME: For now, we simply maintain a cache in s.ranked which is invalidated
  880. // by setting it to nil, but if necessary we should use a better data structure
  881. // for this (e.g. a heap or ordered map).
  882. //
  883. // FIXME: The scoring logic is currently very naïve, see peerInfo.Score().
  884. func (s *peerStore) Ranked() []*peerInfo {
  885. if s.ranked != nil {
  886. return s.ranked
  887. }
  888. s.ranked = make([]*peerInfo, 0, len(s.peers))
  889. for _, peer := range s.peers {
  890. s.ranked = append(s.ranked, peer)
  891. }
  892. sort.Slice(s.ranked, func(i, j int) bool {
  893. // FIXME: If necessary, consider precomputing scores before sorting,
  894. // to reduce the number of Score() calls.
  895. return s.ranked[i].Score() > s.ranked[j].Score()
  896. })
  897. return s.ranked
  898. }
  899. // Size returns the number of peers in the peer store.
  900. func (s *peerStore) Size() int {
  901. return len(s.peers)
  902. }
  903. // peerInfo contains peer information stored in a peerStore.
  904. type peerInfo struct {
  905. ID NodeID
  906. AddressInfo map[string]*peerAddressInfo
  907. LastConnected time.Time
  908. // These fields are ephemeral, i.e. not persisted to the database.
  909. Persistent bool
  910. Height int64
  911. }
  912. // peerInfoFromProto converts a Protobuf PeerInfo message to a peerInfo,
  913. // erroring if the data is invalid.
  914. func peerInfoFromProto(msg *p2pproto.PeerInfo) (*peerInfo, error) {
  915. p := &peerInfo{
  916. ID: NodeID(msg.ID),
  917. AddressInfo: map[string]*peerAddressInfo{},
  918. }
  919. if msg.LastConnected != nil {
  920. p.LastConnected = *msg.LastConnected
  921. }
  922. for _, addr := range msg.AddressInfo {
  923. addressInfo, err := peerAddressInfoFromProto(addr)
  924. if err != nil {
  925. return nil, err
  926. }
  927. p.AddressInfo[addressInfo.Address.String()] = addressInfo
  928. }
  929. return p, p.Validate()
  930. }
  931. // ToProto converts the peerInfo to p2pproto.PeerInfo for database storage. The
  932. // Protobuf type only contains persisted fields, while ephemeral fields are
  933. // discarded. The returned message may contain pointers to original data, since
  934. // it is expected to be serialized immediately.
  935. func (p *peerInfo) ToProto() *p2pproto.PeerInfo {
  936. msg := &p2pproto.PeerInfo{
  937. ID: string(p.ID),
  938. LastConnected: &p.LastConnected,
  939. }
  940. for _, addressInfo := range p.AddressInfo {
  941. msg.AddressInfo = append(msg.AddressInfo, addressInfo.ToProto())
  942. }
  943. if msg.LastConnected.IsZero() {
  944. msg.LastConnected = nil
  945. }
  946. return msg
  947. }
  948. // Copy returns a deep copy of the peer info.
  949. func (p *peerInfo) Copy() peerInfo {
  950. if p == nil {
  951. return peerInfo{}
  952. }
  953. c := *p
  954. for i, addressInfo := range c.AddressInfo {
  955. addressInfoCopy := addressInfo.Copy()
  956. c.AddressInfo[i] = &addressInfoCopy
  957. }
  958. return c
  959. }
  960. // Score calculates a score for the peer. Higher-scored peers will be
  961. // preferred over lower scores.
  962. func (p *peerInfo) Score() PeerScore {
  963. var score PeerScore
  964. if p.Persistent {
  965. score += PeerScorePersistent
  966. }
  967. return score
  968. }
  969. // Validate validates the peer info.
  970. func (p *peerInfo) Validate() error {
  971. if p.ID == "" {
  972. return errors.New("no peer ID")
  973. }
  974. return nil
  975. }
  976. // peerAddressInfo contains information and statistics about a peer address.
  977. type peerAddressInfo struct {
  978. Address NodeAddress
  979. LastDialSuccess time.Time
  980. LastDialFailure time.Time
  981. DialFailures uint32 // since last successful dial
  982. }
  983. // peerAddressInfoFromProto converts a Protobuf PeerAddressInfo message
  984. // to a peerAddressInfo.
  985. func peerAddressInfoFromProto(msg *p2pproto.PeerAddressInfo) (*peerAddressInfo, error) {
  986. address, err := ParseNodeAddress(msg.Address)
  987. if err != nil {
  988. return nil, fmt.Errorf("invalid address %q: %w", address, err)
  989. }
  990. addressInfo := &peerAddressInfo{
  991. Address: address,
  992. DialFailures: msg.DialFailures,
  993. }
  994. if msg.LastDialSuccess != nil {
  995. addressInfo.LastDialSuccess = *msg.LastDialSuccess
  996. }
  997. if msg.LastDialFailure != nil {
  998. addressInfo.LastDialFailure = *msg.LastDialFailure
  999. }
  1000. return addressInfo, addressInfo.Validate()
  1001. }
  1002. // ToProto converts the address into to a Protobuf message for serialization.
  1003. func (a *peerAddressInfo) ToProto() *p2pproto.PeerAddressInfo {
  1004. msg := &p2pproto.PeerAddressInfo{
  1005. Address: a.Address.String(),
  1006. LastDialSuccess: &a.LastDialSuccess,
  1007. LastDialFailure: &a.LastDialFailure,
  1008. DialFailures: a.DialFailures,
  1009. }
  1010. if msg.LastDialSuccess.IsZero() {
  1011. msg.LastDialSuccess = nil
  1012. }
  1013. if msg.LastDialFailure.IsZero() {
  1014. msg.LastDialFailure = nil
  1015. }
  1016. return msg
  1017. }
  1018. // Copy returns a copy of the address info.
  1019. func (a *peerAddressInfo) Copy() peerAddressInfo {
  1020. return *a
  1021. }
  1022. // Validate validates the address info.
  1023. func (a *peerAddressInfo) Validate() error {
  1024. return a.Address.Validate()
  1025. }
  1026. // These are database key prefixes.
  1027. const (
  1028. prefixPeerInfo int64 = 1
  1029. )
  1030. // keyPeerInfo generates a peerInfo database key.
  1031. func keyPeerInfo(id NodeID) []byte {
  1032. key, err := orderedcode.Append(nil, prefixPeerInfo, string(id))
  1033. if err != nil {
  1034. panic(err)
  1035. }
  1036. return key
  1037. }
  1038. // keyPeerInfoPrefix generates start/end keys for the entire peerInfo key range.
  1039. func keyPeerInfoRange() ([]byte, []byte) {
  1040. start, err := orderedcode.Append(nil, prefixPeerInfo, "")
  1041. if err != nil {
  1042. panic(err)
  1043. }
  1044. end, err := orderedcode.Append(nil, prefixPeerInfo, orderedcode.Infinity)
  1045. if err != nil {
  1046. panic(err)
  1047. }
  1048. return start, end
  1049. }
  1050. // ============================================================================
  1051. // Types and business logic below may be deprecated.
  1052. //
  1053. // TODO: Rename once legacy p2p types are removed.
  1054. // ref: https://github.com/tendermint/tendermint/issues/5670
  1055. // ============================================================================
  1056. //go:generate mockery --case underscore --name Peer
  1057. const metricsTickerDuration = 10 * time.Second
  1058. // Peer is an interface representing a peer connected on a reactor.
  1059. type Peer interface {
  1060. service.Service
  1061. FlushStop()
  1062. ID() NodeID // peer's cryptographic ID
  1063. RemoteIP() net.IP // remote IP of the connection
  1064. RemoteAddr() net.Addr // remote address of the connection
  1065. IsOutbound() bool // did we dial the peer
  1066. IsPersistent() bool // do we redial this peer when we disconnect
  1067. CloseConn() error // close original connection
  1068. NodeInfo() NodeInfo // peer's info
  1069. Status() tmconn.ConnectionStatus
  1070. SocketAddr() *NetAddress // actual address of the socket
  1071. Send(byte, []byte) bool
  1072. TrySend(byte, []byte) bool
  1073. Set(string, interface{})
  1074. Get(string) interface{}
  1075. }
  1076. //----------------------------------------------------------
  1077. // peerConn contains the raw connection and its config.
  1078. type peerConn struct {
  1079. outbound bool
  1080. persistent bool
  1081. conn Connection
  1082. ip net.IP // cached RemoteIP()
  1083. }
  1084. func newPeerConn(outbound, persistent bool, conn Connection) peerConn {
  1085. return peerConn{
  1086. outbound: outbound,
  1087. persistent: persistent,
  1088. conn: conn,
  1089. }
  1090. }
  1091. // Return the IP from the connection RemoteAddr
  1092. func (pc peerConn) RemoteIP() net.IP {
  1093. if pc.ip == nil {
  1094. pc.ip = pc.conn.RemoteEndpoint().IP
  1095. }
  1096. return pc.ip
  1097. }
  1098. // peer implements Peer.
  1099. //
  1100. // Before using a peer, you will need to perform a handshake on connection.
  1101. type peer struct {
  1102. service.BaseService
  1103. // raw peerConn and the multiplex connection
  1104. peerConn
  1105. // peer's node info and the channel it knows about
  1106. // channels = nodeInfo.Channels
  1107. // cached to avoid copying nodeInfo in hasChannel
  1108. nodeInfo NodeInfo
  1109. channels []byte
  1110. reactors map[byte]Reactor
  1111. onPeerError func(Peer, interface{})
  1112. // User data
  1113. Data *cmap.CMap
  1114. metrics *Metrics
  1115. metricsTicker *time.Ticker
  1116. }
  1117. type PeerOption func(*peer)
  1118. func newPeer(
  1119. nodeInfo NodeInfo,
  1120. pc peerConn,
  1121. reactorsByCh map[byte]Reactor,
  1122. onPeerError func(Peer, interface{}),
  1123. options ...PeerOption,
  1124. ) *peer {
  1125. p := &peer{
  1126. peerConn: pc,
  1127. nodeInfo: nodeInfo,
  1128. channels: nodeInfo.Channels, // TODO
  1129. reactors: reactorsByCh,
  1130. onPeerError: onPeerError,
  1131. Data: cmap.NewCMap(),
  1132. metricsTicker: time.NewTicker(metricsTickerDuration),
  1133. metrics: NopMetrics(),
  1134. }
  1135. p.BaseService = *service.NewBaseService(nil, "Peer", p)
  1136. for _, option := range options {
  1137. option(p)
  1138. }
  1139. return p
  1140. }
  1141. // onError calls the peer error callback.
  1142. func (p *peer) onError(err interface{}) {
  1143. p.onPeerError(p, err)
  1144. }
  1145. // String representation.
  1146. func (p *peer) String() string {
  1147. if p.outbound {
  1148. return fmt.Sprintf("Peer{%v %v out}", p.conn, p.ID())
  1149. }
  1150. return fmt.Sprintf("Peer{%v %v in}", p.conn, p.ID())
  1151. }
  1152. //---------------------------------------------------
  1153. // Implements service.Service
  1154. // SetLogger implements BaseService.
  1155. func (p *peer) SetLogger(l log.Logger) {
  1156. p.Logger = l
  1157. }
  1158. // OnStart implements BaseService.
  1159. func (p *peer) OnStart() error {
  1160. if err := p.BaseService.OnStart(); err != nil {
  1161. return err
  1162. }
  1163. go p.processMessages()
  1164. go p.metricsReporter()
  1165. return nil
  1166. }
  1167. // processMessages processes messages received from the connection.
  1168. func (p *peer) processMessages() {
  1169. defer func() {
  1170. if r := recover(); r != nil {
  1171. p.Logger.Error("peer message processing panic", "err", r, "stack", string(debug.Stack()))
  1172. p.onError(fmt.Errorf("panic during peer message processing: %v", r))
  1173. }
  1174. }()
  1175. for {
  1176. chID, msg, err := p.conn.ReceiveMessage()
  1177. if err != nil {
  1178. p.onError(err)
  1179. return
  1180. }
  1181. reactor, ok := p.reactors[byte(chID)]
  1182. if !ok {
  1183. p.onError(fmt.Errorf("unknown channel %v", chID))
  1184. return
  1185. }
  1186. reactor.Receive(byte(chID), p, msg)
  1187. }
  1188. }
  1189. // FlushStop mimics OnStop but additionally ensures that all successful
  1190. // .Send() calls will get flushed before closing the connection.
  1191. // NOTE: it is not safe to call this method more than once.
  1192. func (p *peer) FlushStop() {
  1193. p.metricsTicker.Stop()
  1194. p.BaseService.OnStop()
  1195. if err := p.conn.FlushClose(); err != nil {
  1196. p.Logger.Debug("error while stopping peer", "err", err)
  1197. }
  1198. }
  1199. // OnStop implements BaseService.
  1200. func (p *peer) OnStop() {
  1201. p.metricsTicker.Stop()
  1202. p.BaseService.OnStop()
  1203. if err := p.conn.Close(); err != nil {
  1204. p.Logger.Debug("error while stopping peer", "err", err)
  1205. }
  1206. }
  1207. //---------------------------------------------------
  1208. // Implements Peer
  1209. // ID returns the peer's ID - the hex encoded hash of its pubkey.
  1210. func (p *peer) ID() NodeID {
  1211. return p.nodeInfo.ID()
  1212. }
  1213. // IsOutbound returns true if the connection is outbound, false otherwise.
  1214. func (p *peer) IsOutbound() bool {
  1215. return p.peerConn.outbound
  1216. }
  1217. // IsPersistent returns true if the peer is persitent, false otherwise.
  1218. func (p *peer) IsPersistent() bool {
  1219. return p.peerConn.persistent
  1220. }
  1221. // NodeInfo returns a copy of the peer's NodeInfo.
  1222. func (p *peer) NodeInfo() NodeInfo {
  1223. return p.nodeInfo
  1224. }
  1225. // SocketAddr returns the address of the socket.
  1226. // For outbound peers, it's the address dialed (after DNS resolution).
  1227. // For inbound peers, it's the address returned by the underlying connection
  1228. // (not what's reported in the peer's NodeInfo).
  1229. func (p *peer) SocketAddr() *NetAddress {
  1230. endpoint := p.peerConn.conn.RemoteEndpoint()
  1231. return &NetAddress{
  1232. ID: p.ID(),
  1233. IP: endpoint.IP,
  1234. Port: endpoint.Port,
  1235. }
  1236. }
  1237. // Status returns the peer's ConnectionStatus.
  1238. func (p *peer) Status() tmconn.ConnectionStatus {
  1239. return p.conn.Status()
  1240. }
  1241. // Send msg bytes to the channel identified by chID byte. Returns false if the
  1242. // send queue is full after timeout, specified by MConnection.
  1243. func (p *peer) Send(chID byte, msgBytes []byte) bool {
  1244. if !p.IsRunning() {
  1245. // see Switch#Broadcast, where we fetch the list of peers and loop over
  1246. // them - while we're looping, one peer may be removed and stopped.
  1247. return false
  1248. } else if !p.hasChannel(chID) {
  1249. return false
  1250. }
  1251. res, err := p.conn.SendMessage(ChannelID(chID), msgBytes)
  1252. if err == io.EOF {
  1253. return false
  1254. } else if err != nil {
  1255. p.onError(err)
  1256. return false
  1257. }
  1258. if res {
  1259. labels := []string{
  1260. "peer_id", string(p.ID()),
  1261. "chID", fmt.Sprintf("%#x", chID),
  1262. }
  1263. p.metrics.PeerSendBytesTotal.With(labels...).Add(float64(len(msgBytes)))
  1264. }
  1265. return res
  1266. }
  1267. // TrySend msg bytes to the channel identified by chID byte. Immediately returns
  1268. // false if the send queue is full.
  1269. func (p *peer) TrySend(chID byte, msgBytes []byte) bool {
  1270. if !p.IsRunning() {
  1271. return false
  1272. } else if !p.hasChannel(chID) {
  1273. return false
  1274. }
  1275. res, err := p.conn.TrySendMessage(ChannelID(chID), msgBytes)
  1276. if err == io.EOF {
  1277. return false
  1278. } else if err != nil {
  1279. p.onError(err)
  1280. return false
  1281. }
  1282. if res {
  1283. labels := []string{
  1284. "peer_id", string(p.ID()),
  1285. "chID", fmt.Sprintf("%#x", chID),
  1286. }
  1287. p.metrics.PeerSendBytesTotal.With(labels...).Add(float64(len(msgBytes)))
  1288. }
  1289. return res
  1290. }
  1291. // Get the data for a given key.
  1292. func (p *peer) Get(key string) interface{} {
  1293. return p.Data.Get(key)
  1294. }
  1295. // Set sets the data for the given key.
  1296. func (p *peer) Set(key string, data interface{}) {
  1297. p.Data.Set(key, data)
  1298. }
  1299. // hasChannel returns true if the peer reported
  1300. // knowing about the given chID.
  1301. func (p *peer) hasChannel(chID byte) bool {
  1302. for _, ch := range p.channels {
  1303. if ch == chID {
  1304. return true
  1305. }
  1306. }
  1307. // NOTE: probably will want to remove this
  1308. // but could be helpful while the feature is new
  1309. p.Logger.Debug(
  1310. "Unknown channel for peer",
  1311. "channel",
  1312. chID,
  1313. "channels",
  1314. p.channels,
  1315. )
  1316. return false
  1317. }
  1318. // CloseConn closes original connection. Used for cleaning up in cases where the peer had not been started at all.
  1319. func (p *peer) CloseConn() error {
  1320. return p.peerConn.conn.Close()
  1321. }
  1322. //---------------------------------------------------
  1323. // methods only used for testing
  1324. // TODO: can we remove these?
  1325. // CloseConn closes the underlying connection
  1326. func (pc *peerConn) CloseConn() {
  1327. pc.conn.Close()
  1328. }
  1329. // RemoteAddr returns peer's remote network address.
  1330. func (p *peer) RemoteAddr() net.Addr {
  1331. endpoint := p.conn.RemoteEndpoint()
  1332. return &net.TCPAddr{
  1333. IP: endpoint.IP,
  1334. Port: int(endpoint.Port),
  1335. }
  1336. }
  1337. //---------------------------------------------------
  1338. func PeerMetrics(metrics *Metrics) PeerOption {
  1339. return func(p *peer) {
  1340. p.metrics = metrics
  1341. }
  1342. }
  1343. func (p *peer) metricsReporter() {
  1344. for {
  1345. select {
  1346. case <-p.metricsTicker.C:
  1347. status := p.conn.Status()
  1348. var sendQueueSize float64
  1349. for _, chStatus := range status.Channels {
  1350. sendQueueSize += float64(chStatus.SendQueueSize)
  1351. }
  1352. p.metrics.PeerPendingSendBytes.With("peer_id", string(p.ID())).Set(sendQueueSize)
  1353. case <-p.Quit():
  1354. return
  1355. }
  1356. }
  1357. }