You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1673 lines
50 KiB

p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: file descriptor leaks (#3150) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```
6 years ago
p2p: file descriptor leaks (#3150) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```
6 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: file descriptor leaks (#3150) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```
6 years ago
p2p: file descriptor leaks (#3150) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```
6 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
  1. package p2p
  2. import (
  3. "context"
  4. "errors"
  5. "fmt"
  6. "io"
  7. "math"
  8. "math/rand"
  9. "net"
  10. "net/url"
  11. "runtime/debug"
  12. "sort"
  13. "strconv"
  14. "strings"
  15. "sync"
  16. "time"
  17. "github.com/gogo/protobuf/proto"
  18. "github.com/google/orderedcode"
  19. dbm "github.com/tendermint/tm-db"
  20. "github.com/tendermint/tendermint/libs/cmap"
  21. "github.com/tendermint/tendermint/libs/log"
  22. "github.com/tendermint/tendermint/libs/service"
  23. tmconn "github.com/tendermint/tendermint/p2p/conn"
  24. p2pproto "github.com/tendermint/tendermint/proto/tendermint/p2p"
  25. )
  26. // PeerAddress is a peer address URL. It differs from Endpoint in that the
  27. // address hostname may be expanded into multiple IP addresses (thus multiple
  28. // endpoints).
  29. //
  30. // If the URL is opaque, i.e. of the form "scheme:<opaque>", then the opaque
  31. // part has to contain either the node ID or a node ID and path in the form
  32. // "scheme:<nodeid>@<path>".
  33. type PeerAddress struct {
  34. ID NodeID
  35. Protocol Protocol
  36. Hostname string
  37. Port uint16
  38. Path string
  39. }
  40. // ParsePeerAddress parses a peer address URL into a PeerAddress,
  41. // normalizing and validating it.
  42. func ParsePeerAddress(urlString string) (PeerAddress, error) {
  43. url, err := url.Parse(urlString)
  44. if err != nil || url == nil {
  45. return PeerAddress{}, fmt.Errorf("invalid peer address %q: %w", urlString, err)
  46. }
  47. address := PeerAddress{}
  48. // If the URL is opaque, i.e. in the form "scheme:<opaque>", we specify the
  49. // opaque bit to be either a node ID or a node ID and path in the form
  50. // "scheme:<nodeid>@<path>".
  51. if url.Opaque != "" {
  52. parts := strings.Split(url.Opaque, "@")
  53. if len(parts) > 2 {
  54. return PeerAddress{}, fmt.Errorf("invalid address format %q, unexpected @", urlString)
  55. }
  56. address.ID, err = NewNodeID(parts[0])
  57. if err != nil {
  58. return PeerAddress{}, fmt.Errorf("invalid peer ID %q: %w", parts[0], err)
  59. }
  60. if len(parts) == 2 {
  61. address.Path = parts[1]
  62. }
  63. return address, nil
  64. }
  65. // Otherwise, just parse a normal networked URL.
  66. address.ID, err = NewNodeID(url.User.Username())
  67. if err != nil {
  68. return PeerAddress{}, fmt.Errorf("invalid peer ID %q: %w", url.User.Username(), err)
  69. }
  70. if url.Scheme != "" {
  71. address.Protocol = Protocol(strings.ToLower(url.Scheme))
  72. } else {
  73. address.Protocol = defaultProtocol
  74. }
  75. address.Hostname = strings.ToLower(url.Hostname())
  76. if portString := url.Port(); portString != "" {
  77. port64, err := strconv.ParseUint(portString, 10, 16)
  78. if err != nil {
  79. return PeerAddress{}, fmt.Errorf("invalid port %q: %w", portString, err)
  80. }
  81. address.Port = uint16(port64)
  82. }
  83. // NOTE: URL paths are case-sensitive, so we don't lowercase them.
  84. address.Path = url.Path
  85. if url.RawPath != "" {
  86. address.Path = url.RawPath
  87. }
  88. if url.RawQuery != "" {
  89. address.Path += "?" + url.RawQuery
  90. }
  91. if url.RawFragment != "" {
  92. address.Path += "#" + url.RawFragment
  93. }
  94. if address.Path != "" && address.Path[0] != '/' && address.Path[0] != '#' {
  95. address.Path = "/" + address.Path
  96. }
  97. return address, address.Validate()
  98. }
  99. // Resolve resolves a PeerAddress into a set of Endpoints, by expanding
  100. // out a DNS hostname to IP addresses.
  101. func (a PeerAddress) Resolve(ctx context.Context) ([]Endpoint, error) {
  102. // If there is no hostname, this is an opaque URL in the form
  103. // "scheme:<opaque>".
  104. if a.Hostname == "" {
  105. return []Endpoint{{
  106. PeerID: a.ID,
  107. Protocol: a.Protocol,
  108. Path: a.Path,
  109. }}, nil
  110. }
  111. ips, err := net.DefaultResolver.LookupIP(ctx, "ip", a.Hostname)
  112. if err != nil {
  113. return nil, err
  114. }
  115. endpoints := make([]Endpoint, len(ips))
  116. for i, ip := range ips {
  117. endpoints[i] = Endpoint{
  118. PeerID: a.ID,
  119. Protocol: a.Protocol,
  120. IP: ip,
  121. Port: a.Port,
  122. Path: a.Path,
  123. }
  124. }
  125. return endpoints, nil
  126. }
  127. // Validates validates a PeerAddress.
  128. func (a PeerAddress) Validate() error {
  129. if a.Protocol == "" {
  130. return errors.New("no protocol")
  131. }
  132. if a.ID == "" {
  133. return errors.New("no peer ID")
  134. } else if err := a.ID.Validate(); err != nil {
  135. return fmt.Errorf("invalid peer ID: %w", err)
  136. }
  137. if a.Port > 0 && a.Hostname == "" {
  138. return errors.New("cannot specify port without hostname")
  139. }
  140. return nil
  141. }
  142. // String formats the address as a URL string.
  143. func (a PeerAddress) String() string {
  144. // Handle opaque URLs.
  145. if a.Hostname == "" {
  146. s := fmt.Sprintf("%s:%s", a.Protocol, a.ID)
  147. if a.Path != "" {
  148. s += "@" + a.Path
  149. }
  150. return s
  151. }
  152. s := fmt.Sprintf("%s://%s@%s", a.Protocol, a.ID, a.Hostname)
  153. if a.Port > 0 {
  154. s += ":" + strconv.Itoa(int(a.Port))
  155. }
  156. s += a.Path // We've already normalized the path with appropriate prefix in ParsePeerAddress()
  157. return s
  158. }
  159. // PeerStatus specifies peer statuses.
  160. type PeerStatus string
  161. const (
  162. PeerStatusNew = PeerStatus("new") // New peer which we haven't tried to contact yet.
  163. PeerStatusUp = PeerStatus("up") // Peer which we have an active connection to.
  164. PeerStatusDown = PeerStatus("down") // Peer which we're temporarily disconnected from.
  165. PeerStatusRemoved = PeerStatus("removed") // Peer which has been removed.
  166. PeerStatusBanned = PeerStatus("banned") // Peer which is banned for misbehavior.
  167. )
  168. // PeerError is a peer error reported by a reactor via the Error channel. The
  169. // severity may cause the peer to be disconnected or banned depending on policy.
  170. type PeerError struct {
  171. PeerID NodeID
  172. Err error
  173. Severity PeerErrorSeverity
  174. }
  175. // PeerErrorSeverity determines the severity of a peer error.
  176. type PeerErrorSeverity string
  177. const (
  178. PeerErrorSeverityLow PeerErrorSeverity = "low" // Mostly ignored.
  179. PeerErrorSeverityHigh PeerErrorSeverity = "high" // May disconnect.
  180. PeerErrorSeverityCritical PeerErrorSeverity = "critical" // Ban.
  181. )
  182. // PeerUpdatesCh defines a wrapper around a PeerUpdate go channel that allows
  183. // a reactor to listen for peer updates and safely close it when stopping.
  184. type PeerUpdatesCh struct {
  185. closeOnce sync.Once
  186. // updatesCh defines the go channel in which the router sends peer updates to
  187. // reactors. Each reactor will have its own PeerUpdatesCh to listen for updates
  188. // from.
  189. updatesCh chan PeerUpdate
  190. // doneCh is used to signal that a PeerUpdatesCh is closed. It is the
  191. // reactor's responsibility to invoke Close.
  192. doneCh chan struct{}
  193. }
  194. // NewPeerUpdates returns a reference to a new PeerUpdatesCh.
  195. func NewPeerUpdates(updatesCh chan PeerUpdate) *PeerUpdatesCh {
  196. return &PeerUpdatesCh{
  197. updatesCh: updatesCh,
  198. doneCh: make(chan struct{}),
  199. }
  200. }
  201. // Updates returns a read-only go channel where a consuming reactor can listen
  202. // for peer updates sent from the router.
  203. func (puc *PeerUpdatesCh) Updates() <-chan PeerUpdate {
  204. return puc.updatesCh
  205. }
  206. // Close closes the PeerUpdatesCh channel. It should only be closed by the respective
  207. // reactor when stopping and ensure nothing is listening for updates.
  208. //
  209. // NOTE: After a PeerUpdatesCh is closed, the router may safely assume it can no
  210. // longer send on the internal updatesCh, however it should NEVER explicitly close
  211. // it as that could result in panics by sending on a closed channel.
  212. func (puc *PeerUpdatesCh) Close() {
  213. puc.closeOnce.Do(func() {
  214. close(puc.doneCh)
  215. })
  216. }
  217. // Done returns a read-only version of the PeerUpdatesCh's internal doneCh go
  218. // channel that should be used by a router to signal when it is safe to explicitly
  219. // not send any peer updates.
  220. func (puc *PeerUpdatesCh) Done() <-chan struct{} {
  221. return puc.doneCh
  222. }
  223. // PeerUpdate is a peer status update for reactors.
  224. type PeerUpdate struct {
  225. PeerID NodeID
  226. Status PeerStatus
  227. }
  228. // PeerScore is a numeric score assigned to a peer (higher is better).
  229. type PeerScore uint16
  230. const (
  231. // PeerScorePersistent is added for persistent peers.
  232. PeerScorePersistent PeerScore = 100
  233. )
  234. // PeerManager manages peer lifecycle information, using a peerStore for
  235. // underlying storage. Its primary purpose is to determine which peers to
  236. // connect to next, make sure a peer only has a single active connection (either
  237. // inbound or outbound), and evict peers to make room for higher-scored peers.
  238. // It does not manage actual connections (this is handled by the Router),
  239. // only the peer lifecycle state.
  240. //
  241. // We track dialing and connected states independently. This allows us to accept
  242. // an inbound connection from a peer while the router is also dialing an
  243. // outbound connection to that same peer, which will cause the dialer to
  244. // eventually error when attempting to mark the peer as connected. This also
  245. // avoids race conditions where multiple goroutines may end up dialing a peer if
  246. // an incoming connection was briefly accepted and disconnected while we were
  247. // also dialing.
  248. //
  249. // For an outbound connection, the flow is as follows:
  250. // - DialNext: returns a peer address to dial, marking the peer as dialing.
  251. // - DialFailed: reports a dial failure, unmarking the peer as dialing.
  252. // - Dialed: successfully dialed, unmarking as dialing and marking as connected
  253. // (or erroring if already connected).
  254. // - Ready: routing is up, broadcasts a PeerStatusUp peer update to subscribers.
  255. // - Disconnected: peer disconnects, unmarking as connected and broadcasts a
  256. // PeerStatusDown peer update.
  257. //
  258. // For an inbound connection, the flow is as follows:
  259. // - Accepted: successfully accepted connection, marking as connected (or erroring
  260. // if already connected).
  261. // - Ready: routing is up, broadcasts a PeerStatusUp peer update to subscribers.
  262. // - Disconnected: peer disconnects, unmarking as connected and broadcasts a
  263. // PeerStatusDown peer update.
  264. //
  265. // When evicting peers, either because peers are explicitly scheduled for
  266. // eviction or we are connected to too many peers, the flow is as follows:
  267. // - EvictNext: if marked evict and connected, unmark evict and mark evicting.
  268. // If beyond MaxConnected, pick lowest-scored peer and mark evicting.
  269. // - Disconnected: unmark connected, evicting, evict, and broadcast a
  270. // PeerStatusDown peer update.
  271. //
  272. // If all connection slots are full (at MaxConnections), we can use up to
  273. // MaxConnectionsUpgrade additional connections to probe any higher-scored
  274. // unconnected peers, and if we reach them (or they reach us) we allow the
  275. // connection and evict a lower-scored peer. We mark the lower-scored peer as
  276. // upgrading[from]=to to make sure no other higher-scored peers can claim the
  277. // same one for an upgrade. The flow is as follows:
  278. // - Accepted: if upgrade is possible, mark connected and add lower-scored to evict.
  279. // - DialNext: if upgrade is possible, mark upgrading[from]=to and dialing.
  280. // - DialFailed: unmark upgrading[from]=to and dialing.
  281. // - Dialed: unmark upgrading[from]=to and dialing, mark as connected, add
  282. // lower-scored to evict.
  283. // - EvictNext: pick peer from evict, mark as evicting.
  284. // - Disconnected: unmark connected, upgrading[from]=to, evict, evicting.
  285. type PeerManager struct {
  286. options PeerManagerOptions
  287. wakeDialCh chan struct{} // wakes up DialNext() on relevant peer changes
  288. wakeEvictCh chan struct{} // wakes up EvictNext() on relevant peer changes
  289. closeCh chan struct{} // signal channel for Close()
  290. closeOnce sync.Once
  291. mtx sync.Mutex
  292. store *peerStore
  293. dialing map[NodeID]bool // peers being dialed (DialNext -> Dialed/DialFail)
  294. upgrading map[NodeID]NodeID // peers claimed for upgrade (DialNext -> Dialed/DialFail)
  295. connected map[NodeID]bool // connected peers (Dialed/Accepted -> Disconnected)
  296. evict map[NodeID]bool // peers scheduled for eviction (Connected -> EvictNext)
  297. evicting map[NodeID]bool // peers being evicted (EvictNext -> Disconnected)
  298. subscriptions map[*PeerUpdatesCh]*PeerUpdatesCh // keyed by struct identity (address)
  299. }
  300. // PeerManagerOptions specifies options for a PeerManager.
  301. type PeerManagerOptions struct {
  302. // PersistentPeers are peers that we want to maintain persistent connections
  303. // to. These will be scored higher than other peers, and if
  304. // MaxConnectedUpgrade is non-zero any lower-scored peers will be evicted if
  305. // necessary to make room for these.
  306. PersistentPeers []NodeID
  307. // MaxPeers is the maximum number of peers to track information about, i.e.
  308. // store in the peer store. When exceeded, the lowest-scored unconnected peers
  309. // will be deleted. 0 means no limit.
  310. MaxPeers uint16
  311. // MaxConnected is the maximum number of connected peers (inbound and
  312. // outbound). 0 means no limit.
  313. MaxConnected uint16
  314. // MaxConnectedUpgrade is the maximum number of additional connections to
  315. // use for probing any better-scored peers to upgrade to when all connection
  316. // slots are full. 0 disables peer upgrading.
  317. //
  318. // For example, if we are already connected to MaxConnected peers, but we
  319. // know or learn about better-scored peers (e.g. configured persistent
  320. // peers) that we are not connected too, then we can probe these peers by
  321. // using up to MaxConnectedUpgrade connections, and once connected evict the
  322. // lowest-scored connected peers. This also works for inbound connections,
  323. // i.e. if a higher-scored peer attempts to connect to us, we can accept
  324. // the connection and evict a lower-scored peer.
  325. MaxConnectedUpgrade uint16
  326. // MinRetryTime is the minimum time to wait between retries. Retry times
  327. // double for each retry, up to MaxRetryTime. 0 disables retries.
  328. MinRetryTime time.Duration
  329. // MaxRetryTime is the maximum time to wait between retries. 0 means
  330. // no maximum, in which case the retry time will keep doubling.
  331. MaxRetryTime time.Duration
  332. // MaxRetryTimePersistent is the maximum time to wait between retries for
  333. // peers listed in PersistentPeers. 0 uses MaxRetryTime instead.
  334. MaxRetryTimePersistent time.Duration
  335. // RetryTimeJitter is the upper bound of a random interval added to
  336. // retry times, to avoid thundering herds. 0 disables jutter.
  337. RetryTimeJitter time.Duration
  338. }
  339. // NewPeerManager creates a new peer manager.
  340. func NewPeerManager(peerDB dbm.DB, options PeerManagerOptions) (*PeerManager, error) {
  341. store, err := newPeerStore(peerDB)
  342. if err != nil {
  343. return nil, err
  344. }
  345. peerManager := &PeerManager{
  346. options: options,
  347. closeCh: make(chan struct{}),
  348. // We use a buffer of size 1 for these trigger channels, with
  349. // non-blocking sends. This ensures that if e.g. wakeDial() is called
  350. // multiple times before the initial trigger is picked up we only
  351. // process the trigger once.
  352. //
  353. // FIXME: This should maybe be a libs/sync type.
  354. wakeDialCh: make(chan struct{}, 1),
  355. wakeEvictCh: make(chan struct{}, 1),
  356. store: store,
  357. dialing: map[NodeID]bool{},
  358. upgrading: map[NodeID]NodeID{},
  359. connected: map[NodeID]bool{},
  360. evict: map[NodeID]bool{},
  361. evicting: map[NodeID]bool{},
  362. subscriptions: map[*PeerUpdatesCh]*PeerUpdatesCh{},
  363. }
  364. if err = peerManager.configurePeers(); err != nil {
  365. return nil, err
  366. }
  367. if err = peerManager.prunePeers(); err != nil {
  368. return nil, err
  369. }
  370. return peerManager, nil
  371. }
  372. // configurePeers configures peers in the peer store with ephemeral runtime
  373. // configuration, e.g. setting peerInfo.Persistent based on
  374. // PeerManagerOptions.PersistentPeers. The caller must hold the mutex lock.
  375. func (m *PeerManager) configurePeers() error {
  376. for _, peerID := range m.options.PersistentPeers {
  377. if peer, ok := m.store.Get(peerID); ok {
  378. peer.Persistent = true
  379. if err := m.store.Set(peer); err != nil {
  380. return err
  381. }
  382. }
  383. }
  384. return nil
  385. }
  386. // prunePeers removes peers from the peer store if it contains more than
  387. // MaxPeers peers. The lowest-scored non-connected peers are removed.
  388. // The caller must hold the mutex lock.
  389. func (m *PeerManager) prunePeers() error {
  390. if m.options.MaxPeers == 0 || m.store.Size() <= int(m.options.MaxPeers) {
  391. return nil
  392. }
  393. m.mtx.Lock()
  394. defer m.mtx.Unlock()
  395. ranked := m.store.Ranked()
  396. for i := len(ranked) - 1; i >= 0; i-- {
  397. peerID := ranked[i].ID
  398. switch {
  399. case m.store.Size() <= int(m.options.MaxPeers):
  400. break
  401. case m.dialing[peerID]:
  402. case m.connected[peerID]:
  403. case m.evicting[peerID]:
  404. default:
  405. if err := m.store.Delete(peerID); err != nil {
  406. return err
  407. }
  408. }
  409. }
  410. return nil
  411. }
  412. // Close closes the peer manager, releasing resources allocated with it
  413. // (specifically any running goroutines).
  414. func (m *PeerManager) Close() {
  415. m.closeOnce.Do(func() {
  416. close(m.closeCh)
  417. })
  418. }
  419. // Add adds a peer to the manager, given as an address. If the peer already
  420. // exists, the address is added to it.
  421. func (m *PeerManager) Add(address PeerAddress) error {
  422. if err := address.Validate(); err != nil {
  423. return err
  424. }
  425. m.mtx.Lock()
  426. defer m.mtx.Unlock()
  427. peer, ok := m.store.Get(address.ID)
  428. if !ok {
  429. peer = m.makePeerInfo(address.ID)
  430. }
  431. if _, ok := peer.AddressInfo[address.String()]; !ok {
  432. peer.AddressInfo[address.String()] = &peerAddressInfo{Address: address}
  433. }
  434. if err := m.store.Set(peer); err != nil {
  435. return err
  436. }
  437. if err := m.prunePeers(); err != nil {
  438. return err
  439. }
  440. m.wakeDial()
  441. return nil
  442. }
  443. // Advertise returns a list of peer addresses to advertise to a peer.
  444. //
  445. // FIXME: This is fairly naïve and only returns the addresses of the
  446. // highest-ranked peers.
  447. func (m *PeerManager) Advertise(peerID NodeID, limit uint16) []PeerAddress {
  448. m.mtx.Lock()
  449. defer m.mtx.Unlock()
  450. addresses := make([]PeerAddress, 0, limit)
  451. for _, peer := range m.store.Ranked() {
  452. if peer.ID == peerID {
  453. continue
  454. }
  455. for _, addressInfo := range peer.AddressInfo {
  456. if len(addresses) >= int(limit) {
  457. return addresses
  458. }
  459. addresses = append(addresses, addressInfo.Address)
  460. }
  461. }
  462. return addresses
  463. }
  464. // makePeerInfo creates a peerInfo for a new peer.
  465. func (m *PeerManager) makePeerInfo(id NodeID) peerInfo {
  466. isPersistent := false
  467. for _, p := range m.options.PersistentPeers {
  468. if id == p {
  469. isPersistent = true
  470. break
  471. }
  472. }
  473. return peerInfo{
  474. ID: id,
  475. Persistent: isPersistent,
  476. AddressInfo: map[string]*peerAddressInfo{},
  477. }
  478. }
  479. // Subscribe subscribes to peer updates. The caller must consume the peer
  480. // updates in a timely fashion and close the subscription when done, since
  481. // delivery is guaranteed and will block peer connection/disconnection
  482. // otherwise.
  483. func (m *PeerManager) Subscribe() *PeerUpdatesCh {
  484. // FIXME: We may want to use a size 1 buffer here. When the router
  485. // broadcasts a peer update it has to loop over all of the
  486. // subscriptions, and we want to avoid blocking and waiting for a
  487. // context switch before continuing to the next subscription. This also
  488. // prevents tail latencies from compounding across updates. We also want
  489. // to make sure the subscribers are reasonably in sync, so it should be
  490. // kept at 1. However, this should be benchmarked first.
  491. peerUpdates := NewPeerUpdates(make(chan PeerUpdate))
  492. m.mtx.Lock()
  493. m.subscriptions[peerUpdates] = peerUpdates
  494. m.mtx.Unlock()
  495. go func() {
  496. <-peerUpdates.Done()
  497. m.mtx.Lock()
  498. delete(m.subscriptions, peerUpdates)
  499. m.mtx.Unlock()
  500. }()
  501. return peerUpdates
  502. }
  503. // broadcast broadcasts a peer update to all subscriptions. The caller must
  504. // already hold the mutex lock. This means the mutex is held for the duration
  505. // of the broadcast, which we want to make sure all subscriptions receive all
  506. // updates in the same order.
  507. //
  508. // FIXME: Consider using more fine-grained mutexes here, and/or a channel to
  509. // enforce ordering of updates.
  510. func (m *PeerManager) broadcast(peerUpdate PeerUpdate) {
  511. for _, sub := range m.subscriptions {
  512. select {
  513. case sub.updatesCh <- peerUpdate:
  514. case <-sub.doneCh:
  515. }
  516. }
  517. }
  518. // DialNext finds an appropriate peer address to dial, and marks it as dialing.
  519. // If no peer is found, or all connection slots are full, it blocks until one
  520. // becomes available. The caller must call Dialed() or DialFailed() for the
  521. // returned peer. The context can be used to cancel the call.
  522. func (m *PeerManager) DialNext(ctx context.Context) (NodeID, PeerAddress, error) {
  523. for {
  524. id, address, err := m.TryDialNext()
  525. if err != nil || id != "" {
  526. return id, address, err
  527. }
  528. select {
  529. case <-m.wakeDialCh:
  530. case <-ctx.Done():
  531. return "", PeerAddress{}, ctx.Err()
  532. }
  533. }
  534. }
  535. // TryDialNext is equivalent to DialNext(), but immediately returns an empty
  536. // peer ID if no peers or connection slots are available.
  537. func (m *PeerManager) TryDialNext() (NodeID, PeerAddress, error) {
  538. m.mtx.Lock()
  539. defer m.mtx.Unlock()
  540. // We allow dialing MaxConnected+MaxConnectedUpgrade peers. Including
  541. // MaxConnectedUpgrade allows us to probe additional peers that have a
  542. // higher score than any other peers, and if successful evict it.
  543. if m.options.MaxConnected > 0 &&
  544. len(m.connected)+len(m.dialing) >= int(m.options.MaxConnected)+int(m.options.MaxConnectedUpgrade) {
  545. return "", PeerAddress{}, nil
  546. }
  547. for _, peer := range m.store.Ranked() {
  548. if m.dialing[peer.ID] || m.connected[peer.ID] {
  549. continue
  550. }
  551. for _, addressInfo := range peer.AddressInfo {
  552. if time.Since(addressInfo.LastDialFailure) < m.retryDelay(addressInfo.DialFailures, peer.Persistent) {
  553. continue
  554. }
  555. // We now have an eligible address to dial. If we're full but have
  556. // upgrade capacity (as checked above), we find a lower-scored peer
  557. // we can replace and mark it as upgrading so noone else claims it.
  558. //
  559. // If we don't find one, there is no point in trying additional
  560. // peers, since they will all have the same or lower score than this
  561. // peer (since they're ordered by score via peerStore.Ranked).
  562. if m.options.MaxConnected > 0 && len(m.connected) >= int(m.options.MaxConnected) {
  563. upgradeFromPeer := m.findUpgradeCandidate(peer.ID, peer.Score())
  564. if upgradeFromPeer == "" {
  565. return "", PeerAddress{}, nil
  566. }
  567. m.upgrading[upgradeFromPeer] = peer.ID
  568. }
  569. m.dialing[peer.ID] = true
  570. return peer.ID, addressInfo.Address, nil
  571. }
  572. }
  573. return "", PeerAddress{}, nil
  574. }
  575. // wakeDial is used to notify DialNext about changes that *may* cause new
  576. // peers to become eligible for dialing, such as peer disconnections and
  577. // retry timeouts.
  578. func (m *PeerManager) wakeDial() {
  579. // The channel has a 1-size buffer. A non-blocking send ensures
  580. // we only queue up at most 1 trigger between each DialNext().
  581. select {
  582. case m.wakeDialCh <- struct{}{}:
  583. default:
  584. }
  585. }
  586. // wakeEvict is used to notify EvictNext about changes that *may* cause
  587. // peers to become eligible for eviction, such as peer upgrades.
  588. func (m *PeerManager) wakeEvict() {
  589. // The channel has a 1-size buffer. A non-blocking send ensures
  590. // we only queue up at most 1 trigger between each EvictNext().
  591. select {
  592. case m.wakeEvictCh <- struct{}{}:
  593. default:
  594. }
  595. }
  596. // retryDelay calculates a dial retry delay using exponential backoff, based on
  597. // retry settings in PeerManagerOptions. If MinRetryTime is 0, this returns
  598. // MaxInt64 (i.e. an infinite retry delay, effectively disabling retries).
  599. func (m *PeerManager) retryDelay(failures uint32, persistent bool) time.Duration {
  600. if failures == 0 {
  601. return 0
  602. }
  603. if m.options.MinRetryTime == 0 {
  604. return time.Duration(math.MaxInt64)
  605. }
  606. maxDelay := m.options.MaxRetryTime
  607. if persistent && m.options.MaxRetryTimePersistent > 0 {
  608. maxDelay = m.options.MaxRetryTimePersistent
  609. }
  610. delay := m.options.MinRetryTime * time.Duration(math.Pow(2, float64(failures)))
  611. if maxDelay > 0 && delay > maxDelay {
  612. delay = maxDelay
  613. }
  614. // FIXME: This should use a PeerManager-scoped RNG.
  615. delay += time.Duration(rand.Int63n(int64(m.options.RetryTimeJitter))) // nolint:gosec
  616. return delay
  617. }
  618. // DialFailed reports a failed dial attempt. This will make the peer available
  619. // for dialing again when appropriate.
  620. //
  621. // FIXME: This should probably delete or mark bad addresses/peers after some time.
  622. func (m *PeerManager) DialFailed(peerID NodeID, address PeerAddress) error {
  623. m.mtx.Lock()
  624. defer m.mtx.Unlock()
  625. delete(m.dialing, peerID)
  626. for from, to := range m.upgrading {
  627. if to == peerID {
  628. delete(m.upgrading, from) // Unmark failed upgrade attempt.
  629. }
  630. }
  631. peer, ok := m.store.Get(peerID)
  632. if !ok { // Peer may have been removed while dialing, ignore.
  633. return nil
  634. }
  635. addressInfo, ok := peer.AddressInfo[address.String()]
  636. if !ok {
  637. return nil // Assume the address has been removed, ignore.
  638. }
  639. addressInfo.LastDialFailure = time.Now().UTC()
  640. addressInfo.DialFailures++
  641. if err := m.store.Set(peer); err != nil {
  642. return err
  643. }
  644. // We spawn a goroutine that notifies DialNext() again when the retry
  645. // timeout has elapsed, so that we can consider dialing it again.
  646. go func() {
  647. retryDelay := m.retryDelay(addressInfo.DialFailures, peer.Persistent)
  648. if retryDelay == time.Duration(math.MaxInt64) {
  649. return
  650. }
  651. // Use an explicit timer with deferred cleanup instead of
  652. // time.After(), to avoid leaking goroutines on PeerManager.Close().
  653. timer := time.NewTimer(retryDelay)
  654. defer timer.Stop()
  655. select {
  656. case <-timer.C:
  657. m.wakeDial()
  658. case <-m.closeCh:
  659. }
  660. }()
  661. m.wakeDial()
  662. return nil
  663. }
  664. // Dialed marks a peer as successfully dialed. Any further incoming connections
  665. // will be rejected, and once disconnected the peer may be dialed again.
  666. func (m *PeerManager) Dialed(peerID NodeID, address PeerAddress) error {
  667. m.mtx.Lock()
  668. defer m.mtx.Unlock()
  669. delete(m.dialing, peerID)
  670. var upgradeFromPeer NodeID
  671. for from, to := range m.upgrading {
  672. if to == peerID {
  673. delete(m.upgrading, from)
  674. upgradeFromPeer = from
  675. // Don't break, just in case this peer was marked as upgrading for
  676. // multiple lower-scored peers (shouldn't really happen).
  677. }
  678. }
  679. if m.connected[peerID] {
  680. return fmt.Errorf("peer %v is already connected", peerID)
  681. }
  682. if m.options.MaxConnected > 0 &&
  683. len(m.connected) >= int(m.options.MaxConnected)+int(m.options.MaxConnectedUpgrade) {
  684. return fmt.Errorf("already connected to maximum number of peers")
  685. }
  686. peer, ok := m.store.Get(peerID)
  687. if !ok {
  688. return fmt.Errorf("peer %q was removed while dialing", peerID)
  689. }
  690. now := time.Now().UTC()
  691. peer.LastConnected = now
  692. if addressInfo, ok := peer.AddressInfo[address.String()]; ok {
  693. addressInfo.DialFailures = 0
  694. addressInfo.LastDialSuccess = now
  695. // If not found, assume address has been removed.
  696. }
  697. if err := m.store.Set(peer); err != nil {
  698. return err
  699. }
  700. if upgradeFromPeer != "" && m.options.MaxConnected > 0 &&
  701. len(m.connected) >= int(m.options.MaxConnected) {
  702. // Look for an even lower-scored peer that may have appeared
  703. // since we started the upgrade.
  704. if p, ok := m.store.Get(upgradeFromPeer); ok {
  705. if u := m.findUpgradeCandidate(p.ID, p.Score()); u != "" {
  706. upgradeFromPeer = u
  707. }
  708. }
  709. m.evict[upgradeFromPeer] = true
  710. }
  711. m.connected[peerID] = true
  712. m.wakeEvict()
  713. return nil
  714. }
  715. // Accepted marks an incoming peer connection successfully accepted. If the peer
  716. // is already connected or we don't allow additional connections then this will
  717. // return an error.
  718. //
  719. // If full but MaxConnectedUpgrade is non-zero and the incoming peer is
  720. // better-scored than any existing peers, then we accept it and evict a
  721. // lower-scored peer.
  722. //
  723. // NOTE: We can't take an address here, since e.g. TCP uses a different port
  724. // number for outbound traffic than inbound traffic, so the peer's endpoint
  725. // wouldn't necessarily be an appropriate address to dial.
  726. //
  727. // FIXME: When we accept a connection from a peer, we should register that
  728. // peer's address in the peer store so that we can dial it later. In order to do
  729. // that, we'll need to get the remote address after all, but as noted above that
  730. // can't be the remote endpoint since that will usually have the wrong port
  731. // number.
  732. func (m *PeerManager) Accepted(peerID NodeID) error {
  733. m.mtx.Lock()
  734. defer m.mtx.Unlock()
  735. if m.connected[peerID] {
  736. return fmt.Errorf("peer %q is already connected", peerID)
  737. }
  738. if m.options.MaxConnected > 0 &&
  739. len(m.connected) >= int(m.options.MaxConnected)+int(m.options.MaxConnectedUpgrade) {
  740. return fmt.Errorf("already connected to maximum number of peers")
  741. }
  742. peer, ok := m.store.Get(peerID)
  743. if !ok {
  744. peer = m.makePeerInfo(peerID)
  745. }
  746. // If all connections slots are full, but we allow upgrades (and we checked
  747. // above that we have upgrade capacity), then we can look for a lower-scored
  748. // peer to replace and if found accept the connection anyway and evict it.
  749. var upgradeFromPeer NodeID
  750. if m.options.MaxConnected > 0 && len(m.connected) >= int(m.options.MaxConnected) {
  751. upgradeFromPeer = m.findUpgradeCandidate(peer.ID, peer.Score())
  752. if upgradeFromPeer == "" {
  753. return fmt.Errorf("already connected to maximum number of peers")
  754. }
  755. }
  756. peer.LastConnected = time.Now().UTC()
  757. if err := m.store.Set(peer); err != nil {
  758. return err
  759. }
  760. m.connected[peerID] = true
  761. if upgradeFromPeer != "" {
  762. m.evict[upgradeFromPeer] = true
  763. }
  764. m.wakeEvict()
  765. return nil
  766. }
  767. // Ready marks a peer as ready, broadcasting status updates to subscribers. The
  768. // peer must already be marked as connected. This is separate from Dialed() and
  769. // Accepted() to allow the router to set up its internal queues before reactors
  770. // start sending messages.
  771. func (m *PeerManager) Ready(peerID NodeID) {
  772. m.mtx.Lock()
  773. defer m.mtx.Unlock()
  774. if m.connected[peerID] {
  775. m.broadcast(PeerUpdate{
  776. PeerID: peerID,
  777. Status: PeerStatusUp,
  778. })
  779. }
  780. }
  781. // Disconnected unmarks a peer as connected, allowing new connections to be
  782. // established.
  783. func (m *PeerManager) Disconnected(peerID NodeID) error {
  784. m.mtx.Lock()
  785. defer m.mtx.Unlock()
  786. delete(m.connected, peerID)
  787. delete(m.upgrading, peerID)
  788. delete(m.evict, peerID)
  789. delete(m.evicting, peerID)
  790. m.broadcast(PeerUpdate{
  791. PeerID: peerID,
  792. Status: PeerStatusDown,
  793. })
  794. m.wakeDial()
  795. return nil
  796. }
  797. // EvictNext returns the next peer to evict (i.e. disconnect). If no evictable
  798. // peers are found, the call will block until one becomes available or the
  799. // context is cancelled.
  800. func (m *PeerManager) EvictNext(ctx context.Context) (NodeID, error) {
  801. for {
  802. id, err := m.TryEvictNext()
  803. if err != nil || id != "" {
  804. return id, err
  805. }
  806. select {
  807. case <-m.wakeEvictCh:
  808. case <-ctx.Done():
  809. return "", ctx.Err()
  810. }
  811. }
  812. }
  813. // TryEvictNext is equivalent to EvictNext, but immediately returns an empty
  814. // node ID if no evictable peers are found.
  815. func (m *PeerManager) TryEvictNext() (NodeID, error) {
  816. m.mtx.Lock()
  817. defer m.mtx.Unlock()
  818. // If any connected peers are explicitly scheduled for eviction, we return a
  819. // random one.
  820. for peerID := range m.evict {
  821. delete(m.evict, peerID)
  822. if m.connected[peerID] && !m.evicting[peerID] {
  823. m.evicting[peerID] = true
  824. return peerID, nil
  825. }
  826. }
  827. // If we're below capacity, we don't need to evict anything.
  828. if m.options.MaxConnected == 0 ||
  829. len(m.connected)-len(m.evicting) <= int(m.options.MaxConnected) {
  830. return "", nil
  831. }
  832. // If we're above capacity, just pick the lowest-ranked peer to evict.
  833. ranked := m.store.Ranked()
  834. for i := len(ranked) - 1; i >= 0; i-- {
  835. peer := ranked[i]
  836. if m.connected[peer.ID] && !m.evicting[peer.ID] {
  837. m.evicting[peer.ID] = true
  838. return peer.ID, nil
  839. }
  840. }
  841. return "", nil
  842. }
  843. // findUpgradeCandidate looks for a lower-scored peer that we could evict
  844. // to make room for the given peer. Returns an empty ID if none is found.
  845. // The caller must hold the mutex lock.
  846. func (m *PeerManager) findUpgradeCandidate(id NodeID, score PeerScore) NodeID {
  847. ranked := m.store.Ranked()
  848. for i := len(ranked) - 1; i >= 0; i-- {
  849. candidate := ranked[i]
  850. switch {
  851. case candidate.Score() >= score:
  852. return "" // no further peers can be scored lower, due to sorting
  853. case !m.connected[candidate.ID]:
  854. case m.evict[candidate.ID]:
  855. case m.evicting[candidate.ID]:
  856. case m.upgrading[candidate.ID] != "":
  857. default:
  858. return candidate.ID
  859. }
  860. }
  861. return ""
  862. }
  863. // GetHeight returns a peer's height, as reported via SetHeight. If the peer
  864. // or height is unknown, this returns 0.
  865. //
  866. // FIXME: This is a temporary workaround for the peer state stored via the
  867. // legacy Peer.Set() and Peer.Get() APIs, used to share height state between the
  868. // consensus and mempool reactors. These dependencies should be removed from the
  869. // reactors, and instead query this information independently via new P2P
  870. // protocol additions.
  871. func (m *PeerManager) GetHeight(peerID NodeID) int64 {
  872. m.mtx.Lock()
  873. defer m.mtx.Unlock()
  874. peer, _ := m.store.Get(peerID)
  875. return peer.Height
  876. }
  877. // SetHeight stores a peer's height, making it available via GetHeight. If the
  878. // peer is unknown, it is created.
  879. //
  880. // FIXME: This is a temporary workaround for the peer state stored via the
  881. // legacy Peer.Set() and Peer.Get() APIs, used to share height state between the
  882. // consensus and mempool reactors. These dependencies should be removed from the
  883. // reactors, and instead query this information independently via new P2P
  884. // protocol additions.
  885. func (m *PeerManager) SetHeight(peerID NodeID, height int64) error {
  886. m.mtx.Lock()
  887. defer m.mtx.Unlock()
  888. peer, ok := m.store.Get(peerID)
  889. if !ok {
  890. peer = m.makePeerInfo(peerID)
  891. }
  892. peer.Height = height
  893. return m.store.Set(peer)
  894. }
  895. // peerStore stores information about peers. It is not thread-safe, assuming
  896. // it is used only by PeerManager which handles concurrency control, allowing
  897. // it to execute multiple operations atomically via its own mutex.
  898. //
  899. // The entire set of peers is kept in memory, for performance. It is loaded
  900. // from disk on initialization, and any changes are written back to disk
  901. // (without fsync, since we can afford to lose recent writes).
  902. type peerStore struct {
  903. db dbm.DB
  904. peers map[NodeID]*peerInfo
  905. ranked []*peerInfo // cache for Ranked(), nil invalidates cache
  906. }
  907. // newPeerStore creates a new peer store, loading all persisted peers from the
  908. // database into memory.
  909. func newPeerStore(db dbm.DB) (*peerStore, error) {
  910. store := &peerStore{
  911. db: db,
  912. }
  913. if err := store.loadPeers(); err != nil {
  914. return nil, err
  915. }
  916. return store, nil
  917. }
  918. // loadPeers loads all peers from the database into memory.
  919. func (s *peerStore) loadPeers() error {
  920. peers := make(map[NodeID]*peerInfo)
  921. start, end := keyPeerInfoRange()
  922. iter, err := s.db.Iterator(start, end)
  923. if err != nil {
  924. return err
  925. }
  926. defer iter.Close()
  927. for ; iter.Valid(); iter.Next() {
  928. // FIXME: We may want to tolerate failures here, by simply logging
  929. // the errors and ignoring the faulty peer entries.
  930. msg := new(p2pproto.PeerInfo)
  931. if err := proto.Unmarshal(iter.Value(), msg); err != nil {
  932. return fmt.Errorf("invalid peer Protobuf data: %w", err)
  933. }
  934. peer, err := peerInfoFromProto(msg)
  935. if err != nil {
  936. return fmt.Errorf("invalid peer data: %w", err)
  937. }
  938. peers[peer.ID] = peer
  939. }
  940. if iter.Error() != nil {
  941. return iter.Error()
  942. }
  943. s.peers = peers
  944. s.ranked = nil // invalidate cache if populated
  945. return nil
  946. }
  947. // Get fetches a peer. The boolean indicates whether the peer existed or not.
  948. // The returned peer info is a copy, and can be mutated at will.
  949. func (s *peerStore) Get(id NodeID) (peerInfo, bool) {
  950. peer, ok := s.peers[id]
  951. return peer.Copy(), ok
  952. }
  953. // Set stores peer data. The input data will be copied, and can safely be reused
  954. // by the caller.
  955. func (s *peerStore) Set(peer peerInfo) error {
  956. if err := peer.Validate(); err != nil {
  957. return err
  958. }
  959. peer = peer.Copy()
  960. // FIXME: We may want to optimize this by avoiding saving to the database
  961. // if there haven't been any changes to persisted fields.
  962. bz, err := peer.ToProto().Marshal()
  963. if err != nil {
  964. return err
  965. }
  966. if err = s.db.Set(keyPeerInfo(peer.ID), bz); err != nil {
  967. return err
  968. }
  969. if current, ok := s.peers[peer.ID]; !ok || current.Score() != peer.Score() {
  970. // If the peer is new, or its score changes, we invalidate the Ranked() cache.
  971. s.peers[peer.ID] = &peer
  972. s.ranked = nil
  973. } else {
  974. // Otherwise, since s.ranked contains pointers to the old data and we
  975. // want those pointers to remain valid with the new data, we have to
  976. // update the existing pointer address.
  977. *current = peer
  978. }
  979. return nil
  980. }
  981. // Delete deletes a peer, or does nothing if it does not exist.
  982. func (s *peerStore) Delete(id NodeID) error {
  983. if _, ok := s.peers[id]; !ok {
  984. return nil
  985. }
  986. if err := s.db.Delete(keyPeerInfo(id)); err != nil {
  987. return err
  988. }
  989. delete(s.peers, id)
  990. s.ranked = nil
  991. return nil
  992. }
  993. // List retrieves all peers in an arbitrary order. The returned data is a copy,
  994. // and can be mutated at will.
  995. func (s *peerStore) List() []peerInfo {
  996. peers := make([]peerInfo, 0, len(s.peers))
  997. for _, peer := range s.peers {
  998. peers = append(peers, peer.Copy())
  999. }
  1000. return peers
  1001. }
  1002. // Ranked returns a list of peers ordered by score (better peers first). Peers
  1003. // with equal scores are returned in an arbitrary order. The returned list must
  1004. // not be mutated or accessed concurrently by the caller, since it returns
  1005. // pointers to internal peerStore data for performance.
  1006. //
  1007. // Ranked is used to determine both which peers to dial, which ones to evict,
  1008. // and which ones to delete completely.
  1009. //
  1010. // FIXME: For now, we simply maintain a cache in s.ranked which is invalidated
  1011. // by setting it to nil, but if necessary we should use a better data structure
  1012. // for this (e.g. a heap or ordered map).
  1013. //
  1014. // FIXME: The scoring logic is currently very naïve, see peerInfo.Score().
  1015. func (s *peerStore) Ranked() []*peerInfo {
  1016. if s.ranked != nil {
  1017. return s.ranked
  1018. }
  1019. s.ranked = make([]*peerInfo, 0, len(s.peers))
  1020. for _, peer := range s.peers {
  1021. s.ranked = append(s.ranked, peer)
  1022. }
  1023. sort.Slice(s.ranked, func(i, j int) bool {
  1024. // FIXME: If necessary, consider precomputing scores before sorting,
  1025. // to reduce the number of Score() calls.
  1026. return s.ranked[i].Score() > s.ranked[j].Score()
  1027. })
  1028. return s.ranked
  1029. }
  1030. // Size returns the number of peers in the peer store.
  1031. func (s *peerStore) Size() int {
  1032. return len(s.peers)
  1033. }
  1034. // peerInfo contains peer information stored in a peerStore.
  1035. type peerInfo struct {
  1036. ID NodeID
  1037. AddressInfo map[string]*peerAddressInfo
  1038. LastConnected time.Time
  1039. // These fields are ephemeral, i.e. not persisted to the database.
  1040. Persistent bool
  1041. Height int64
  1042. }
  1043. // peerInfoFromProto converts a Protobuf PeerInfo message to a peerInfo,
  1044. // erroring if the data is invalid.
  1045. func peerInfoFromProto(msg *p2pproto.PeerInfo) (*peerInfo, error) {
  1046. p := &peerInfo{
  1047. ID: NodeID(msg.ID),
  1048. AddressInfo: map[string]*peerAddressInfo{},
  1049. }
  1050. if msg.LastConnected != nil {
  1051. p.LastConnected = *msg.LastConnected
  1052. }
  1053. for _, addr := range msg.AddressInfo {
  1054. addressInfo, err := peerAddressInfoFromProto(addr)
  1055. if err != nil {
  1056. return nil, err
  1057. }
  1058. p.AddressInfo[addressInfo.Address.String()] = addressInfo
  1059. }
  1060. return p, p.Validate()
  1061. }
  1062. // ToProto converts the peerInfo to p2pproto.PeerInfo for database storage. The
  1063. // Protobuf type only contains persisted fields, while ephemeral fields are
  1064. // discarded. The returned message may contain pointers to original data, since
  1065. // it is expected to be serialized immediately.
  1066. func (p *peerInfo) ToProto() *p2pproto.PeerInfo {
  1067. msg := &p2pproto.PeerInfo{
  1068. ID: string(p.ID),
  1069. LastConnected: &p.LastConnected,
  1070. }
  1071. for _, addressInfo := range p.AddressInfo {
  1072. msg.AddressInfo = append(msg.AddressInfo, addressInfo.ToProto())
  1073. }
  1074. if msg.LastConnected.IsZero() {
  1075. msg.LastConnected = nil
  1076. }
  1077. return msg
  1078. }
  1079. // Copy returns a deep copy of the peer info.
  1080. func (p *peerInfo) Copy() peerInfo {
  1081. if p == nil {
  1082. return peerInfo{}
  1083. }
  1084. c := *p
  1085. for i, addressInfo := range c.AddressInfo {
  1086. addressInfoCopy := addressInfo.Copy()
  1087. c.AddressInfo[i] = &addressInfoCopy
  1088. }
  1089. return c
  1090. }
  1091. // Score calculates a score for the peer. Higher-scored peers will be
  1092. // preferred over lower scores.
  1093. func (p *peerInfo) Score() PeerScore {
  1094. var score PeerScore
  1095. if p.Persistent {
  1096. score += PeerScorePersistent
  1097. }
  1098. return score
  1099. }
  1100. // Validate validates the peer info.
  1101. func (p *peerInfo) Validate() error {
  1102. if p.ID == "" {
  1103. return errors.New("no peer ID")
  1104. }
  1105. return nil
  1106. }
  1107. // peerAddressInfo contains information and statistics about a peer address.
  1108. type peerAddressInfo struct {
  1109. Address PeerAddress
  1110. LastDialSuccess time.Time
  1111. LastDialFailure time.Time
  1112. DialFailures uint32 // since last successful dial
  1113. }
  1114. // peerAddressInfoFromProto converts a Protobuf PeerAddressInfo message
  1115. // to a peerAddressInfo.
  1116. func peerAddressInfoFromProto(msg *p2pproto.PeerAddressInfo) (*peerAddressInfo, error) {
  1117. address, err := ParsePeerAddress(msg.Address)
  1118. if err != nil {
  1119. return nil, fmt.Errorf("invalid address %q: %w", address, err)
  1120. }
  1121. addressInfo := &peerAddressInfo{
  1122. Address: address,
  1123. DialFailures: msg.DialFailures,
  1124. }
  1125. if msg.LastDialSuccess != nil {
  1126. addressInfo.LastDialSuccess = *msg.LastDialSuccess
  1127. }
  1128. if msg.LastDialFailure != nil {
  1129. addressInfo.LastDialFailure = *msg.LastDialFailure
  1130. }
  1131. return addressInfo, addressInfo.Validate()
  1132. }
  1133. // ToProto converts the address into to a Protobuf message for serialization.
  1134. func (a *peerAddressInfo) ToProto() *p2pproto.PeerAddressInfo {
  1135. msg := &p2pproto.PeerAddressInfo{
  1136. Address: a.Address.String(),
  1137. LastDialSuccess: &a.LastDialSuccess,
  1138. LastDialFailure: &a.LastDialFailure,
  1139. DialFailures: a.DialFailures,
  1140. }
  1141. if msg.LastDialSuccess.IsZero() {
  1142. msg.LastDialSuccess = nil
  1143. }
  1144. if msg.LastDialFailure.IsZero() {
  1145. msg.LastDialFailure = nil
  1146. }
  1147. return msg
  1148. }
  1149. // Copy returns a copy of the address info.
  1150. func (a *peerAddressInfo) Copy() peerAddressInfo {
  1151. return *a
  1152. }
  1153. // Validate validates the address info.
  1154. func (a *peerAddressInfo) Validate() error {
  1155. return a.Address.Validate()
  1156. }
  1157. // These are database key prefixes.
  1158. const (
  1159. prefixPeerInfo int64 = 1
  1160. )
  1161. // keyPeerInfo generates a peerInfo database key.
  1162. func keyPeerInfo(id NodeID) []byte {
  1163. key, err := orderedcode.Append(nil, prefixPeerInfo, string(id))
  1164. if err != nil {
  1165. panic(err)
  1166. }
  1167. return key
  1168. }
  1169. // keyPeerInfoPrefix generates start/end keys for the entire peerInfo key range.
  1170. func keyPeerInfoRange() ([]byte, []byte) {
  1171. start, err := orderedcode.Append(nil, prefixPeerInfo, "")
  1172. if err != nil {
  1173. panic(err)
  1174. }
  1175. end, err := orderedcode.Append(nil, prefixPeerInfo, orderedcode.Infinity)
  1176. if err != nil {
  1177. panic(err)
  1178. }
  1179. return start, end
  1180. }
  1181. // ============================================================================
  1182. // Types and business logic below may be deprecated.
  1183. //
  1184. // TODO: Rename once legacy p2p types are removed.
  1185. // ref: https://github.com/tendermint/tendermint/issues/5670
  1186. // ============================================================================
  1187. //go:generate mockery --case underscore --name Peer
  1188. const metricsTickerDuration = 10 * time.Second
  1189. // Peer is an interface representing a peer connected on a reactor.
  1190. type Peer interface {
  1191. service.Service
  1192. FlushStop()
  1193. ID() NodeID // peer's cryptographic ID
  1194. RemoteIP() net.IP // remote IP of the connection
  1195. RemoteAddr() net.Addr // remote address of the connection
  1196. IsOutbound() bool // did we dial the peer
  1197. IsPersistent() bool // do we redial this peer when we disconnect
  1198. CloseConn() error // close original connection
  1199. NodeInfo() NodeInfo // peer's info
  1200. Status() tmconn.ConnectionStatus
  1201. SocketAddr() *NetAddress // actual address of the socket
  1202. Send(byte, []byte) bool
  1203. TrySend(byte, []byte) bool
  1204. Set(string, interface{})
  1205. Get(string) interface{}
  1206. }
  1207. //----------------------------------------------------------
  1208. // peerConn contains the raw connection and its config.
  1209. type peerConn struct {
  1210. outbound bool
  1211. persistent bool
  1212. conn Connection
  1213. ip net.IP // cached RemoteIP()
  1214. }
  1215. func newPeerConn(outbound, persistent bool, conn Connection) peerConn {
  1216. return peerConn{
  1217. outbound: outbound,
  1218. persistent: persistent,
  1219. conn: conn,
  1220. }
  1221. }
  1222. // ID only exists for SecretConnection.
  1223. func (pc peerConn) ID() NodeID {
  1224. return NodeIDFromPubKey(pc.conn.PubKey())
  1225. }
  1226. // Return the IP from the connection RemoteAddr
  1227. func (pc peerConn) RemoteIP() net.IP {
  1228. if pc.ip == nil {
  1229. pc.ip = pc.conn.RemoteEndpoint().IP
  1230. }
  1231. return pc.ip
  1232. }
  1233. // peer implements Peer.
  1234. //
  1235. // Before using a peer, you will need to perform a handshake on connection.
  1236. type peer struct {
  1237. service.BaseService
  1238. // raw peerConn and the multiplex connection
  1239. peerConn
  1240. // peer's node info and the channel it knows about
  1241. // channels = nodeInfo.Channels
  1242. // cached to avoid copying nodeInfo in hasChannel
  1243. nodeInfo NodeInfo
  1244. channels []byte
  1245. reactors map[byte]Reactor
  1246. onPeerError func(Peer, interface{})
  1247. // User data
  1248. Data *cmap.CMap
  1249. metrics *Metrics
  1250. metricsTicker *time.Ticker
  1251. }
  1252. type PeerOption func(*peer)
  1253. func newPeer(
  1254. pc peerConn,
  1255. reactorsByCh map[byte]Reactor,
  1256. onPeerError func(Peer, interface{}),
  1257. options ...PeerOption,
  1258. ) *peer {
  1259. nodeInfo := pc.conn.NodeInfo()
  1260. p := &peer{
  1261. peerConn: pc,
  1262. nodeInfo: nodeInfo,
  1263. channels: nodeInfo.Channels, // TODO
  1264. reactors: reactorsByCh,
  1265. onPeerError: onPeerError,
  1266. Data: cmap.NewCMap(),
  1267. metricsTicker: time.NewTicker(metricsTickerDuration),
  1268. metrics: NopMetrics(),
  1269. }
  1270. p.BaseService = *service.NewBaseService(nil, "Peer", p)
  1271. for _, option := range options {
  1272. option(p)
  1273. }
  1274. return p
  1275. }
  1276. // onError calls the peer error callback.
  1277. func (p *peer) onError(err interface{}) {
  1278. p.onPeerError(p, err)
  1279. }
  1280. // String representation.
  1281. func (p *peer) String() string {
  1282. if p.outbound {
  1283. return fmt.Sprintf("Peer{%v %v out}", p.conn, p.ID())
  1284. }
  1285. return fmt.Sprintf("Peer{%v %v in}", p.conn, p.ID())
  1286. }
  1287. //---------------------------------------------------
  1288. // Implements service.Service
  1289. // SetLogger implements BaseService.
  1290. func (p *peer) SetLogger(l log.Logger) {
  1291. p.Logger = l
  1292. }
  1293. // OnStart implements BaseService.
  1294. func (p *peer) OnStart() error {
  1295. if err := p.BaseService.OnStart(); err != nil {
  1296. return err
  1297. }
  1298. go p.processMessages()
  1299. go p.metricsReporter()
  1300. return nil
  1301. }
  1302. // processMessages processes messages received from the connection.
  1303. func (p *peer) processMessages() {
  1304. defer func() {
  1305. if r := recover(); r != nil {
  1306. p.Logger.Error("peer message processing panic", "err", r, "stack", string(debug.Stack()))
  1307. p.onError(fmt.Errorf("panic during peer message processing: %v", r))
  1308. }
  1309. }()
  1310. for {
  1311. chID, msg, err := p.conn.ReceiveMessage()
  1312. if err != nil {
  1313. p.onError(err)
  1314. return
  1315. }
  1316. reactor, ok := p.reactors[chID]
  1317. if !ok {
  1318. p.onError(fmt.Errorf("unknown channel %v", chID))
  1319. return
  1320. }
  1321. reactor.Receive(chID, p, msg)
  1322. }
  1323. }
  1324. // FlushStop mimics OnStop but additionally ensures that all successful
  1325. // .Send() calls will get flushed before closing the connection.
  1326. // NOTE: it is not safe to call this method more than once.
  1327. func (p *peer) FlushStop() {
  1328. p.metricsTicker.Stop()
  1329. p.BaseService.OnStop()
  1330. if err := p.conn.FlushClose(); err != nil {
  1331. p.Logger.Debug("error while stopping peer", "err", err)
  1332. }
  1333. }
  1334. // OnStop implements BaseService.
  1335. func (p *peer) OnStop() {
  1336. p.metricsTicker.Stop()
  1337. p.BaseService.OnStop()
  1338. if err := p.conn.Close(); err != nil {
  1339. p.Logger.Debug("error while stopping peer", "err", err)
  1340. }
  1341. }
  1342. //---------------------------------------------------
  1343. // Implements Peer
  1344. // ID returns the peer's ID - the hex encoded hash of its pubkey.
  1345. func (p *peer) ID() NodeID {
  1346. return p.nodeInfo.ID()
  1347. }
  1348. // IsOutbound returns true if the connection is outbound, false otherwise.
  1349. func (p *peer) IsOutbound() bool {
  1350. return p.peerConn.outbound
  1351. }
  1352. // IsPersistent returns true if the peer is persitent, false otherwise.
  1353. func (p *peer) IsPersistent() bool {
  1354. return p.peerConn.persistent
  1355. }
  1356. // NodeInfo returns a copy of the peer's NodeInfo.
  1357. func (p *peer) NodeInfo() NodeInfo {
  1358. return p.nodeInfo
  1359. }
  1360. // SocketAddr returns the address of the socket.
  1361. // For outbound peers, it's the address dialed (after DNS resolution).
  1362. // For inbound peers, it's the address returned by the underlying connection
  1363. // (not what's reported in the peer's NodeInfo).
  1364. func (p *peer) SocketAddr() *NetAddress {
  1365. return p.peerConn.conn.RemoteEndpoint().NetAddress()
  1366. }
  1367. // Status returns the peer's ConnectionStatus.
  1368. func (p *peer) Status() tmconn.ConnectionStatus {
  1369. return p.conn.Status()
  1370. }
  1371. // Send msg bytes to the channel identified by chID byte. Returns false if the
  1372. // send queue is full after timeout, specified by MConnection.
  1373. func (p *peer) Send(chID byte, msgBytes []byte) bool {
  1374. if !p.IsRunning() {
  1375. // see Switch#Broadcast, where we fetch the list of peers and loop over
  1376. // them - while we're looping, one peer may be removed and stopped.
  1377. return false
  1378. } else if !p.hasChannel(chID) {
  1379. return false
  1380. }
  1381. res, err := p.conn.SendMessage(chID, msgBytes)
  1382. if err == io.EOF {
  1383. return false
  1384. } else if err != nil {
  1385. p.onError(err)
  1386. return false
  1387. }
  1388. if res {
  1389. labels := []string{
  1390. "peer_id", string(p.ID()),
  1391. "chID", fmt.Sprintf("%#x", chID),
  1392. }
  1393. p.metrics.PeerSendBytesTotal.With(labels...).Add(float64(len(msgBytes)))
  1394. }
  1395. return res
  1396. }
  1397. // TrySend msg bytes to the channel identified by chID byte. Immediately returns
  1398. // false if the send queue is full.
  1399. func (p *peer) TrySend(chID byte, msgBytes []byte) bool {
  1400. if !p.IsRunning() {
  1401. return false
  1402. } else if !p.hasChannel(chID) {
  1403. return false
  1404. }
  1405. res, err := p.conn.TrySendMessage(chID, msgBytes)
  1406. if err == io.EOF {
  1407. return false
  1408. } else if err != nil {
  1409. p.onError(err)
  1410. return false
  1411. }
  1412. if res {
  1413. labels := []string{
  1414. "peer_id", string(p.ID()),
  1415. "chID", fmt.Sprintf("%#x", chID),
  1416. }
  1417. p.metrics.PeerSendBytesTotal.With(labels...).Add(float64(len(msgBytes)))
  1418. }
  1419. return res
  1420. }
  1421. // Get the data for a given key.
  1422. func (p *peer) Get(key string) interface{} {
  1423. return p.Data.Get(key)
  1424. }
  1425. // Set sets the data for the given key.
  1426. func (p *peer) Set(key string, data interface{}) {
  1427. p.Data.Set(key, data)
  1428. }
  1429. // hasChannel returns true if the peer reported
  1430. // knowing about the given chID.
  1431. func (p *peer) hasChannel(chID byte) bool {
  1432. for _, ch := range p.channels {
  1433. if ch == chID {
  1434. return true
  1435. }
  1436. }
  1437. // NOTE: probably will want to remove this
  1438. // but could be helpful while the feature is new
  1439. p.Logger.Debug(
  1440. "Unknown channel for peer",
  1441. "channel",
  1442. chID,
  1443. "channels",
  1444. p.channels,
  1445. )
  1446. return false
  1447. }
  1448. // CloseConn closes original connection. Used for cleaning up in cases where the peer had not been started at all.
  1449. func (p *peer) CloseConn() error {
  1450. return p.peerConn.conn.Close()
  1451. }
  1452. //---------------------------------------------------
  1453. // methods only used for testing
  1454. // TODO: can we remove these?
  1455. // CloseConn closes the underlying connection
  1456. func (pc *peerConn) CloseConn() {
  1457. pc.conn.Close()
  1458. }
  1459. // RemoteAddr returns peer's remote network address.
  1460. func (p *peer) RemoteAddr() net.Addr {
  1461. endpoint := p.conn.RemoteEndpoint()
  1462. return &net.TCPAddr{
  1463. IP: endpoint.IP,
  1464. Port: int(endpoint.Port),
  1465. }
  1466. }
  1467. //---------------------------------------------------
  1468. func PeerMetrics(metrics *Metrics) PeerOption {
  1469. return func(p *peer) {
  1470. p.metrics = metrics
  1471. }
  1472. }
  1473. func (p *peer) metricsReporter() {
  1474. for {
  1475. select {
  1476. case <-p.metricsTicker.C:
  1477. status := p.conn.Status()
  1478. var sendQueueSize float64
  1479. for _, chStatus := range status.Channels {
  1480. sendQueueSize += float64(chStatus.SendQueueSize)
  1481. }
  1482. p.metrics.PeerPendingSendBytes.With("peer_id", string(p.ID())).Set(sendQueueSize)
  1483. case <-p.Quit():
  1484. return
  1485. }
  1486. }
  1487. }