You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1396 lines
42 KiB

p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: make PeerManager.DialNext() and EvictNext() block (#5947) See #5936 and #5938 for background. The plan was initially to have `DialNext()` and `EvictNext()` return a channel. However, implementing this became unnecessarily complicated and error-prone. As an example, the channel would be both consumed and populated (via method calls) by the same driving method (e.g. `Router.dialPeers()`) which could easily cause deadlocks where a method call blocked while sending on the channel that the caller itself was responsible for consuming (but couldn't since it was busy making the method call). It would also require a set of goroutines in the peer manager that would interact with the goroutines in the router in non-obvious ways, and fully populating the channel on startup could cause deadlocks with other startup tasks. Several issues like these made the solution hard to reason about. I therefore simply made `DialNext()` and `EvictNext()` block until the next peer was available, using internal triggers to wake these methods up in a non-blocking fashion when any relevant state changes occurred. This proved much simpler to reason about, since there are no goroutines in the peer manager (except for trivial retry timers), nor any blocking channel sends, and it instead relies entirely on the existing goroutine structure of the router for concurrency. This also happens to be the same pattern used by the `Transport.Accept()` API, following Go stdlib conventions, so all router goroutines end up using a consistent pattern as well.
4 years ago
p2p: file descriptor leaks (#3150) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```
6 years ago
p2p: file descriptor leaks (#3150) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```
6 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: file descriptor leaks (#3150) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```
6 years ago
p2p: file descriptor leaks (#3150) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```
6 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
p2p: implement new Transport interface (#5791) This implements a new `Transport` interface and related types for the P2P refactor in #5670. Previously, `conn.MConnection` was very tightly coupled to the `Peer` implementation -- in order to allow alternative non-multiplexed transports (e.g. QUIC), MConnection has now been moved below the `Transport` interface, as `MConnTransport`, and decoupled from the peer. Since the `p2p` package is not covered by our Go API stability, this is not considered a breaking change, and not listed in the changelog. The initial approach was to implement the new interface in its final form (which also involved possible protocol changes, see https://github.com/tendermint/spec/pull/227). However, it turned out that this would require a large amount of changes to existing P2P code because of the previous tight coupling between `Peer` and `MConnection` and the reliance on subtleties in the MConnection behavior. Instead, I have broadened the `Transport` interface to expose much of the existing MConnection interface, preserved much of the existing MConnection logic and behavior in the transport implementation, and tried to make as few changes to the rest of the P2P stack as possible. We will instead reduce this interface gradually as we refactor other parts of the P2P stack. The low-level transport code and protocol (e.g. MConnection, SecretConnection and so on) has not been significantly changed, and refactoring this is not a priority until we come up with a plan for QUIC adoption, as we may end up discarding the MConnection code entirely. There are no tests of the new `MConnTransport`, as this code is likely to evolve as we proceed with the P2P refactor, but tests should be added before a final release. The E2E tests are sufficient for basic validation in the meanwhile.
4 years ago
  1. package p2p
  2. import (
  3. "context"
  4. "errors"
  5. "fmt"
  6. "io"
  7. "math"
  8. "math/rand"
  9. "net"
  10. "net/url"
  11. "runtime/debug"
  12. "sort"
  13. "strconv"
  14. "sync"
  15. "time"
  16. "github.com/tendermint/tendermint/libs/cmap"
  17. "github.com/tendermint/tendermint/libs/log"
  18. "github.com/tendermint/tendermint/libs/service"
  19. tmconn "github.com/tendermint/tendermint/p2p/conn"
  20. )
  21. // PeerAddress is a peer address URL.
  22. type PeerAddress struct {
  23. *url.URL
  24. }
  25. // ParsePeerAddress parses a peer address URL into a PeerAddress.
  26. func ParsePeerAddress(address string) (PeerAddress, error) {
  27. u, err := url.Parse(address)
  28. if err != nil || u == nil {
  29. return PeerAddress{}, fmt.Errorf("unable to parse peer address %q: %w", address, err)
  30. }
  31. if u.Scheme == "" {
  32. u.Scheme = string(defaultProtocol)
  33. }
  34. pa := PeerAddress{URL: u}
  35. if err = pa.Validate(); err != nil {
  36. return PeerAddress{}, err
  37. }
  38. return pa, nil
  39. }
  40. // NodeID returns the address node ID.
  41. func (a PeerAddress) NodeID() NodeID {
  42. return NodeID(a.User.Username())
  43. }
  44. // Resolve resolves a PeerAddress into a set of Endpoints, by expanding
  45. // out a DNS name in Host to its IP addresses. Field mapping:
  46. //
  47. // Scheme → Endpoint.Protocol
  48. // Host → Endpoint.IP
  49. // User → Endpoint.PeerID
  50. // Port → Endpoint.Port
  51. // Path+Query+Fragment,Opaque → Endpoint.Path
  52. //
  53. func (a PeerAddress) Resolve(ctx context.Context) ([]Endpoint, error) {
  54. ips, err := net.DefaultResolver.LookupIP(ctx, "ip", a.Host)
  55. if err != nil {
  56. return nil, err
  57. }
  58. port, err := a.parsePort()
  59. if err != nil {
  60. return nil, err
  61. }
  62. path := a.Path
  63. if a.RawPath != "" {
  64. path = a.RawPath
  65. }
  66. if a.Opaque != "" { // used for e.g. "about:blank" style URLs
  67. path = a.Opaque
  68. }
  69. if a.RawQuery != "" {
  70. path += "?" + a.RawQuery
  71. }
  72. if a.RawFragment != "" {
  73. path += "#" + a.RawFragment
  74. }
  75. endpoints := make([]Endpoint, len(ips))
  76. for i, ip := range ips {
  77. endpoints[i] = Endpoint{
  78. PeerID: a.NodeID(),
  79. Protocol: Protocol(a.Scheme),
  80. IP: ip,
  81. Port: port,
  82. Path: path,
  83. }
  84. }
  85. return endpoints, nil
  86. }
  87. // Validates validates a PeerAddress.
  88. func (a PeerAddress) Validate() error {
  89. if a.Scheme == "" {
  90. return errors.New("no protocol")
  91. }
  92. if id := a.User.Username(); id == "" {
  93. return errors.New("no peer ID")
  94. } else if err := NodeID(id).Validate(); err != nil {
  95. return fmt.Errorf("invalid peer ID: %w", err)
  96. }
  97. if a.Hostname() == "" && len(a.Query()) == 0 && a.Opaque == "" {
  98. return errors.New("no host or path given")
  99. }
  100. if port, err := a.parsePort(); err != nil {
  101. return err
  102. } else if port > 0 && a.Hostname() == "" {
  103. return errors.New("cannot specify port without host")
  104. }
  105. return nil
  106. }
  107. // parsePort returns the port number as a uint16.
  108. func (a PeerAddress) parsePort() (uint16, error) {
  109. if portString := a.Port(); portString != "" {
  110. port64, err := strconv.ParseUint(portString, 10, 16)
  111. if err != nil {
  112. return 0, fmt.Errorf("invalid port %q: %w", portString, err)
  113. }
  114. return uint16(port64), nil
  115. }
  116. return 0, nil
  117. }
  118. // PeerStatus specifies peer statuses.
  119. type PeerStatus string
  120. const (
  121. PeerStatusNew = PeerStatus("new") // New peer which we haven't tried to contact yet.
  122. PeerStatusUp = PeerStatus("up") // Peer which we have an active connection to.
  123. PeerStatusDown = PeerStatus("down") // Peer which we're temporarily disconnected from.
  124. PeerStatusRemoved = PeerStatus("removed") // Peer which has been removed.
  125. PeerStatusBanned = PeerStatus("banned") // Peer which is banned for misbehavior.
  126. )
  127. // PeerError is a peer error reported by a reactor via the Error channel. The
  128. // severity may cause the peer to be disconnected or banned depending on policy.
  129. type PeerError struct {
  130. PeerID NodeID
  131. Err error
  132. Severity PeerErrorSeverity
  133. }
  134. // PeerErrorSeverity determines the severity of a peer error.
  135. type PeerErrorSeverity string
  136. const (
  137. PeerErrorSeverityLow PeerErrorSeverity = "low" // Mostly ignored.
  138. PeerErrorSeverityHigh PeerErrorSeverity = "high" // May disconnect.
  139. PeerErrorSeverityCritical PeerErrorSeverity = "critical" // Ban.
  140. )
  141. // PeerUpdatesCh defines a wrapper around a PeerUpdate go channel that allows
  142. // a reactor to listen for peer updates and safely close it when stopping.
  143. type PeerUpdatesCh struct {
  144. closeOnce sync.Once
  145. // updatesCh defines the go channel in which the router sends peer updates to
  146. // reactors. Each reactor will have its own PeerUpdatesCh to listen for updates
  147. // from.
  148. updatesCh chan PeerUpdate
  149. // doneCh is used to signal that a PeerUpdatesCh is closed. It is the
  150. // reactor's responsibility to invoke Close.
  151. doneCh chan struct{}
  152. }
  153. // NewPeerUpdates returns a reference to a new PeerUpdatesCh.
  154. func NewPeerUpdates(updatesCh chan PeerUpdate) *PeerUpdatesCh {
  155. return &PeerUpdatesCh{
  156. updatesCh: updatesCh,
  157. doneCh: make(chan struct{}),
  158. }
  159. }
  160. // Updates returns a read-only go channel where a consuming reactor can listen
  161. // for peer updates sent from the router.
  162. func (puc *PeerUpdatesCh) Updates() <-chan PeerUpdate {
  163. return puc.updatesCh
  164. }
  165. // Close closes the PeerUpdatesCh channel. It should only be closed by the respective
  166. // reactor when stopping and ensure nothing is listening for updates.
  167. //
  168. // NOTE: After a PeerUpdatesCh is closed, the router may safely assume it can no
  169. // longer send on the internal updatesCh, however it should NEVER explicitly close
  170. // it as that could result in panics by sending on a closed channel.
  171. func (puc *PeerUpdatesCh) Close() {
  172. puc.closeOnce.Do(func() {
  173. close(puc.doneCh)
  174. })
  175. }
  176. // Done returns a read-only version of the PeerUpdatesCh's internal doneCh go
  177. // channel that should be used by a router to signal when it is safe to explicitly
  178. // not send any peer updates.
  179. func (puc *PeerUpdatesCh) Done() <-chan struct{} {
  180. return puc.doneCh
  181. }
  182. // PeerUpdate is a peer status update for reactors.
  183. type PeerUpdate struct {
  184. PeerID NodeID
  185. Status PeerStatus
  186. }
  187. // PeerScore is a numeric score assigned to a peer (higher is better).
  188. type PeerScore uint16
  189. const (
  190. // PeerScorePersistent is added for persistent peers.
  191. PeerScorePersistent PeerScore = 100
  192. )
  193. // PeerManager manages peer lifecycle information, using a peerStore for
  194. // underlying storage. Its primary purpose is to determine which peers to
  195. // connect to next, make sure a peer only has a single active connection (either
  196. // inbound or outbound), and evict peers to make room for higher-scored peers.
  197. // It does not manage actual connections (this is handled by the Router),
  198. // only the peer lifecycle state.
  199. //
  200. // We track dialing and connected states independently. This allows us to accept
  201. // an inbound connection from a peer while the router is also dialing an
  202. // outbound connection to that same peer, which will cause the dialer to
  203. // eventually error when attempting to mark the peer as connected. This also
  204. // avoids race conditions where multiple goroutines may end up dialing a peer if
  205. // an incoming connection was briefly accepted and disconnected while we were
  206. // also dialing.
  207. //
  208. // For an outbound connection, the flow is as follows:
  209. // - DialNext: returns a peer address to dial, marking the peer as dialing.
  210. // - DialFailed: reports a dial failure, unmarking the peer as dialing.
  211. // - Dialed: successfully dialed, unmarking as dialing and marking as connected
  212. // (or erroring if already connected).
  213. // - Ready: routing is up, broadcasts a PeerStatusUp peer update to subscribers.
  214. // - Disconnected: peer disconnects, unmarking as connected and broadcasts a
  215. // PeerStatusDown peer update.
  216. //
  217. // For an inbound connection, the flow is as follows:
  218. // - Accepted: successfully accepted connection, marking as connected (or erroring
  219. // if already connected).
  220. // - Ready: routing is up, broadcasts a PeerStatusUp peer update to subscribers.
  221. // - Disconnected: peer disconnects, unmarking as connected and broadcasts a
  222. // PeerStatusDown peer update.
  223. //
  224. // If we are connected to too many peers (more than MaxConnections), typically
  225. // because we have upgraded to higher-scored peers and need to shed lower-scored
  226. // ones, the flow is as follows:
  227. // - EvictNext: returns a peer ID to evict, marking peer as evicting.
  228. // - Disconnected: peer was disconnected, unmarking as connected and evicting,
  229. // and broadcasts a PeerStatusDown peer update.
  230. //
  231. // If all connection slots are full (at MaxConnections), we can use up to
  232. // MaxConnectionsUpgrade additional connections to probe any higher-scored
  233. // unconnected peers, and if we reach them (or they reach us) we allow the
  234. // connection and evict lower-scored peers. We mark the lower-scored peer as
  235. // upgrading[from]=to to make sure no other higher-scored peers can claim the
  236. // same one for an upgrade. The flow is as follows:
  237. // - Accepted: if upgrade is possible, mark upgrading[from]=to and connected.
  238. // - DialNext: if upgrade is possible, mark upgrading[from]=to and dialing.
  239. // - DialFailed: unmark upgrading[from]=to and dialing.
  240. // - Dialed: unmark dialing, mark as connected.
  241. // - EvictNext: unmark upgrading[from]=to, then if over MaxConnections
  242. // either the upgraded peer or an even lower-scored one (if found)
  243. // is marked as evicting and returned.
  244. // - Disconnected: unmark connected and evicting, also upgrading[from]=to
  245. // both from and to (in case either disconnected before eviction).
  246. type PeerManager struct {
  247. options PeerManagerOptions
  248. wakeDialCh chan struct{} // wakes up DialNext() on relevant peer changes
  249. wakeEvictCh chan struct{} // wakes up EvictNext() on relevant peer changes
  250. closeCh chan struct{} // signal channel for Close()
  251. closeOnce sync.Once
  252. mtx sync.Mutex
  253. store *peerStore
  254. dialing map[NodeID]bool // peers being dialed (DialNext -> Dialed/DialFail)
  255. connected map[NodeID]bool // connected peers (Dialed/Accepted -> Disconnected)
  256. upgrading map[NodeID]NodeID // peers claimed for upgrading (key is lower-scored peer)
  257. evicting map[NodeID]bool // peers being evicted (EvictNext -> Disconnected)
  258. subscriptions map[*PeerUpdatesCh]*PeerUpdatesCh // keyed by struct identity (address)
  259. }
  260. // PeerManagerOptions specifies options for a PeerManager.
  261. type PeerManagerOptions struct {
  262. // PersistentPeers are peers that we want to maintain persistent connections
  263. // to. These will be scored higher than other peers, and if
  264. // MaxConnectedUpgrade is non-zero any lower-scored peers will be evicted if
  265. // necessary to make room for these.
  266. PersistentPeers []NodeID
  267. // MaxConnected is the maximum number of connected peers (inbound and
  268. // outbound). 0 means no limit.
  269. MaxConnected uint16
  270. // MaxConnectedUpgrade is the maximum number of additional connections to
  271. // use for probing any better-scored peers to upgrade to when all connection
  272. // slots are full. 0 disables peer upgrading.
  273. //
  274. // For example, if we are already connected to MaxConnected peers, but we
  275. // know or learn about better-scored peers (e.g. configured persistent
  276. // peers) that we are not connected too, then we can probe these peers by
  277. // using up to MaxConnectedUpgrade connections, and once connected evict the
  278. // lowest-scored connected peers. This also works for inbound connections,
  279. // i.e. if a higher-scored peer attempts to connect to us, we can accept
  280. // the connection and evict a lower-scored peer.
  281. MaxConnectedUpgrade uint16
  282. // MinRetryTime is the minimum time to wait between retries. Retry times
  283. // double for each retry, up to MaxRetryTime. 0 disables retries.
  284. MinRetryTime time.Duration
  285. // MaxRetryTime is the maximum time to wait between retries. 0 means
  286. // no maximum, in which case the retry time will keep doubling.
  287. MaxRetryTime time.Duration
  288. // MaxRetryTimePersistent is the maximum time to wait between retries for
  289. // peers listed in PersistentPeers. 0 uses MaxRetryTime instead.
  290. MaxRetryTimePersistent time.Duration
  291. // RetryTimeJitter is the upper bound of a random interval added to
  292. // retry times, to avoid thundering herds. 0 disables jutter.
  293. RetryTimeJitter time.Duration
  294. }
  295. // isPersistent is a convenience function that checks if the given peer ID
  296. // is contained in PersistentPeers. It just uses a linear search, since
  297. // PersistentPeers is expected to be small.
  298. func (o PeerManagerOptions) isPersistent(id NodeID) bool {
  299. for _, p := range o.PersistentPeers {
  300. if id == p {
  301. return true
  302. }
  303. }
  304. return false
  305. }
  306. // NewPeerManager creates a new peer manager.
  307. func NewPeerManager(options PeerManagerOptions) *PeerManager {
  308. return &PeerManager{
  309. options: options,
  310. closeCh: make(chan struct{}),
  311. // We use a buffer of size 1 for these trigger channels, with
  312. // non-blocking sends. This ensures that if e.g. wakeDial() is called
  313. // multiple times before the initial trigger is picked up we only
  314. // process the trigger once.
  315. //
  316. // FIXME: This should maybe be a libs/sync type.
  317. wakeDialCh: make(chan struct{}, 1),
  318. wakeEvictCh: make(chan struct{}, 1),
  319. // FIXME: Once the store persists data, we need to update existing
  320. // peers in the store with any new information, e.g. changes to
  321. // PersistentPeers configuration.
  322. store: newPeerStore(),
  323. dialing: map[NodeID]bool{},
  324. connected: map[NodeID]bool{},
  325. upgrading: map[NodeID]NodeID{},
  326. evicting: map[NodeID]bool{},
  327. subscriptions: map[*PeerUpdatesCh]*PeerUpdatesCh{},
  328. }
  329. }
  330. // Close closes the peer manager, releasing resources allocated with it
  331. // (specifically any running goroutines).
  332. func (m *PeerManager) Close() {
  333. m.closeOnce.Do(func() {
  334. close(m.closeCh)
  335. })
  336. }
  337. // Add adds a peer to the manager, given as an address. If the peer already
  338. // exists, the address is added to it.
  339. func (m *PeerManager) Add(address PeerAddress) error {
  340. if err := address.Validate(); err != nil {
  341. return err
  342. }
  343. m.mtx.Lock()
  344. defer m.mtx.Unlock()
  345. peer, err := m.store.Get(address.NodeID())
  346. if err != nil {
  347. return err
  348. }
  349. if peer == nil {
  350. peer = &peerInfo{
  351. ID: address.NodeID(),
  352. Persistent: m.options.isPersistent(address.NodeID()),
  353. }
  354. }
  355. peer.AddAddress(address)
  356. err = m.store.Set(peer)
  357. if err != nil {
  358. return err
  359. }
  360. m.wakeDial()
  361. return nil
  362. }
  363. // Subscribe subscribes to peer updates. The caller must consume the peer
  364. // updates in a timely fashion and close the subscription when done, since
  365. // delivery is guaranteed and will block peer connection/disconnection
  366. // otherwise.
  367. func (m *PeerManager) Subscribe() *PeerUpdatesCh {
  368. // FIXME: We may want to use a size 1 buffer here. When the router
  369. // broadcasts a peer update it has to loop over all of the
  370. // subscriptions, and we want to avoid blocking and waiting for a
  371. // context switch before continuing to the next subscription. This also
  372. // prevents tail latencies from compounding across updates. We also want
  373. // to make sure the subscribers are reasonably in sync, so it should be
  374. // kept at 1. However, this should be benchmarked first.
  375. peerUpdates := NewPeerUpdates(make(chan PeerUpdate))
  376. m.mtx.Lock()
  377. m.subscriptions[peerUpdates] = peerUpdates
  378. m.mtx.Unlock()
  379. go func() {
  380. <-peerUpdates.Done()
  381. m.mtx.Lock()
  382. delete(m.subscriptions, peerUpdates)
  383. m.mtx.Unlock()
  384. }()
  385. return peerUpdates
  386. }
  387. // broadcast broadcasts a peer update to all subscriptions. The caller must
  388. // already hold the mutex lock. This means the mutex is held for the duration
  389. // of the broadcast, which we want to make sure all subscriptions receive all
  390. // updates in the same order.
  391. //
  392. // FIXME: Consider using more fine-grained mutexes here, and/or a channel to
  393. // enforce ordering of updates.
  394. func (m *PeerManager) broadcast(peerUpdate PeerUpdate) {
  395. for _, sub := range m.subscriptions {
  396. select {
  397. case sub.updatesCh <- peerUpdate:
  398. case <-sub.doneCh:
  399. }
  400. }
  401. }
  402. // DialNext finds an appropriate peer address to dial, and marks it as dialing.
  403. // If no peer is found, or all connection slots are full, it blocks until one
  404. // becomes available. The caller must call Dialed() or DialFailed() for the
  405. // returned peer. The context can be used to cancel the call.
  406. func (m *PeerManager) DialNext(ctx context.Context) (NodeID, PeerAddress, error) {
  407. for {
  408. id, address, err := m.TryDialNext()
  409. if err != nil || id != "" {
  410. return id, address, err
  411. }
  412. select {
  413. case <-m.wakeDialCh:
  414. case <-ctx.Done():
  415. return "", PeerAddress{}, ctx.Err()
  416. }
  417. }
  418. }
  419. // TryDialNext is equivalent to DialNext(), but immediately returns an empty
  420. // peer ID if no peers or connection slots are available.
  421. func (m *PeerManager) TryDialNext() (NodeID, PeerAddress, error) {
  422. m.mtx.Lock()
  423. defer m.mtx.Unlock()
  424. // We allow dialing MaxConnected+MaxConnectedUpgrade peers. Including
  425. // MaxConnectedUpgrade allows us to probe additional peers that have a
  426. // higher score than a connected peer, and if successful evict the
  427. // lower-scored peer via EvictNext().
  428. if m.options.MaxConnected > 0 &&
  429. len(m.connected)+len(m.dialing) >= int(m.options.MaxConnected)+int(m.options.MaxConnectedUpgrade) {
  430. return "", PeerAddress{}, nil
  431. }
  432. ranked, err := m.store.Ranked()
  433. if err != nil {
  434. return "", PeerAddress{}, err
  435. }
  436. for _, peer := range ranked {
  437. if m.dialing[peer.ID] || m.connected[peer.ID] {
  438. continue
  439. }
  440. for _, addressInfo := range peer.AddressInfo {
  441. if time.Since(addressInfo.LastDialFailure) < m.retryDelay(peer, addressInfo.DialFailures) {
  442. continue
  443. }
  444. // At this point we have an eligible address to dial. If we're full
  445. // but have peer upgrade capacity (as checked above), we need to
  446. // make sure there exists an evictable peer of a lower score that we
  447. // can replace. If so, we mark the lower-scored peer as upgrading so
  448. // noone else can claim it, and EvictNext() will evict it later.
  449. //
  450. // If we don't find one, there is no point in trying additional
  451. // peers, since they will all have the same or lower score than this
  452. // peer (since they're ordered by score via peerStore.Ranked).
  453. if m.options.MaxConnected > 0 && len(m.connected) >= int(m.options.MaxConnected) {
  454. upgradePeer := m.findUpgradeCandidate(peer, ranked)
  455. if upgradePeer == "" {
  456. return "", PeerAddress{}, nil
  457. }
  458. m.upgrading[upgradePeer] = peer.ID
  459. }
  460. m.dialing[peer.ID] = true
  461. return peer.ID, addressInfo.Address, nil
  462. }
  463. }
  464. return "", PeerAddress{}, nil
  465. }
  466. // wakeDial is used to notify DialNext about changes that *may* cause new
  467. // peers to become eligible for dialing, such as peer disconnections and
  468. // retry timeouts.
  469. func (m *PeerManager) wakeDial() {
  470. // The channel has a 1-size buffer. A non-blocking send ensures
  471. // we only queue up at most 1 trigger between each DialNext().
  472. select {
  473. case m.wakeDialCh <- struct{}{}:
  474. default:
  475. }
  476. }
  477. // wakeEvict is used to notify EvictNext about changes that *may* cause
  478. // peers to become eligible for eviction, such as peer upgrades.
  479. func (m *PeerManager) wakeEvict() {
  480. // The channel has a 1-size buffer. A non-blocking send ensures
  481. // we only queue up at most 1 trigger between each EvictNext().
  482. select {
  483. case m.wakeEvictCh <- struct{}{}:
  484. default:
  485. }
  486. }
  487. // retryDelay calculates a dial retry delay using exponential backoff, based on
  488. // retry settings in PeerManagerOptions. If MinRetryTime is 0, this returns
  489. // MaxInt64 (i.e. an infinite retry delay, effectively disabling retries).
  490. func (m *PeerManager) retryDelay(peer *peerInfo, failures uint32) time.Duration {
  491. if failures == 0 {
  492. return 0
  493. }
  494. if m.options.MinRetryTime == 0 {
  495. return time.Duration(math.MaxInt64)
  496. }
  497. maxDelay := m.options.MaxRetryTime
  498. if peer.Persistent && m.options.MaxRetryTimePersistent > 0 {
  499. maxDelay = m.options.MaxRetryTimePersistent
  500. }
  501. delay := m.options.MinRetryTime * time.Duration(math.Pow(2, float64(failures)))
  502. if maxDelay > 0 && delay > maxDelay {
  503. delay = maxDelay
  504. }
  505. // FIXME: This should use a PeerManager-scoped RNG.
  506. delay += time.Duration(rand.Int63n(int64(m.options.RetryTimeJitter))) // nolint:gosec
  507. return delay
  508. }
  509. // DialFailed reports a failed dial attempt. This will make the peer available
  510. // for dialing again when appropriate.
  511. //
  512. // FIXME: This should probably delete or mark bad addresses/peers after some time.
  513. func (m *PeerManager) DialFailed(peerID NodeID, address PeerAddress) error {
  514. m.mtx.Lock()
  515. defer m.mtx.Unlock()
  516. delete(m.dialing, peerID)
  517. for from, to := range m.upgrading {
  518. if to == peerID {
  519. // Unmark failed upgrade attempt.
  520. delete(m.upgrading, from)
  521. }
  522. }
  523. peer, err := m.store.Get(peerID)
  524. if err != nil || peer == nil { // Peer may have been removed while dialing, ignore.
  525. return err
  526. }
  527. addressInfo := peer.LookupAddressInfo(address)
  528. if addressInfo == nil {
  529. return nil // Assume the address has been removed, ignore.
  530. }
  531. addressInfo.LastDialFailure = time.Now().UTC()
  532. addressInfo.DialFailures++
  533. if err = m.store.Set(peer); err != nil {
  534. return err
  535. }
  536. // We spawn a goroutine that notifies DialNext() again when the retry
  537. // timeout has elapsed, so that we can consider dialing it again.
  538. //
  539. // FIXME: We need to calculate the retry delay outside of the goroutine,
  540. // since the arguments are currently pointers to structs shared in the
  541. // peerStore. The peerStore should probably return struct copies instead,
  542. // to avoid these sorts of issues.
  543. if retryDelay := m.retryDelay(peer, addressInfo.DialFailures); retryDelay != time.Duration(math.MaxInt64) {
  544. go func() {
  545. // Use an explicit timer with deferred cleanup instead of
  546. // time.After(), to avoid leaking goroutines on PeerManager.Close().
  547. timer := time.NewTimer(retryDelay)
  548. defer timer.Stop()
  549. select {
  550. case <-timer.C:
  551. m.wakeDial()
  552. case <-m.closeCh:
  553. }
  554. }()
  555. }
  556. m.wakeDial()
  557. return nil
  558. }
  559. // Dialed marks a peer as successfully dialed. Any further incoming connections
  560. // will be rejected, and once disconnected the peer may be dialed again.
  561. func (m *PeerManager) Dialed(peerID NodeID, address PeerAddress) error {
  562. m.mtx.Lock()
  563. defer m.mtx.Unlock()
  564. delete(m.dialing, peerID)
  565. if m.connected[peerID] {
  566. return fmt.Errorf("peer %v is already connected", peerID)
  567. }
  568. if m.options.MaxConnected > 0 &&
  569. len(m.connected) >= int(m.options.MaxConnected)+int(m.options.MaxConnectedUpgrade) {
  570. return fmt.Errorf("already connected to maximum number of peers")
  571. }
  572. peer, err := m.store.Get(peerID)
  573. if err != nil {
  574. return err
  575. } else if peer == nil {
  576. return fmt.Errorf("peer %q was removed while dialing", peerID)
  577. }
  578. now := time.Now().UTC()
  579. peer.LastConnected = now
  580. if addressInfo := peer.LookupAddressInfo(address); addressInfo != nil {
  581. addressInfo.DialFailures = 0
  582. addressInfo.LastDialSuccess = now
  583. }
  584. if err = m.store.Set(peer); err != nil {
  585. return err
  586. }
  587. m.connected[peerID] = true
  588. m.wakeEvict()
  589. return nil
  590. }
  591. // Accepted marks an incoming peer connection successfully accepted. If the peer
  592. // is already connected or we don't allow additional connections then this will
  593. // return an error.
  594. //
  595. // If MaxConnectedUpgrade is non-zero, the accepted peer is better-scored than any
  596. // other connected peer, and the number of connections does not exceed
  597. // MaxConnected + MaxConnectedUpgrade then we accept the connection and rely on
  598. // EvictNext() to evict lower-scored peers.
  599. //
  600. // NOTE: We can't take an address here, since e.g. TCP uses a different port
  601. // number for outbound traffic than inbound traffic, so the peer's endpoint
  602. // wouldn't necessarily be an appropriate address to dial.
  603. func (m *PeerManager) Accepted(peerID NodeID) error {
  604. m.mtx.Lock()
  605. defer m.mtx.Unlock()
  606. if m.connected[peerID] {
  607. return fmt.Errorf("peer %q is already connected", peerID)
  608. }
  609. if m.options.MaxConnected > 0 &&
  610. len(m.connected) >= int(m.options.MaxConnected)+int(m.options.MaxConnectedUpgrade) {
  611. return fmt.Errorf("already connected to maximum number of peers")
  612. }
  613. peer, err := m.store.Get(peerID)
  614. if err != nil {
  615. return err
  616. }
  617. if peer == nil {
  618. peer = &peerInfo{
  619. ID: peerID,
  620. Persistent: m.options.isPersistent(peerID),
  621. }
  622. }
  623. // If we're already full (i.e. at MaxConnected), but we allow upgrades (and
  624. // we know from the check above that we have upgrade capacity), then we can
  625. // look for any lower-scored evictable peer, and if found we can accept this
  626. // connection anyway and let EvictNext() evict a lower-scored peer for us.
  627. if m.options.MaxConnected > 0 && len(m.connected) >= int(m.options.MaxConnected) {
  628. ranked, err := m.store.Ranked()
  629. if err != nil {
  630. return err
  631. }
  632. upgradePeer := m.findUpgradeCandidate(peer, ranked)
  633. if upgradePeer == "" {
  634. return fmt.Errorf("already connected to maximum number of peers")
  635. }
  636. m.upgrading[upgradePeer] = peerID
  637. }
  638. peer.LastConnected = time.Now().UTC()
  639. if err = m.store.Set(peer); err != nil {
  640. return err
  641. }
  642. m.connected[peerID] = true
  643. m.wakeEvict()
  644. return nil
  645. }
  646. // Ready marks a peer as ready, broadcasting status updates to subscribers. The
  647. // peer must already be marked as connected. This is separate from Dialed() and
  648. // Accepted() to allow the router to set up its internal queues before reactors
  649. // start sending messages.
  650. func (m *PeerManager) Ready(peerID NodeID) {
  651. m.mtx.Lock()
  652. defer m.mtx.Unlock()
  653. if m.connected[peerID] {
  654. m.broadcast(PeerUpdate{
  655. PeerID: peerID,
  656. Status: PeerStatusUp,
  657. })
  658. }
  659. }
  660. // Disconnected unmarks a peer as connected, allowing new connections to be
  661. // established.
  662. func (m *PeerManager) Disconnected(peerID NodeID) error {
  663. m.mtx.Lock()
  664. defer m.mtx.Unlock()
  665. // After upgrading to a peer, it's possible for that peer to disconnect
  666. // before EvictNext() gets around to evicting the lower-scored peer. To
  667. // avoid stale upgrade markers, we remove it here.
  668. for from, to := range m.upgrading {
  669. if to == peerID {
  670. delete(m.upgrading, from)
  671. }
  672. }
  673. delete(m.connected, peerID)
  674. delete(m.upgrading, peerID)
  675. delete(m.evicting, peerID)
  676. m.broadcast(PeerUpdate{
  677. PeerID: peerID,
  678. Status: PeerStatusDown,
  679. })
  680. m.wakeDial()
  681. return nil
  682. }
  683. // EvictNext returns the next peer to evict (i.e. disconnect). If no evictable
  684. // peers are found, the call will block until one becomes available or the
  685. // context is cancelled.
  686. func (m *PeerManager) EvictNext(ctx context.Context) (NodeID, error) {
  687. for {
  688. id, err := m.TryEvictNext()
  689. if err != nil || id != "" {
  690. return id, err
  691. }
  692. select {
  693. case <-m.wakeEvictCh:
  694. case <-ctx.Done():
  695. return "", ctx.Err()
  696. }
  697. }
  698. }
  699. // TryEvictNext is equivalent to EvictNext, but immediately returns an empty
  700. // node ID if no evictable peers are found.
  701. func (m *PeerManager) TryEvictNext() (NodeID, error) {
  702. m.mtx.Lock()
  703. defer m.mtx.Unlock()
  704. // We first prune the upgrade list. All connection slots were full when the
  705. // upgrades began, but we may have disconnected other peers in the meanwhile
  706. // and thus don't have to evict the upgraded peers after all.
  707. for from, to := range m.upgrading {
  708. // Stop pruning when the upgrade slots are only for connections
  709. // exceeding MaxConnected.
  710. if m.options.MaxConnected == 0 ||
  711. len(m.upgrading) <= len(m.connected)-len(m.evicting)-int(m.options.MaxConnected) {
  712. break
  713. }
  714. if m.connected[to] {
  715. delete(m.upgrading, from)
  716. }
  717. }
  718. // If we're below capacity, we don't need to evict anything.
  719. if m.options.MaxConnected == 0 ||
  720. len(m.connected)-len(m.evicting) <= int(m.options.MaxConnected) {
  721. return "", nil
  722. }
  723. ranked, err := m.store.Ranked()
  724. if err != nil {
  725. return "", err
  726. }
  727. // Look for any upgraded peers that we can evict.
  728. for from, to := range m.upgrading {
  729. if m.connected[to] {
  730. delete(m.upgrading, from)
  731. // We may have connected to even lower-scored peers that we can
  732. // evict since we started upgrading this one, in which case we can
  733. // evict one of those.
  734. fromPeer, err := m.store.Get(from)
  735. if err != nil {
  736. return "", err
  737. } else if fromPeer == nil {
  738. continue
  739. } else if evictPeer := m.findUpgradeCandidate(fromPeer, ranked); evictPeer != "" {
  740. m.evicting[evictPeer] = true
  741. return evictPeer, nil
  742. } else {
  743. m.evicting[from] = true
  744. return from, nil
  745. }
  746. }
  747. }
  748. // If we didn't find any upgraded peers to evict, we just pick a low-ranked one.
  749. for i := len(ranked) - 1; i >= 0; i-- {
  750. peer := ranked[i]
  751. if m.connected[peer.ID] && !m.evicting[peer.ID] {
  752. m.evicting[peer.ID] = true
  753. return peer.ID, nil
  754. }
  755. }
  756. return "", nil
  757. }
  758. // findUpgradeCandidate looks for a lower-scored peer that we could evict
  759. // to make room for the given peer. Returns an empty ID if none is found.
  760. // The caller must hold the mutex lock.
  761. func (m *PeerManager) findUpgradeCandidate(peer *peerInfo, ranked []*peerInfo) NodeID {
  762. // Check for any existing upgrade claims to this peer. It is important that
  763. // we return this, since we can get an inbound connection from a peer that
  764. // we're concurrently trying to dial for an upgrade, and we want the inbound
  765. // connection to be accepted in this case.
  766. for from, to := range m.upgrading {
  767. if to == peer.ID {
  768. return from
  769. }
  770. }
  771. for i := len(ranked) - 1; i >= 0; i-- {
  772. candidate := ranked[i]
  773. switch {
  774. case candidate.Score() >= peer.Score():
  775. return "" // no further peers can be scored lower, due to sorting
  776. case !m.connected[candidate.ID]:
  777. case m.evicting[candidate.ID]:
  778. case m.upgrading[candidate.ID] != "":
  779. default:
  780. return candidate.ID
  781. }
  782. }
  783. return ""
  784. }
  785. // GetHeight returns a peer's height, as reported via SetHeight. If the peer
  786. // or height is unknown, this returns 0.
  787. //
  788. // FIXME: This is a temporary workaround for the peer state stored via the
  789. // legacy Peer.Set() and Peer.Get() APIs, used to share height state between the
  790. // consensus and mempool reactors. These dependencies should be removed from the
  791. // reactors, and instead query this information independently via new P2P
  792. // protocol additions.
  793. func (m *PeerManager) GetHeight(peerID NodeID) (int64, error) {
  794. m.mtx.Lock()
  795. defer m.mtx.Unlock()
  796. peer, err := m.store.Get(peerID)
  797. if err != nil || peer == nil {
  798. return 0, err
  799. }
  800. return peer.Height, nil
  801. }
  802. // SetHeight stores a peer's height, making it available via GetHeight. If the
  803. // peer is unknown, it is created.
  804. //
  805. // FIXME: This is a temporary workaround for the peer state stored via the
  806. // legacy Peer.Set() and Peer.Get() APIs, used to share height state between the
  807. // consensus and mempool reactors. These dependencies should be removed from the
  808. // reactors, and instead query this information independently via new P2P
  809. // protocol additions.
  810. func (m *PeerManager) SetHeight(peerID NodeID, height int64) error {
  811. m.mtx.Lock()
  812. defer m.mtx.Unlock()
  813. peer, err := m.store.Get(peerID)
  814. if err != nil {
  815. return err
  816. }
  817. if peer == nil {
  818. peer = &peerInfo{
  819. ID: peerID,
  820. Persistent: m.options.isPersistent(peerID),
  821. }
  822. }
  823. peer.Height = height
  824. return m.store.Set(peer)
  825. }
  826. // peerStore stores information about peers. It is currently a bare-bones
  827. // in-memory store, and will be fleshed out later.
  828. //
  829. // peerStore is not thread-safe, since it assumes it is only used by PeerManager
  830. // which handles concurrency control. This allows the manager to execute multiple
  831. // operations atomically while it holds the mutex.
  832. type peerStore struct {
  833. peers map[NodeID]peerInfo
  834. }
  835. // newPeerStore creates a new peer store.
  836. func newPeerStore() *peerStore {
  837. return &peerStore{
  838. peers: map[NodeID]peerInfo{},
  839. }
  840. }
  841. // Get fetches a peer, returning nil if not found.
  842. func (s *peerStore) Get(id NodeID) (*peerInfo, error) {
  843. peer, ok := s.peers[id]
  844. if !ok {
  845. return nil, nil
  846. }
  847. return &peer, nil
  848. }
  849. // Set stores peer data.
  850. func (s *peerStore) Set(peer *peerInfo) error {
  851. if peer == nil {
  852. return errors.New("peer cannot be nil")
  853. }
  854. s.peers[peer.ID] = *peer
  855. return nil
  856. }
  857. // List retrieves all peers.
  858. func (s *peerStore) List() ([]*peerInfo, error) {
  859. peers := []*peerInfo{}
  860. for _, peer := range s.peers {
  861. peer := peer
  862. peers = append(peers, &peer)
  863. }
  864. return peers, nil
  865. }
  866. // Ranked returns a list of peers ordered by score (better peers first).
  867. // Peers with equal scores are returned in an arbitrary order.
  868. //
  869. // This is used to determine which peers to connect to and which peers to evict
  870. // in order to make room for better peers.
  871. //
  872. // FIXME: For now, we simply generate the list on every call, but this can get
  873. // expensive since it's called fairly frequently. We may want to either cache
  874. // this, or store peers in a data structure that maintains order (e.g. a heap or
  875. // ordered map).
  876. func (s *peerStore) Ranked() ([]*peerInfo, error) {
  877. peers, err := s.List()
  878. if err != nil {
  879. return nil, err
  880. }
  881. sort.Slice(peers, func(i, j int) bool {
  882. // FIXME: If necessary, consider precomputing scores before sorting,
  883. // to reduce the number of Score() calls.
  884. return peers[i].Score() > peers[j].Score()
  885. })
  886. return peers, nil
  887. }
  888. // peerInfo contains peer information stored in a peerStore.
  889. type peerInfo struct {
  890. ID NodeID
  891. AddressInfo []*addressInfo
  892. Persistent bool
  893. Height int64
  894. LastConnected time.Time
  895. }
  896. // AddAddress adds an address to a peer, unless it already exists. It does not
  897. // validate the address. Returns true if the address was new.
  898. func (p *peerInfo) AddAddress(address PeerAddress) bool {
  899. if p.LookupAddressInfo(address) != nil {
  900. return false
  901. }
  902. p.AddressInfo = append(p.AddressInfo, &addressInfo{Address: address})
  903. return true
  904. }
  905. // LookupAddressInfo returns address info for an address, or nil if unknown.
  906. func (p *peerInfo) LookupAddressInfo(address PeerAddress) *addressInfo {
  907. // We just do a linear search for now.
  908. addressString := address.String()
  909. for _, info := range p.AddressInfo {
  910. if info.Address.String() == addressString {
  911. return info
  912. }
  913. }
  914. return nil
  915. }
  916. // Score calculates a score for the peer. Higher-scored peers will be
  917. // preferred over lower scores.
  918. func (p *peerInfo) Score() PeerScore {
  919. var score PeerScore
  920. if p.Persistent {
  921. score += PeerScorePersistent
  922. }
  923. return score
  924. }
  925. // addressInfo contains information and statistics about an address.
  926. type addressInfo struct {
  927. Address PeerAddress
  928. LastDialSuccess time.Time
  929. LastDialFailure time.Time
  930. DialFailures uint32 // since last successful dial
  931. }
  932. // ============================================================================
  933. // Types and business logic below may be deprecated.
  934. //
  935. // TODO: Rename once legacy p2p types are removed.
  936. // ref: https://github.com/tendermint/tendermint/issues/5670
  937. // ============================================================================
  938. //go:generate mockery --case underscore --name Peer
  939. const metricsTickerDuration = 10 * time.Second
  940. // Peer is an interface representing a peer connected on a reactor.
  941. type Peer interface {
  942. service.Service
  943. FlushStop()
  944. ID() NodeID // peer's cryptographic ID
  945. RemoteIP() net.IP // remote IP of the connection
  946. RemoteAddr() net.Addr // remote address of the connection
  947. IsOutbound() bool // did we dial the peer
  948. IsPersistent() bool // do we redial this peer when we disconnect
  949. CloseConn() error // close original connection
  950. NodeInfo() NodeInfo // peer's info
  951. Status() tmconn.ConnectionStatus
  952. SocketAddr() *NetAddress // actual address of the socket
  953. Send(byte, []byte) bool
  954. TrySend(byte, []byte) bool
  955. Set(string, interface{})
  956. Get(string) interface{}
  957. }
  958. //----------------------------------------------------------
  959. // peerConn contains the raw connection and its config.
  960. type peerConn struct {
  961. outbound bool
  962. persistent bool
  963. conn Connection
  964. ip net.IP // cached RemoteIP()
  965. }
  966. func newPeerConn(outbound, persistent bool, conn Connection) peerConn {
  967. return peerConn{
  968. outbound: outbound,
  969. persistent: persistent,
  970. conn: conn,
  971. }
  972. }
  973. // ID only exists for SecretConnection.
  974. func (pc peerConn) ID() NodeID {
  975. return NodeIDFromPubKey(pc.conn.PubKey())
  976. }
  977. // Return the IP from the connection RemoteAddr
  978. func (pc peerConn) RemoteIP() net.IP {
  979. if pc.ip == nil {
  980. pc.ip = pc.conn.RemoteEndpoint().IP
  981. }
  982. return pc.ip
  983. }
  984. // peer implements Peer.
  985. //
  986. // Before using a peer, you will need to perform a handshake on connection.
  987. type peer struct {
  988. service.BaseService
  989. // raw peerConn and the multiplex connection
  990. peerConn
  991. // peer's node info and the channel it knows about
  992. // channels = nodeInfo.Channels
  993. // cached to avoid copying nodeInfo in hasChannel
  994. nodeInfo NodeInfo
  995. channels []byte
  996. reactors map[byte]Reactor
  997. onPeerError func(Peer, interface{})
  998. // User data
  999. Data *cmap.CMap
  1000. metrics *Metrics
  1001. metricsTicker *time.Ticker
  1002. }
  1003. type PeerOption func(*peer)
  1004. func newPeer(
  1005. pc peerConn,
  1006. reactorsByCh map[byte]Reactor,
  1007. onPeerError func(Peer, interface{}),
  1008. options ...PeerOption,
  1009. ) *peer {
  1010. nodeInfo := pc.conn.NodeInfo()
  1011. p := &peer{
  1012. peerConn: pc,
  1013. nodeInfo: nodeInfo,
  1014. channels: nodeInfo.Channels, // TODO
  1015. reactors: reactorsByCh,
  1016. onPeerError: onPeerError,
  1017. Data: cmap.NewCMap(),
  1018. metricsTicker: time.NewTicker(metricsTickerDuration),
  1019. metrics: NopMetrics(),
  1020. }
  1021. p.BaseService = *service.NewBaseService(nil, "Peer", p)
  1022. for _, option := range options {
  1023. option(p)
  1024. }
  1025. return p
  1026. }
  1027. // onError calls the peer error callback.
  1028. func (p *peer) onError(err interface{}) {
  1029. p.onPeerError(p, err)
  1030. }
  1031. // String representation.
  1032. func (p *peer) String() string {
  1033. if p.outbound {
  1034. return fmt.Sprintf("Peer{%v %v out}", p.conn, p.ID())
  1035. }
  1036. return fmt.Sprintf("Peer{%v %v in}", p.conn, p.ID())
  1037. }
  1038. //---------------------------------------------------
  1039. // Implements service.Service
  1040. // SetLogger implements BaseService.
  1041. func (p *peer) SetLogger(l log.Logger) {
  1042. p.Logger = l
  1043. }
  1044. // OnStart implements BaseService.
  1045. func (p *peer) OnStart() error {
  1046. if err := p.BaseService.OnStart(); err != nil {
  1047. return err
  1048. }
  1049. go p.processMessages()
  1050. go p.metricsReporter()
  1051. return nil
  1052. }
  1053. // processMessages processes messages received from the connection.
  1054. func (p *peer) processMessages() {
  1055. defer func() {
  1056. if r := recover(); r != nil {
  1057. p.Logger.Error("peer message processing panic", "err", r, "stack", string(debug.Stack()))
  1058. p.onError(fmt.Errorf("panic during peer message processing: %v", r))
  1059. }
  1060. }()
  1061. for {
  1062. chID, msg, err := p.conn.ReceiveMessage()
  1063. if err != nil {
  1064. p.onError(err)
  1065. return
  1066. }
  1067. reactor, ok := p.reactors[chID]
  1068. if !ok {
  1069. p.onError(fmt.Errorf("unknown channel %v", chID))
  1070. return
  1071. }
  1072. reactor.Receive(chID, p, msg)
  1073. }
  1074. }
  1075. // FlushStop mimics OnStop but additionally ensures that all successful
  1076. // .Send() calls will get flushed before closing the connection.
  1077. // NOTE: it is not safe to call this method more than once.
  1078. func (p *peer) FlushStop() {
  1079. p.metricsTicker.Stop()
  1080. p.BaseService.OnStop()
  1081. if err := p.conn.FlushClose(); err != nil {
  1082. p.Logger.Debug("error while stopping peer", "err", err)
  1083. }
  1084. }
  1085. // OnStop implements BaseService.
  1086. func (p *peer) OnStop() {
  1087. p.metricsTicker.Stop()
  1088. p.BaseService.OnStop()
  1089. if err := p.conn.Close(); err != nil {
  1090. p.Logger.Debug("error while stopping peer", "err", err)
  1091. }
  1092. }
  1093. //---------------------------------------------------
  1094. // Implements Peer
  1095. // ID returns the peer's ID - the hex encoded hash of its pubkey.
  1096. func (p *peer) ID() NodeID {
  1097. return p.nodeInfo.ID()
  1098. }
  1099. // IsOutbound returns true if the connection is outbound, false otherwise.
  1100. func (p *peer) IsOutbound() bool {
  1101. return p.peerConn.outbound
  1102. }
  1103. // IsPersistent returns true if the peer is persitent, false otherwise.
  1104. func (p *peer) IsPersistent() bool {
  1105. return p.peerConn.persistent
  1106. }
  1107. // NodeInfo returns a copy of the peer's NodeInfo.
  1108. func (p *peer) NodeInfo() NodeInfo {
  1109. return p.nodeInfo
  1110. }
  1111. // SocketAddr returns the address of the socket.
  1112. // For outbound peers, it's the address dialed (after DNS resolution).
  1113. // For inbound peers, it's the address returned by the underlying connection
  1114. // (not what's reported in the peer's NodeInfo).
  1115. func (p *peer) SocketAddr() *NetAddress {
  1116. return p.peerConn.conn.RemoteEndpoint().NetAddress()
  1117. }
  1118. // Status returns the peer's ConnectionStatus.
  1119. func (p *peer) Status() tmconn.ConnectionStatus {
  1120. return p.conn.Status()
  1121. }
  1122. // Send msg bytes to the channel identified by chID byte. Returns false if the
  1123. // send queue is full after timeout, specified by MConnection.
  1124. func (p *peer) Send(chID byte, msgBytes []byte) bool {
  1125. if !p.IsRunning() {
  1126. // see Switch#Broadcast, where we fetch the list of peers and loop over
  1127. // them - while we're looping, one peer may be removed and stopped.
  1128. return false
  1129. } else if !p.hasChannel(chID) {
  1130. return false
  1131. }
  1132. res, err := p.conn.SendMessage(chID, msgBytes)
  1133. if err == io.EOF {
  1134. return false
  1135. } else if err != nil {
  1136. p.onError(err)
  1137. return false
  1138. }
  1139. if res {
  1140. labels := []string{
  1141. "peer_id", string(p.ID()),
  1142. "chID", fmt.Sprintf("%#x", chID),
  1143. }
  1144. p.metrics.PeerSendBytesTotal.With(labels...).Add(float64(len(msgBytes)))
  1145. }
  1146. return res
  1147. }
  1148. // TrySend msg bytes to the channel identified by chID byte. Immediately returns
  1149. // false if the send queue is full.
  1150. func (p *peer) TrySend(chID byte, msgBytes []byte) bool {
  1151. if !p.IsRunning() {
  1152. return false
  1153. } else if !p.hasChannel(chID) {
  1154. return false
  1155. }
  1156. res, err := p.conn.TrySendMessage(chID, msgBytes)
  1157. if err == io.EOF {
  1158. return false
  1159. } else if err != nil {
  1160. p.onError(err)
  1161. return false
  1162. }
  1163. if res {
  1164. labels := []string{
  1165. "peer_id", string(p.ID()),
  1166. "chID", fmt.Sprintf("%#x", chID),
  1167. }
  1168. p.metrics.PeerSendBytesTotal.With(labels...).Add(float64(len(msgBytes)))
  1169. }
  1170. return res
  1171. }
  1172. // Get the data for a given key.
  1173. func (p *peer) Get(key string) interface{} {
  1174. return p.Data.Get(key)
  1175. }
  1176. // Set sets the data for the given key.
  1177. func (p *peer) Set(key string, data interface{}) {
  1178. p.Data.Set(key, data)
  1179. }
  1180. // hasChannel returns true if the peer reported
  1181. // knowing about the given chID.
  1182. func (p *peer) hasChannel(chID byte) bool {
  1183. for _, ch := range p.channels {
  1184. if ch == chID {
  1185. return true
  1186. }
  1187. }
  1188. // NOTE: probably will want to remove this
  1189. // but could be helpful while the feature is new
  1190. p.Logger.Debug(
  1191. "Unknown channel for peer",
  1192. "channel",
  1193. chID,
  1194. "channels",
  1195. p.channels,
  1196. )
  1197. return false
  1198. }
  1199. // CloseConn closes original connection. Used for cleaning up in cases where the peer had not been started at all.
  1200. func (p *peer) CloseConn() error {
  1201. return p.peerConn.conn.Close()
  1202. }
  1203. //---------------------------------------------------
  1204. // methods only used for testing
  1205. // TODO: can we remove these?
  1206. // CloseConn closes the underlying connection
  1207. func (pc *peerConn) CloseConn() {
  1208. pc.conn.Close()
  1209. }
  1210. // RemoteAddr returns peer's remote network address.
  1211. func (p *peer) RemoteAddr() net.Addr {
  1212. endpoint := p.conn.RemoteEndpoint()
  1213. return &net.TCPAddr{
  1214. IP: endpoint.IP,
  1215. Port: int(endpoint.Port),
  1216. }
  1217. }
  1218. //---------------------------------------------------
  1219. func PeerMetrics(metrics *Metrics) PeerOption {
  1220. return func(p *peer) {
  1221. p.metrics = metrics
  1222. }
  1223. }
  1224. func (p *peer) metricsReporter() {
  1225. for {
  1226. select {
  1227. case <-p.metricsTicker.C:
  1228. status := p.conn.Status()
  1229. var sendQueueSize float64
  1230. for _, chStatus := range status.Channels {
  1231. sendQueueSize += float64(chStatus.SendQueueSize)
  1232. }
  1233. p.metrics.PeerPendingSendBytes.With("peer_id", string(p.ID())).Set(sendQueueSize)
  1234. case <-p.Quit():
  1235. return
  1236. }
  1237. }
  1238. }