You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

283 lines
6.5 KiB

fix non deterministic test failures and race in privval socket (#3258) * node: decrease retry conn timeout in test Should fix #3256 The retry timeout was set to the default, which is the same as the accept timeout, so it's no wonder this would fail. Here we decrease the retry timeout so we can try many times before the accept timeout. * p2p: increase handshake timeout in test This fails sometimes, presumably because the handshake timeout is so low (only 50ms). So increase it to 1s. Should fix #3187 * privval: fix race with ping. closes #3237 Pings happen in a go-routine and can happen concurrently with other messages. Since we use a request/response protocol, we expect to send a request and get back the corresponding response. But with pings happening concurrently, this assumption could be violated. We were using a mutex, but only a RWMutex, where the RLock was being held for sending messages - this was to allow the underlying connection to be replaced if it fails. Turns out we actually need to use a full lock (not just a read lock) to prevent multiple requests from happening concurrently. * node: fix test name. DelayedStop -> DelayedStart * autofile: Wait() method In the TestWALTruncate in consensus/wal_test.go we remove the WAL directory at the end of the test. However the wal.Stop() does not properly wait for the autofile group to finish shutting down. Hence it was possible that the group's go-routine is still running when the cleanup happens, which causes a panic since the directory disappeared. Here we add a Wait() method to properly wait until the go-routine exits so we can safely clean up. This fixes #2852.
6 years ago
p2p: file descriptor leaks (#3150) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```
6 years ago
p2p: file descriptor leaks (#3150) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```
6 years ago
p2p: file descriptor leaks (#3150) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```
6 years ago
p2p: file descriptor leaks (#3150) * close peer's connection to avoid fd leak Fixes #2967 * rename peer#Addr to RemoteAddr * fix test * fixes after Ethan's review * bring back the check * changelog entry * write a test for switch#acceptRoutine * increase timeouts? :( * remove extra assertNPeersWithTimeout * simplify test * assert number of peers (just to be safe) * Cleanup in OnStop * run tests with verbose flag on CircleCI * spawn a reading routine to prevent connection from closing * get port from the listener random port is faster, but often results in ``` panic: listen tcp 127.0.0.1:44068: bind: address already in use [recovered] panic: listen tcp 127.0.0.1:44068: bind: address already in use goroutine 79 [running]: testing.tRunner.func1(0xc0001bd600) /usr/local/go/src/testing/testing.go:792 +0x387 panic(0x974d20, 0xc0001b0500) /usr/local/go/src/runtime/panic.go:513 +0x1b9 github.com/tendermint/tendermint/p2p.MakeSwitch(0xc0000f42a0, 0x0, 0x9fb9cc, 0x9, 0x9fc346, 0xb, 0xb42128, 0x0, 0x0, 0x0, ...) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:182 +0xa28 github.com/tendermint/tendermint/p2p.MakeConnectedSwitches(0xc0000f42a0, 0x2, 0xb42128, 0xb41eb8, 0x4f1205, 0xc0001bed80, 0x4f16ed) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/test_util.go:75 +0xf9 github.com/tendermint/tendermint/p2p.MakeSwitchPair(0xbb8d20, 0xc0001bd600, 0xb42128, 0x2f7, 0x4f16c0) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:94 +0x4c github.com/tendermint/tendermint/p2p.TestSwitches(0xc0001bd600) /home/vagrant/go/src/github.com/tendermint/tendermint/p2p/switch_test.go:117 +0x58 testing.tRunner(0xc0001bd600, 0xb42038) /usr/local/go/src/testing/testing.go:827 +0xbf created by testing.(*T).Run /usr/local/go/src/testing/testing.go:878 +0x353 exit status 2 FAIL github.com/tendermint/tendermint/p2p 0.350s ```
6 years ago
  1. package p2p
  2. import (
  3. "fmt"
  4. "net"
  5. "time"
  6. "github.com/pkg/errors"
  7. "github.com/tendermint/tendermint/crypto"
  8. "github.com/tendermint/tendermint/crypto/ed25519"
  9. cmn "github.com/tendermint/tendermint/libs/common"
  10. "github.com/tendermint/tendermint/libs/log"
  11. "github.com/tendermint/tendermint/config"
  12. "github.com/tendermint/tendermint/p2p/conn"
  13. )
  14. const testCh = 0x01
  15. //------------------------------------------------
  16. type mockNodeInfo struct {
  17. addr *NetAddress
  18. }
  19. func (ni mockNodeInfo) ID() ID { return ni.addr.ID }
  20. func (ni mockNodeInfo) NetAddress() (*NetAddress, error) { return ni.addr, nil }
  21. func (ni mockNodeInfo) Validate() error { return nil }
  22. func (ni mockNodeInfo) CompatibleWith(other NodeInfo) error { return nil }
  23. func AddPeerToSwitchPeerSet(sw *Switch, peer Peer) {
  24. sw.peers.Add(peer)
  25. }
  26. func CreateRandomPeer(outbound bool) *peer {
  27. addr, netAddr := CreateRoutableAddr()
  28. p := &peer{
  29. peerConn: peerConn{
  30. outbound: outbound,
  31. socketAddr: netAddr,
  32. },
  33. nodeInfo: mockNodeInfo{netAddr},
  34. mconn: &conn.MConnection{},
  35. metrics: NopMetrics(),
  36. }
  37. p.SetLogger(log.TestingLogger().With("peer", addr))
  38. return p
  39. }
  40. func CreateRoutableAddr() (addr string, netAddr *NetAddress) {
  41. for {
  42. var err error
  43. addr = fmt.Sprintf("%X@%v.%v.%v.%v:26656",
  44. cmn.RandBytes(20),
  45. cmn.RandInt()%256,
  46. cmn.RandInt()%256,
  47. cmn.RandInt()%256,
  48. cmn.RandInt()%256)
  49. netAddr, err = NewNetAddressString(addr)
  50. if err != nil {
  51. panic(err)
  52. }
  53. if netAddr.Routable() {
  54. break
  55. }
  56. }
  57. return
  58. }
  59. //------------------------------------------------------------------
  60. // Connects switches via arbitrary net.Conn. Used for testing.
  61. const TestHost = "localhost"
  62. // MakeConnectedSwitches returns n switches, connected according to the connect func.
  63. // If connect==Connect2Switches, the switches will be fully connected.
  64. // initSwitch defines how the i'th switch should be initialized (ie. with what reactors).
  65. // NOTE: panics if any switch fails to start.
  66. func MakeConnectedSwitches(cfg *config.P2PConfig,
  67. n int,
  68. initSwitch func(int, *Switch) *Switch,
  69. connect func([]*Switch, int, int),
  70. ) []*Switch {
  71. switches := make([]*Switch, n)
  72. for i := 0; i < n; i++ {
  73. switches[i] = MakeSwitch(cfg, i, TestHost, "123.123.123", initSwitch)
  74. }
  75. if err := StartSwitches(switches); err != nil {
  76. panic(err)
  77. }
  78. for i := 0; i < n; i++ {
  79. for j := i + 1; j < n; j++ {
  80. connect(switches, i, j)
  81. }
  82. }
  83. return switches
  84. }
  85. // Connect2Switches will connect switches i and j via net.Pipe().
  86. // Blocks until a connection is established.
  87. // NOTE: caller ensures i and j are within bounds.
  88. func Connect2Switches(switches []*Switch, i, j int) {
  89. switchI := switches[i]
  90. switchJ := switches[j]
  91. c1, c2 := conn.NetPipe()
  92. doneCh := make(chan struct{})
  93. go func() {
  94. err := switchI.addPeerWithConnection(c1)
  95. if err != nil {
  96. panic(err)
  97. }
  98. doneCh <- struct{}{}
  99. }()
  100. go func() {
  101. err := switchJ.addPeerWithConnection(c2)
  102. if err != nil {
  103. panic(err)
  104. }
  105. doneCh <- struct{}{}
  106. }()
  107. <-doneCh
  108. <-doneCh
  109. }
  110. func (sw *Switch) addPeerWithConnection(conn net.Conn) error {
  111. pc, err := testInboundPeerConn(conn, sw.config, sw.nodeKey.PrivKey)
  112. if err != nil {
  113. if err := conn.Close(); err != nil {
  114. sw.Logger.Error("Error closing connection", "err", err)
  115. }
  116. return err
  117. }
  118. ni, err := handshake(conn, time.Second, sw.nodeInfo)
  119. if err != nil {
  120. if err := conn.Close(); err != nil {
  121. sw.Logger.Error("Error closing connection", "err", err)
  122. }
  123. return err
  124. }
  125. p := newPeer(
  126. pc,
  127. MConnConfig(sw.config),
  128. ni,
  129. sw.reactorsByCh,
  130. sw.chDescs,
  131. sw.StopPeerForError,
  132. )
  133. if err = sw.addPeer(p); err != nil {
  134. pc.CloseConn()
  135. return err
  136. }
  137. return nil
  138. }
  139. // StartSwitches calls sw.Start() for each given switch.
  140. // It returns the first encountered error.
  141. func StartSwitches(switches []*Switch) error {
  142. for _, s := range switches {
  143. err := s.Start() // start switch and reactors
  144. if err != nil {
  145. return err
  146. }
  147. }
  148. return nil
  149. }
  150. func MakeSwitch(
  151. cfg *config.P2PConfig,
  152. i int,
  153. network, version string,
  154. initSwitch func(int, *Switch) *Switch,
  155. opts ...SwitchOption,
  156. ) *Switch {
  157. nodeKey := NodeKey{
  158. PrivKey: ed25519.GenPrivKey(),
  159. }
  160. nodeInfo := testNodeInfo(nodeKey.ID(), fmt.Sprintf("node%d", i))
  161. addr, err := NewNetAddressString(
  162. IDAddressString(nodeKey.ID(), nodeInfo.(DefaultNodeInfo).ListenAddr),
  163. )
  164. if err != nil {
  165. panic(err)
  166. }
  167. t := NewMultiplexTransport(nodeInfo, nodeKey, MConnConfig(cfg))
  168. if err := t.Listen(*addr); err != nil {
  169. panic(err)
  170. }
  171. // TODO: let the config be passed in?
  172. sw := initSwitch(i, NewSwitch(cfg, t, opts...))
  173. sw.SetLogger(log.TestingLogger().With("switch", i))
  174. sw.SetNodeKey(&nodeKey)
  175. ni := nodeInfo.(DefaultNodeInfo)
  176. for ch := range sw.reactorsByCh {
  177. ni.Channels = append(ni.Channels, ch)
  178. }
  179. nodeInfo = ni
  180. // TODO: We need to setup reactors ahead of time so the NodeInfo is properly
  181. // populated and we don't have to do those awkward overrides and setters.
  182. t.nodeInfo = nodeInfo
  183. sw.SetNodeInfo(nodeInfo)
  184. return sw
  185. }
  186. func testInboundPeerConn(
  187. conn net.Conn,
  188. config *config.P2PConfig,
  189. ourNodePrivKey crypto.PrivKey,
  190. ) (peerConn, error) {
  191. return testPeerConn(conn, config, false, false, ourNodePrivKey, nil)
  192. }
  193. func testPeerConn(
  194. rawConn net.Conn,
  195. cfg *config.P2PConfig,
  196. outbound, persistent bool,
  197. ourNodePrivKey crypto.PrivKey,
  198. socketAddr *NetAddress,
  199. ) (pc peerConn, err error) {
  200. conn := rawConn
  201. // Fuzz connection
  202. if cfg.TestFuzz {
  203. // so we have time to do peer handshakes and get set up
  204. conn = FuzzConnAfterFromConfig(conn, 10*time.Second, cfg.TestFuzzConfig)
  205. }
  206. // Encrypt connection
  207. conn, err = upgradeSecretConn(conn, cfg.HandshakeTimeout, ourNodePrivKey)
  208. if err != nil {
  209. return pc, errors.Wrap(err, "Error creating peer")
  210. }
  211. // Only the information we already have
  212. return newPeerConn(outbound, persistent, conn, socketAddr), nil
  213. }
  214. //----------------------------------------------------------------
  215. // rand node info
  216. func testNodeInfo(id ID, name string) NodeInfo {
  217. return testNodeInfoWithNetwork(id, name, "testing")
  218. }
  219. func testNodeInfoWithNetwork(id ID, name, network string) NodeInfo {
  220. return DefaultNodeInfo{
  221. ProtocolVersion: defaultProtocolVersion,
  222. ID_: id,
  223. ListenAddr: fmt.Sprintf("127.0.0.1:%d", getFreePort()),
  224. Network: network,
  225. Version: "1.2.3-rc0-deadbeef",
  226. Channels: []byte{testCh},
  227. Moniker: name,
  228. Other: DefaultNodeInfoOther{
  229. TxIndex: "on",
  230. RPCAddress: fmt.Sprintf("127.0.0.1:%d", getFreePort()),
  231. },
  232. }
  233. }
  234. func getFreePort() int {
  235. port, err := cmn.GetFreePort()
  236. if err != nil {
  237. panic(err)
  238. }
  239. return port
  240. }