* mempool: add a safety check, write tests for mempoolIDs
and document 65536 limit in the mempool reactor spec
follow-up to https://github.com/tendermint/tendermint/pull/2778
* rename the test
* fixes after Ismail's review
Closes#1798
This is done by making every mempool tx maintain a list of peers who its received the tx from. Instead of using the 20byte peer ID, it instead uses a local map from peerID to uint16 counter, so every peer adds 2 bytes. (Word aligned to probably make it 8 bytes)
This also required resetting the callback function on every CheckTx. This likely has performance ramifications for instruction caching. The actual setting operation isn't costly with the removal of defers in this PR.
* Make the mempool not gossip txs back to peers its received it from
* Fix adversarial memleak
* Don't break interface
* Update changelog
* Forgot to add a mtx
* forgot a mutex
* Update mempool/reactor.go
Co-Authored-By: ValarDragon <ValarDragon@users.noreply.github.com>
* Update mempool/mempool.go
Co-Authored-By: ValarDragon <ValarDragon@users.noreply.github.com>
* Use unknown peer ID
Co-Authored-By: ValarDragon <ValarDragon@users.noreply.github.com>
* fix compilation
* use next wait chan logic when skipping
* Minor fixes
* Add TxInfo
* Add reverse map
* Make activeID's auto-reserve 0
* 0 -> UnknownPeerID
Co-Authored-By: ValarDragon <ValarDragon@users.noreply.github.com>
* Switch to making the normal case set a callback on the reqres object
The recheck case is still done via the global callback, and stats
are also set via global callback
* fix merge conflict
* Addres comments
* Add cache tests
* add cache tests
* minor fixes
* update metrics in reqResCb and reformat code
* goimport -w mempool/reactor.go
* mempool: update memTx senders
I had to introduce txsMap for quick mempoolTx lookups.
* change senders type from []uint16 to sync.Map
Fixes DATA RACE:
```
Read at 0x00c0013fcd3a by goroutine 183:
github.com/tendermint/tendermint/mempool.(*MempoolReactor).broadcastTxRoutine()
/go/src/github.com/tendermint/tendermint/mempool/reactor.go:195 +0x3c7
Previous write at 0x00c0013fcd3a by D[2019-02-27|10:10:49.058] Read PacketMsg switch=3 peer=35bc1e3558c182927b31987eeff3feb3d58a0fc5@127.0.0.1
:46552 conn=MConn{pipe} packet="PacketMsg{30:2B06579D0A143EB78F3D3299DE8213A51D4E11FB05ACE4D6A14F T:1}"
goroutine 190:
github.com/tendermint/tendermint/mempool.(*Mempool).CheckTxWithInfo()
/go/src/github.com/tendermint/tendermint/mempool/mempool.go:387 +0xdc1
github.com/tendermint/tendermint/mempool.(*MempoolReactor).Receive()
/go/src/github.com/tendermint/tendermint/mempool/reactor.go:134 +0xb04
github.com/tendermint/tendermint/p2p.createMConnection.func1()
/go/src/github.com/tendermint/tendermint/p2p/peer.go:374 +0x25b
github.com/tendermint/tendermint/p2p/conn.(*MConnection).recvRoutine()
/go/src/github.com/tendermint/tendermint/p2p/conn/connection.go:599 +0xcce
Goroutine 183 (running) created at:
D[2019-02-27|10:10:49.058] Send switch=2 peer=1efafad5443abeea4b7a8155218e4369525d987e@127.0.0.1:46193 channel=48 conn=MConn{pipe} m
sgBytes=2B06579D0A146194480ADAE00C2836ED7125FEE65C1D9DD51049
github.com/tendermint/tendermint/mempool.(*MempoolReactor).AddPeer()
/go/src/github.com/tendermint/tendermint/mempool/reactor.go:105 +0x1b1
github.com/tendermint/tendermint/p2p.(*Switch).startInitPeer()
/go/src/github.com/tendermint/tendermint/p2p/switch.go:683 +0x13b
github.com/tendermint/tendermint/p2p.(*Switch).addPeer()
/go/src/github.com/tendermint/tendermint/p2p/switch.go:650 +0x585
github.com/tendermint/tendermint/p2p.(*Switch).addPeerWithConnection()
/go/src/github.com/tendermint/tendermint/p2p/test_util.go:145 +0x939
github.com/tendermint/tendermint/p2p.Connect2Switches.func2()
/go/src/github.com/tendermint/tendermint/p2p/test_util.go:109 +0x50
I[2019-02-27|10:10:49.058] Added good transaction validator=0 tx=43B4D1F0F03460BD262835C4AA560DB860CFBBE85BD02386D83DAC38C67B3AD7 res="&{CheckTx:gas_w
anted:1 }" height=0 total=375
Goroutine 190 (running) created at:
github.com/tendermint/tendermint/p2p/conn.(*MConnection).OnStart()
/go/src/github.com/tendermint/tendermint/p2p/conn/connection.go:210 +0x313
github.com/tendermint/tendermint/libs/common.(*BaseService).Start()
/go/src/github.com/tendermint/tendermint/libs/common/service.go:139 +0x4df
github.com/tendermint/tendermint/p2p.(*peer).OnStart()
/go/src/github.com/tendermint/tendermint/p2p/peer.go:179 +0x56
github.com/tendermint/tendermint/libs/common.(*BaseService).Start()
/go/src/github.com/tendermint/tendermint/libs/common/service.go:139 +0x4df
github.com/tendermint/tendermint/p2p.(*peer).Start()
<autogenerated>:1 +0x43
github.com/tendermint/tendermint/p2p.(*Switch).startInitPeer()
```
* explain the choice of a map DS for senders
* extract ids pool/mapper to a separate struct
* fix literal copies lock value from senders: sync.Map contains sync.Mutex
* use sync.Map#LoadOrStore instead of Load
* fixes after Ismail's review
* rename resCbNormal to resCbFirstTime
* improve ResetTestRootWithChainID() concurrency safety
Rely on ioutil.TempDir() to create test root directories and ensure
multiple same-chain id test cases can run in parallel.
* Update config/toml.go
Co-Authored-By: alessio <quadrispro@ubuntu.com>
* clean up test directories after completion
Closes: #1034
* Remove redundant EnsureDir call
* s/PanicSafety()/panic()/s
* Put create dir functionality back in ResetTestRootWithChainID
* Place test directories in OS's tempdir
In modern UNIX and UNIX-like systems /tmp is very often
mounted as tmpfs. This might speed test execution a bit.
* Set 0700 to a const
* rootsDirs -> configRootDirs
* Don't double remove directories
* Avoid global variables
* Fix consensus tests
* Reduce defer stack
* Address review comments
* Try to fix tests
* Update CHANGELOG_PENDING.md
Co-Authored-By: alessio <quadrispro@ubuntu.com>
* Update consensus/common_test.go
Co-Authored-By: alessio <quadrispro@ubuntu.com>
* Update consensus/common_test.go
Co-Authored-By: alessio <quadrispro@ubuntu.com>
* use READ lock/unlock in ConsensusState#GetLastHeight
Refs #2721
* do not use defers when there's no need
* fix peer formatting (output its address instead of the pointer)
```
[54310]: E[11-02|11:59:39.851] Connection failed @ sendRoutine module=p2p peer=0xb78f00 conn=MConn{74.207.236.148:26656} err="pong timeout"
```
https://github.com/tendermint/tendermint/issues/2721#issuecomment-435326581
* panic if peer has no state
https://github.com/tendermint/tendermint/issues/2721#issuecomment-435347165
It's confusing that sometimes we check if peer has a state, but most of
the times we expect it to be there
1. add79700b5/mempool/reactor.go (L138)
2. add79700b5/rpc/core/consensus.go (L196) (edited)
I will change everything to always assume peer has a state and panic
otherwise
that should help identify issues earlier
* abci/localclient: extend lock on app callback
App callback should be protected by lock as well (note this was already
done for InitChainAsync, why not for others???). Otherwise, when we
execute the block, tx might come in and call the callback in the same
time we're updating it in execBlockOnProxyApp => DATA RACE
Fixes#2721
Consensus state is locked
```
goroutine 113333 [semacquire, 309 minutes]:
sync.runtime_SemacquireMutex(0xc00180009c, 0xc0000c7e00)
/usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*RWMutex).RLock(0xc001800090)
/usr/local/go/src/sync/rwmutex.go:50 +0x4e
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).GetRoundState(0xc001800000, 0x0)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:218 +0x46
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusReactor).queryMaj23Routine(0xc0017def80, 0x11104a0, 0xc0072488f0, 0xc007248
9c0)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/reactor.go:735 +0x16d
created by github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusReactor).AddPeer
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/reactor.go:172 +0x236
```
because localClient is locked
```
goroutine 1899 [semacquire, 309 minutes]:
sync.runtime_SemacquireMutex(0xc00003363c, 0xc0000cb500)
/usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc000033638)
/usr/local/go/src/sync/mutex.go:134 +0xff
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/abci/client.(*localClient).SetResponseCallback(0xc0001fb560, 0xc007868540)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/abci/client/local_client.go:32 +0x33
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/proxy.(*appConnConsensus).SetResponseCallback(0xc00002f750, 0xc007868540)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/proxy/app_conn.go:57 +0x40
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/state.execBlockOnProxyApp(0x1104e20, 0xc002ca0ba0, 0x11092a0, 0xc00002f750, 0xc0001fe960, 0xc000bfc660, 0x110cfe0, 0xc000090330, 0xc9d12, 0xc000d9d5a0, ...)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/state/execution.go:230 +0x1fd
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock(0xc002c2a230, 0x7, 0x0, 0xc000eae880, 0x6, 0xc002e52c60, 0x16, 0x1f927, 0xc9d12, 0xc000d9d5a0, ...)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/state/execution.go:96 +0x142
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).finalizeCommit(0xc001800000, 0x1f928)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1339 +0xa3e
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).tryFinalizeCommit(0xc001800000, 0x1f928)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1270 +0x451
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).enterCommit.func1(0xc001800000, 0x0, 0x1f928)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1218 +0x90
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).enterCommit(0xc001800000, 0x1f928, 0x0)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1247 +0x6b8
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).addVote(0xc001800000, 0xc003d8dea0, 0xc000cf4cc0, 0x28, 0xf1, 0xc003bc7ad0, 0xc003bc7b10)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1659 +0xbad
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).tryAddVote(0xc001800000, 0xc003d8dea0, 0xc000cf4cc0, 0x28, 0xf1, 0xf1, 0xf1)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1517 +0x59
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).handleMsg(0xc001800000, 0xd98200, 0xc0070dbed0, 0xc000cf4cc0, 0x28)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:660 +0x64b
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine(0xc001800000, 0x0)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:617 +0x670
created by github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).OnStart
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:311 +0x132
```
tx comes in and CheckTx is executed right when we execute the block
```
goroutine 111044 [semacquire, 309 minutes]:
sync.runtime_SemacquireMutex(0xc00003363c, 0x0)
/usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc000033638)
/usr/local/go/src/sync/mutex.go:134 +0xff
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/abci/client.(*localClient).CheckTxAsync(0xc0001fb0e0, 0xc002d94500, 0x13f, 0x280, 0x0)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/abci/client/local_client.go:85 +0x47
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/proxy.(*appConnMempool).CheckTxAsync(0xc00002f720, 0xc002d94500, 0x13f, 0x280, 0x1)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/proxy/app_conn.go:114 +0x51
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/mempool.(*Mempool).CheckTx(0xc002d3a320, 0xc002d94500, 0x13f, 0x280, 0xc0072355f0, 0x0, 0x0)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/mempool/mempool.go:316 +0x17b
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/core.BroadcastTxSync(0xc002d94500, 0x13f, 0x280, 0x0, 0x0, 0x0)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/core/mempool.go:93 +0xb8
reflect.Value.call(0xd85560, 0x10326c0, 0x13, 0xec7b8b, 0x4, 0xc00663f180, 0x1, 0x1, 0xc00663f180, 0xc00663f188, ...)
/usr/local/go/src/reflect/value.go:447 +0x449
reflect.Value.Call(0xd85560, 0x10326c0, 0x13, 0xc00663f180, 0x1, 0x1, 0x0, 0x0, 0xc005cc9344)
/usr/local/go/src/reflect/value.go:308 +0xa4
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server.makeHTTPHandler.func2(0x1102060, 0xc00663f100, 0xc0082d7900)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server/handlers.go:269 +0x188
net/http.HandlerFunc.ServeHTTP(0xc002c81f20, 0x1102060, 0xc00663f100, 0xc0082d7900)
/usr/local/go/src/net/http/server.go:1964 +0x44
net/http.(*ServeMux).ServeHTTP(0xc002c81b60, 0x1102060, 0xc00663f100, 0xc0082d7900)
/usr/local/go/src/net/http/server.go:2361 +0x127
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server.maxBytesHandler.ServeHTTP(0x10f8a40, 0xc002c81b60, 0xf4240, 0x1102060, 0xc00663f100, 0xc0082d7900)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server/http_server.go:219 +0xcf
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server.RecoverAndLogHandler.func1(0x1103220, 0xc00121e620, 0xc0082d7900)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server/http_server.go:192 +0x394
net/http.HandlerFunc.ServeHTTP(0xc002c06ea0, 0x1103220, 0xc00121e620, 0xc0082d7900)
/usr/local/go/src/net/http/server.go:1964 +0x44
net/http.serverHandler.ServeHTTP(0xc001a1aa90, 0x1103220, 0xc00121e620, 0xc0082d7900)
/usr/local/go/src/net/http/server.go:2741 +0xab
net/http.(*conn).serve(0xc00785a3c0, 0x11041a0, 0xc000f844c0)
/usr/local/go/src/net/http/server.go:1847 +0x646
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:2851 +0x2f5
```
* consensus: use read lock in Receive#VoteMessage
* use defer to unlock mutex because application might panic
* use defer in every method of the localClient
* add a changelog entry
* drain channels before Unsubscribe(All)
Read 55362ed766/libs/pubsub/pubsub.go (L13)
for the detailed explanation of the issue.
We'll need to fix it someday. Make sure to keep an eye on
https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-033-pubsub.md
* retry instead of panic when peer has no state in reactors other than consensus
in /dump_consensus_state RPC endpoint, skip a peer with no state
* rpc/core/mempool: simplify error messages
* rpc/core/mempool: use time.After instead of timer
also, do not log DeliverTx result (to be consistent with other memthods)
* unlock before calling the callback in reqRes#SetCallback
Leaking goroutine:
```
114 @ 0x42f2bc 0x42f3ae 0x440794 0x4403b9 0x468002 0x9fe32d 0x9ff78f 0xa025ed 0x45e571
```
Explanation:
it blocks on an empty clist forever. so unless theres txs coming in,
this go routine will just sit there, holding onto the peer too.
if we're constantly reconnecting to some peer, old instances are not
garbage collected, leading to memory leak.
Fixes https://github.com/cosmos/gaia/issues/108
Previous attempt https://github.com/tendermint/tendermint/pull/1156