* rpc: store validator info periodly
* increase ValidatorSetStoreInterval
also
- unexpose it
- add a comment
- refactor code
- add a benchmark, which shows that 100000 results in ~ 100ms to get 100
validators
* make the change non-breaking
* expand comment
* rename valSetStoreInterval to valSetCheckpointInterval
* change the panic msg
* add a test and changelog entry
* update changelog entry
* update changelog entry
* add a link to PR
* fix test
* Update CHANGELOG_PENDING.md
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* update comment
* use MaxInt64 func
* add actionable advice for ErrAddrBookNonRoutable err
Should replace https://github.com/tendermint/tendermint/pull/3463
* reorder checks in addrbook#addAddress so
ErrAddrBookPrivate is returned first
and do not log error in DialPeersAsync if the address is private
because it's not an error
ListOfKnownAddresses is removed
panic if addrbook size is less than zero
CrawlPeers does not attempt to connect to existing or peers we're currently dialing
various perf. fixes
improved tests (though not complete)
move IsDialingOrExistingAddress check into DialPeerWithAddress (Fixes#2716)
* addrbook: preallocate memory when saving addrbook to file
* addrbook: remove oldestFirst struct and check for ID
* oldestFirst replaced with sort.Slice
* ID is now mandatory, so no need to check
* addrbook: remove ListOfKnownAddresses
GetSelection is used instead in seed mode.
* addrbook: panic if size is less than 0
* rewrite addrbook#saveToFile to not use a counter
* test AttemptDisconnects func
* move IsDialingOrExistingAddress check into DialPeerWithAddress
* save and cleanup crawl peer data
* get rid of DefaultSeedDisconnectWaitPeriod
* make linter happy
* fix TestPEXReactorSeedMode
* fix comment
* add a changelog entry
* Apply suggestions from code review
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* rename ErrDialingOrExistingAddress to ErrCurrentlyDialingOrExistingAddress
* lowercase errors
* do not persist seed data
pros:
- no extra files
- less IO
cons:
- if the node crashes, seed might crawl a peer too soon
* fixes after Ethan's review
* add a changelog entry
* we should only consult Switch about peers
checking addrbook size does not make sense since only PEX reactor uses
it for dialing peers!
https://github.com/tendermint/tendermint/pull/3011#discussion_r270948875
* OriginalAddr -> SocketAddr
OriginalAddr records the originally dialed address for outbound peers,
rather than the peer's self reported address. For inbound peers, it was
nil. Here, we rename it to SocketAddr and for inbound peers, set it to
the RemoteAddr of the connection.
* use SocketAddr
Numerous places in the code call peer.NodeInfo().NetAddress().
However, this call to NetAddress() may perform a DNS lookup if the
reported NodeInfo.ListenAddr includes a name. Failure of this lookup
returns a nil address, which can lead to panics in the code.
Instead, call peer.SocketAddr() to return the static address of the
connection.
* remove nodeInfo.NetAddress()
Expose `transport.NetAddress()`, a static result determined
when the transport is created. Removing NetAddress() from the nodeInfo
prevents accidental DNS lookups.
* fixes from review
* linter
* fixes from review
* docs: fix broken links (#3482)
A bunch of links were broken in the documentation s they included the
`docs` prefix.
* Update CHANGELOG_PENDING
* docs: switch to relative links for github compatitibility (#3482)
* docs: fix broken links (#3482)
A bunch of links were broken in the documentation s they included the
`docs` prefix.
* Update CHANGELOG_PENDING
* docs: switch to relative links for github compatitibility (#3482)
* mempool: add a safety check, write tests for mempoolIDs
and document 65536 limit in the mempool reactor spec
follow-up to https://github.com/tendermint/tendermint/pull/2778
* rename the test
* fixes after Ismail's review
Why submit this pr:
we have suffered from infinite loop in addrbook bug which takes us a long time to find out why process become a zombie peer. It have been fixed in #3232. But the ADDRS_LOOP is still there, risk of infinite loop is still exist.
The algorithm that to random pick a bucket is not stable, which means the peer may unluckily always choose the wrong bucket for a long time, the time and cpu cost is meaningless.
A simple improvement:
shuffle bucketsNew and bucketsOld, and pick necessary number of address from them. A stable
algorithm.
I think it's nice when the Client interface has all the methods. If someone does not need a particular method/set of methods, she can use individual interfaces (e.g. NetworkClient, MempoolClient) or write her own interface.
technically breaking
Fixes#3458
Refs #3306, irisnet@fdbb676
I ran an irishub validator. After the validator node ran several days, I dump the whole goroutine stack. I found that there were hundreds of broadcastTxRoutine. However, the connected peer quantity was less than 30. So I belive that there must be broadcastTxRoutine leakage issue.
According to my analysis, I think the root cause of this issue locate in below code:
select {
case <-next.NextWaitChan():
// see the start of the for loop for nil check
next = next.Next()
case <-peer.Quit():
return
case <-memR.Quit():
return
}
As we know, if multiple paths are avaliable in the same time, then a random path will be selected. Suppose that next.NextWaitChan() and peer.Quit() are both avaliable, and next.NextWaitChan() is chosen.
// send memTx
msg := &TxMessage{Tx: memTx.tx}
success := peer.Send(MempoolChannel, cdc.MustMarshalBinaryBare(msg))
if !success {
time.Sleep(peerCatchupSleepIntervalMS * time.Millisecond)
continue
}
Then next will be non-empty and the peer send operation won't be success. As a result, this go routine will be track into infinite loop and won't be released.
My proposal is to check peer.Quit() and memR.Quit() in every loop no matter whether next is nil.
Closes#1798
This is done by making every mempool tx maintain a list of peers who its received the tx from. Instead of using the 20byte peer ID, it instead uses a local map from peerID to uint16 counter, so every peer adds 2 bytes. (Word aligned to probably make it 8 bytes)
This also required resetting the callback function on every CheckTx. This likely has performance ramifications for instruction caching. The actual setting operation isn't costly with the removal of defers in this PR.
* Make the mempool not gossip txs back to peers its received it from
* Fix adversarial memleak
* Don't break interface
* Update changelog
* Forgot to add a mtx
* forgot a mutex
* Update mempool/reactor.go
Co-Authored-By: ValarDragon <ValarDragon@users.noreply.github.com>
* Update mempool/mempool.go
Co-Authored-By: ValarDragon <ValarDragon@users.noreply.github.com>
* Use unknown peer ID
Co-Authored-By: ValarDragon <ValarDragon@users.noreply.github.com>
* fix compilation
* use next wait chan logic when skipping
* Minor fixes
* Add TxInfo
* Add reverse map
* Make activeID's auto-reserve 0
* 0 -> UnknownPeerID
Co-Authored-By: ValarDragon <ValarDragon@users.noreply.github.com>
* Switch to making the normal case set a callback on the reqres object
The recheck case is still done via the global callback, and stats
are also set via global callback
* fix merge conflict
* Addres comments
* Add cache tests
* add cache tests
* minor fixes
* update metrics in reqResCb and reformat code
* goimport -w mempool/reactor.go
* mempool: update memTx senders
I had to introduce txsMap for quick mempoolTx lookups.
* change senders type from []uint16 to sync.Map
Fixes DATA RACE:
```
Read at 0x00c0013fcd3a by goroutine 183:
github.com/tendermint/tendermint/mempool.(*MempoolReactor).broadcastTxRoutine()
/go/src/github.com/tendermint/tendermint/mempool/reactor.go:195 +0x3c7
Previous write at 0x00c0013fcd3a by D[2019-02-27|10:10:49.058] Read PacketMsg switch=3 peer=35bc1e3558c182927b31987eeff3feb3d58a0fc5@127.0.0.1
:46552 conn=MConn{pipe} packet="PacketMsg{30:2B06579D0A143EB78F3D3299DE8213A51D4E11FB05ACE4D6A14F T:1}"
goroutine 190:
github.com/tendermint/tendermint/mempool.(*Mempool).CheckTxWithInfo()
/go/src/github.com/tendermint/tendermint/mempool/mempool.go:387 +0xdc1
github.com/tendermint/tendermint/mempool.(*MempoolReactor).Receive()
/go/src/github.com/tendermint/tendermint/mempool/reactor.go:134 +0xb04
github.com/tendermint/tendermint/p2p.createMConnection.func1()
/go/src/github.com/tendermint/tendermint/p2p/peer.go:374 +0x25b
github.com/tendermint/tendermint/p2p/conn.(*MConnection).recvRoutine()
/go/src/github.com/tendermint/tendermint/p2p/conn/connection.go:599 +0xcce
Goroutine 183 (running) created at:
D[2019-02-27|10:10:49.058] Send switch=2 peer=1efafad5443abeea4b7a8155218e4369525d987e@127.0.0.1:46193 channel=48 conn=MConn{pipe} m
sgBytes=2B06579D0A146194480ADAE00C2836ED7125FEE65C1D9DD51049
github.com/tendermint/tendermint/mempool.(*MempoolReactor).AddPeer()
/go/src/github.com/tendermint/tendermint/mempool/reactor.go:105 +0x1b1
github.com/tendermint/tendermint/p2p.(*Switch).startInitPeer()
/go/src/github.com/tendermint/tendermint/p2p/switch.go:683 +0x13b
github.com/tendermint/tendermint/p2p.(*Switch).addPeer()
/go/src/github.com/tendermint/tendermint/p2p/switch.go:650 +0x585
github.com/tendermint/tendermint/p2p.(*Switch).addPeerWithConnection()
/go/src/github.com/tendermint/tendermint/p2p/test_util.go:145 +0x939
github.com/tendermint/tendermint/p2p.Connect2Switches.func2()
/go/src/github.com/tendermint/tendermint/p2p/test_util.go:109 +0x50
I[2019-02-27|10:10:49.058] Added good transaction validator=0 tx=43B4D1F0F03460BD262835C4AA560DB860CFBBE85BD02386D83DAC38C67B3AD7 res="&{CheckTx:gas_w
anted:1 }" height=0 total=375
Goroutine 190 (running) created at:
github.com/tendermint/tendermint/p2p/conn.(*MConnection).OnStart()
/go/src/github.com/tendermint/tendermint/p2p/conn/connection.go:210 +0x313
github.com/tendermint/tendermint/libs/common.(*BaseService).Start()
/go/src/github.com/tendermint/tendermint/libs/common/service.go:139 +0x4df
github.com/tendermint/tendermint/p2p.(*peer).OnStart()
/go/src/github.com/tendermint/tendermint/p2p/peer.go:179 +0x56
github.com/tendermint/tendermint/libs/common.(*BaseService).Start()
/go/src/github.com/tendermint/tendermint/libs/common/service.go:139 +0x4df
github.com/tendermint/tendermint/p2p.(*peer).Start()
<autogenerated>:1 +0x43
github.com/tendermint/tendermint/p2p.(*Switch).startInitPeer()
```
* explain the choice of a map DS for senders
* extract ids pool/mapper to a separate struct
* fix literal copies lock value from senders: sync.Map contains sync.Mutex
* use sync.Map#LoadOrStore instead of Load
* fixes after Ismail's review
* rename resCbNormal to resCbFirstTime
In order to re-enable the test harness for the KMS (see
tendermint/kms#227), we need some marginally more realistic proposals
and votes. This is because the KMS does some additional sanity checks
now to ensure the height and round are increasing over time.
* p2p: refactor Switch#Broadcast func
- call wg.Add only once
- do not call peers.List twice!
* bad for perfomance
* peers list can change in between calls!
Refs #3306
* p2p: use time.Ticker instead of RepeatTimer
no need in RepeatTimer since we don't Reset them
Refs #3306
* libs/common: remove RepeatTimer (also TimerMaker and Ticker interface)
"ancient code that’s caused no end of trouble" Ethan
I believe there's much simplier way to write a ticker than can be reset
https://medium.com/@arpith/resetting-a-ticker-in-go-63858a2c17ec