Calling ensurePeers outside of ensurePeersRoutine can lead to nodes
disconnecting from us due to "sent next PEX request too soon" error.
Solution is to just dial addrs we got from src instead of calling
ensurePeers.
Refs #2093Fixes#3338
* update changelog
* Update changelog
- less detail about internal stuff like the `mempool`
- focus BUG FIX entries more on what was fixed rather than how
* minor cleanup.
- remove entry for GenesisDocProvider, it's being undone
* minor fix
* Allow testnet hostnames to be overridden
This allows one to specify the `--hostname` flag multiple times, each
time providing an additional custom hostname for a respective peer
(validator or non-validator). This overrides any of the
`--hostname-prefix` or `--starting-ip-address` flags.
The string array approach is taken instead of the string slice approach
(see the pflag docs:
https://godoc.org/github.com/spf13/pflag#StringArray) because the string
slice approach (a comma-separated string) doesn't allow for cleaner
multi-line BASH scripts - where this feature is intended to be used.
* Reorder conditional for clarity with simpler earlier return
* Allow for specifying peer hostname suffix
* Quote values in help strings for greater clarity
* Fix command switch
* Add CHANGELOG_PENDING entry for PR
* Allow for unique monikers
The current approach to generating monikers for testnet nodes assigns
the local hostname of the machine on which the testnet config was
generated to all nodes. This results in the same moniker for each and
every node.
This commit makes use of the supplied `--hostname-prefix` and
`--hostname-suffix`, or `--hostname` parameters to generate unique
monikers for each node. Alternatively, another parameter
(`--random-monikers`) allows one to forcibly override all of the other
options with random hexadecimal strings.
* Update CHANGELOG_PENDING entry for new command line switch
* fix peer state init to late
Peer does not have a state yet. We set it in AddPeer.
We need an new interface before mconnection is started
* pex message to soon
fix reconnection pex send too fast,
error is caused lastReceivedRequests is still
not deleted when a peer reconnected
* add test case for initpeer
* add prove case
* remove potentially infinite loop
* Update consensus/reactor.go
Co-Authored-By: guagualvcha <baifudong@lancai.cn>
* Update consensus/reactor_test.go
Co-Authored-By: guagualvcha <baifudong@lancai.cn>
* document Reactor interface better
* refactor TestReactorReceiveDoesNotPanicIfAddPeerHasntBeenCalledYet
* fix merge conflicts
* blockchain: remove peer's ID from the pool in InitPeer
Refs #3338
* pex: resetPeersRequestsInfo both upon InitPeer and RemovePeer
* ensure RemovePeer is always called before InitPeer
by removing the peer from the switch last (after we've stopped it and
removed from all reactors)
* add some comments for ConsensusReactor#InitPeer
* fix pex reactor
* format code
* fix spelling
* update changelog
* remove unused methods
* do not clear lastReceivedRequests upon error
only in RemovePeer
* call InitPeer before we start the peer!
* add a comment to InitPeer
* write a test
* use waitUntilSwitchHasAtLeastNPeers func
* bring back timeouts
* Test to ensure Receive panics if InitPeer has not been called
* p2p: initial implementation of peer behaviour
* [p2p] re-use newMockPeer
* p2p: add inline docs for peer_behaviour interface
* [p2p] Align PeerBehaviour interface (#3558)
* [p2p] make switchedPeerHebaviour private
* [p2p] make storePeerHebaviour private
* [p2p] Add CHANGELOG_PENDING entry for PeerBehaviour
* [p2p] Adjustment naming for PeerBehaviour
* [p2p] Add coarse lock around storedPeerBehaviour
* [p2p]: Fix non-pointer methods in storedPeerBehaviour
+ Structs with embeded locks must specify all methods with pointer
receivers to avoid creating a copy of the embeded lock.
* [p2p] Thorough refactoring based on comments in #3552
+ Decouple PeerBehaviour interface from Peer by parametrizing
methods with `p2p.ID` instead of `p2p.Peer`
+ Setter methods wrapped in a write lock
+ Getter methods wrapped in a read lock
+ Getter methods on storedPeerBehaviour now take a `p2p.ID`
+ Getter methods return a copy of underlying stored behaviours
+ Add doc strings to public types and methods
* [p2p] make structs public
* [p2p] Test empty StoredPeerBehaviour
* [p2p] typo fix
* [p2p] add TestStoredPeerBehaviourConcurrency
+ Add a test which uses StoredPeerBehaviour in multiple goroutines
to ensure thread-safety.
* Update p2p/peer_behaviour.go
Co-Authored-By: brapse <brapse@gmail.com>
* Update p2p/peer_behaviour.go
Co-Authored-By: brapse <brapse@gmail.com>
* Update p2p/peer_behaviour.go
Co-Authored-By: brapse <brapse@gmail.com>
* Update p2p/peer_behaviour.go
Co-Authored-By: brapse <brapse@gmail.com>
* Update p2p/peer_behaviour.go
Co-Authored-By: brapse <brapse@gmail.com>
* Update p2p/peer_behaviour_test.go
Co-Authored-By: brapse <brapse@gmail.com>
* Update p2p/peer_behaviour.go
Co-Authored-By: brapse <brapse@gmail.com>
* Update p2p/peer_behaviour.go
Co-Authored-By: brapse <brapse@gmail.com>
* Update p2p/peer_behaviour.go
Co-Authored-By: brapse <brapse@gmail.com>
* Update p2p/peer_behaviour.go
Co-Authored-By: brapse <brapse@gmail.com>
* [p2p] field ordering convention
* p2p: peer behaviour refactor
+ Change naming of reporting behaviour to `Report`
+ Remove the responsibility of distinguishing between the categories
of good and bad behaviour and instead focus on reporting behaviour.
* p2p: rename PeerReporter -> PeerBehaviourReporter
* libs/db: conditional compilation
For cleveldb: go build -tags cleveldb
For boltdb: go build -tags boltdb
Fixes#3611
* document db_backend param better
* remove deprecated LevelDBBackend
* update changelog
* add missing lines
* add new line
* fix TestRemoteDB
* add a line about boltdb tag
* Revert "remove deprecated LevelDBBackend"
This reverts commit 1aa85453f7.
* make PR non breaking
* change DEPRECATED label format
https://stackoverflow.com/a/36360323/820520
* mempool: remove only valid (Code==0) txs on Update
so evil proposers can't drop valid txs in Commit stage.
Also remove invalid (Code!=0) txs from the cache so they can be
resubmitted.
Fixes#3322@rickyyangz:
In the end of commit stage, we will update mempool to remove all the txs
in current block.
// Update mempool.
err = blockExec.mempool.Update(
block.Height,
block.Txs,
TxPreCheck(state),
TxPostCheck(state),
)
Assum an account has 3 transactions in the mempool, the sequences are
100, 101 and 102 separately, So an evil proposal can only package the
101 and 102 transactions into its proposal block, and leave 100 still in
mempool, then the two txs will be removed from all validators' mempool
when commit. So the account lost the two valid txs.
@ebuchman:
In the longer term we may want to do something like #2639 so we can
validate txs before we commit the block. But even in this case we'd only
want to run the equivalent of CheckTx, which means the DeliverTx could
still fail even if the CheckTx passes depending on how the app handles
the ABCI Code semantics. So more work will be required around the ABCI
code. See also #2185
* add changelog entry and tests
* improve changelog message
* reformat code
## Description
Refs #2659
Breaking changes in the mempool package:
[mempool] #2659 Mempool now an interface
old Mempool renamed to CListMempool
NewMempool renamed to NewCListMempool
Option renamed to CListOption
MempoolReactor renamed to Reactor
NewMempoolReactor renamed to NewReactor
unexpose TxID method
TxInfo.PeerID renamed to SenderID
unexpose MempoolReactor.Mempool
Breaking changes in the state package:
[state] #2659 Mempool interface moved to mempool package
MockMempool moved to top-level mock package and renamed to Mempool
Non Breaking changes in the node package:
[node] #2659 Add Mempool method, which allows you to access mempool
## Commits
* move Mempool interface into mempool package
Refs #2659
Breaking changes in the mempool package:
- Mempool now an interface
- old Mempool renamed to CListMempool
Breaking changes to state package:
- MockMempool moved to mempool/mock package and renamed to Mempool
- Mempool interface moved to mempool package
* assert CListMempool impl Mempool
* gofmt code
* rename MempoolReactor to Reactor
- combine everything into one interface
- rename TxInfo.PeerID to TxInfo.SenderID
- unexpose MempoolReactor.Mempool
* move mempool mock into top-level mock package
* add a fixme
TxsFront should not be a part of the Mempool interface
because it leaks implementation details. Instead, we need to come up
with general interface for querying the mempool so the MempoolReactor
can fetch and broadcast txs to peers.
* change node#Mempool to return interface
* save commit = new reactor arch
* Revert "save commit = new reactor arch"
This reverts commit 1bfceacd9d.
* require CListMempool in mempool.Reactor
* add two changelog entries
* fixes after my own review
* quote interfaces, structs and functions
* fixes after Ismail's review
* make node's mempool an interface
* make InitWAL/CloseWAL methods a part of Mempool interface
* fix merge conflicts
* make node's mempool an interface
## Description
Previously only outbound peers can be persistent.
Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect. This part is actually optional and can be reverted.
Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes#3362
## Commits
* make persistent prop independent of conn direction
Previously only outbound peers can be persistent. Now, even if the peer
is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect.
Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes#3362
* fix TestPEXReactorDialPeer test
* add a changelog entry
* update changelog
* add two tests
* reformat code
* test UnsafeDialPeers and UnsafeDialSeeds
* add TestSwitchDialPeersAsync
* spec: update p2p/config spec
* fixes after Ismail's review
* Apply suggestions from code review
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* fix merge conflict
* remove sleep from TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode
We don't need it actually.
If we are low on addresses for peering, we need to discover more peers. The
previous behavior would query existing peers; however, if an existing peer
does not participate in peer exchange, then our node will not discover more peers.
This change consults both existing peers as well as seeds when there is a deficit
in address book addresses. This allows for discovering peers though existing channels
as well as via seeds if existing peers do not share addresses.
The node.NewNode method is pretty complex at the moment, an in order to address issues like #3156, we need to simplify the interface for partial node instantiation. In some places, we don't need to build up a full node (like in the node.TestCreateProposalBlock test), but the complexity of such partial instantiation needs to be reduced.
This PR aims to eventually make this easier/simpler.
See also this gist https://gist.github.com/thanethomson/56e1640d057a26186e38ad678a1d114c for some background work done when starting to refactor here.
## Commits:
* [WIP] Refactor node.NewNode to simplify
The `node.NewNode` method is pretty complex at the moment, an in order
to address issues like #3156, we need to simplify the interface for
partial node instantiation. In some places, we don't need to build up a
full node (like in the `node.TestCreateProposalBlock` test), but the
complexity of such partial instantiation needs to be reduced.
This PR aims to eventually make this easier/simpler.
* Refactor state loading and genesis doc provider into state package
* Refactor for clarity of return parameters
* Fix incorrect capitalization of error messages
* Simplify extracted functions' names
* Document optionally-prefixed functions
* Refactor optionallyFastSync for clarity of separation of concerns
* Restructure function for early return
* Restructure function for early return
* Remove dependence on deprecated panic functions
* refactor code a bit more
plus, expose PEXReactor on node
* align logger names
* add a changelog entry
* align logger names 2
* add a note about PEXReactor returning nil
* Remove deprecated PanicXXX functions from codebase
As per discussion over
[here](https://github.com/tendermint/tendermint/pull/3456#discussion_r278423492),
we need to remove these `PanicXXX` functions and eliminate our
dependence on them. In this PR, each and every `PanicXXX` function call
is replaced with a simple `panic` call.
* add a changelog entry
Should fix#3451, #2723 and #3317.
Test TestResetTimeoutPrecommitUponNewHeight is simplified so it reduces a risk of timeout failure. Furthermore, timeout we wait for TimeoutEvents is increased, and the timeout value is more precisely computed. This should hopefully decrease a chance of non-deterministic test failures.
This assertion is problematic to ensure consistently due to dependency on scheduler. On the other hand, if I am not wrong, order in which messages are read from the channel respects order in which messages are written. Therefore, process will receive 2f+1 precommits that are not all for v (one is for nil) so TriggeredTimeoutPrecommit would be set to true. So we don't need to assert it. I know that it would be better to still assert to it but I don't know how to do it without sleep and that is ugly and is causing us nondeterministic failures.
* check every block appHash
Fixes#3460
Not really needed, but it would detect if the app changed in a way it
shouldn't have.
* add a changelog entry
* no need to return an error if we panic
* rename methods
* rename methods once again
* add a test
* correct error msg
plus fix a few go-lint warnings
* better panic msg
Continues from #3280 in building support for batched requests/responses in the JSON RPC (as per issue #3213).
* Add JSON RPC batching for client and server
As per #3213, this adds support for [JSON RPC batch requests and
responses](https://www.jsonrpc.org/specification#batch).
* Add additional checks to ensure client responses are the same as results
* Fix case where a notification is sent and no response is expected
* Add test to check that JSON RPC notifications in a batch are left out in responses
* Update CHANGELOG_PENDING.md
* Update PR number now that PR has been created
* Make errors start with lowercase letter
* Refactor batch functionality to be standalone
This refactors the batching functionality to rather act in a standalone
way. In light of supporting concurrent goroutines making use of the same
client, it would make sense to have batching functionality where one
could create a batch of requests per goroutine and send that batch
without interfering with a batch from another goroutine.
* Add examples for simple and batch HTTP client usage
* Check errors from writer and remove nolinter directives
* Make error strings start with lowercase letter
* Refactor examples to make them testable
* Use safer deferred shutdown for example Tendermint test node
* Recompose rpcClient interface from pre-existing interface components
* Rename WaitGroup for brevity
* Replace empty ID string with request ID
* Remove extraneous test case
* Convert first letter of errors.Wrap() messages to lowercase
* Remove extraneous function parameter
* Make variable declaration terse
* Reorder WaitGroup.Done call to help prevent race conditions in the face of failure
* Swap mutex to value representation and remove initialization
* Restore empty JSONRPC string ID in response to prevent nil
* Make JSONRPCBufferedRequest private
* Revert PR hard link in CHANGELOG_PENDING
* Add client ID for JSONRPCClient
This adds code to automatically generate a randomized client ID for the
JSONRPCClient, and adds a check of the IDs in the responses (if one was
set in the requests).
* Extract response ID validation into separate function
* Remove extraneous comments
* Reorder fields to indicate clearly which are protected by the mutex
* Refactor for loop to remove indexing
* Restructure and combine loop
* Flatten conditional block for better readability
* Make multi-variable declaration slightly more readable
* Change for loop style
* Compress error check statements
* Make function description more generic to show that we support different protocols
* Preallocate memory for request and result objects
* use dialPeer function in a seed mode
Fixes#3532
by storing a number of attempts we've tried to connect in-memory and
removing the address from addrbook when number of attempts > 16
What happened:
New code was supposed to fall back to last height changed when/if it
failed to find validators at checkpoint height (to make release
non-breaking).
But because we did not check if validator set is empty, the fall back
logic was never executed => resulting in LoadValidators returning an
empty validator set for cases where `lastStoredHeight` is checkpoint
height (i.e. almost all heights if the application does not change
validator set often).
How it was found:
one of our users - @sunboshan reported a bug here
https://github.com/tendermint/tendermint/pull/3537#issuecomment-482711833
* use last height changed in validator set is empty
* add a changelog entry
* rpc: store validator info periodly
* increase ValidatorSetStoreInterval
also
- unexpose it
- add a comment
- refactor code
- add a benchmark, which shows that 100000 results in ~ 100ms to get 100
validators
* make the change non-breaking
* expand comment
* rename valSetStoreInterval to valSetCheckpointInterval
* change the panic msg
* add a test and changelog entry
* update changelog entry
* update changelog entry
* add a link to PR
* fix test
* Update CHANGELOG_PENDING.md
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* update comment
* use MaxInt64 func
ListOfKnownAddresses is removed
panic if addrbook size is less than zero
CrawlPeers does not attempt to connect to existing or peers we're currently dialing
various perf. fixes
improved tests (though not complete)
move IsDialingOrExistingAddress check into DialPeerWithAddress (Fixes#2716)
* addrbook: preallocate memory when saving addrbook to file
* addrbook: remove oldestFirst struct and check for ID
* oldestFirst replaced with sort.Slice
* ID is now mandatory, so no need to check
* addrbook: remove ListOfKnownAddresses
GetSelection is used instead in seed mode.
* addrbook: panic if size is less than 0
* rewrite addrbook#saveToFile to not use a counter
* test AttemptDisconnects func
* move IsDialingOrExistingAddress check into DialPeerWithAddress
* save and cleanup crawl peer data
* get rid of DefaultSeedDisconnectWaitPeriod
* make linter happy
* fix TestPEXReactorSeedMode
* fix comment
* add a changelog entry
* Apply suggestions from code review
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* rename ErrDialingOrExistingAddress to ErrCurrentlyDialingOrExistingAddress
* lowercase errors
* do not persist seed data
pros:
- no extra files
- less IO
cons:
- if the node crashes, seed might crawl a peer too soon
* fixes after Ethan's review
* add a changelog entry
* we should only consult Switch about peers
checking addrbook size does not make sense since only PEX reactor uses
it for dialing peers!
https://github.com/tendermint/tendermint/pull/3011#discussion_r270948875
* docs: fix broken links (#3482)
A bunch of links were broken in the documentation s they included the
`docs` prefix.
* Update CHANGELOG_PENDING
* docs: switch to relative links for github compatitibility (#3482)
I think it's nice when the Client interface has all the methods. If someone does not need a particular method/set of methods, she can use individual interfaces (e.g. NetworkClient, MempoolClient) or write her own interface.
technically breaking
Fixes#3458
Closes#1798
This is done by making every mempool tx maintain a list of peers who its received the tx from. Instead of using the 20byte peer ID, it instead uses a local map from peerID to uint16 counter, so every peer adds 2 bytes. (Word aligned to probably make it 8 bytes)
This also required resetting the callback function on every CheckTx. This likely has performance ramifications for instruction caching. The actual setting operation isn't costly with the removal of defers in this PR.
* Make the mempool not gossip txs back to peers its received it from
* Fix adversarial memleak
* Don't break interface
* Update changelog
* Forgot to add a mtx
* forgot a mutex
* Update mempool/reactor.go
Co-Authored-By: ValarDragon <ValarDragon@users.noreply.github.com>
* Update mempool/mempool.go
Co-Authored-By: ValarDragon <ValarDragon@users.noreply.github.com>
* Use unknown peer ID
Co-Authored-By: ValarDragon <ValarDragon@users.noreply.github.com>
* fix compilation
* use next wait chan logic when skipping
* Minor fixes
* Add TxInfo
* Add reverse map
* Make activeID's auto-reserve 0
* 0 -> UnknownPeerID
Co-Authored-By: ValarDragon <ValarDragon@users.noreply.github.com>
* Switch to making the normal case set a callback on the reqres object
The recheck case is still done via the global callback, and stats
are also set via global callback
* fix merge conflict
* Addres comments
* Add cache tests
* add cache tests
* minor fixes
* update metrics in reqResCb and reformat code
* goimport -w mempool/reactor.go
* mempool: update memTx senders
I had to introduce txsMap for quick mempoolTx lookups.
* change senders type from []uint16 to sync.Map
Fixes DATA RACE:
```
Read at 0x00c0013fcd3a by goroutine 183:
github.com/tendermint/tendermint/mempool.(*MempoolReactor).broadcastTxRoutine()
/go/src/github.com/tendermint/tendermint/mempool/reactor.go:195 +0x3c7
Previous write at 0x00c0013fcd3a by D[2019-02-27|10:10:49.058] Read PacketMsg switch=3 peer=35bc1e3558c182927b31987eeff3feb3d58a0fc5@127.0.0.1
:46552 conn=MConn{pipe} packet="PacketMsg{30:2B06579D0A143EB78F3D3299DE8213A51D4E11FB05ACE4D6A14F T:1}"
goroutine 190:
github.com/tendermint/tendermint/mempool.(*Mempool).CheckTxWithInfo()
/go/src/github.com/tendermint/tendermint/mempool/mempool.go:387 +0xdc1
github.com/tendermint/tendermint/mempool.(*MempoolReactor).Receive()
/go/src/github.com/tendermint/tendermint/mempool/reactor.go:134 +0xb04
github.com/tendermint/tendermint/p2p.createMConnection.func1()
/go/src/github.com/tendermint/tendermint/p2p/peer.go:374 +0x25b
github.com/tendermint/tendermint/p2p/conn.(*MConnection).recvRoutine()
/go/src/github.com/tendermint/tendermint/p2p/conn/connection.go:599 +0xcce
Goroutine 183 (running) created at:
D[2019-02-27|10:10:49.058] Send switch=2 peer=1efafad5443abeea4b7a8155218e4369525d987e@127.0.0.1:46193 channel=48 conn=MConn{pipe} m
sgBytes=2B06579D0A146194480ADAE00C2836ED7125FEE65C1D9DD51049
github.com/tendermint/tendermint/mempool.(*MempoolReactor).AddPeer()
/go/src/github.com/tendermint/tendermint/mempool/reactor.go:105 +0x1b1
github.com/tendermint/tendermint/p2p.(*Switch).startInitPeer()
/go/src/github.com/tendermint/tendermint/p2p/switch.go:683 +0x13b
github.com/tendermint/tendermint/p2p.(*Switch).addPeer()
/go/src/github.com/tendermint/tendermint/p2p/switch.go:650 +0x585
github.com/tendermint/tendermint/p2p.(*Switch).addPeerWithConnection()
/go/src/github.com/tendermint/tendermint/p2p/test_util.go:145 +0x939
github.com/tendermint/tendermint/p2p.Connect2Switches.func2()
/go/src/github.com/tendermint/tendermint/p2p/test_util.go:109 +0x50
I[2019-02-27|10:10:49.058] Added good transaction validator=0 tx=43B4D1F0F03460BD262835C4AA560DB860CFBBE85BD02386D83DAC38C67B3AD7 res="&{CheckTx:gas_w
anted:1 }" height=0 total=375
Goroutine 190 (running) created at:
github.com/tendermint/tendermint/p2p/conn.(*MConnection).OnStart()
/go/src/github.com/tendermint/tendermint/p2p/conn/connection.go:210 +0x313
github.com/tendermint/tendermint/libs/common.(*BaseService).Start()
/go/src/github.com/tendermint/tendermint/libs/common/service.go:139 +0x4df
github.com/tendermint/tendermint/p2p.(*peer).OnStart()
/go/src/github.com/tendermint/tendermint/p2p/peer.go:179 +0x56
github.com/tendermint/tendermint/libs/common.(*BaseService).Start()
/go/src/github.com/tendermint/tendermint/libs/common/service.go:139 +0x4df
github.com/tendermint/tendermint/p2p.(*peer).Start()
<autogenerated>:1 +0x43
github.com/tendermint/tendermint/p2p.(*Switch).startInitPeer()
```
* explain the choice of a map DS for senders
* extract ids pool/mapper to a separate struct
* fix literal copies lock value from senders: sync.Map contains sync.Mutex
* use sync.Map#LoadOrStore instead of Load
* fixes after Ismail's review
* rename resCbNormal to resCbFirstTime
* p2p: refactor Switch#Broadcast func
- call wg.Add only once
- do not call peers.List twice!
* bad for perfomance
* peers list can change in between calls!
Refs #3306
* p2p: use time.Ticker instead of RepeatTimer
no need in RepeatTimer since we don't Reset them
Refs #3306
* libs/common: remove RepeatTimer (also TimerMaker and Ticker interface)
"ancient code that’s caused no end of trouble" Ethan
I believe there's much simplier way to write a ticker than can be reset
https://medium.com/@arpith/resetting-a-ticker-in-go-63858a2c17ec