## Description
Previously only outbound peers can be persistent.
Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect. This part is actually optional and can be reverted.
Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes#3362
## Commits
* make persistent prop independent of conn direction
Previously only outbound peers can be persistent. Now, even if the peer
is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect.
Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes#3362
* fix TestPEXReactorDialPeer test
* add a changelog entry
* update changelog
* add two tests
* reformat code
* test UnsafeDialPeers and UnsafeDialSeeds
* add TestSwitchDialPeersAsync
* spec: update p2p/config spec
* fixes after Ismail's review
* Apply suggestions from code review
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* fix merge conflict
* remove sleep from TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode
We don't need it actually.
The node.NewNode method is pretty complex at the moment, an in order to address issues like #3156, we need to simplify the interface for partial node instantiation. In some places, we don't need to build up a full node (like in the node.TestCreateProposalBlock test), but the complexity of such partial instantiation needs to be reduced.
This PR aims to eventually make this easier/simpler.
See also this gist https://gist.github.com/thanethomson/56e1640d057a26186e38ad678a1d114c for some background work done when starting to refactor here.
## Commits:
* [WIP] Refactor node.NewNode to simplify
The `node.NewNode` method is pretty complex at the moment, an in order
to address issues like #3156, we need to simplify the interface for
partial node instantiation. In some places, we don't need to build up a
full node (like in the `node.TestCreateProposalBlock` test), but the
complexity of such partial instantiation needs to be reduced.
This PR aims to eventually make this easier/simpler.
* Refactor state loading and genesis doc provider into state package
* Refactor for clarity of return parameters
* Fix incorrect capitalization of error messages
* Simplify extracted functions' names
* Document optionally-prefixed functions
* Refactor optionallyFastSync for clarity of separation of concerns
* Restructure function for early return
* Restructure function for early return
* Remove dependence on deprecated panic functions
* refactor code a bit more
plus, expose PEXReactor on node
* align logger names
* add a changelog entry
* align logger names 2
* add a note about PEXReactor returning nil
* Remove deprecated PanicXXX functions from codebase
As per discussion over
[here](https://github.com/tendermint/tendermint/pull/3456#discussion_r278423492),
we need to remove these `PanicXXX` functions and eliminate our
dependence on them. In this PR, each and every `PanicXXX` function call
is replaced with a simple `panic` call.
* add a changelog entry
ListOfKnownAddresses is removed
panic if addrbook size is less than zero
CrawlPeers does not attempt to connect to existing or peers we're currently dialing
various perf. fixes
improved tests (though not complete)
move IsDialingOrExistingAddress check into DialPeerWithAddress (Fixes#2716)
* addrbook: preallocate memory when saving addrbook to file
* addrbook: remove oldestFirst struct and check for ID
* oldestFirst replaced with sort.Slice
* ID is now mandatory, so no need to check
* addrbook: remove ListOfKnownAddresses
GetSelection is used instead in seed mode.
* addrbook: panic if size is less than 0
* rewrite addrbook#saveToFile to not use a counter
* test AttemptDisconnects func
* move IsDialingOrExistingAddress check into DialPeerWithAddress
* save and cleanup crawl peer data
* get rid of DefaultSeedDisconnectWaitPeriod
* make linter happy
* fix TestPEXReactorSeedMode
* fix comment
* add a changelog entry
* Apply suggestions from code review
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* rename ErrDialingOrExistingAddress to ErrCurrentlyDialingOrExistingAddress
* lowercase errors
* do not persist seed data
pros:
- no extra files
- less IO
cons:
- if the node crashes, seed might crawl a peer too soon
* fixes after Ethan's review
* add a changelog entry
* we should only consult Switch about peers
checking addrbook size does not make sense since only PEX reactor uses
it for dialing peers!
https://github.com/tendermint/tendermint/pull/3011#discussion_r270948875
* OriginalAddr -> SocketAddr
OriginalAddr records the originally dialed address for outbound peers,
rather than the peer's self reported address. For inbound peers, it was
nil. Here, we rename it to SocketAddr and for inbound peers, set it to
the RemoteAddr of the connection.
* use SocketAddr
Numerous places in the code call peer.NodeInfo().NetAddress().
However, this call to NetAddress() may perform a DNS lookup if the
reported NodeInfo.ListenAddr includes a name. Failure of this lookup
returns a nil address, which can lead to panics in the code.
Instead, call peer.SocketAddr() to return the static address of the
connection.
* remove nodeInfo.NetAddress()
Expose `transport.NetAddress()`, a static result determined
when the transport is created. Removing NetAddress() from the nodeInfo
prevents accidental DNS lookups.
* fixes from review
* linter
* fixes from review
* Make sure config.TimeoutBroadcastTxCommit < rpcserver.WriteTimeout()
* remove redundant comment
* libs/rpc/http_server: move Read/WriteTimeout into Config
* increase defaults for read/write timeouts
Based on this article
https://www.digitalocean.com/community/tutorials/how-to-optimize-nginx-configuration
* WriteTimeout should be larger than TimeoutBroadcastTxCommit
* set a deadline for subscribing to txs
* extract duration into const
* add two changelog entries
* Update CHANGELOG_PENDING.md
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* Update CHANGELOG_PENDING.md
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* 12 -> 10
* changelog
* changelog
* limit number of /subscribe clients and queries per client
Add the following config variables (under [rpc] section):
* max_subscription_clients
* max_subscriptions_per_client
* timeout_broadcast_tx_commit
Fixes#2826
new HTTPClient interface for subscriptions
finalize HTTPClient events interface
remove EventSubscriber
fix data race
```
WARNING: DATA RACE
Read at 0x00c000a36060 by goroutine 129:
github.com/tendermint/tendermint/rpc/client.(*Local).Subscribe.func1()
/go/src/github.com/tendermint/tendermint/rpc/client/localclient.go:168 +0x1f0
Previous write at 0x00c000a36060 by goroutine 132:
github.com/tendermint/tendermint/rpc/client.(*Local).Subscribe()
/go/src/github.com/tendermint/tendermint/rpc/client/localclient.go:191 +0x4e0
github.com/tendermint/tendermint/rpc/client.WaitForOneEvent()
/go/src/github.com/tendermint/tendermint/rpc/client/helpers.go:64 +0x178
github.com/tendermint/tendermint/rpc/client_test.TestTxEventsSentWithBroadcastTxSync.func1()
/go/src/github.com/tendermint/tendermint/rpc/client/event_test.go:139 +0x298
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
Goroutine 129 (running) created at:
github.com/tendermint/tendermint/rpc/client.(*Local).Subscribe()
/go/src/github.com/tendermint/tendermint/rpc/client/localclient.go:164 +0x4b7
github.com/tendermint/tendermint/rpc/client.WaitForOneEvent()
/go/src/github.com/tendermint/tendermint/rpc/client/helpers.go:64 +0x178
github.com/tendermint/tendermint/rpc/client_test.TestTxEventsSentWithBroadcastTxSync.func1()
/go/src/github.com/tendermint/tendermint/rpc/client/event_test.go:139 +0x298
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
Goroutine 132 (running) created at:
testing.(*T).Run()
/usr/local/go/src/testing/testing.go:878 +0x659
github.com/tendermint/tendermint/rpc/client_test.TestTxEventsSentWithBroadcastTxSync()
/go/src/github.com/tendermint/tendermint/rpc/client/event_test.go:119 +0x186
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
==================
```
lite client works (tested manually)
godoc comments
httpclient: do not close the out channel
use TimeoutBroadcastTxCommit
no timeout for unsubscribe
but 1s Local (5s HTTP) timeout for resubscribe
format code
change Subscribe#out cap to 1
and replace config vars with RPCConfig
TimeoutBroadcastTxCommit can't be greater than rpcserver.WriteTimeout
rpc: Context as first parameter to all functions
reformat code
fixes after my own review
fixes after Ethan's review
add test stubs
fix config.toml
* fixes after manual testing
- rpc: do not recommend to use BroadcastTxCommit because it's slow and wastes
Tendermint resources (pubsub)
- rpc: better error in Subscribe and BroadcastTxCommit
- HTTPClient: do not resubscribe if err = ErrAlreadySubscribed
* fixes after Ismail's review
* Update rpc/grpc/grpc_test.go
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
This issue is related to #3107
This is a first renaming/refactoring step before reworking and removing heartbeats.
As discussed with @Liamsi , we preferred to go for a couple of independent and separate PRs to simplify review work.
The changes:
Help to clarify the relation between the validator and remote signer endpoints
Differentiate between timeouts and deadlines
Prepare to encapsulate networking related code behind RemoteSigner in the next PR
My intention is to separate and encapsulate the "network related" code from the actual signer.
SignerRemote ---(uses/contains)--> SignerValidatorEndpoint <--(connects to)--> SignerServiceEndpoint ---> SignerService (future.. not here yet but would like to decouple too)
All reconnection/heartbeat/whatever code goes in the endpoints. Signer[Remote/Service] do not need to know about that.
I agree Endpoint may not be the perfect name. I tried to find something "Go-ish" enough. It is a common name in go-kit, kubernetes, etc.
Right now:
SignerValidatorEndpoint:
handles the listener
contains SignerRemote
Implements the PrivValidator interface
connects and sets a connection object in a contained SignerRemote
delegates PrivValidator some calls to SignerRemote which in turn uses the conn object that was set externally
SignerRemote:
Implements the PrivValidator interface
read/writes from a connection object directly
handles heartbeats
SignerServiceEndpoint:
Does most things in a single place
delegates to a PrivValidator IIRC.
* cleanup
* Refactoring step 1
* Refactoring step 2
* move messages to another file
* mark for future work / next steps
* mark deprecated classes in docs
* Fix linter problems
* additional linter fixes
* changelog: use issue number instead of PR number
* follow up to #3291
- rpc/test/helpers.go add StopTendermint(node) func
- remove ensureDir(filepath.Dir(walFile), 0700)
- mempool/mempool_test.go add type cleanupFunc func()
* cmd/show_validator: wrap err to make it more clear
* Close and recreate a RemoteSigner on err
* Update changelog
* Address Anton's comments / suggestions:
- update changelog
- restart TCPVal
- shut down on `ErrUnexpectedResponse`
* re-init remote signer client with fresh connection if Ping fails
- add/update TODOs in secret connection
- rename tcp.go -> tcp_client.go, same with ipc to clarify their purpose
* account for `conn returned by waitConnection can be `nil`
- also add TODO about RemoteSigner conn field
* Tests for retrying: IPC / TCP
- shorter info log on success
- set conn and use it in tests to close conn
* Tests for retrying: IPC / TCP
- shorter info log on success
- set conn and use it in tests to close conn
- add rwmutex for conn field in IPC
* comments and doc.go
* fix ipc tests. fixes#2677
* use constants for tests
* cleanup some error statements
* fixes#2784, race in tests
* remove print statement
* minor fixes from review
* update comment on sts spec
* cosmetics
* p2p/conn: add failing tests
* p2p/conn: make SecretConnection thread safe
* changelog
* IPCVal signer refactor
- use a .reset() method
- don't use embedded RemoteSignerClient
- guard RemoteSignerClient with mutex
- drop the .conn
- expose Close() on RemoteSignerClient
* apply IPCVal refactor to TCPVal
* remove mtx from RemoteSignerClient
* consolidate IPCVal and TCPVal, fixes#3104
- done in tcp_client.go
- now called SocketVal
- takes a listener in the constructor
- make tcpListener and unixListener contain all the differences
* delete ipc files
* introduce unix and tcp dialer for RemoteSigner
* rename files
- drop tcp_ prefix
- rename priv_validator.go to file.go
* bring back listener options
* fix node
* fix priv_val_server
* fix node test
* minor cleanup and comments
* split immutable and mutable parts of priv_validator.json
* fix bugs
* minor changes
* retrig test
* delete scripts/wire2amino.go
* fix test
* fixes from review
* privval: remove mtx
* rearrange priv_validator.go
* upgrade path
* write tests for the upgrade
* fix for unsafe_reset_all
* add test
* add reset test
* Decouple StartHTTP{,AndTLS}Server from Listen()
This should help solve cosmos/cosmos-sdk#2715
* Fix small mistake
* Update StartGRPCServer
* s/rpc/rpcserver/
* Start grpccore.StartGRPCServer in a goroutine
* Reinstate l.Close()
* Fix rpc/lib/test/main.go
* Update code comment
* update changelog and comments
* fix tm-monitor. more comments
* fix amino overhead computation for Tx:
- also count the fieldnum / typ3
- add method to compute overhead per Tx
- slightly clarify comment on MaxAminoOverheadForBlock
- add tests
* fix TestReapMaxBytesMaxGas according to amino overhead
* fix TestMempoolFilters according to amino overhead
* address review comments:
- add a note about fieldNum = 1
- add forgotten godoc comment
* fix and use sm.TxPreCheck
* fix test
* remove print statement
* p2p: add protocol Version to NodeInfo
* update node pkg. remove extraneous version files
* update changelog and docs
* fix test
* p2p: Version -> ProtocolVersion; more ValidateBasic and tests
* types: add Version to Header
* abci: add Version to Header
* state: add Version to State
* node: check software and state protocol versions match
* update changelog
* docs/spec: update for versions
* state: more tests
* remove TODOs
* remove empty test
Ref: #2563
I added IPC as an unencrypted alternative to SocketPV.
Besides I fixed the following aspects of SocketPV:
Added locking since we are operating on a single socket
The connection deadline is extended every time a successful packet exchange happens; otherwise the connection would always die permanently x seconds after the connection was established.
Added a ping/heartbeat mechanism to keep the connection alive; native TCP keepalives do not work in this use-case
* Extend the SecureConn socket to extend its deadline
* Add locking & ping/heartbeat packets to SocketPV
* Implement IPC PV and abstract socket signing
* Refactored IPC and SocketPV
* Implement @melekes comments
* Fixes to rebase
* p2p: NodeInfo is an interface
* (squash) fixes from review
* (squash) more fixes from review
* p2p: remove peerConn.HandshakeTimeout
* p2p: NodeInfo is two interfaces. Remove String()
* fixes from review
* remove test code from peer.RemoteIP()
* p2p: remove peer.OriginalAddr(). See #2618
* use a mockPeer in peer_set_test.go
* p2p: fix testNodeInfo naming
* p2p: remove unused var
* remove testRandNodeInfo
* fix linter
* fix retry dialing self
* fix rpc
* stop node upon receiving SIGTERM or CTRL-Ceven during genesis sleep by setting up interrupt before starting a node
Closes#2434
* call Start, not OnStart when starting a component to avoid:
```
E[09-24|10:13:15.805] Not stopping PubSub -- have not been started yet module=pubsub impl=PubSub
```
being printed on exit
This also refactors the prior mempool to filter to be known as
"precheck filter" and this new filter is called "postcheck filter"
This PR also fixes a bug where the precheck filter previously didn't
account for the amino overhead, which could a maliciously sized tx to
halt blocks from getting any txs in them.
* Move maxGas outside of function definition to avoid race condition
* Type filter funcs and make public
* Use helper method for post check
* Remove superfluous Filter suffix
* Move default pre/post checks into package
* Fix broken references
* Fix typos
* Expand on examples for checks
* follow up to removing some consensus params Refs #2382
* change args type to int64 in state#makeParams
* make valsCount and evidenceCount ints again
* MaxEvidenceBytesPerBlock: include magic number in godoc
* [spec] creating a proposal
* test state#TxFilter
* panic if MaxDataBytes is less than 0
* fixes after review
* use amino#UvarintSize to calculate overhead
0c74291f3b/encoder.go (L85-L90)
* avoid cyclic imports
* you can do better Go, come on
* remove testdouble package
Handshaker was removed from proxy package so it can be called
independently of starting the abci app connections and can return a
result to the caller.
We are swapping the exisiting listener implementation with the newly
introduced Transport and its default implementation MultiplexTransport,
removing a large chunk of old connection setup and handling scattered
over the Peer and Switch code. The Switch requires a Transport now and
handles externally passed Peer filters.
* remove ConsensusParams.TxSize and ConsensusParams.BlockGossip
Refs #2347
* block part size is now fixed
Refs #2347
* use max data size, not max bytes for tx limit
Refs #2347