This issue is related to #3107
This is a first renaming/refactoring step before reworking and removing heartbeats.
As discussed with @Liamsi , we preferred to go for a couple of independent and separate PRs to simplify review work.
The changes:
Help to clarify the relation between the validator and remote signer endpoints
Differentiate between timeouts and deadlines
Prepare to encapsulate networking related code behind RemoteSigner in the next PR
My intention is to separate and encapsulate the "network related" code from the actual signer.
SignerRemote ---(uses/contains)--> SignerValidatorEndpoint <--(connects to)--> SignerServiceEndpoint ---> SignerService (future.. not here yet but would like to decouple too)
All reconnection/heartbeat/whatever code goes in the endpoints. Signer[Remote/Service] do not need to know about that.
I agree Endpoint may not be the perfect name. I tried to find something "Go-ish" enough. It is a common name in go-kit, kubernetes, etc.
Right now:
SignerValidatorEndpoint:
handles the listener
contains SignerRemote
Implements the PrivValidator interface
connects and sets a connection object in a contained SignerRemote
delegates PrivValidator some calls to SignerRemote which in turn uses the conn object that was set externally
SignerRemote:
Implements the PrivValidator interface
read/writes from a connection object directly
handles heartbeats
SignerServiceEndpoint:
Does most things in a single place
delegates to a PrivValidator IIRC.
* cleanup
* Refactoring step 1
* Refactoring step 2
* move messages to another file
* mark for future work / next steps
* mark deprecated classes in docs
* Fix linter problems
* additional linter fixes
* green pubsub tests :OK:
* get rid of clientToQueryMap
* Subscribe and SubscribeUnbuffered
* start adapting other pkgs to new pubsub
* nope
* rename MsgAndTags to Message
* remove TagMap
it does not bring any additional benefits
* bring back EventSubscriber
* fix test
* fix data race in TestStartNextHeightCorrectly
```
Write at 0x00c0001c7418 by goroutine 796:
github.com/tendermint/tendermint/consensus.TestStartNextHeightCorrectly()
/go/src/github.com/tendermint/tendermint/consensus/state_test.go:1296 +0xad
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
Previous read at 0x00c0001c7418 by goroutine 858:
github.com/tendermint/tendermint/consensus.(*ConsensusState).addVote()
/go/src/github.com/tendermint/tendermint/consensus/state.go:1631 +0x1366
github.com/tendermint/tendermint/consensus.(*ConsensusState).tryAddVote()
/go/src/github.com/tendermint/tendermint/consensus/state.go:1476 +0x8f
github.com/tendermint/tendermint/consensus.(*ConsensusState).handleMsg()
/go/src/github.com/tendermint/tendermint/consensus/state.go:667 +0xa1e
github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine()
/go/src/github.com/tendermint/tendermint/consensus/state.go:628 +0x794
Goroutine 796 (running) created at:
testing.(*T).Run()
/usr/local/go/src/testing/testing.go:878 +0x659
testing.runTests.func1()
/usr/local/go/src/testing/testing.go:1119 +0xa8
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
testing.runTests()
/usr/local/go/src/testing/testing.go:1117 +0x4ee
testing.(*M).Run()
/usr/local/go/src/testing/testing.go:1034 +0x2ee
main.main()
_testmain.go:214 +0x332
Goroutine 858 (running) created at:
github.com/tendermint/tendermint/consensus.(*ConsensusState).startRoutines()
/go/src/github.com/tendermint/tendermint/consensus/state.go:334 +0x221
github.com/tendermint/tendermint/consensus.startTestRound()
/go/src/github.com/tendermint/tendermint/consensus/common_test.go:122 +0x63
github.com/tendermint/tendermint/consensus.TestStateFullRound1()
/go/src/github.com/tendermint/tendermint/consensus/state_test.go:255 +0x397
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
```
* fixes after my own review
* fix formatting
* wait 100ms before kicking a subscriber out
+ a test for indexer_service
* fixes after my second review
* no timeout
* add changelog entries
* fix merge conflicts
* fix typos after Thane's review
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* reformat code
* rewrite indexer service in the attempt to fix failing test
https://github.com/tendermint/tendermint/pull/3227/#issuecomment-462316527
* Revert "rewrite indexer service in the attempt to fix failing test"
This reverts commit 0d9107a098.
* another attempt to fix indexer
* fixes after Ethan's review
* use unbuffered channel when indexing transactions
Refs https://github.com/tendermint/tendermint/pull/3227#discussion_r258786716
* add a comment for EventBus#SubscribeUnbuffered
* format code
* changelog: use issue number instead of PR number
* follow up to #3291
- rpc/test/helpers.go add StopTendermint(node) func
- remove ensureDir(filepath.Dir(walFile), 0700)
- mempool/mempool_test.go add type cleanupFunc func()
* cmd/show_validator: wrap err to make it more clear
* improve ResetTestRootWithChainID() concurrency safety
Rely on ioutil.TempDir() to create test root directories and ensure
multiple same-chain id test cases can run in parallel.
* Update config/toml.go
Co-Authored-By: alessio <quadrispro@ubuntu.com>
* clean up test directories after completion
Closes: #1034
* Remove redundant EnsureDir call
* s/PanicSafety()/panic()/s
* Put create dir functionality back in ResetTestRootWithChainID
* Place test directories in OS's tempdir
In modern UNIX and UNIX-like systems /tmp is very often
mounted as tmpfs. This might speed test execution a bit.
* Set 0700 to a const
* rootsDirs -> configRootDirs
* Don't double remove directories
* Avoid global variables
* Fix consensus tests
* Reduce defer stack
* Address review comments
* Try to fix tests
* Update CHANGELOG_PENDING.md
Co-Authored-By: alessio <quadrispro@ubuntu.com>
* Update consensus/common_test.go
Co-Authored-By: alessio <quadrispro@ubuntu.com>
* Update consensus/common_test.go
Co-Authored-By: alessio <quadrispro@ubuntu.com>
* types.NewCommit
* use types.NewCommit everywhere
* fix log in unsafe_reset
* memoize height and round in constructor
* notes about deprecating toVote
* bring back memoizeHeightRound
* node: decrease retry conn timeout in test
Should fix#3256
The retry timeout was set to the default, which is the same as the
accept timeout, so it's no wonder this would fail. Here we decrease the
retry timeout so we can try many times before the accept timeout.
* p2p: increase handshake timeout in test
This fails sometimes, presumably because the handshake timeout is so low
(only 50ms). So increase it to 1s. Should fix#3187
* privval: fix race with ping. closes#3237
Pings happen in a go-routine and can happen concurrently with other
messages. Since we use a request/response protocol, we expect to send a
request and get back the corresponding response. But with pings
happening concurrently, this assumption could be violated. We were using
a mutex, but only a RWMutex, where the RLock was being held for sending
messages - this was to allow the underlying connection to be replaced if
it fails. Turns out we actually need to use a full lock (not just a read
lock) to prevent multiple requests from happening concurrently.
* node: fix test name. DelayedStop -> DelayedStart
* autofile: Wait() method
In the TestWALTruncate in consensus/wal_test.go we remove the WAL
directory at the end of the test. However the wal.Stop() does not
properly wait for the autofile group to finish shutting down. Hence it
was possible that the group's go-routine is still running when the
cleanup happens, which causes a panic since the directory disappeared.
Here we add a Wait() method to properly wait until the go-routine exits
so we can safely clean up. This fixes#2852.
* consensus: createProposalBlock function
* blockExecutor.CreateProposalBlock
- factored out of consensus pkg into a method on blockExec
- new private interfaces for mempool ("txNotifier") and evpool with one function each
- consensus tests still require more mempool methods
* failing test for CreateProposalBlock
* Fix bug in include evidece into block
* evidence: change maxBytes to maxSize
* MaxEvidencePerBlock
- changed to return both the max number and the max bytes
- preparation for #2590
* changelog
* fix linter
* Fix from review
Co-Authored-By: ebuchman <ethan@coinculture.info>
* Close and recreate a RemoteSigner on err
* Update changelog
* Address Anton's comments / suggestions:
- update changelog
- restart TCPVal
- shut down on `ErrUnexpectedResponse`
* re-init remote signer client with fresh connection if Ping fails
- add/update TODOs in secret connection
- rename tcp.go -> tcp_client.go, same with ipc to clarify their purpose
* account for `conn returned by waitConnection can be `nil`
- also add TODO about RemoteSigner conn field
* Tests for retrying: IPC / TCP
- shorter info log on success
- set conn and use it in tests to close conn
* Tests for retrying: IPC / TCP
- shorter info log on success
- set conn and use it in tests to close conn
- add rwmutex for conn field in IPC
* comments and doc.go
* fix ipc tests. fixes#2677
* use constants for tests
* cleanup some error statements
* fixes#2784, race in tests
* remove print statement
* minor fixes from review
* update comment on sts spec
* cosmetics
* p2p/conn: add failing tests
* p2p/conn: make SecretConnection thread safe
* changelog
* IPCVal signer refactor
- use a .reset() method
- don't use embedded RemoteSignerClient
- guard RemoteSignerClient with mutex
- drop the .conn
- expose Close() on RemoteSignerClient
* apply IPCVal refactor to TCPVal
* remove mtx from RemoteSignerClient
* consolidate IPCVal and TCPVal, fixes#3104
- done in tcp_client.go
- now called SocketVal
- takes a listener in the constructor
- make tcpListener and unixListener contain all the differences
* delete ipc files
* introduce unix and tcp dialer for RemoteSigner
* rename files
- drop tcp_ prefix
- rename priv_validator.go to file.go
* bring back listener options
* fix node
* fix priv_val_server
* fix node test
* minor cleanup and comments
* split immutable and mutable parts of priv_validator.json
* fix bugs
* minor changes
* retrig test
* delete scripts/wire2amino.go
* fix test
* fixes from review
* privval: remove mtx
* rearrange priv_validator.go
* upgrade path
* write tests for the upgrade
* fix for unsafe_reset_all
* add test
* add reset test
* Decouple StartHTTP{,AndTLS}Server from Listen()
This should help solve cosmos/cosmos-sdk#2715
* Fix small mistake
* Update StartGRPCServer
* s/rpc/rpcserver/
* Start grpccore.StartGRPCServer in a goroutine
* Reinstate l.Close()
* Fix rpc/lib/test/main.go
* Update code comment
* update changelog and comments
* fix tm-monitor. more comments
* fix amino overhead computation for Tx:
- also count the fieldnum / typ3
- add method to compute overhead per Tx
- slightly clarify comment on MaxAminoOverheadForBlock
- add tests
* fix TestReapMaxBytesMaxGas according to amino overhead
* fix TestMempoolFilters according to amino overhead
* address review comments:
- add a note about fieldNum = 1
- add forgotten godoc comment
* fix and use sm.TxPreCheck
* fix test
* remove print statement
* p2p: add protocol Version to NodeInfo
* update node pkg. remove extraneous version files
* update changelog and docs
* fix test
* p2p: Version -> ProtocolVersion; more ValidateBasic and tests
* types: add Version to Header
* abci: add Version to Header
* state: add Version to State
* node: check software and state protocol versions match
* update changelog
* docs/spec: update for versions
* state: more tests
* remove TODOs
* remove empty test
Ref: #2563
I added IPC as an unencrypted alternative to SocketPV.
Besides I fixed the following aspects of SocketPV:
Added locking since we are operating on a single socket
The connection deadline is extended every time a successful packet exchange happens; otherwise the connection would always die permanently x seconds after the connection was established.
Added a ping/heartbeat mechanism to keep the connection alive; native TCP keepalives do not work in this use-case
* Extend the SecureConn socket to extend its deadline
* Add locking & ping/heartbeat packets to SocketPV
* Implement IPC PV and abstract socket signing
* Refactored IPC and SocketPV
* Implement @melekes comments
* Fixes to rebase
* p2p: NodeInfo is an interface
* (squash) fixes from review
* (squash) more fixes from review
* p2p: remove peerConn.HandshakeTimeout
* p2p: NodeInfo is two interfaces. Remove String()
* fixes from review
* remove test code from peer.RemoteIP()
* p2p: remove peer.OriginalAddr(). See #2618
* use a mockPeer in peer_set_test.go
* p2p: fix testNodeInfo naming
* p2p: remove unused var
* remove testRandNodeInfo
* fix linter
* fix retry dialing self
* fix rpc
* stop node upon receiving SIGTERM or CTRL-Ceven during genesis sleep by setting up interrupt before starting a node
Closes#2434
* call Start, not OnStart when starting a component to avoid:
```
E[09-24|10:13:15.805] Not stopping PubSub -- have not been started yet module=pubsub impl=PubSub
```
being printed on exit
This also refactors the prior mempool to filter to be known as
"precheck filter" and this new filter is called "postcheck filter"
This PR also fixes a bug where the precheck filter previously didn't
account for the amino overhead, which could a maliciously sized tx to
halt blocks from getting any txs in them.
* Move maxGas outside of function definition to avoid race condition
* Type filter funcs and make public
* Use helper method for post check
* Remove superfluous Filter suffix
* Move default pre/post checks into package
* Fix broken references
* Fix typos
* Expand on examples for checks
* follow up to removing some consensus params Refs #2382
* change args type to int64 in state#makeParams
* make valsCount and evidenceCount ints again
* MaxEvidenceBytesPerBlock: include magic number in godoc
* [spec] creating a proposal
* test state#TxFilter
* panic if MaxDataBytes is less than 0
* fixes after review
* use amino#UvarintSize to calculate overhead
0c74291f3b/encoder.go (L85-L90)
* avoid cyclic imports
* you can do better Go, come on
* remove testdouble package
Handshaker was removed from proxy package so it can be called
independently of starting the abci app connections and can return a
result to the caller.
We are swapping the exisiting listener implementation with the newly
introduced Transport and its default implementation MultiplexTransport,
removing a large chunk of old connection setup and handling scattered
over the Peer and Switch code. The Switch requires a Transport now and
handles externally passed Peer filters.
* remove ConsensusParams.TxSize and ConsensusParams.BlockGossip
Refs #2347
* block part size is now fixed
Refs #2347
* use max data size, not max bytes for tx limit
Refs #2347