This PR is related to #3107 and a continuation of #3351
It is important to emphasise that in the privval original design, client/server and listening/dialing roles are inverted and do not follow a conventional interaction.
Given two hosts A and B:
Host A is listener/client
Host B is dialer/server (contains the secret key)
When A requires a signature, it needs to wait for B to dial in before it can issue a request.
A only accepts a single connection and any failure leads to dropping the connection and waiting for B to reconnect.
The original rationale behind this design was based on security.
Host B only allows outbound connections to a list of whitelisted hosts.
It is not possible to reach B unless B dials in. There are no listening/open ports in B.
This PR results in the following changes:
Refactors ping/heartbeat to avoid previously existing race conditions.
Separates transport (dialer/listener) from signing (client/server) concerns to simplify workflow.
Unifies and abstracts away the differences between unix and tcp sockets.
A single signer endpoint implementation unifies connection handling code (read/write/close/connection obj)
The signer request handler (server side) is customizable to increase testability.
Updates and extends unit tests
A high level overview of the classes is as follows:
Transport (endpoints): The following classes take care of establishing a connection
SignerDialerEndpoint
SignerListeningEndpoint
SignerEndpoint groups common functionality (read/write/timeouts/etc.)
Signing (client/server): The following classes take care of exchanging request/responses
SignerClient
SignerServer
This PR also closes#3601
Commits:
* refactoring - work in progress
* reworking unit tests
* Encapsulating and fixing unit tests
* Improve tests
* Clean up
* Fix/improve unit tests
* clean up tests
* Improving service endpoint
* fixing unit test
* fix linter issues
* avoid invalid cache values (improve later?)
* complete implementation
* wip
* improved connection loop
* Improve reconnections + fixing unit tests
* addressing comments
* small formatting changes
* clean up
* Update node/node.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* Update privval/signer_client.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* Update privval/signer_client_test.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* check during initialization
* dropping connecting when writing fails
* removing break
* use t.log instead
* unifying and using cmn.GetFreePort()
* review fixes
* reordering and unifying drop connection
* closing instead of signalling
* refactored service loop
* removed superfluous brackets
* GetPubKey can return errors
* Revert "GetPubKey can return errors"
This reverts commit 68c06f19b4.
* adding entry to changelog
* Update CHANGELOG_PENDING.md
Co-Authored-By: jleni <juan.leni@zondax.ch>
* Update privval/signer_client.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* Update privval/signer_dialer_endpoint.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* Update privval/signer_dialer_endpoint.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* Update privval/signer_dialer_endpoint.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* Update privval/signer_dialer_endpoint.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* Update privval/signer_listener_endpoint_test.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* updating node.go
* review fixes
* fixes linter
* fixing unit test
* small fixes in comments
* addressing review comments
* addressing review comments 2
* reverting suggestion
* Update privval/signer_client_test.go
Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>
* Update privval/signer_client_test.go
Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>
* Update privval/signer_listener_endpoint_test.go
Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>
* do not expose brokenSignerDialerEndpoint
* clean up logging
* unifying methods
shorten test time
signer also drops
* reenabling pings
* improving testability + unit test
* fixing go fmt + unit test
* remove unused code
* Addressing review comments
* simplifying connection workflow
* fix linter/go import issue
* using base service quit
* updating comment
* Simplifying design + adjusting names
* fixing linter issues
* refactoring test harness + fixes
* Addressing review comments
* cleaning up
* adding additional error check
This issue is related to #3107
This is a first renaming/refactoring step before reworking and removing heartbeats.
As discussed with @Liamsi , we preferred to go for a couple of independent and separate PRs to simplify review work.
The changes:
Help to clarify the relation between the validator and remote signer endpoints
Differentiate between timeouts and deadlines
Prepare to encapsulate networking related code behind RemoteSigner in the next PR
My intention is to separate and encapsulate the "network related" code from the actual signer.
SignerRemote ---(uses/contains)--> SignerValidatorEndpoint <--(connects to)--> SignerServiceEndpoint ---> SignerService (future.. not here yet but would like to decouple too)
All reconnection/heartbeat/whatever code goes in the endpoints. Signer[Remote/Service] do not need to know about that.
I agree Endpoint may not be the perfect name. I tried to find something "Go-ish" enough. It is a common name in go-kit, kubernetes, etc.
Right now:
SignerValidatorEndpoint:
handles the listener
contains SignerRemote
Implements the PrivValidator interface
connects and sets a connection object in a contained SignerRemote
delegates PrivValidator some calls to SignerRemote which in turn uses the conn object that was set externally
SignerRemote:
Implements the PrivValidator interface
read/writes from a connection object directly
handles heartbeats
SignerServiceEndpoint:
Does most things in a single place
delegates to a PrivValidator IIRC.
* cleanup
* Refactoring step 1
* Refactoring step 2
* move messages to another file
* mark for future work / next steps
* mark deprecated classes in docs
* Fix linter problems
* additional linter fixes
* Adds a random suffix to temporary Unix sockets during testing
* Adds Unix domain socket tests for client (FAILING)
This adds Unix domain socket tests for the privval client. Right now,
one of the tests (TestRemoteSignerRetry) fails, probably because the
Unix domain socket state is known instantaneously on both sides by the
OS. Committing this to collaborate on the error.
* Removes extraneous logging
* Completes testing of Unix sockets client version
This completes the testing of the client connecting via Unix sockets.
There are two specific tests (TestSocketPVDeadline and
TestRemoteSignerRetryTCPOnly) that are only relevant to TCP connections.
* Renames test to show TCP-specificity
* Adds testing into closures for consistency (forgot previously)
* Moves test specific to RemoteSigner into own file
As per discussion on #3132, `TestRemoteSignerRetryTCPOnly` doesn't
really belong with the client tests. This moves it into its own file
related to the `RemoteSigner` class.
* Close and recreate a RemoteSigner on err
* Update changelog
* Address Anton's comments / suggestions:
- update changelog
- restart TCPVal
- shut down on `ErrUnexpectedResponse`
* re-init remote signer client with fresh connection if Ping fails
- add/update TODOs in secret connection
- rename tcp.go -> tcp_client.go, same with ipc to clarify their purpose
* account for `conn returned by waitConnection can be `nil`
- also add TODO about RemoteSigner conn field
* Tests for retrying: IPC / TCP
- shorter info log on success
- set conn and use it in tests to close conn
* Tests for retrying: IPC / TCP
- shorter info log on success
- set conn and use it in tests to close conn
- add rwmutex for conn field in IPC
* comments and doc.go
* fix ipc tests. fixes#2677
* use constants for tests
* cleanup some error statements
* fixes#2784, race in tests
* remove print statement
* minor fixes from review
* update comment on sts spec
* cosmetics
* p2p/conn: add failing tests
* p2p/conn: make SecretConnection thread safe
* changelog
* IPCVal signer refactor
- use a .reset() method
- don't use embedded RemoteSignerClient
- guard RemoteSignerClient with mutex
- drop the .conn
- expose Close() on RemoteSignerClient
* apply IPCVal refactor to TCPVal
* remove mtx from RemoteSignerClient
* consolidate IPCVal and TCPVal, fixes#3104
- done in tcp_client.go
- now called SocketVal
- takes a listener in the constructor
- make tcpListener and unixListener contain all the differences
* delete ipc files
* introduce unix and tcp dialer for RemoteSigner
* rename files
- drop tcp_ prefix
- rename priv_validator.go to file.go
* bring back listener options
* fix node
* fix priv_val_server
* fix node test
* minor cleanup and comments
Ref: #2563
I added IPC as an unencrypted alternative to SocketPV.
Besides I fixed the following aspects of SocketPV:
Added locking since we are operating on a single socket
The connection deadline is extended every time a successful packet exchange happens; otherwise the connection would always die permanently x seconds after the connection was established.
Added a ping/heartbeat mechanism to keep the connection alive; native TCP keepalives do not work in this use-case
* Extend the SecureConn socket to extend its deadline
* Add locking & ping/heartbeat packets to SocketPV
* Implement IPC PV and abstract socket signing
* Refactored IPC and SocketPV
* Implement @melekes comments
* Fixes to rebase
* switch to amino for SignBytes and add Reply with error message
- currently only Vote is done
* switch Reply type in socket for other messages
- add error description on error
* add TODOs regarding error handling
* address comments from peer review session (thx @xla)
- contains all changes besides the test-coverage / error'ing branches
* increase test coverage:
- add tests for each newly introduced error'ing code path
* return error if received wrong response
* add test for wrong response branches (ErrUnexpectedResponse)
* update CHANGELOG_PENDING and related documentation (spec)
* fix typo: s/CanonicallockID/CanonicalBlockID
* fixes from review
Currently the top level directory contains basically all of the code
for the crypto package. This PR moves the crypto code into submodules
in a similar manner to what `golang/x/crypto` does. This improves code
organization.
Ref discussion: https://github.com/tendermint/tendermint/pull/1966Closes#1956
This is a maintenance change to move the private validator package out
of the types and to a top-level location. There is no good reason to
keep it under the types and it will more clearly coommunicate where
additions related to the privval belong. It leaves the interface and the
mock in types for now as it would introduce circular dependency between
privval and types, this should be resolved eventually.
* mv priv_validator to privval pkg
* use consistent `privval` as import
Follow-up to #1255
Follow-up to feedback from #1286, this change simplifies the connection
handling in the SocketClient and makes the communication via TCP more
robust. It introduces the tcpTimeoutListener to encapsulate accept and
i/o timeout handling as well as connection keep-alive, this type could
likely be upgraded to handle more fine-grained tuning of the tcp stack
(linger, nodelay, etc.) according to the properties we desire. The same
methods should be applied to the RemoteSigner which will be overhauled
when the priv_val_server is fleshed out.
* require private key
* simplify connect logic
* break out conn upgrades to tcpTimeoutListener
* extend test coverage and simplify component setup
Follow-up to #1255 aligning with the expectation that the external
signing process connects to the node. The SocketClient will block on
start until one connection has been established, support for multiple
signers connected simultaneously is a planned future extension.
* SocketClient accepts connection
* PrivValSocketServer renamed to RemoteSigner
* extend tests
To achieve faster feedback cycles for our feature PRs this change
reduces the average buildtime from 35 to ~6min by utilising their new
2.0 offering based on docker and nomad. We make use of parallel build
steps wherever possible so that the duration is determined by the
slowest test suite (p2p).
This is an intermediate step until we move our CI/CD completely
on-premise for more control and added security.
Following ADDR 008 the node will connect to an external
process to handle signing requests. Operation of the external process is
left to the user.
* introduce alias for PrivValidator interface on socket client
* integrate socket client in node
* structure tests
* remove unnecessary flag
As calls to the private validator can involve side-effects like network
communication it is desirable for all methods returning an error to not
break the control flow of the caller.
* adjust PrivValidator interface