Follow-up to feedback from #1286, this change simplifies the connection
handling in the SocketClient and makes the communication via TCP more
robust. It introduces the tcpTimeoutListener to encapsulate accept and
i/o timeout handling as well as connection keep-alive, this type could
likely be upgraded to handle more fine-grained tuning of the tcp stack
(linger, nodelay, etc.) according to the properties we desire. The same
methods should be applied to the RemoteSigner which will be overhauled
when the priv_val_server is fleshed out.
* require private key
* simplify connect logic
* break out conn upgrades to tcpTimeoutListener
* extend test coverage and simplify component setup
Follow-up to #1255 aligning with the expectation that the external
signing process connects to the node. The SocketClient will block on
start until one connection has been established, support for multiple
signers connected simultaneously is a planned future extension.
* SocketClient accepts connection
* PrivValSocketServer renamed to RemoteSigner
* extend tests
To achieve faster feedback cycles for our feature PRs this change
reduces the average buildtime from 35 to ~6min by utilising their new
2.0 offering based on docker and nomad. We make use of parallel build
steps wherever possible so that the duration is determined by the
slowest test suite (p2p).
This is an intermediate step until we move our CI/CD completely
on-premise for more control and added security.
* expose AuthEnc in the P2P config
if AuthEnc is true, dialed peers must have a node ID in the address and
it must match the persistent pubkey from the secret handshake.
Refs #1157
* fixes after my own review
* fix docs
* fix build failure
```
p2p/pex/pex_reactor_test.go:288:88: cannot use seed.NodeInfo().NetAddress() (type *p2p.NetAddress) as type string in array or slice literal
```
* p2p: introduce peerConn to simplify peer creation
* Introduce `peerConn` containing the known fields of `peer`
* `peer` only created in `sw.addPeer` once handshake is complete and NodeInfo is checked
* Eliminates some mutable variables and makes the code flow better
* Simplifies the `newXxxPeer` funcs
* Use ID instead of PubKey where possible.
* SetPubKeyFilter -> SetIDFilter
* nodeInfo.Validate takes ID
* remove peer.PubKey()
* persistent node ids
* fixes from review
* test: use ip_plus_id.sh more
* fix invalid memory panic during fast_sync test
```
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: panic: runtime error: invalid memory address or nil pointer dereference
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x98dd3e]
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]:
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: goroutine 3432 [running]:
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: github.com/tendermint/tendermint/p2p.newOutboundPeerConn(0xc423fd1380, 0xc420933e00, 0x1, 0x1239a60, 0
xc420128c40, 0x2, 0x42caf6, 0xc42001f300, 0xc422831d98, 0xc4227951c0, ...)
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: #011/go/src/github.com/tendermint/tendermint/p2p/peer.go:123 +0x31e
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: github.com/tendermint/tendermint/p2p.(*Switch).addOutboundPeerWithConfig(0xc4200ad040, 0xc423fd1380, 0
xc420933e00, 0xc423f48801, 0x28, 0x2)
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: #011/go/src/github.com/tendermint/tendermint/p2p/switch.go:455 +0x12b
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: github.com/tendermint/tendermint/p2p.(*Switch).DialPeerWithAddress(0xc4200ad040, 0xc423fd1380, 0x1, 0x
0, 0x0)
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: #011/go/src/github.com/tendermint/tendermint/p2p/switch.go:371 +0xdc
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: github.com/tendermint/tendermint/p2p.(*Switch).reconnectToPeer(0xc4200ad040, 0x123e000, 0xc42007bb00)
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: #011/go/src/github.com/tendermint/tendermint/p2p/switch.go:290 +0x25f
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: created by github.com/tendermint/tendermint/p2p.(*Switch).StopPeerForError
2018-02-21T06:30:05Z box887.localdomain docker/local_testnet_4[14907]: #011/go/src/github.com/tendermint/tendermint/p2p/switch.go:256 +0x1b7
```
Following ADDR 008 the node will connect to an external
process to handle signing requests. Operation of the external process is
left to the user.
* introduce alias for PrivValidator interface on socket client
* integrate socket client in node
* structure tests
* remove unnecessary flag
As calls to the private validator can involve side-effects like network
communication it is desirable for all methods returning an error to not
break the control flow of the caller.
* adjust PrivValidator interface
Fixes https://github.com/tendermint/tendermint/issues/1189
For every TxEventBuffer.Flush() invoking, we were invoking
a:
b.events = make([]EventDataTx, 0, b.capacity)
whose intention is to innocently clear the events slice but
maintain the underlying capacity.
However, unfortunately this is memory and garbage collection intensive
which is linear in the number of events added. If an attack had access
to our code somehow, invoking .Flush() in tight loops would be a sure
way to cause huge GC pressure, and say if they added about 1e9
events maliciously, every Flush() would take at least 3.2seconds
which is enough to now control our application.
The new using of the capacity preserving slice clearing idiom
takes a constant time regardless of the number of elements with zero
allocations so we are killing many birds with one stone i.e
b.events = b.events[:0]
For benchmarking results, please see
https://gist.github.com/odeke-em/532c14ab67d71c9c0b95518a7a526058
for a reference on how things can get out of hand easily.
if we call it after, we might receive a "fresh" transaction from
`broadcast_tx_sync` before old transactions (which were not
committed).
Refs #1091
```
Commit is called with a lock on the mempool, meaning no calls to CheckTx
can start. However, since CheckTx is called async in the mempool
connection, some CheckTx might have already "sailed", when the lock is
released in the mempool and Commit proceeds.
Then, that spurious CheckTx has not yet "begun" in the ABCI app (stuck
in transport?). Instead, ABCI app manages to start to process the
Commit. Next, the spurious, "sailed" CheckTx happens in the wrong place.
```
* Vulnerability in light client proxy
When calling GetCertifiedCommit the light client proxy would call
Certify and even on error return the Commit as if it had been correctly
certified.
Now it returns the error correctly and returns an empty Commit on error.
* Improve names for clarity
The lite package now contains StaticCertifier, DynamicCertifier and
InqueringCertifier. This also changes the method receivers from one
letter to two letter names, which will make future refactoring easier
and follows the coding standards.
* Fix test failures
* Rename files
* remove dead code