Earlier this week somebody posted this in GoS Riot chat:
```
E[2019-02-12|10:38:37.596] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[length 878916964 exceeded maximum possible value of 1048576 bytes]"
E[2019-02-12|10:38:37.596] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[length 825701731 exceeded maximum possible value of 1048576 bytes]"
E[2019-02-12|10:38:37.596] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[length 1631073634 exceeded maximum possible value of 1048576 bytes]"
E[2019-02-12|10:38:37.596] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[length 912418148 exceeded maximum possible value of 1048576 bytes]"
E[2019-02-12|10:38:37.600] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[failed to read data: EOF]"
E[2019-02-12|10:38:37.600] Error on catchup replay. Proceeding to start ConsensusState anyway module=consensus err="Cannot replay height 7242. WAL does not contain #ENDHEIGHT for 7241"
E[2019-02-12|10:38:37.861] Error dialing peer module=p2p err="dial tcp 35.183.126.181:26656: i/o timeout
```
Note the length error messages. What has happened is the length field got corrupted probably. I've looked at the code and noticed that we don't check the msg size during encoding. This PR fixes that. It also improves a few error messages in WALDecoder.
* rpc/net_info: change RemoteIP type from net.IP to String
Before:
"AAAAAAAAAAAAAP//rB8ktw=="
which is amino-encoded net.IP byte slice
After:
"192.0.2.1"
Fixes#3251
* rpc/net_info: non-empty response in docs
* not related to linter: remove obsolete constants:
- `Insecure` and `Secure` and type `Security` are not used anywhere
* not related to linter: update example
- NewInsecure was deleted; change example to NewRemoteDB
* address: Binds to all network interfaces (gosec):
- bind to localhost instead of 0.0.0.0
- regenerate test key and cert for this purpose (was valid for ::) and
otherwise we would see:
transport: authentication handshake failed: x509: certificate is
valid for ::, not 127.0.0.1\"
(used https://github.com/google/keytransparency/blob/master/scripts/gen_server_keys.sh
to regenerate certs)
* use sha256 in tests instead of md5; time difference is negligible
* nolint usage of math/rand in test and add comment on its import
- crypto/rand is slower and we do not need sth more secure in tests
* enable linter in circle-ci
* another nolint math/rand in test
* replace another occurrence of md5
* consistent comment about importing math/rand
* types.NewCommit
* use types.NewCommit everywhere
* fix log in unsafe_reset
* memoize height and round in constructor
* notes about deprecating toVote
* bring back memoizeHeightRound
* evidence: NewEvidencePool takes evidenceDB
* evidence: failing TestStoreCommitDuplicate
tendermint/security#35
* GetEvidence -> GetEvidenceInfo
* fix TestStoreCommitDuplicate
* comment in VerifyEvidence
* add check if evidence was already seen
- modify EventPool interface (EventStore is not known in ApplyBlock):
- add IsCommitted method to iface
- add test
* update changelog
* fix TestStoreMark:
- priority in evidence info gets reset to zero after evidence gets committed
* review comments: simplify EvidencePool.IsCommitted
- delete obsolete EvidenceStore.IsCommitted
* add simple test for IsCommitted
* update changelog: this is actually breaking (PR number still missing)
* fix TestStoreMark:
- priority in evidence info gets reset to zero after evidence gets
committed
* review suggestion: simplify return
* Initial commit for 3181..still early
* unit test updates
* unit test updates
* fix check of dups accross updates and deletes
* simplify the processChange() func
* added overflow check utest
* Added checks for empty valset, new utest
* deepcopy changes in processUpdate()
* moved to new API, fixed tests
* test cleanup
* address review comments
* make sure votePower > 0
* gofmt fixes
* handle duplicates and invalid values
* more work on tests, review comments
* Renamed and explained K
* make TestVal private
* split verifyUpdatesAndComputeNewPriorities.., added check for deletes
* return error if validator set is empty after processing changes
* address review comments
* lint err
* Fixed the total voting power and added comments
* fix lint
* fix lint
* failing test
* fix infinite loop in addrbook
There are cases where we only have a small number of addresses marked
good ("old"), but the selection mechanism keeps trying to select more of these
addresses, and hence ends up in an infinite loop. Here we fix this to
only try and select such "old" addresses if we have enough of them. Note this
means, if we don't have enough of them, we may return more "new"
addresses than otherwise expected by the newSelectionBias.
This whole GetSelectionWithBias method probably needs to be rewritten,
but this is a quick fix for the issue.
* changelog
* fix infinite loop if not enough new addrs
* fix another potential infinite loop
if a.nNew == 0 -> pickFromOldBucket=true, but we don't have enough items
(a.nOld > len(oldBucketToAddrsMap) false)
* Revert "fix another potential infinite loop"
This reverts commit 146540c112.
* check num addresses instead of buckets, new test
* fixed the int division
* add slack to bias % in test, lint fixes
* Added checks for selection content in test
* test cleanup
* Apply suggestions from code review
Co-Authored-By: ebuchman <ethan@coinculture.info>
* address review comments
* change after Anton's review comments
* use the same docker image we use for testing
when building a binary for localnet
* switch back to circleci classic
* more review comments
* more review comments
* refactor addrbook_test
* build linux binary inside docker
in attempt to fix
```
--> Running dep
+ make build-linux
GOOS=linux GOARCH=amd64 make build
make[1]: Entering directory `/home/circleci/.go_workspace/src/github.com/tendermint/tendermint'
CGO_ENABLED=0 go build -ldflags "-X github.com/tendermint/tendermint/version.GitCommit=`git rev-parse --short=8 HEAD`" -tags 'tendermint' -o build/tendermint ./cmd/tendermint/
p2p/pex/addrbook.go:373:13: undefined: math.Round
```
* change dir from /usr to /go
* use concrete Go version for localnet binary
* check for nil addresses just to be sure
* WIP: Starts adding remote signer test harness
This commit adds a new command to Tendermint to allow for us to build a
standalone binary to test remote signers such as KMS
(https://github.com/tendermint/kms).
Right now, all it does is test that the local public key matches the
public key reported by the client, and fails at the point where it
attempts to get the client to sign a proposal.
* Fixes typo
* Fixes proposal validation test
This commit fixes the proposal validation test as per #3149. It also
moves the test harness into its own internal package to isolate its
exports from the `privval` package.
* Adds vote signing validation
* Applying recommendations from #3149
* Adds function descriptions for test harness
* Adds ability to ask remote signer to shut down
Prior to this commit, the remote signer needs to manually be shut down,
which is not ideal for automated testing. This commit allows us to send
a poison pill message to the KMS to let it shut down gracefully once
testing is done (whether the tests pass or fail).
* Adds tests for remote signer test harness
This commit makes some minor modifications to a few files to allow for
testing of the remote signer test harness. Two tests are added here:
checking for a fully successful (the ideal) case, and for the case where
the maximum number of retries has been reached when attempting to accept
incoming connections from the remote signer.
* Condenses serialization of proposals and votes using existing Tendermint functions
* Removes now-unnecessary amino import and codec
* Adds error message for vote signing failure
* Adds key extraction command for integration test
Took the code from here:
https://gist.github.com/Liamsi/a80993f24bff574bbfdbbfa9efa84bc7 to
create a simple utility command to extract a key from a local Tendermint
validator for use in KMS integration testing.
* Makes path expansion success non-compulsory
* Fixes segfault on SIGTERM
We need an additional variable to keep track of whether we're
successfully connected, otherwise hitting Ctrl+Break during execution
causes a segmentation fault. This now allows for a clean shutdown.
* Consolidates shutdown checks
* Adds comments indicating codes for easy lookup
* Adds Docker build for remote signer harness
Updates the `DOCKER/build.sh` and `DOCKER/push.sh` files to allow one to
override the image name and Dockerfile using environment variables.
Updates the primary `Makefile` as well as the `DOCKER/Makefile` to allow
for building the `remote_val_harness` Docker image.
* Adds build_remote_val_harness_docker_image to .PHONY
* Removes remote signer poison pill messaging functionality
* Reduces fluff code in command line parsing
As per
https://github.com/tendermint/tendermint/pull/3149#pullrequestreview-196171788,
this reduces the amount of fluff code in the PR down to the bare
minimum.
* Fixes ordering of error check and info log
* Moves remove_val_harness cmd into tools folder
It seems to make sense to rather keep the remote signer test harness in
its own tool folder (now rather named `tm-signer-harness` to keep with
the tool naming convention). It is actually a separate tool, not meant
to be one of the core binaries, but supplementary and supportive.
* Updates documentation for tm-signer-harness
* Refactors flag parsing to be more compact and less redundant
* Adds version sub-command help
* Removes extraneous flags parsing
* Adds CHANGELOG_PENDING entry for tm-signer-harness
* Improves test coverage
Adds a few extra parameters to the `MockPV` type to fake broken vote and
proposal signing. Also adds some more tests for the test harness so as
to increase coverage for failed cases.
* Fixes formatting for CHANGELOG_PENDING.md
* Fix formatting for documentation config
* Point users towards official Tendermint docs for tools documentation
* Point users towards official Tendermint docs for tm-signer-harness
* Remove extraneous constant
* Rename TestHarness.sc to TestHarness.spv for naming consistency
* Refactor to remove redundant goroutine
* Refactor conditional to cleaner switch statement and better error handling for listener protocol
* Remove extraneous goroutine
* Add note about installing tmkms via Cargo
* Fix typo in naming of output signing key
* Add note about where to find chain ID
* Replace /home/user with ~/ for brevity
* Fixes "signer.key" typo
* Minor edits for clarification for tm-signer-harness bulid/setup process
* p2p/conn: fix deadlock in FlushStop/OnStop
* makefile: set_with_deadlock
* close doneSendRoutine at end of sendRoutine
* conn: initialize channs in OnStart
* node: decrease retry conn timeout in test
Should fix#3256
The retry timeout was set to the default, which is the same as the
accept timeout, so it's no wonder this would fail. Here we decrease the
retry timeout so we can try many times before the accept timeout.
* p2p: increase handshake timeout in test
This fails sometimes, presumably because the handshake timeout is so low
(only 50ms). So increase it to 1s. Should fix#3187
* privval: fix race with ping. closes#3237
Pings happen in a go-routine and can happen concurrently with other
messages. Since we use a request/response protocol, we expect to send a
request and get back the corresponding response. But with pings
happening concurrently, this assumption could be violated. We were using
a mutex, but only a RWMutex, where the RLock was being held for sending
messages - this was to allow the underlying connection to be replaced if
it fails. Turns out we actually need to use a full lock (not just a read
lock) to prevent multiple requests from happening concurrently.
* node: fix test name. DelayedStop -> DelayedStart
* autofile: Wait() method
In the TestWALTruncate in consensus/wal_test.go we remove the WAL
directory at the end of the test. However the wal.Stop() does not
properly wait for the autofile group to finish shutting down. Hence it
was possible that the group's go-routine is still running when the
cleanup happens, which causes a panic since the directory disappeared.
Here we add a Wait() method to properly wait until the go-routine exits
so we can safely clean up. This fixes#2852.
* types: memoize height/round in commit instead of first vote
* types: commit.ValidateBasic in VerifyCommit
* types: new CommitSig alias for Vote
In preparation for reducing the redundancy in Commits, we introduce the
CommitSig as an alias for Vote. This is non-breaking on the protocol,
and minor breaking on the Go API, as Commit now contains a list of
CommitSig instead of Vote.
* remove dependence on ToVote
* update some comments
* fix tests
* fix tests
* fixes from review
* switch from fork (tendermint/btcd) to orig package (btcsuite/btcd); also
- remove obsolete check in test `size != -1` is always true
- WIP as the serialization still needs to be wrapped
* WIP: wrap signature & privkey, pubkey needs to be wrapped as well
* wrap pubkey too
* use "github.com/ethereum/go-ethereum/crypto/secp256k1" if cgo is
available, else use "github.com/btcsuite/btcd/btcec" and take care of
lower-S when verifying
Annoyingly, had to disable pruning when importing
github.com/ethereum/go-ethereum/ :-/
* update comment
* update comment
* emulate signature_nocgo.go for additional benchmarks:
592bf6a59c/crypto/signature_nocgo.go (L60-L76)
* use our format (r || s) in lower-s form when in the non-cgo case
* remove comment about using the C library directly
* vendor github.com/btcsuite/btcd too
* Add test for the !cgo case
* update changelog pending
Closes#3162#3163
Refs #1958, #2091, tendermint/btcd#1
> Implement a check for the blacklisted low order points, ala the X25519 has_small_order() function in libsodium
(#3010 (comment))
resolves first half of #3010