* New lint version upgrade
- linter was upgraded
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* enable-a;; is deprecated
* minor change
* another try
* some more changes
* some more changes
* reenable prealloc
* add version till bot is fixed
* Pin range scope vars
* Don't disable scopelint
This PR repairs linter errors seen when running the following commands:
golangci-lint run --no-config --disable-all=true --enable=scopelint
Contributes to #3262
* cs: check for SkipTimeoutCommit or wait timeout in handleTxsAvailable
Previously, if create_empty_blocks was set to false, TM sometimes could
create 2 consecutive blocks without actually waiting for timeoutCommit
(1 sec. by default).
```
I[2019-08-29|07:32:43.874] Executed block module=state height=47 validTxs=10 invalidTxs=0
I[2019-08-29|07:32:43.876] Committed state module=state height=47 txs=10 appHash=F88C010000000000
I[2019-08-29|07:32:44.885] Executed block module=state height=48 validTxs=2 invalidTxs=0
I[2019-08-29|07:32:44.887] Committed state module=state height=48 txs=2 appHash=FC8C010000000000
I[2019-08-29|07:32:44.908] Executed block module=state height=49 validTxs=8 invalidTxs=0
I[2019-08-29|07:32:44.909] Committed state module=state height=49 txs=8 appHash=8C8D010000000000
I[2019-08-29|07:32:45.886] Executed block module=state height=50 validTxs=2 invalidTxs=0
I[2019-08-29|07:32:45.895] Committed state module=state height=50 txs=2 appHash=908D010000000000
I[2019-08-29|07:32:45.908] Executed block module=state height=51 validTxs=8 invalidTxs=0
I[2019-08-29|07:32:45.909] Committed state module=state height=51 txs=8 appHash=A08D010000000000
```
This commit fixes that by adding a check to handleTxsAvailable.
Fixes#3908
* update changelog
* schedule timeoutCommit if StartTime is in the future
or SkipTimeoutCommit=true && we DON'T have all the votes
* fix TestReactorCreatesBlockWhenEmptyBlocksFalse
by checking if we have LastCommit or not
* address Ethan's comments
* Remove unnecessary type conversions
* Consolidate repeated strings into consts
* Clothe return statements
* Update blockchain/v1/reactor_fsm_test.go
Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>
This PR repairs linter errors seen when running the following commands:
golangci-lint run --no-config --disable-all=true --enable=unconvert
golangci-lint run --no-config --disable-all=true --enable=goconst
golangci-lint run --no-config --disable-all=true --enable=nakedret
Contributes to #3262
* Add test for bcBlockRequestMessage ValidateBasic method
* Add test for bcNoBlockResponseMessage ValidateBasic method
* Add test for bcStatusRequestMessage ValidateBasic method
* Add test for bcStatusResponseMessage ValidateBasic method
* Add blockchain v1 reactor ValidateBasic tests
* Add test for NewRoundStepMessage ValidateBasic method
* Add test for NewValidBlockMessage ValidateBasic method
* Test BlockParts Size
* Import cmn package
* Add test for ProposalPOLMessage ValidateBasic method
* Add test for BlockPartMessage ValidateBasic method
* Add test for HasVoteMessage ValidateBasic method
* Add test for VoteSetMaj23Message ValidateBasic method
* Add test for VoteSetBitsMessage ValidateBasic method
* Fix linter errors
* Improve readability
* Add test for BaseConfig ValidateBasic method
* Add test for RPCConfig ValidateBasic method
* Add test for P2PConfig ValidateBasic method
* Add test for MempoolConfig ValidateBasic method
* Add test for FastSyncConfig ValidateBasic method
* Add test for ConsensusConfig ValidateBasic method
* Add test for InstrumentationConfig ValidateBasic method
* Add test for BlockID ValidateBasic method
* Add test for SignedHeader ValidateBasic method
* Add test for MockGoodEvidence and MockBadEvidence ValidateBasic methods
* Remove debug logging
Co-Authored-By: Marko <marbar3778@yahoo.com>
* Update MempoolConfig field
* Test a single struct field at a time, for maintainability
Fixes#2740
Add gocritic as a linter
The linting is not complete, but should i complete in this PR or in a following.
23 files have been touched so it may be better to do in a following PR
Commits:
* Add gocritic to linting
- Added gocritic to linting
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* gocritic
* pr comments
* remove switch in cmdBatch
cleanup to add linter
grpc change:
https://godoc.org/google.golang.org/grpc#WithContextDialerhttps://godoc.org/google.golang.org/grpc#WithDialer
grpc/grpc-go#2627
prometheous change:
due to UninstrumentedHandler, being deprecated in the future
empty branch = empty if or else statement
didn't delete them entirely but commented
couldn't find a reason to have them
could not replicate the issue #3406
but if want to keep it commented then we should comment out the if statement as well
* Renamed wire.go to codec.go
- Wire was the previous name of amino
- Codec describes the file better than `wire` & `amino`
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* ide error
* rename amino.go to codec.go
* go routines in blockchain reactor
* Added reference to the go routine diagram
* Initial commit
* cleanup
* Undo testing_logger change, committed by mistake
* Fix the test loggers
* pulled some fsm code into pool.go
* added pool tests
* changes to the design
added block requests under peer
moved the request trigger in the reactor poolRoutine, triggered now by a ticker
in general moved everything required for making block requests smarter in the poolRoutine
added a simple map of heights to keep track of what will need to be requested next
added a few more tests
* send errors to FSM in a different channel than blocks
send errors (RemovePeer) from switch on a different channel than the
one receiving blocks
renamed channels
added more pool tests
* more pool tests
* lint errors
* more tests
* more tests
* switch fast sync to new implementation
* fixed data race in tests
* cleanup
* finished fsm tests
* address golangci comments :)
* address golangci comments :)
* Added timeout on next block needed to advance
* updating docs and cleanup
* fix issue in test from previous cleanup
* cleanup
* Added termination scenarios, tests and more cleanup
* small fixes to adr, comments and cleanup
* Fix bug in sendRequest()
If we tried to send a request to a peer not present in the switch, a
missing continue statement caused the request to be blackholed in a peer
that was removed and never retried.
While this bug was manifesting, the reactor kept asking for other
blocks that would be stored and never consumed. Added the number of
unconsumed blocks in the math for requesting blocks ahead of current
processing height so eventually there will be no more blocks requested
until the already received ones are consumed.
* remove bpPeer's didTimeout field
* Use distinct err codes for peer timeout and FSM timeouts
* Don't allow peers to update with lower height
* review comments from Ethan and Zarko
* some cleanup, renaming, comments
* Move block execution in separate goroutine
* Remove pool's numPending
* review comments
* fix lint, remove old blockchain reactor and duplicates in fsm tests
* small reorg around peer after review comments
* add the reactor spec
* verify block only once
* review comments
* change to int for max number of pending requests
* cleanup and godoc
* Add configuration flag fast sync version
* golangci fixes
* fix config template
* move both reactor versions under blockchain
* cleanup, golint, renaming stuff
* updated documentation, fixed more golint warnings
* integrate with behavior package
* sync with master
* gofmt
* add changelog_pending entry
* move to improvments
* suggestion to changelog entry
* Remove db from tendemrint in favor of tendermint/tm-cmn
- remove db from `libs`
- update dependancy, there have been no breaking changes in the updated deps
- https://github.com/grpc/grpc-go/releases
- https://github.com/golang/protobuf/releases
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* changelog add
* gofmt
* more gofmt
As per #2127, this refactors the RequestCheckTx ProtoBuf struct to allow for a flag indicating whether a query is a recheck or not (and allows for possible future, more nuanced states).
In order to pass this extended information through to the ABCI app, the proxy.AppConnMempool (and, for consistency, the proxy.AppConnConsensus) interface seems to need to be refactored along with abcicli.Client.
And, as per this comment, I've made the following modification to the protobuf definition for the RequestCheckTx structure:
enum CheckTxType {
New = 0;
Recheck = 1;
}
message RequestCheckTx {
bytes tx = 1;
CheckTxType type = 2;
}
* Refactor ABCI CheckTx to notify of recheck
As per #2127, this refactors the `RequestCheckTx` ProtoBuf struct to allow for:
1. a flag indicating whether a query is a recheck or not (and allows for
possible future, more nuanced states)
2. an `additional_data` bytes array to provide information for those more
nuanced states.
In order to pass this extended information through to the ABCI app, the
`proxy.AppConnMempool` (and, for consistency, the
`proxy.AppConnConsensus`) interface seems to need to be refactored.
Commits:
* Fix linting issue
* Add CHANGELOG_PENDING entry
* Remove extraneous explicit initialization
* Update ABCI spec doc to include new CheckTx params
* Rename method param for consistency
* Rename CheckTxType enum values and remove additional_data param
* Refactor signature of Application.CheckTx
* Refactor signature of Application.DeliverTx
* Refactor example variable names for clarity and consistency
* Rename method variables for consistency
* Rename method variables for consistency
* add a changelog entry
* update docs
* fix peer state init to late
Peer does not have a state yet. We set it in AddPeer.
We need an new interface before mconnection is started
* pex message to soon
fix reconnection pex send too fast,
error is caused lastReceivedRequests is still
not deleted when a peer reconnected
* add test case for initpeer
* add prove case
* remove potentially infinite loop
* Update consensus/reactor.go
Co-Authored-By: guagualvcha <baifudong@lancai.cn>
* Update consensus/reactor_test.go
Co-Authored-By: guagualvcha <baifudong@lancai.cn>
* document Reactor interface better
* refactor TestReactorReceiveDoesNotPanicIfAddPeerHasntBeenCalledYet
* fix merge conflicts
* blockchain: remove peer's ID from the pool in InitPeer
Refs #3338
* pex: resetPeersRequestsInfo both upon InitPeer and RemovePeer
* ensure RemovePeer is always called before InitPeer
by removing the peer from the switch last (after we've stopped it and
removed from all reactors)
* add some comments for ConsensusReactor#InitPeer
* fix pex reactor
* format code
* fix spelling
* update changelog
* remove unused methods
* do not clear lastReceivedRequests upon error
only in RemovePeer
* call InitPeer before we start the peer!
* add a comment to InitPeer
* write a test
* use waitUntilSwitchHasAtLeastNPeers func
* bring back timeouts
* Test to ensure Receive panics if InitPeer has not been called
* VoteSignBytes builds CanonicalVote
* CommitVotes implements VoteSetReader
- new CommitVotes struct holds both the Commit and the ValidatorSet and
implements VoteSetReader
- ToVote takes a ValidatorSet
* fix TestCommit
* use CommitSig.BlockID
Commits may include votes for a different BlockID, could be nil,
or different altogether. This means we can't use `commit.BlockID`
for reconstructing the sign bytes, since up to -1/3 of the commits
might be for independent BlockIDs. This means CommitSig will need to
include an indicator for what BlockID it signed - if it's not the
committed one or nil, it will need to include it fully in order to be
verified. This is unfortunate but unavoidable so long as we include
votes for non-committed BlockIDs (which we do to track validator
liveness)
* fixes from review
* remove CommitVotes. CommitSig contains address
* remove commit.canonicalVote method
* toVote -> getVote, takes valIdx
* update adr-025
* commit.ToVoteSet -> CommitToVoteSet
* add test
* fix from review
## Description
Refs #2659
Breaking changes in the mempool package:
[mempool] #2659 Mempool now an interface
old Mempool renamed to CListMempool
NewMempool renamed to NewCListMempool
Option renamed to CListOption
MempoolReactor renamed to Reactor
NewMempoolReactor renamed to NewReactor
unexpose TxID method
TxInfo.PeerID renamed to SenderID
unexpose MempoolReactor.Mempool
Breaking changes in the state package:
[state] #2659 Mempool interface moved to mempool package
MockMempool moved to top-level mock package and renamed to Mempool
Non Breaking changes in the node package:
[node] #2659 Add Mempool method, which allows you to access mempool
## Commits
* move Mempool interface into mempool package
Refs #2659
Breaking changes in the mempool package:
- Mempool now an interface
- old Mempool renamed to CListMempool
Breaking changes to state package:
- MockMempool moved to mempool/mock package and renamed to Mempool
- Mempool interface moved to mempool package
* assert CListMempool impl Mempool
* gofmt code
* rename MempoolReactor to Reactor
- combine everything into one interface
- rename TxInfo.PeerID to TxInfo.SenderID
- unexpose MempoolReactor.Mempool
* move mempool mock into top-level mock package
* add a fixme
TxsFront should not be a part of the Mempool interface
because it leaks implementation details. Instead, we need to come up
with general interface for querying the mempool so the MempoolReactor
can fetch and broadcast txs to peers.
* change node#Mempool to return interface
* save commit = new reactor arch
* Revert "save commit = new reactor arch"
This reverts commit 1bfceacd9d.
* require CListMempool in mempool.Reactor
* add two changelog entries
* fixes after my own review
* quote interfaces, structs and functions
* fixes after Ismail's review
* make node's mempool an interface
* make InitWAL/CloseWAL methods a part of Mempool interface
* fix merge conflicts
* make node's mempool an interface
* Remove deprecated PanicXXX functions from codebase
As per discussion over
[here](https://github.com/tendermint/tendermint/pull/3456#discussion_r278423492),
we need to remove these `PanicXXX` functions and eliminate our
dependence on them. In this PR, each and every `PanicXXX` function call
is replaced with a simple `panic` call.
* add a changelog entry
Should fix#3451, #2723 and #3317.
Test TestResetTimeoutPrecommitUponNewHeight is simplified so it reduces a risk of timeout failure. Furthermore, timeout we wait for TimeoutEvents is increased, and the timeout value is more precisely computed. This should hopefully decrease a chance of non-deterministic test failures.
This assertion is problematic to ensure consistently due to dependency on scheduler. On the other hand, if I am not wrong, order in which messages are read from the channel respects order in which messages are written. Therefore, process will receive 2f+1 precommits that are not all for v (one is for nil) so TriggeredTimeoutPrecommit would be set to true. So we don't need to assert it. I know that it would be better to still assert to it but I don't know how to do it without sleep and that is ugly and is causing us nondeterministic failures.
* check every block appHash
Fixes#3460
Not really needed, but it would detect if the app changed in a way it
shouldn't have.
* add a changelog entry
* no need to return an error if we panic
* rename methods
* rename methods once again
* add a test
* correct error msg
plus fix a few go-lint warnings
* better panic msg
* mempool: add a safety check, write tests for mempoolIDs
and document 65536 limit in the mempool reactor spec
follow-up to https://github.com/tendermint/tendermint/pull/2778
* rename the test
* fixes after Ismail's review
See https://github.com/tendermint/tendermint/pull/3403/files#r266208947
In #3403 we unexposed BlockTimeIota from the ABCI, but it's still part
of the ConsensusParams struct, so we have to remember to add it back
after calling PB2TM.ConsensusParams. Instead, PB2TM.ConsensusParams
should take it as an argument
Fixes#3432
Also
- init substructures to avoid panic in pb2tm.ConsensusParams
Before: if csp.Block is nil and we later try to access/write to it,
we'll panic.
After: if csp.Block is nil and we later try to access/write to it,
there'll be no panic.
Refs #3262
This fixes two small bugs:
1) lite/dbprovider: return `ok` instead of true in parse* functions. It's weird that we're ignoring `ok` value before.
2) consensus/state: previously because of the shadowing we almost never output "Error with msg". Now we declare both `added` and `err` in the beginning of the function, so there's no shadowing.
* make BlockTimeIota a consensus parameter, not a locally configurable option
Refs #2920
* make TimeIota int64 ms
Refs #2920
* update Gopkg.toml
* fixes after Ethan's review
* fix TestRemoteSignerProposalSigningFailed
* update changelog
Before we're using it to get a round state in tests. Now it can be done
by calling csX.GetRoundState. We will need to rewrite
TestStateSlashingPrevotes and TestStateSlashingPrecommits, which are
commented right now, to not rely on EventDataRoundState#RoundState
field.
Refs #1527
* green pubsub tests :OK:
* get rid of clientToQueryMap
* Subscribe and SubscribeUnbuffered
* start adapting other pkgs to new pubsub
* nope
* rename MsgAndTags to Message
* remove TagMap
it does not bring any additional benefits
* bring back EventSubscriber
* fix test
* fix data race in TestStartNextHeightCorrectly
```
Write at 0x00c0001c7418 by goroutine 796:
github.com/tendermint/tendermint/consensus.TestStartNextHeightCorrectly()
/go/src/github.com/tendermint/tendermint/consensus/state_test.go:1296 +0xad
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
Previous read at 0x00c0001c7418 by goroutine 858:
github.com/tendermint/tendermint/consensus.(*ConsensusState).addVote()
/go/src/github.com/tendermint/tendermint/consensus/state.go:1631 +0x1366
github.com/tendermint/tendermint/consensus.(*ConsensusState).tryAddVote()
/go/src/github.com/tendermint/tendermint/consensus/state.go:1476 +0x8f
github.com/tendermint/tendermint/consensus.(*ConsensusState).handleMsg()
/go/src/github.com/tendermint/tendermint/consensus/state.go:667 +0xa1e
github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine()
/go/src/github.com/tendermint/tendermint/consensus/state.go:628 +0x794
Goroutine 796 (running) created at:
testing.(*T).Run()
/usr/local/go/src/testing/testing.go:878 +0x659
testing.runTests.func1()
/usr/local/go/src/testing/testing.go:1119 +0xa8
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
testing.runTests()
/usr/local/go/src/testing/testing.go:1117 +0x4ee
testing.(*M).Run()
/usr/local/go/src/testing/testing.go:1034 +0x2ee
main.main()
_testmain.go:214 +0x332
Goroutine 858 (running) created at:
github.com/tendermint/tendermint/consensus.(*ConsensusState).startRoutines()
/go/src/github.com/tendermint/tendermint/consensus/state.go:334 +0x221
github.com/tendermint/tendermint/consensus.startTestRound()
/go/src/github.com/tendermint/tendermint/consensus/common_test.go:122 +0x63
github.com/tendermint/tendermint/consensus.TestStateFullRound1()
/go/src/github.com/tendermint/tendermint/consensus/state_test.go:255 +0x397
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
```
* fixes after my own review
* fix formatting
* wait 100ms before kicking a subscriber out
+ a test for indexer_service
* fixes after my second review
* no timeout
* add changelog entries
* fix merge conflicts
* fix typos after Thane's review
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* reformat code
* rewrite indexer service in the attempt to fix failing test
https://github.com/tendermint/tendermint/pull/3227/#issuecomment-462316527
* Revert "rewrite indexer service in the attempt to fix failing test"
This reverts commit 0d9107a098.
* another attempt to fix indexer
* fixes after Ethan's review
* use unbuffered channel when indexing transactions
Refs https://github.com/tendermint/tendermint/pull/3227#discussion_r258786716
* add a comment for EventBus#SubscribeUnbuffered
* format code
As per #3043, this adds a ticker to sync the WAL every 2s while the WAL is running.
* Flush WAL every 2s
This adds a ticker that flushes the WAL every 2s while the WAL is
running. This is related to #3043.
* Fix spelling
* Increase timeout to 2mins for slower build environments
* Make WAL sync interval configurable
* Add TODO to replace testChan with more comprehensive testBus
* Remove extraneous debug statement
* Remove testChan in favour of using system time
As per
https://github.com/tendermint/tendermint/pull/3300#discussion_r255886586,
this removes the `testChan` WAL member and replaces the approach with a
system time-oriented one. In this new approach, we keep track of the
system time at which each flush and periodic flush successfully
occurred.
The naming of the various functions is also updated here to be more
consistent with "flushing" as opposed to "sync'ing".
* Update naming convention and ensure lock for timestamp update
* Add Flush method as part of WAL interface
Adds a `Flush` method as part of the WAL interface to enforce the idea
that we can manually trigger a WAL flush from outside of the WAL. This
is employed in the consensus state management to flush the WAL prior to
signing votes/proposals, as per https://github.com/tendermint/tendermint/issues/3043#issuecomment-453853630
* Update CHANGELOG_PENDING
* Remove mutex approach and replace with DI
The dependency injection approach to dealing with testing concerns could
allow similar effects to some kind of "testing bus"-based approach. This
commit introduces an example of this, where instead of relying on
(potentially fragile) timing of things between the code and the test, we
inject code into the function under test that can signal the test
through a channel.
This allows us to avoid the `time.Sleep()`-based approach previously
employed.
* Update comment on WAL flushing during vote signing
Co-Authored-By: thanethomson <connect@thanethomson.com>
* Simplify flush interval definition
Co-Authored-By: thanethomson <connect@thanethomson.com>
* Expand commentary on WAL disk flushing
Co-Authored-By: thanethomson <connect@thanethomson.com>
* Add broken test to illustrate WAL sync test problem
Removes test-related state (dependency injection code) from the WAL data
structure and adds test code to illustrate the problem with using
`WALGenerateNBlocks` and `wal.SearchForEndHeight` to test periodic
sync'ing.
* Fix test error messages
* Use WAL group buffer size to check for flush
A function is added to `libs/autofile/group.go#Group` in order to return
the size of the buffered data (i.e. data that has not yet been flushed
to disk). The test now checks that, prior to a `time.Sleep`, the group
buffer has data in it. After the `time.Sleep` (during which time the
periodic flush should have been called), the buffer should be empty.
* Remove config root dir removal from #3291
* Add godoc for NewWAL mentioning periodic sync
* changelog: use issue number instead of PR number
* follow up to #3291
- rpc/test/helpers.go add StopTendermint(node) func
- remove ensureDir(filepath.Dir(walFile), 0700)
- mempool/mempool_test.go add type cleanupFunc func()
* cmd/show_validator: wrap err to make it more clear
* improve ResetTestRootWithChainID() concurrency safety
Rely on ioutil.TempDir() to create test root directories and ensure
multiple same-chain id test cases can run in parallel.
* Update config/toml.go
Co-Authored-By: alessio <quadrispro@ubuntu.com>
* clean up test directories after completion
Closes: #1034
* Remove redundant EnsureDir call
* s/PanicSafety()/panic()/s
* Put create dir functionality back in ResetTestRootWithChainID
* Place test directories in OS's tempdir
In modern UNIX and UNIX-like systems /tmp is very often
mounted as tmpfs. This might speed test execution a bit.
* Set 0700 to a const
* rootsDirs -> configRootDirs
* Don't double remove directories
* Avoid global variables
* Fix consensus tests
* Reduce defer stack
* Address review comments
* Try to fix tests
* Update CHANGELOG_PENDING.md
Co-Authored-By: alessio <quadrispro@ubuntu.com>
* Update consensus/common_test.go
Co-Authored-By: alessio <quadrispro@ubuntu.com>
* Update consensus/common_test.go
Co-Authored-By: alessio <quadrispro@ubuntu.com>
Earlier this week somebody posted this in GoS Riot chat:
```
E[2019-02-12|10:38:37.596] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[length 878916964 exceeded maximum possible value of 1048576 bytes]"
E[2019-02-12|10:38:37.596] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[length 825701731 exceeded maximum possible value of 1048576 bytes]"
E[2019-02-12|10:38:37.596] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[length 1631073634 exceeded maximum possible value of 1048576 bytes]"
E[2019-02-12|10:38:37.596] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[length 912418148 exceeded maximum possible value of 1048576 bytes]"
E[2019-02-12|10:38:37.600] Corrupted entry. Skipping... module=consensus wal=/home/gaia/.gaiad/data/cs.wal/wal err="DataCorruptionError[failed to read data: EOF]"
E[2019-02-12|10:38:37.600] Error on catchup replay. Proceeding to start ConsensusState anyway module=consensus err="Cannot replay height 7242. WAL does not contain #ENDHEIGHT for 7241"
E[2019-02-12|10:38:37.861] Error dialing peer module=p2p err="dial tcp 35.183.126.181:26656: i/o timeout
```
Note the length error messages. What has happened is the length field got corrupted probably. I've looked at the code and noticed that we don't check the msg size during encoding. This PR fixes that. It also improves a few error messages in WALDecoder.
* types.NewCommit
* use types.NewCommit everywhere
* fix log in unsafe_reset
* memoize height and round in constructor
* notes about deprecating toVote
* bring back memoizeHeightRound
* evidence: NewEvidencePool takes evidenceDB
* evidence: failing TestStoreCommitDuplicate
tendermint/security#35
* GetEvidence -> GetEvidenceInfo
* fix TestStoreCommitDuplicate
* comment in VerifyEvidence
* add check if evidence was already seen
- modify EventPool interface (EventStore is not known in ApplyBlock):
- add IsCommitted method to iface
- add test
* update changelog
* fix TestStoreMark:
- priority in evidence info gets reset to zero after evidence gets committed
* review comments: simplify EvidencePool.IsCommitted
- delete obsolete EvidenceStore.IsCommitted
* add simple test for IsCommitted
* update changelog: this is actually breaking (PR number still missing)
* fix TestStoreMark:
- priority in evidence info gets reset to zero after evidence gets
committed
* review suggestion: simplify return
* node: decrease retry conn timeout in test
Should fix#3256
The retry timeout was set to the default, which is the same as the
accept timeout, so it's no wonder this would fail. Here we decrease the
retry timeout so we can try many times before the accept timeout.
* p2p: increase handshake timeout in test
This fails sometimes, presumably because the handshake timeout is so low
(only 50ms). So increase it to 1s. Should fix#3187
* privval: fix race with ping. closes#3237
Pings happen in a go-routine and can happen concurrently with other
messages. Since we use a request/response protocol, we expect to send a
request and get back the corresponding response. But with pings
happening concurrently, this assumption could be violated. We were using
a mutex, but only a RWMutex, where the RLock was being held for sending
messages - this was to allow the underlying connection to be replaced if
it fails. Turns out we actually need to use a full lock (not just a read
lock) to prevent multiple requests from happening concurrently.
* node: fix test name. DelayedStop -> DelayedStart
* autofile: Wait() method
In the TestWALTruncate in consensus/wal_test.go we remove the WAL
directory at the end of the test. However the wal.Stop() does not
properly wait for the autofile group to finish shutting down. Hence it
was possible that the group's go-routine is still running when the
cleanup happens, which causes a panic since the directory disappeared.
Here we add a Wait() method to properly wait until the go-routine exits
so we can safely clean up. This fixes#2852.