* libs/common: TrapSignal accepts logger as a first parameter
and does not block anymore
* previously it was dumping "captured ..." msg to os.Stdout
* TrapSignal should not be responsible for blocking thread of execution
Refs #3238
* exit with zero (0) code upon receiving SIGTERM/SIGINT
Refs #3238
* fix formatting in docs/app-dev/abci-cli.md
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* fix formatting in docs/app-dev/abci-cli.md
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* green pubsub tests :OK:
* get rid of clientToQueryMap
* Subscribe and SubscribeUnbuffered
* start adapting other pkgs to new pubsub
* nope
* rename MsgAndTags to Message
* remove TagMap
it does not bring any additional benefits
* bring back EventSubscriber
* fix test
* fix data race in TestStartNextHeightCorrectly
```
Write at 0x00c0001c7418 by goroutine 796:
github.com/tendermint/tendermint/consensus.TestStartNextHeightCorrectly()
/go/src/github.com/tendermint/tendermint/consensus/state_test.go:1296 +0xad
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
Previous read at 0x00c0001c7418 by goroutine 858:
github.com/tendermint/tendermint/consensus.(*ConsensusState).addVote()
/go/src/github.com/tendermint/tendermint/consensus/state.go:1631 +0x1366
github.com/tendermint/tendermint/consensus.(*ConsensusState).tryAddVote()
/go/src/github.com/tendermint/tendermint/consensus/state.go:1476 +0x8f
github.com/tendermint/tendermint/consensus.(*ConsensusState).handleMsg()
/go/src/github.com/tendermint/tendermint/consensus/state.go:667 +0xa1e
github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine()
/go/src/github.com/tendermint/tendermint/consensus/state.go:628 +0x794
Goroutine 796 (running) created at:
testing.(*T).Run()
/usr/local/go/src/testing/testing.go:878 +0x659
testing.runTests.func1()
/usr/local/go/src/testing/testing.go:1119 +0xa8
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
testing.runTests()
/usr/local/go/src/testing/testing.go:1117 +0x4ee
testing.(*M).Run()
/usr/local/go/src/testing/testing.go:1034 +0x2ee
main.main()
_testmain.go:214 +0x332
Goroutine 858 (running) created at:
github.com/tendermint/tendermint/consensus.(*ConsensusState).startRoutines()
/go/src/github.com/tendermint/tendermint/consensus/state.go:334 +0x221
github.com/tendermint/tendermint/consensus.startTestRound()
/go/src/github.com/tendermint/tendermint/consensus/common_test.go:122 +0x63
github.com/tendermint/tendermint/consensus.TestStateFullRound1()
/go/src/github.com/tendermint/tendermint/consensus/state_test.go:255 +0x397
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
```
* fixes after my own review
* fix formatting
* wait 100ms before kicking a subscriber out
+ a test for indexer_service
* fixes after my second review
* no timeout
* add changelog entries
* fix merge conflicts
* fix typos after Thane's review
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* reformat code
* rewrite indexer service in the attempt to fix failing test
https://github.com/tendermint/tendermint/pull/3227/#issuecomment-462316527
* Revert "rewrite indexer service in the attempt to fix failing test"
This reverts commit 0d9107a098.
* another attempt to fix indexer
* fixes after Ethan's review
* use unbuffered channel when indexing transactions
Refs https://github.com/tendermint/tendermint/pull/3227#discussion_r258786716
* add a comment for EventBus#SubscribeUnbuffered
* format code
As per #3043, this adds a ticker to sync the WAL every 2s while the WAL is running.
* Flush WAL every 2s
This adds a ticker that flushes the WAL every 2s while the WAL is
running. This is related to #3043.
* Fix spelling
* Increase timeout to 2mins for slower build environments
* Make WAL sync interval configurable
* Add TODO to replace testChan with more comprehensive testBus
* Remove extraneous debug statement
* Remove testChan in favour of using system time
As per
https://github.com/tendermint/tendermint/pull/3300#discussion_r255886586,
this removes the `testChan` WAL member and replaces the approach with a
system time-oriented one. In this new approach, we keep track of the
system time at which each flush and periodic flush successfully
occurred.
The naming of the various functions is also updated here to be more
consistent with "flushing" as opposed to "sync'ing".
* Update naming convention and ensure lock for timestamp update
* Add Flush method as part of WAL interface
Adds a `Flush` method as part of the WAL interface to enforce the idea
that we can manually trigger a WAL flush from outside of the WAL. This
is employed in the consensus state management to flush the WAL prior to
signing votes/proposals, as per https://github.com/tendermint/tendermint/issues/3043#issuecomment-453853630
* Update CHANGELOG_PENDING
* Remove mutex approach and replace with DI
The dependency injection approach to dealing with testing concerns could
allow similar effects to some kind of "testing bus"-based approach. This
commit introduces an example of this, where instead of relying on
(potentially fragile) timing of things between the code and the test, we
inject code into the function under test that can signal the test
through a channel.
This allows us to avoid the `time.Sleep()`-based approach previously
employed.
* Update comment on WAL flushing during vote signing
Co-Authored-By: thanethomson <connect@thanethomson.com>
* Simplify flush interval definition
Co-Authored-By: thanethomson <connect@thanethomson.com>
* Expand commentary on WAL disk flushing
Co-Authored-By: thanethomson <connect@thanethomson.com>
* Add broken test to illustrate WAL sync test problem
Removes test-related state (dependency injection code) from the WAL data
structure and adds test code to illustrate the problem with using
`WALGenerateNBlocks` and `wal.SearchForEndHeight` to test periodic
sync'ing.
* Fix test error messages
* Use WAL group buffer size to check for flush
A function is added to `libs/autofile/group.go#Group` in order to return
the size of the buffered data (i.e. data that has not yet been flushed
to disk). The test now checks that, prior to a `time.Sleep`, the group
buffer has data in it. After the `time.Sleep` (during which time the
periodic flush should have been called), the buffer should be empty.
* Remove config root dir removal from #3291
* Add godoc for NewWAL mentioning periodic sync
* not related to linter: remove obsolete constants:
- `Insecure` and `Secure` and type `Security` are not used anywhere
* not related to linter: update example
- NewInsecure was deleted; change example to NewRemoteDB
* address: Binds to all network interfaces (gosec):
- bind to localhost instead of 0.0.0.0
- regenerate test key and cert for this purpose (was valid for ::) and
otherwise we would see:
transport: authentication handshake failed: x509: certificate is
valid for ::, not 127.0.0.1\"
(used https://github.com/google/keytransparency/blob/master/scripts/gen_server_keys.sh
to regenerate certs)
* use sha256 in tests instead of md5; time difference is negligible
* nolint usage of math/rand in test and add comment on its import
- crypto/rand is slower and we do not need sth more secure in tests
* enable linter in circle-ci
* another nolint math/rand in test
* replace another occurrence of md5
* consistent comment about importing math/rand
* node: decrease retry conn timeout in test
Should fix#3256
The retry timeout was set to the default, which is the same as the
accept timeout, so it's no wonder this would fail. Here we decrease the
retry timeout so we can try many times before the accept timeout.
* p2p: increase handshake timeout in test
This fails sometimes, presumably because the handshake timeout is so low
(only 50ms). So increase it to 1s. Should fix#3187
* privval: fix race with ping. closes#3237
Pings happen in a go-routine and can happen concurrently with other
messages. Since we use a request/response protocol, we expect to send a
request and get back the corresponding response. But with pings
happening concurrently, this assumption could be violated. We were using
a mutex, but only a RWMutex, where the RLock was being held for sending
messages - this was to allow the underlying connection to be replaced if
it fails. Turns out we actually need to use a full lock (not just a read
lock) to prevent multiple requests from happening concurrently.
* node: fix test name. DelayedStop -> DelayedStart
* autofile: Wait() method
In the TestWALTruncate in consensus/wal_test.go we remove the WAL
directory at the end of the test. However the wal.Stop() does not
properly wait for the autofile group to finish shutting down. Hence it
was possible that the group's go-routine is still running when the
cleanup happens, which causes a panic since the directory disappeared.
Here we add a Wait() method to properly wait until the go-routine exits
so we can safely clean up. This fixes#2852.
* Enforce validators can only use the correct pubkey type
* adapt to variable renames
* Address comments from #2636
* separate updating and validation logic
* update spec
* Add test case for TestStringSliceEqual, clarify slice copying code
* Address @ebuchman's comments
* Split up testing validator update execution, and its validation
* use READ lock/unlock in ConsensusState#GetLastHeight
Refs #2721
* do not use defers when there's no need
* fix peer formatting (output its address instead of the pointer)
```
[54310]: E[11-02|11:59:39.851] Connection failed @ sendRoutine module=p2p peer=0xb78f00 conn=MConn{74.207.236.148:26656} err="pong timeout"
```
https://github.com/tendermint/tendermint/issues/2721#issuecomment-435326581
* panic if peer has no state
https://github.com/tendermint/tendermint/issues/2721#issuecomment-435347165
It's confusing that sometimes we check if peer has a state, but most of
the times we expect it to be there
1. add79700b5/mempool/reactor.go (L138)
2. add79700b5/rpc/core/consensus.go (L196) (edited)
I will change everything to always assume peer has a state and panic
otherwise
that should help identify issues earlier
* abci/localclient: extend lock on app callback
App callback should be protected by lock as well (note this was already
done for InitChainAsync, why not for others???). Otherwise, when we
execute the block, tx might come in and call the callback in the same
time we're updating it in execBlockOnProxyApp => DATA RACE
Fixes#2721
Consensus state is locked
```
goroutine 113333 [semacquire, 309 minutes]:
sync.runtime_SemacquireMutex(0xc00180009c, 0xc0000c7e00)
/usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*RWMutex).RLock(0xc001800090)
/usr/local/go/src/sync/rwmutex.go:50 +0x4e
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).GetRoundState(0xc001800000, 0x0)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:218 +0x46
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusReactor).queryMaj23Routine(0xc0017def80, 0x11104a0, 0xc0072488f0, 0xc007248
9c0)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/reactor.go:735 +0x16d
created by github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusReactor).AddPeer
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/reactor.go:172 +0x236
```
because localClient is locked
```
goroutine 1899 [semacquire, 309 minutes]:
sync.runtime_SemacquireMutex(0xc00003363c, 0xc0000cb500)
/usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc000033638)
/usr/local/go/src/sync/mutex.go:134 +0xff
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/abci/client.(*localClient).SetResponseCallback(0xc0001fb560, 0xc007868540)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/abci/client/local_client.go:32 +0x33
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/proxy.(*appConnConsensus).SetResponseCallback(0xc00002f750, 0xc007868540)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/proxy/app_conn.go:57 +0x40
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/state.execBlockOnProxyApp(0x1104e20, 0xc002ca0ba0, 0x11092a0, 0xc00002f750, 0xc0001fe960, 0xc000bfc660, 0x110cfe0, 0xc000090330, 0xc9d12, 0xc000d9d5a0, ...)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/state/execution.go:230 +0x1fd
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock(0xc002c2a230, 0x7, 0x0, 0xc000eae880, 0x6, 0xc002e52c60, 0x16, 0x1f927, 0xc9d12, 0xc000d9d5a0, ...)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/state/execution.go:96 +0x142
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).finalizeCommit(0xc001800000, 0x1f928)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1339 +0xa3e
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).tryFinalizeCommit(0xc001800000, 0x1f928)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1270 +0x451
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).enterCommit.func1(0xc001800000, 0x0, 0x1f928)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1218 +0x90
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).enterCommit(0xc001800000, 0x1f928, 0x0)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1247 +0x6b8
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).addVote(0xc001800000, 0xc003d8dea0, 0xc000cf4cc0, 0x28, 0xf1, 0xc003bc7ad0, 0xc003bc7b10)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1659 +0xbad
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).tryAddVote(0xc001800000, 0xc003d8dea0, 0xc000cf4cc0, 0x28, 0xf1, 0xf1, 0xf1)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:1517 +0x59
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).handleMsg(0xc001800000, 0xd98200, 0xc0070dbed0, 0xc000cf4cc0, 0x28)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:660 +0x64b
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine(0xc001800000, 0x0)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:617 +0x670
created by github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).OnStart
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/consensus/state.go:311 +0x132
```
tx comes in and CheckTx is executed right when we execute the block
```
goroutine 111044 [semacquire, 309 minutes]:
sync.runtime_SemacquireMutex(0xc00003363c, 0x0)
/usr/local/go/src/runtime/sema.go:71 +0x3d
sync.(*Mutex).Lock(0xc000033638)
/usr/local/go/src/sync/mutex.go:134 +0xff
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/abci/client.(*localClient).CheckTxAsync(0xc0001fb0e0, 0xc002d94500, 0x13f, 0x280, 0x0)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/abci/client/local_client.go:85 +0x47
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/proxy.(*appConnMempool).CheckTxAsync(0xc00002f720, 0xc002d94500, 0x13f, 0x280, 0x1)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/proxy/app_conn.go:114 +0x51
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/mempool.(*Mempool).CheckTx(0xc002d3a320, 0xc002d94500, 0x13f, 0x280, 0xc0072355f0, 0x0, 0x0)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/mempool/mempool.go:316 +0x17b
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/core.BroadcastTxSync(0xc002d94500, 0x13f, 0x280, 0x0, 0x0, 0x0)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/core/mempool.go:93 +0xb8
reflect.Value.call(0xd85560, 0x10326c0, 0x13, 0xec7b8b, 0x4, 0xc00663f180, 0x1, 0x1, 0xc00663f180, 0xc00663f188, ...)
/usr/local/go/src/reflect/value.go:447 +0x449
reflect.Value.Call(0xd85560, 0x10326c0, 0x13, 0xc00663f180, 0x1, 0x1, 0x0, 0x0, 0xc005cc9344)
/usr/local/go/src/reflect/value.go:308 +0xa4
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server.makeHTTPHandler.func2(0x1102060, 0xc00663f100, 0xc0082d7900)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server/handlers.go:269 +0x188
net/http.HandlerFunc.ServeHTTP(0xc002c81f20, 0x1102060, 0xc00663f100, 0xc0082d7900)
/usr/local/go/src/net/http/server.go:1964 +0x44
net/http.(*ServeMux).ServeHTTP(0xc002c81b60, 0x1102060, 0xc00663f100, 0xc0082d7900)
/usr/local/go/src/net/http/server.go:2361 +0x127
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server.maxBytesHandler.ServeHTTP(0x10f8a40, 0xc002c81b60, 0xf4240, 0x1102060, 0xc00663f100, 0xc0082d7900)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server/http_server.go:219 +0xcf
github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server.RecoverAndLogHandler.func1(0x1103220, 0xc00121e620, 0xc0082d7900)
/root/go/src/github.com/MinterTeam/minter-go-node/vendor/github.com/tendermint/tendermint/rpc/lib/server/http_server.go:192 +0x394
net/http.HandlerFunc.ServeHTTP(0xc002c06ea0, 0x1103220, 0xc00121e620, 0xc0082d7900)
/usr/local/go/src/net/http/server.go:1964 +0x44
net/http.serverHandler.ServeHTTP(0xc001a1aa90, 0x1103220, 0xc00121e620, 0xc0082d7900)
/usr/local/go/src/net/http/server.go:2741 +0xab
net/http.(*conn).serve(0xc00785a3c0, 0x11041a0, 0xc000f844c0)
/usr/local/go/src/net/http/server.go:1847 +0x646
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:2851 +0x2f5
```
* consensus: use read lock in Receive#VoteMessage
* use defer to unlock mutex because application might panic
* use defer in every method of the localClient
* add a changelog entry
* drain channels before Unsubscribe(All)
Read 55362ed766/libs/pubsub/pubsub.go (L13)
for the detailed explanation of the issue.
We'll need to fix it someday. Make sure to keep an eye on
https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-033-pubsub.md
* retry instead of panic when peer has no state in reactors other than consensus
in /dump_consensus_state RPC endpoint, skip a peer with no state
* rpc/core/mempool: simplify error messages
* rpc/core/mempool: use time.After instead of timer
also, do not log DeliverTx result (to be consistent with other memthods)
* unlock before calling the callback in reqRes#SetCallback
* [libs/events] add more godoc comments
* [libs/events] refactor code
- improve var naming
- improve code structure
- do not use defers for unlocking mutexes (defer takes time)
* Fix random distribution in bitArray.PickRandom
Previously it was very biased. 63 "_" followed by a single "x" had
much greater odds of being chosen. Additionally, the last element was
skewed. This fixes that by first preproccessing the set of all true
indices, and then randomly selecting a single element from there.
This commit also makes the code here significantly simpler, and
improves test cases.
* unlock mtx right after we select true indices
There's a time window after we call RotateFile() where autofile#index+1
does not exist. It will be created during the next call to Write(). BUT
if somebody calls NewReader() before Write(), it will fail with "open
/tmp/wal#index+1/wal: no such file or directory"
We must create file (either by calling gr.Head.openFile() or directly)
during NewReader() to ensure read calls succeed.
Closes#2538
* fix Group.RotateFile need call Flush() before rename. #2428
* fix some review issue. #2428
refactor Group's config: replace setting member with initial option
* fix a handwriting mistake
* fix a time window error between rename and write.
* fix a syntax mistake.
* change option name Get_ to With_
* fix review issue
* fix review issue
These functions were not used anywhere within tendermint, or the
cosmos-sdk. (The functionality is already duplicated in the cosmos-sdk
types package)
* common: Delete unused functions within byteslice
* remove more unused code from strings.go and int.go
* Remove more unused code from int.go
* Fix testcase