This PR is related to #3107 and a continuation of #3351
It is important to emphasise that in the privval original design, client/server and listening/dialing roles are inverted and do not follow a conventional interaction.
Given two hosts A and B:
Host A is listener/client
Host B is dialer/server (contains the secret key)
When A requires a signature, it needs to wait for B to dial in before it can issue a request.
A only accepts a single connection and any failure leads to dropping the connection and waiting for B to reconnect.
The original rationale behind this design was based on security.
Host B only allows outbound connections to a list of whitelisted hosts.
It is not possible to reach B unless B dials in. There are no listening/open ports in B.
This PR results in the following changes:
Refactors ping/heartbeat to avoid previously existing race conditions.
Separates transport (dialer/listener) from signing (client/server) concerns to simplify workflow.
Unifies and abstracts away the differences between unix and tcp sockets.
A single signer endpoint implementation unifies connection handling code (read/write/close/connection obj)
The signer request handler (server side) is customizable to increase testability.
Updates and extends unit tests
A high level overview of the classes is as follows:
Transport (endpoints): The following classes take care of establishing a connection
SignerDialerEndpoint
SignerListeningEndpoint
SignerEndpoint groups common functionality (read/write/timeouts/etc.)
Signing (client/server): The following classes take care of exchanging request/responses
SignerClient
SignerServer
This PR also closes#3601
Commits:
* refactoring - work in progress
* reworking unit tests
* Encapsulating and fixing unit tests
* Improve tests
* Clean up
* Fix/improve unit tests
* clean up tests
* Improving service endpoint
* fixing unit test
* fix linter issues
* avoid invalid cache values (improve later?)
* complete implementation
* wip
* improved connection loop
* Improve reconnections + fixing unit tests
* addressing comments
* small formatting changes
* clean up
* Update node/node.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* Update privval/signer_client.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* Update privval/signer_client_test.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* check during initialization
* dropping connecting when writing fails
* removing break
* use t.log instead
* unifying and using cmn.GetFreePort()
* review fixes
* reordering and unifying drop connection
* closing instead of signalling
* refactored service loop
* removed superfluous brackets
* GetPubKey can return errors
* Revert "GetPubKey can return errors"
This reverts commit 68c06f19b4.
* adding entry to changelog
* Update CHANGELOG_PENDING.md
Co-Authored-By: jleni <juan.leni@zondax.ch>
* Update privval/signer_client.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* Update privval/signer_dialer_endpoint.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* Update privval/signer_dialer_endpoint.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* Update privval/signer_dialer_endpoint.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* Update privval/signer_dialer_endpoint.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* Update privval/signer_listener_endpoint_test.go
Co-Authored-By: jleni <juan.leni@zondax.ch>
* updating node.go
* review fixes
* fixes linter
* fixing unit test
* small fixes in comments
* addressing review comments
* addressing review comments 2
* reverting suggestion
* Update privval/signer_client_test.go
Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>
* Update privval/signer_client_test.go
Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>
* Update privval/signer_listener_endpoint_test.go
Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com>
* do not expose brokenSignerDialerEndpoint
* clean up logging
* unifying methods
shorten test time
signer also drops
* reenabling pings
* improving testability + unit test
* fixing go fmt + unit test
* remove unused code
* Addressing review comments
* simplifying connection workflow
* fix linter/go import issue
* using base service quit
* updating comment
* Simplifying design + adjusting names
* fixing linter issues
* refactoring test harness + fixes
* Addressing review comments
* cleaning up
* adding additional error check
* node: allow replacing existing p2p.Reactor(s)
using [`CustomReactors`
option](https://godoc.org/github.com/tendermint/tendermint/node#CustomReactors).
Warning: beware of accidental name clashes. Here is the list of existing
reactors: MEMPOOL, BLOCKCHAIN, CONSENSUS, EVIDENCE, PEX.
* check the absence of "CUSTOM" prefix
* merge 2 tests
* add doc.go to node package
* Renamed wire.go to codec.go
- Wire was the previous name of amino
- Codec describes the file better than `wire` & `amino`
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* ide error
* rename amino.go to codec.go
* go routines in blockchain reactor
* Added reference to the go routine diagram
* Initial commit
* cleanup
* Undo testing_logger change, committed by mistake
* Fix the test loggers
* pulled some fsm code into pool.go
* added pool tests
* changes to the design
added block requests under peer
moved the request trigger in the reactor poolRoutine, triggered now by a ticker
in general moved everything required for making block requests smarter in the poolRoutine
added a simple map of heights to keep track of what will need to be requested next
added a few more tests
* send errors to FSM in a different channel than blocks
send errors (RemovePeer) from switch on a different channel than the
one receiving blocks
renamed channels
added more pool tests
* more pool tests
* lint errors
* more tests
* more tests
* switch fast sync to new implementation
* fixed data race in tests
* cleanup
* finished fsm tests
* address golangci comments :)
* address golangci comments :)
* Added timeout on next block needed to advance
* updating docs and cleanup
* fix issue in test from previous cleanup
* cleanup
* Added termination scenarios, tests and more cleanup
* small fixes to adr, comments and cleanup
* Fix bug in sendRequest()
If we tried to send a request to a peer not present in the switch, a
missing continue statement caused the request to be blackholed in a peer
that was removed and never retried.
While this bug was manifesting, the reactor kept asking for other
blocks that would be stored and never consumed. Added the number of
unconsumed blocks in the math for requesting blocks ahead of current
processing height so eventually there will be no more blocks requested
until the already received ones are consumed.
* remove bpPeer's didTimeout field
* Use distinct err codes for peer timeout and FSM timeouts
* Don't allow peers to update with lower height
* review comments from Ethan and Zarko
* some cleanup, renaming, comments
* Move block execution in separate goroutine
* Remove pool's numPending
* review comments
* fix lint, remove old blockchain reactor and duplicates in fsm tests
* small reorg around peer after review comments
* add the reactor spec
* verify block only once
* review comments
* change to int for max number of pending requests
* cleanup and godoc
* Add configuration flag fast sync version
* golangci fixes
* fix config template
* move both reactor versions under blockchain
* cleanup, golint, renaming stuff
* updated documentation, fixed more golint warnings
* integrate with behavior package
* sync with master
* gofmt
* add changelog_pending entry
* move to improvments
* suggestion to changelog entry
* Remove db from tendemrint in favor of tendermint/tm-cmn
- remove db from `libs`
- update dependancy, there have been no breaking changes in the updated deps
- https://github.com/grpc/grpc-go/releases
- https://github.com/golang/protobuf/releases
Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
* changelog add
* gofmt
* more gofmt
* change invocation of NewNode across
* custom reactor name are prefixed with CUSTOM_
* upgate changelog pending
* improve comments
* node: refactor NewNode to use functional options
* node: fix a bug where `nil` is recorded as node's address
Solution
AddOurAddress when we know it
sw.NetAddress is nil in createAddrBookAndSetOnSwitch
it's set by n.transport.Listen function, which is called during start
Fixes#3716
* use addr instead of n.sw.NetAddress
* add both ExternalAddress and ListenAddress as our addresses
Fixes#3521
The function NewNetAddressStringWithOptionalID is from a time when peer
IDs were optional. They're not anymore. So this should be renamed to
NewNetAddressString and should ensure the ID is provided.
* update changelog
* use NewNetAddress in transport tests
* use NewNetAddress in TestTransportMultiplexAcceptMultiple
* Move GenesisDocProvider and DefaultGenesisDocProviderFunc
GenesisDocProvider, being a provider of *types.GenesisDoc, makes sense
to be part of the types package.
DefaultGenesisDocProviderFunc, which relies on *config.Config to produce
a types.GenesisDocProvider, makes sense being part of the config
package.
* Add aliases to avoid breaking node package API
* Revert to original structure
After discussion, it appears as though the best place for the relocated
structures is still in the node package. This means that for the v0.31.6
release and into the future, there will be no changes two these two
entities' APIs.
## Description
also
handle errors from DialPeersAsync
remove nil addr from log msg
fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode
This is a follow-up from
#3593 (review)
Fixes most of the #3617, except #3593 (comment)
## Commits
* rpc: /dial_peers: only mark peers as persistent if flag is on
also
- handle errors from DialPeersAsync
- remove nil addr from log msg
- fix TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode
This is a follow-up from
https://github.com/tendermint/tendermint/pull/3593#pullrequestreview-233556909
* remove a call to AddPersistentPeers
TestDialFail will trigger a reconnect
## Description
Refs #2659
Breaking changes in the mempool package:
[mempool] #2659 Mempool now an interface
old Mempool renamed to CListMempool
NewMempool renamed to NewCListMempool
Option renamed to CListOption
MempoolReactor renamed to Reactor
NewMempoolReactor renamed to NewReactor
unexpose TxID method
TxInfo.PeerID renamed to SenderID
unexpose MempoolReactor.Mempool
Breaking changes in the state package:
[state] #2659 Mempool interface moved to mempool package
MockMempool moved to top-level mock package and renamed to Mempool
Non Breaking changes in the node package:
[node] #2659 Add Mempool method, which allows you to access mempool
## Commits
* move Mempool interface into mempool package
Refs #2659
Breaking changes in the mempool package:
- Mempool now an interface
- old Mempool renamed to CListMempool
Breaking changes to state package:
- MockMempool moved to mempool/mock package and renamed to Mempool
- Mempool interface moved to mempool package
* assert CListMempool impl Mempool
* gofmt code
* rename MempoolReactor to Reactor
- combine everything into one interface
- rename TxInfo.PeerID to TxInfo.SenderID
- unexpose MempoolReactor.Mempool
* move mempool mock into top-level mock package
* add a fixme
TxsFront should not be a part of the Mempool interface
because it leaks implementation details. Instead, we need to come up
with general interface for querying the mempool so the MempoolReactor
can fetch and broadcast txs to peers.
* change node#Mempool to return interface
* save commit = new reactor arch
* Revert "save commit = new reactor arch"
This reverts commit 1bfceacd9d.
* require CListMempool in mempool.Reactor
* add two changelog entries
* fixes after my own review
* quote interfaces, structs and functions
* fixes after Ismail's review
* make node's mempool an interface
* make InitWAL/CloseWAL methods a part of Mempool interface
* fix merge conflicts
* make node's mempool an interface
## Description
Previously only outbound peers can be persistent.
Now, even if the peer is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect. This part is actually optional and can be reverted.
Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes#3362
## Commits
* make persistent prop independent of conn direction
Previously only outbound peers can be persistent. Now, even if the peer
is inbound, if it's marked as persistent, when/if conn is lost,
Tendermint will try to reconnect.
Plus, seed won't disconnect from inbound peer if it's marked as
persistent. Fixes#3362
* fix TestPEXReactorDialPeer test
* add a changelog entry
* update changelog
* add two tests
* reformat code
* test UnsafeDialPeers and UnsafeDialSeeds
* add TestSwitchDialPeersAsync
* spec: update p2p/config spec
* fixes after Ismail's review
* Apply suggestions from code review
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* fix merge conflict
* remove sleep from TestPEXReactorDoesNotDisconnectFromPersistentPeerInSeedMode
We don't need it actually.
The node.NewNode method is pretty complex at the moment, an in order to address issues like #3156, we need to simplify the interface for partial node instantiation. In some places, we don't need to build up a full node (like in the node.TestCreateProposalBlock test), but the complexity of such partial instantiation needs to be reduced.
This PR aims to eventually make this easier/simpler.
See also this gist https://gist.github.com/thanethomson/56e1640d057a26186e38ad678a1d114c for some background work done when starting to refactor here.
## Commits:
* [WIP] Refactor node.NewNode to simplify
The `node.NewNode` method is pretty complex at the moment, an in order
to address issues like #3156, we need to simplify the interface for
partial node instantiation. In some places, we don't need to build up a
full node (like in the `node.TestCreateProposalBlock` test), but the
complexity of such partial instantiation needs to be reduced.
This PR aims to eventually make this easier/simpler.
* Refactor state loading and genesis doc provider into state package
* Refactor for clarity of return parameters
* Fix incorrect capitalization of error messages
* Simplify extracted functions' names
* Document optionally-prefixed functions
* Refactor optionallyFastSync for clarity of separation of concerns
* Restructure function for early return
* Restructure function for early return
* Remove dependence on deprecated panic functions
* refactor code a bit more
plus, expose PEXReactor on node
* align logger names
* add a changelog entry
* align logger names 2
* add a note about PEXReactor returning nil
* Remove deprecated PanicXXX functions from codebase
As per discussion over
[here](https://github.com/tendermint/tendermint/pull/3456#discussion_r278423492),
we need to remove these `PanicXXX` functions and eliminate our
dependence on them. In this PR, each and every `PanicXXX` function call
is replaced with a simple `panic` call.
* add a changelog entry
ListOfKnownAddresses is removed
panic if addrbook size is less than zero
CrawlPeers does not attempt to connect to existing or peers we're currently dialing
various perf. fixes
improved tests (though not complete)
move IsDialingOrExistingAddress check into DialPeerWithAddress (Fixes#2716)
* addrbook: preallocate memory when saving addrbook to file
* addrbook: remove oldestFirst struct and check for ID
* oldestFirst replaced with sort.Slice
* ID is now mandatory, so no need to check
* addrbook: remove ListOfKnownAddresses
GetSelection is used instead in seed mode.
* addrbook: panic if size is less than 0
* rewrite addrbook#saveToFile to not use a counter
* test AttemptDisconnects func
* move IsDialingOrExistingAddress check into DialPeerWithAddress
* save and cleanup crawl peer data
* get rid of DefaultSeedDisconnectWaitPeriod
* make linter happy
* fix TestPEXReactorSeedMode
* fix comment
* add a changelog entry
* Apply suggestions from code review
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* rename ErrDialingOrExistingAddress to ErrCurrentlyDialingOrExistingAddress
* lowercase errors
* do not persist seed data
pros:
- no extra files
- less IO
cons:
- if the node crashes, seed might crawl a peer too soon
* fixes after Ethan's review
* add a changelog entry
* we should only consult Switch about peers
checking addrbook size does not make sense since only PEX reactor uses
it for dialing peers!
https://github.com/tendermint/tendermint/pull/3011#discussion_r270948875
* OriginalAddr -> SocketAddr
OriginalAddr records the originally dialed address for outbound peers,
rather than the peer's self reported address. For inbound peers, it was
nil. Here, we rename it to SocketAddr and for inbound peers, set it to
the RemoteAddr of the connection.
* use SocketAddr
Numerous places in the code call peer.NodeInfo().NetAddress().
However, this call to NetAddress() may perform a DNS lookup if the
reported NodeInfo.ListenAddr includes a name. Failure of this lookup
returns a nil address, which can lead to panics in the code.
Instead, call peer.SocketAddr() to return the static address of the
connection.
* remove nodeInfo.NetAddress()
Expose `transport.NetAddress()`, a static result determined
when the transport is created. Removing NetAddress() from the nodeInfo
prevents accidental DNS lookups.
* fixes from review
* linter
* fixes from review
* Make sure config.TimeoutBroadcastTxCommit < rpcserver.WriteTimeout()
* remove redundant comment
* libs/rpc/http_server: move Read/WriteTimeout into Config
* increase defaults for read/write timeouts
Based on this article
https://www.digitalocean.com/community/tutorials/how-to-optimize-nginx-configuration
* WriteTimeout should be larger than TimeoutBroadcastTxCommit
* set a deadline for subscribing to txs
* extract duration into const
* add two changelog entries
* Update CHANGELOG_PENDING.md
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* Update CHANGELOG_PENDING.md
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* 12 -> 10
* changelog
* changelog
* limit number of /subscribe clients and queries per client
Add the following config variables (under [rpc] section):
* max_subscription_clients
* max_subscriptions_per_client
* timeout_broadcast_tx_commit
Fixes#2826
new HTTPClient interface for subscriptions
finalize HTTPClient events interface
remove EventSubscriber
fix data race
```
WARNING: DATA RACE
Read at 0x00c000a36060 by goroutine 129:
github.com/tendermint/tendermint/rpc/client.(*Local).Subscribe.func1()
/go/src/github.com/tendermint/tendermint/rpc/client/localclient.go:168 +0x1f0
Previous write at 0x00c000a36060 by goroutine 132:
github.com/tendermint/tendermint/rpc/client.(*Local).Subscribe()
/go/src/github.com/tendermint/tendermint/rpc/client/localclient.go:191 +0x4e0
github.com/tendermint/tendermint/rpc/client.WaitForOneEvent()
/go/src/github.com/tendermint/tendermint/rpc/client/helpers.go:64 +0x178
github.com/tendermint/tendermint/rpc/client_test.TestTxEventsSentWithBroadcastTxSync.func1()
/go/src/github.com/tendermint/tendermint/rpc/client/event_test.go:139 +0x298
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
Goroutine 129 (running) created at:
github.com/tendermint/tendermint/rpc/client.(*Local).Subscribe()
/go/src/github.com/tendermint/tendermint/rpc/client/localclient.go:164 +0x4b7
github.com/tendermint/tendermint/rpc/client.WaitForOneEvent()
/go/src/github.com/tendermint/tendermint/rpc/client/helpers.go:64 +0x178
github.com/tendermint/tendermint/rpc/client_test.TestTxEventsSentWithBroadcastTxSync.func1()
/go/src/github.com/tendermint/tendermint/rpc/client/event_test.go:139 +0x298
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
Goroutine 132 (running) created at:
testing.(*T).Run()
/usr/local/go/src/testing/testing.go:878 +0x659
github.com/tendermint/tendermint/rpc/client_test.TestTxEventsSentWithBroadcastTxSync()
/go/src/github.com/tendermint/tendermint/rpc/client/event_test.go:119 +0x186
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
==================
```
lite client works (tested manually)
godoc comments
httpclient: do not close the out channel
use TimeoutBroadcastTxCommit
no timeout for unsubscribe
but 1s Local (5s HTTP) timeout for resubscribe
format code
change Subscribe#out cap to 1
and replace config vars with RPCConfig
TimeoutBroadcastTxCommit can't be greater than rpcserver.WriteTimeout
rpc: Context as first parameter to all functions
reformat code
fixes after my own review
fixes after Ethan's review
add test stubs
fix config.toml
* fixes after manual testing
- rpc: do not recommend to use BroadcastTxCommit because it's slow and wastes
Tendermint resources (pubsub)
- rpc: better error in Subscribe and BroadcastTxCommit
- HTTPClient: do not resubscribe if err = ErrAlreadySubscribed
* fixes after Ismail's review
* Update rpc/grpc/grpc_test.go
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* make BlockTimeIota a consensus parameter, not a locally configurable option
Refs #2920
* make TimeIota int64 ms
Refs #2920
* update Gopkg.toml
* fixes after Ethan's review
* fix TestRemoteSignerProposalSigningFailed
* update changelog
This issue is related to #3107
This is a first renaming/refactoring step before reworking and removing heartbeats.
As discussed with @Liamsi , we preferred to go for a couple of independent and separate PRs to simplify review work.
The changes:
Help to clarify the relation between the validator and remote signer endpoints
Differentiate between timeouts and deadlines
Prepare to encapsulate networking related code behind RemoteSigner in the next PR
My intention is to separate and encapsulate the "network related" code from the actual signer.
SignerRemote ---(uses/contains)--> SignerValidatorEndpoint <--(connects to)--> SignerServiceEndpoint ---> SignerService (future.. not here yet but would like to decouple too)
All reconnection/heartbeat/whatever code goes in the endpoints. Signer[Remote/Service] do not need to know about that.
I agree Endpoint may not be the perfect name. I tried to find something "Go-ish" enough. It is a common name in go-kit, kubernetes, etc.
Right now:
SignerValidatorEndpoint:
handles the listener
contains SignerRemote
Implements the PrivValidator interface
connects and sets a connection object in a contained SignerRemote
delegates PrivValidator some calls to SignerRemote which in turn uses the conn object that was set externally
SignerRemote:
Implements the PrivValidator interface
read/writes from a connection object directly
handles heartbeats
SignerServiceEndpoint:
Does most things in a single place
delegates to a PrivValidator IIRC.
* cleanup
* Refactoring step 1
* Refactoring step 2
* move messages to another file
* mark for future work / next steps
* mark deprecated classes in docs
* Fix linter problems
* additional linter fixes
* green pubsub tests :OK:
* get rid of clientToQueryMap
* Subscribe and SubscribeUnbuffered
* start adapting other pkgs to new pubsub
* nope
* rename MsgAndTags to Message
* remove TagMap
it does not bring any additional benefits
* bring back EventSubscriber
* fix test
* fix data race in TestStartNextHeightCorrectly
```
Write at 0x00c0001c7418 by goroutine 796:
github.com/tendermint/tendermint/consensus.TestStartNextHeightCorrectly()
/go/src/github.com/tendermint/tendermint/consensus/state_test.go:1296 +0xad
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
Previous read at 0x00c0001c7418 by goroutine 858:
github.com/tendermint/tendermint/consensus.(*ConsensusState).addVote()
/go/src/github.com/tendermint/tendermint/consensus/state.go:1631 +0x1366
github.com/tendermint/tendermint/consensus.(*ConsensusState).tryAddVote()
/go/src/github.com/tendermint/tendermint/consensus/state.go:1476 +0x8f
github.com/tendermint/tendermint/consensus.(*ConsensusState).handleMsg()
/go/src/github.com/tendermint/tendermint/consensus/state.go:667 +0xa1e
github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine()
/go/src/github.com/tendermint/tendermint/consensus/state.go:628 +0x794
Goroutine 796 (running) created at:
testing.(*T).Run()
/usr/local/go/src/testing/testing.go:878 +0x659
testing.runTests.func1()
/usr/local/go/src/testing/testing.go:1119 +0xa8
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
testing.runTests()
/usr/local/go/src/testing/testing.go:1117 +0x4ee
testing.(*M).Run()
/usr/local/go/src/testing/testing.go:1034 +0x2ee
main.main()
_testmain.go:214 +0x332
Goroutine 858 (running) created at:
github.com/tendermint/tendermint/consensus.(*ConsensusState).startRoutines()
/go/src/github.com/tendermint/tendermint/consensus/state.go:334 +0x221
github.com/tendermint/tendermint/consensus.startTestRound()
/go/src/github.com/tendermint/tendermint/consensus/common_test.go:122 +0x63
github.com/tendermint/tendermint/consensus.TestStateFullRound1()
/go/src/github.com/tendermint/tendermint/consensus/state_test.go:255 +0x397
testing.tRunner()
/usr/local/go/src/testing/testing.go:827 +0x162
```
* fixes after my own review
* fix formatting
* wait 100ms before kicking a subscriber out
+ a test for indexer_service
* fixes after my second review
* no timeout
* add changelog entries
* fix merge conflicts
* fix typos after Thane's review
Co-Authored-By: melekes <anton.kalyaev@gmail.com>
* reformat code
* rewrite indexer service in the attempt to fix failing test
https://github.com/tendermint/tendermint/pull/3227/#issuecomment-462316527
* Revert "rewrite indexer service in the attempt to fix failing test"
This reverts commit 0d9107a098.
* another attempt to fix indexer
* fixes after Ethan's review
* use unbuffered channel when indexing transactions
Refs https://github.com/tendermint/tendermint/pull/3227#discussion_r258786716
* add a comment for EventBus#SubscribeUnbuffered
* format code
* changelog: use issue number instead of PR number
* follow up to #3291
- rpc/test/helpers.go add StopTendermint(node) func
- remove ensureDir(filepath.Dir(walFile), 0700)
- mempool/mempool_test.go add type cleanupFunc func()
* cmd/show_validator: wrap err to make it more clear
* improve ResetTestRootWithChainID() concurrency safety
Rely on ioutil.TempDir() to create test root directories and ensure
multiple same-chain id test cases can run in parallel.
* Update config/toml.go
Co-Authored-By: alessio <quadrispro@ubuntu.com>
* clean up test directories after completion
Closes: #1034
* Remove redundant EnsureDir call
* s/PanicSafety()/panic()/s
* Put create dir functionality back in ResetTestRootWithChainID
* Place test directories in OS's tempdir
In modern UNIX and UNIX-like systems /tmp is very often
mounted as tmpfs. This might speed test execution a bit.
* Set 0700 to a const
* rootsDirs -> configRootDirs
* Don't double remove directories
* Avoid global variables
* Fix consensus tests
* Reduce defer stack
* Address review comments
* Try to fix tests
* Update CHANGELOG_PENDING.md
Co-Authored-By: alessio <quadrispro@ubuntu.com>
* Update consensus/common_test.go
Co-Authored-By: alessio <quadrispro@ubuntu.com>
* Update consensus/common_test.go
Co-Authored-By: alessio <quadrispro@ubuntu.com>
* types.NewCommit
* use types.NewCommit everywhere
* fix log in unsafe_reset
* memoize height and round in constructor
* notes about deprecating toVote
* bring back memoizeHeightRound
* node: decrease retry conn timeout in test
Should fix#3256
The retry timeout was set to the default, which is the same as the
accept timeout, so it's no wonder this would fail. Here we decrease the
retry timeout so we can try many times before the accept timeout.
* p2p: increase handshake timeout in test
This fails sometimes, presumably because the handshake timeout is so low
(only 50ms). So increase it to 1s. Should fix#3187
* privval: fix race with ping. closes#3237
Pings happen in a go-routine and can happen concurrently with other
messages. Since we use a request/response protocol, we expect to send a
request and get back the corresponding response. But with pings
happening concurrently, this assumption could be violated. We were using
a mutex, but only a RWMutex, where the RLock was being held for sending
messages - this was to allow the underlying connection to be replaced if
it fails. Turns out we actually need to use a full lock (not just a read
lock) to prevent multiple requests from happening concurrently.
* node: fix test name. DelayedStop -> DelayedStart
* autofile: Wait() method
In the TestWALTruncate in consensus/wal_test.go we remove the WAL
directory at the end of the test. However the wal.Stop() does not
properly wait for the autofile group to finish shutting down. Hence it
was possible that the group's go-routine is still running when the
cleanup happens, which causes a panic since the directory disappeared.
Here we add a Wait() method to properly wait until the go-routine exits
so we can safely clean up. This fixes#2852.
* consensus: createProposalBlock function
* blockExecutor.CreateProposalBlock
- factored out of consensus pkg into a method on blockExec
- new private interfaces for mempool ("txNotifier") and evpool with one function each
- consensus tests still require more mempool methods
* failing test for CreateProposalBlock
* Fix bug in include evidece into block
* evidence: change maxBytes to maxSize
* MaxEvidencePerBlock
- changed to return both the max number and the max bytes
- preparation for #2590
* changelog
* fix linter
* Fix from review
Co-Authored-By: ebuchman <ethan@coinculture.info>
* Close and recreate a RemoteSigner on err
* Update changelog
* Address Anton's comments / suggestions:
- update changelog
- restart TCPVal
- shut down on `ErrUnexpectedResponse`
* re-init remote signer client with fresh connection if Ping fails
- add/update TODOs in secret connection
- rename tcp.go -> tcp_client.go, same with ipc to clarify their purpose
* account for `conn returned by waitConnection can be `nil`
- also add TODO about RemoteSigner conn field
* Tests for retrying: IPC / TCP
- shorter info log on success
- set conn and use it in tests to close conn
* Tests for retrying: IPC / TCP
- shorter info log on success
- set conn and use it in tests to close conn
- add rwmutex for conn field in IPC
* comments and doc.go
* fix ipc tests. fixes#2677
* use constants for tests
* cleanup some error statements
* fixes#2784, race in tests
* remove print statement
* minor fixes from review
* update comment on sts spec
* cosmetics
* p2p/conn: add failing tests
* p2p/conn: make SecretConnection thread safe
* changelog
* IPCVal signer refactor
- use a .reset() method
- don't use embedded RemoteSignerClient
- guard RemoteSignerClient with mutex
- drop the .conn
- expose Close() on RemoteSignerClient
* apply IPCVal refactor to TCPVal
* remove mtx from RemoteSignerClient
* consolidate IPCVal and TCPVal, fixes#3104
- done in tcp_client.go
- now called SocketVal
- takes a listener in the constructor
- make tcpListener and unixListener contain all the differences
* delete ipc files
* introduce unix and tcp dialer for RemoteSigner
* rename files
- drop tcp_ prefix
- rename priv_validator.go to file.go
* bring back listener options
* fix node
* fix priv_val_server
* fix node test
* minor cleanup and comments
* split immutable and mutable parts of priv_validator.json
* fix bugs
* minor changes
* retrig test
* delete scripts/wire2amino.go
* fix test
* fixes from review
* privval: remove mtx
* rearrange priv_validator.go
* upgrade path
* write tests for the upgrade
* fix for unsafe_reset_all
* add test
* add reset test
* Decouple StartHTTP{,AndTLS}Server from Listen()
This should help solve cosmos/cosmos-sdk#2715
* Fix small mistake
* Update StartGRPCServer
* s/rpc/rpcserver/
* Start grpccore.StartGRPCServer in a goroutine
* Reinstate l.Close()
* Fix rpc/lib/test/main.go
* Update code comment
* update changelog and comments
* fix tm-monitor. more comments