blockchain: Reorg reactor (#3561)
* go routines in blockchain reactor
* Added reference to the go routine diagram
* Initial commit
* cleanup
* Undo testing_logger change, committed by mistake
* Fix the test loggers
* pulled some fsm code into pool.go
* added pool tests
* changes to the design
added block requests under peer
moved the request trigger in the reactor poolRoutine, triggered now by a ticker
in general moved everything required for making block requests smarter in the poolRoutine
added a simple map of heights to keep track of what will need to be requested next
added a few more tests
* send errors to FSM in a different channel than blocks
send errors (RemovePeer) from switch on a different channel than the
one receiving blocks
renamed channels
added more pool tests
* more pool tests
* lint errors
* more tests
* more tests
* switch fast sync to new implementation
* fixed data race in tests
* cleanup
* finished fsm tests
* address golangci comments :)
* address golangci comments :)
* Added timeout on next block needed to advance
* updating docs and cleanup
* fix issue in test from previous cleanup
* cleanup
* Added termination scenarios, tests and more cleanup
* small fixes to adr, comments and cleanup
* Fix bug in sendRequest()
If we tried to send a request to a peer not present in the switch, a
missing continue statement caused the request to be blackholed in a peer
that was removed and never retried.
While this bug was manifesting, the reactor kept asking for other
blocks that would be stored and never consumed. Added the number of
unconsumed blocks in the math for requesting blocks ahead of current
processing height so eventually there will be no more blocks requested
until the already received ones are consumed.
* remove bpPeer's didTimeout field
* Use distinct err codes for peer timeout and FSM timeouts
* Don't allow peers to update with lower height
* review comments from Ethan and Zarko
* some cleanup, renaming, comments
* Move block execution in separate goroutine
* Remove pool's numPending
* review comments
* fix lint, remove old blockchain reactor and duplicates in fsm tests
* small reorg around peer after review comments
* add the reactor spec
* verify block only once
* review comments
* change to int for max number of pending requests
* cleanup and godoc
* Add configuration flag fast sync version
* golangci fixes
* fix config template
* move both reactor versions under blockchain
* cleanup, golint, renaming stuff
* updated documentation, fixed more golint warnings
* integrate with behavior package
* sync with master
* gofmt
* add changelog_pending entry
* move to improvments
* suggestion to changelog entry
5 years ago blockchain: Reorg reactor (#3561)
* go routines in blockchain reactor
* Added reference to the go routine diagram
* Initial commit
* cleanup
* Undo testing_logger change, committed by mistake
* Fix the test loggers
* pulled some fsm code into pool.go
* added pool tests
* changes to the design
added block requests under peer
moved the request trigger in the reactor poolRoutine, triggered now by a ticker
in general moved everything required for making block requests smarter in the poolRoutine
added a simple map of heights to keep track of what will need to be requested next
added a few more tests
* send errors to FSM in a different channel than blocks
send errors (RemovePeer) from switch on a different channel than the
one receiving blocks
renamed channels
added more pool tests
* more pool tests
* lint errors
* more tests
* more tests
* switch fast sync to new implementation
* fixed data race in tests
* cleanup
* finished fsm tests
* address golangci comments :)
* address golangci comments :)
* Added timeout on next block needed to advance
* updating docs and cleanup
* fix issue in test from previous cleanup
* cleanup
* Added termination scenarios, tests and more cleanup
* small fixes to adr, comments and cleanup
* Fix bug in sendRequest()
If we tried to send a request to a peer not present in the switch, a
missing continue statement caused the request to be blackholed in a peer
that was removed and never retried.
While this bug was manifesting, the reactor kept asking for other
blocks that would be stored and never consumed. Added the number of
unconsumed blocks in the math for requesting blocks ahead of current
processing height so eventually there will be no more blocks requested
until the already received ones are consumed.
* remove bpPeer's didTimeout field
* Use distinct err codes for peer timeout and FSM timeouts
* Don't allow peers to update with lower height
* review comments from Ethan and Zarko
* some cleanup, renaming, comments
* Move block execution in separate goroutine
* Remove pool's numPending
* review comments
* fix lint, remove old blockchain reactor and duplicates in fsm tests
* small reorg around peer after review comments
* add the reactor spec
* verify block only once
* review comments
* change to int for max number of pending requests
* cleanup and godoc
* Add configuration flag fast sync version
* golangci fixes
* fix config template
* move both reactor versions under blockchain
* cleanup, golint, renaming stuff
* updated documentation, fixed more golint warnings
* integrate with behavior package
* sync with master
* gofmt
* add changelog_pending entry
* move to improvments
* suggestion to changelog entry
5 years ago |
|
- ---
- order: 3
- ---
-
- # Configuration
-
- Tendermint Core can be configured via a TOML file in
- `$TMHOME/config/config.toml`. Some of these parameters can be overridden by
- command-line flags. For most users, the options in the `##### main base configuration options #####` are intended to be modified while config options
- further below are intended for advance power users.
-
- ## Options
-
- The default configuration file create by `tendermint init` has all
- the parameters set with their default values. It will look something
- like the file below, however, double check by inspecting the
- `config.toml` created with your version of `tendermint` installed:
-
- ```toml
- # This is a TOML config file.
- # For more information, see https://github.com/toml-lang/toml
-
- # NOTE: Any path below can be absolute (e.g. "/var/myawesomeapp/data") or
- # relative to the home directory (e.g. "data"). The home directory is
- # "$HOME/.tendermint" by default, but could be changed via $TMHOME env variable
- # or --home cmd flag.
-
- #######################################################################
- ### Main Base Config Options ###
- #######################################################################
-
- # TCP or UNIX socket address of the ABCI application,
- # or the name of an ABCI application compiled in with the Tendermint binary
- proxy_app = "tcp://127.0.0.1:26658"
-
- # A custom human readable name for this node
- moniker = "anonymous"
-
- # If this node is many blocks behind the tip of the chain, FastSync
- # allows them to catchup quickly by downloading blocks in parallel
- # and verifying their commits
- fast_sync = true
-
- # Database backend: goleveldb | cleveldb | boltdb | rocksdb | badgerdb
- # * goleveldb (github.com/syndtr/goleveldb - most popular implementation)
- # - pure go
- # - stable
- # * cleveldb (uses levigo wrapper)
- # - fast
- # - requires gcc
- # - use cleveldb build tag (go build -tags cleveldb)
- # * boltdb (uses etcd's fork of bolt - github.com/etcd-io/bbolt)
- # - EXPERIMENTAL
- # - may be faster is some use-cases (random reads - indexer)
- # - use boltdb build tag (go build -tags boltdb)
- # * rocksdb (uses github.com/tecbot/gorocksdb)
- # - EXPERIMENTAL
- # - requires gcc
- # - use rocksdb build tag (go build -tags rocksdb)
- # * badgerdb (uses github.com/dgraph-io/badger)
- # - EXPERIMENTAL
- # - use badgerdb build tag (go build -tags badgerdb)
- db_backend = "goleveldb"
-
- # Database directory
- db_dir = "data"
-
- # Output level for logging, including package level options
- log_level = "main:info,state:info,statesync:info,*:error"
-
- # Output format: 'plain' (colored text) or 'json'
- log_format = "plain"
-
- ##### additional base config options #####
-
- # Path to the JSON file containing the initial validator set and other meta data
- genesis_file = "config/genesis.json"
-
- # Path to the JSON file containing the private key to use as a validator in the consensus protocol
- priv_validator_key_file = "config/priv_validator_key.json"
-
- # Path to the JSON file containing the last sign state of a validator
- priv_validator_state_file = "data/priv_validator_state.json"
-
- # TCP or UNIX socket address for Tendermint to listen on for
- # connections from an external PrivValidator process
- priv_validator_laddr = ""
-
- # Path to the JSON file containing the private key to use for node authentication in the p2p protocol
- node_key_file = "config/node_key.json"
-
- # Mechanism to connect to the ABCI application: socket | grpc
- abci = "socket"
-
- # If true, query the ABCI app on connecting to a new peer
- # so the app can decide if we should keep the connection or not
- filter_peers = false
-
-
- #######################################################################
- ### Advanced Configuration Options ###
- #######################################################################
-
- #######################################################
- ### RPC Server Configuration Options ###
- #######################################################
- [rpc]
-
- # TCP or UNIX socket address for the RPC server to listen on
- laddr = "tcp://127.0.0.1:26657"
-
- # A list of origins a cross-domain request can be executed from
- # Default value '[]' disables cors support
- # Use '["*"]' to allow any origin
- cors_allowed_origins = []
-
- # A list of methods the client is allowed to use with cross-domain requests
- cors_allowed_methods = ["HEAD", "GET", "POST", ]
-
- # A list of non simple headers the client is allowed to use with cross-domain requests
- cors_allowed_headers = ["Origin", "Accept", "Content-Type", "X-Requested-With", "X-Server-Time", ]
-
- # TCP or UNIX socket address for the gRPC server to listen on
- # NOTE: This server only supports /broadcast_tx_commit
- grpc_laddr = ""
-
- # Maximum number of simultaneous connections.
- # Does not include RPC (HTTP&WebSocket) connections. See max_open_connections
- # If you want to accept a larger number than the default, make sure
- # you increase your OS limits.
- # 0 - unlimited.
- # Should be < {ulimit -Sn} - {MaxNumInboundPeers} - {MaxNumOutboundPeers} - {N of wal, db and other open files}
- # 1024 - 40 - 10 - 50 = 924 = ~900
- grpc_max_open_connections = 900
-
- # Activate unsafe RPC commands like /dial_seeds and /unsafe_flush_mempool
- unsafe = false
-
- # Maximum number of simultaneous connections (including WebSocket).
- # Does not include gRPC connections. See grpc_max_open_connections
- # If you want to accept a larger number than the default, make sure
- # you increase your OS limits.
- # 0 - unlimited.
- # Should be < {ulimit -Sn} - {MaxNumInboundPeers} - {MaxNumOutboundPeers} - {N of wal, db and other open files}
- # 1024 - 40 - 10 - 50 = 924 = ~900
- max_open_connections = 900
-
- # Maximum number of unique clientIDs that can /subscribe
- # If you're using /broadcast_tx_commit, set to the estimated maximum number
- # of broadcast_tx_commit calls per block.
- max_subscription_clients = 100
-
- # Maximum number of unique queries a given client can /subscribe to
- # If you're using GRPC (or Local RPC client) and /broadcast_tx_commit, set to
- # the estimated # maximum number of broadcast_tx_commit calls per block.
- max_subscriptions_per_client = 5
-
- # How long to wait for a tx to be committed during /broadcast_tx_commit.
- # WARNING: Using a value larger than 10s will result in increasing the
- # global HTTP write timeout, which applies to all connections and endpoints.
- # See https://github.com/tendermint/tendermint/issues/3435
- timeout_broadcast_tx_commit = "10s"
-
- # Maximum size of request body, in bytes
- max_body_bytes = 1000000
-
- # Maximum size of request header, in bytes
- max_header_bytes = 1048576
-
- # The path to a file containing certificate that is used to create the HTTPS server.
- # Migth be either absolute path or path related to tendermint's config directory.
- # If the certificate is signed by a certificate authority,
- # the certFile should be the concatenation of the server's certificate, any intermediates,
- # and the CA's certificate.
- # NOTE: both tls_cert_file and tls_key_file must be present for Tendermint to create HTTPS server.
- # Otherwise, HTTP server is run.
- tls_cert_file = ""
-
- # The path to a file containing matching private key that is used to create the HTTPS server.
- # Migth be either absolute path or path related to tendermint's config directory.
- # NOTE: both tls_cert_file and tls_key_file must be present for Tendermint to create HTTPS server.
- # Otherwise, HTTP server is run.
- tls_key_file = ""
-
- # pprof listen address (https://golang.org/pkg/net/http/pprof)
- pprof_laddr = ""
-
- #######################################################
- ### P2P Configuration Options ###
- #######################################################
- [p2p]
-
- # Address to listen for incoming connections
- laddr = "tcp://0.0.0.0:26656"
-
- # Address to advertise to peers for them to dial
- # If empty, will use the same port as the laddr,
- # and will introspect on the listener or use UPnP
- # to figure out the address.
- external_address = ""
-
- # Comma separated list of seed nodes to connect to
- seeds = ""
-
- # Comma separated list of nodes to keep persistent connections to
- persistent_peers = ""
-
- # UPNP port forwarding
- upnp = false
-
- # Path to address book
- addr_book_file = "config/addrbook.json"
-
- # Set true for strict address routability rules
- # Set false for private or local networks
- addr_book_strict = true
-
- # Maximum number of inbound peers
- max_num_inbound_peers = 40
-
- # Maximum number of outbound peers to connect to, excluding persistent peers
- max_num_outbound_peers = 10
-
- # List of node IDs, to which a connection will be (re)established ignoring any existing limits
- unconditional_peer_ids = ""
-
- # Maximum pause when redialing a persistent peer (if zero, exponential backoff is used)
- persistent_peers_max_dial_period = "0s"
-
- # Time to wait before flushing messages out on the connection
- flush_throttle_timeout = "100ms"
-
- # Maximum size of a message packet payload, in bytes
- max_packet_msg_payload_size = 1024
-
- # Rate at which packets can be sent, in bytes/second
- send_rate = 5120000
-
- # Rate at which packets can be received, in bytes/second
- recv_rate = 5120000
-
- # Set true to enable the peer-exchange reactor
- pex = true
-
- # Seed mode, in which node constantly crawls the network and looks for
- # peers. If another node asks it for addresses, it responds and disconnects.
- #
- # Does not work if the peer-exchange reactor is disabled.
- seed_mode = false
-
- # Comma separated list of peer IDs to keep private (will not be gossiped to other peers)
- private_peer_ids = ""
-
- # Toggle to disable guard against peers connecting from the same ip.
- allow_duplicate_ip = false
-
- # Peer connection configuration.
- handshake_timeout = "20s"
- dial_timeout = "3s"
-
- #######################################################
- ### Mempool Configurattion Option ###
- #######################################################
- [mempool]
-
- recheck = true
- broadcast = true
- wal_dir = ""
-
- # Maximum number of transactions in the mempool
- size = 5000
-
- # Limit the total size of all txs in the mempool.
- # This only accounts for raw transactions (e.g. given 1MB transactions and
- # max_txs_bytes=5MB, mempool will only accept 5 transactions).
- max_txs_bytes = 1073741824
-
- # Size of the cache (used to filter transactions we saw earlier) in transactions
- cache_size = 10000
-
- # Do not remove invalid transactions from the cache (default: false)
- # Set to true if it's not possible for any invalid transaction to become valid
- # again in the future.
- keep-invalid-txs-in-cache = false
-
- # Maximum size of a single transaction.
- # NOTE: the max size of a tx transmitted over the network is {max_tx_bytes}.
- max_tx_bytes = 1048576
-
- # Maximum size of a batch of transactions to send to a peer
- # Including space needed by encoding (one varint per transaction).
- # XXX: Unused due to https://github.com/tendermint/tendermint/issues/5796
- max_batch_bytes = 10485760
-
- #######################################################
- ### State Sync Configuration Options ###
- #######################################################
- [statesync]
- # State sync rapidly bootstraps a new node by discovering, fetching, and restoring a state machine
- # snapshot from peers instead of fetching and replaying historical blocks. Requires some peers in
- # the network to take and serve state machine snapshots. State sync is not attempted if the node
- # has any local state (LastBlockHeight > 0). The node will have a truncated block history,
- # starting from the height of the snapshot.
- enable = false
-
- # RPC servers (comma-separated) for light client verification of the synced state machine and
- # retrieval of state data for node bootstrapping. Also needs a trusted height and corresponding
- # header hash obtained from a trusted source, and a period during which validators can be trusted.
- #
- # For Cosmos SDK-based chains, trust_period should usually be about 2/3 of the unbonding time (~2
- # weeks) during which they can be financially punished (slashed) for misbehavior.
- rpc_servers = ""
- trust_height = 0
- trust_hash = ""
- trust_period = "0s"
-
- # Temporary directory for state sync snapshot chunks, defaults to the OS tempdir (typically /tmp).
- # Will create a new, randomly named directory within, and remove it when done.
- temp_dir = ""
-
- #######################################################
- ### Fast Sync Configuration Connections ###
- #######################################################
- [fastsync]
-
- # Fast Sync version to use:
- # 1) "v0" (default) - the legacy fast sync implementation
- # 2) "v1" - refactor of v0 version for better testability
- # 2) "v2" - complete redesign of v0, optimized for testability & readability
- version = "v0"
-
- #######################################################
- ### Consensus Configuration Options ###
- #######################################################
- [consensus]
-
- wal_file = "data/cs.wal/wal"
-
- # How long we wait for a proposal block before prevoting nil
- timeout_propose = "3s"
- # How much timeout_propose increases with each round
- timeout_propose_delta = "500ms"
- # How long we wait after receiving +2/3 prevotes for “anything” (ie. not a single block or nil)
- timeout_prevote = "1s"
- # How much the timeout_prevote increases with each round
- timeout_prevote_delta = "500ms"
- # How long we wait after receiving +2/3 precommits for “anything” (ie. not a single block or nil)
- timeout_precommit = "1s"
- # How much the timeout_precommit increases with each round
- timeout_precommit_delta = "500ms"
- # How long we wait after committing a block, before starting on the new
- # height (this gives us a chance to receive some more precommits, even
- # though we already have +2/3).
- timeout_commit = "1s"
-
- # How many blocks to look back to check existence of the node's consensus votes before joining consensus
- # When non-zero, the node will panic upon restart
- # if the same consensus key was used to sign {double_sign_check_height} last blocks.
- # So, validators should stop the state machine, wait for some blocks, and then restart the state machine to avoid panic.
- double_sign_check_height = 0
-
- # Make progress as soon as we have all the precommits (as if TimeoutCommit = 0)
- skip_timeout_commit = false
-
- # EmptyBlocks mode and possible interval between empty blocks
- create_empty_blocks = true
- create_empty_blocks_interval = "0s"
-
- # Reactor sleep duration parameters
- peer_gossip_sleep_duration = "100ms"
- peer_query_maj23_sleep_duration = "2s"
-
- #######################################################
- ### Transaction Indexer Configuration Options ###
- #######################################################
- [tx_index]
-
- # What indexer to use for transactions
- #
- # The application will set which txs to index. In some cases a node operator will be able
- # to decide which txs to index based on configuration set in the application.
- #
- # Options:
- # 1) "null"
- # 2) "kv" (default) - the simplest possible indexer, backed by key-value storage (defaults to levelDB; see DBBackend).
- # - When "kv" is chosen "tx.height" and "tx.hash" will always be indexed.
- indexer = "kv"
-
- #######################################################
- ### Instrumentation Configuration Options ###
- #######################################################
- [instrumentation]
-
- # When true, Prometheus metrics are served under /metrics on
- # PrometheusListenAddr.
- # Check out the documentation for the list of available metrics.
- prometheus = false
-
- # Address to listen for Prometheus collector(s) connections
- prometheus_listen_addr = ":26660"
-
- # Maximum number of simultaneous connections.
- # If you want to accept a larger number than the default, make sure
- # you increase your OS limits.
- # 0 - unlimited.
- max_open_connections = 3
-
- # Instrumentation namespace
- namespace = "tendermint"
-
- ```
-
- ## Empty blocks VS no empty blocks
-
- ### create_empty_blocks = true
-
- If `create_empty_blocks` is set to `true` in your config, blocks will be
- created ~ every second (with default consensus parameters). You can regulate
- the delay between blocks by changing the `timeout_commit`. E.g. `timeout_commit = "10s"` should result in ~ 10 second blocks.
-
- ### create_empty_blocks = false
-
- In this setting, blocks are created when transactions received.
-
- Note after the block H, Tendermint creates something we call a "proof block"
- (only if the application hash changed) H+1. The reason for this is to support
- proofs. If you have a transaction in block H that changes the state to X, the
- new application hash will only be included in block H+1. If after your
- transaction is committed, you want to get a light-client proof for the new state
- (X), you need the new block to be committed in order to do that because the new
- block has the new application hash for the state X. That's why we make a new
- (empty) block if the application hash changes. Otherwise, you won't be able to
- make a proof for the new state.
-
- Plus, if you set `create_empty_blocks_interval` to something other than the
- default (`0`), Tendermint will be creating empty blocks even in the absence of
- transactions every `create_empty_blocks_interval`. For instance, with
- `create_empty_blocks = false` and `create_empty_blocks_interval = "30s"`,
- Tendermint will only create blocks if there are transactions, or after waiting
- 30 seconds without receiving any transactions.
-
- ## Consensus timeouts explained
-
- There's a variety of information about timeouts in [Running in
- production](./running-in-production.md)
-
- You can also find more detailed technical explanation in the spec: [The latest
- gossip on BFT consensus](https://arxiv.org/abs/1807.04938).
-
- ```toml
- [consensus]
- ...
-
- timeout_propose = "3s"
- timeout_propose_delta = "500ms"
- timeout_prevote = "1s"
- timeout_prevote_delta = "500ms"
- timeout_precommit = "1s"
- timeout_precommit_delta = "500ms"
- timeout_commit = "1s"
- ```
-
- Note that in a successful round, the only timeout that we absolutely wait no
- matter what is `timeout_commit`.
-
- Here's a brief summary of the timeouts:
-
- - `timeout_propose` = how long we wait for a proposal block before prevoting
- nil
- - `timeout_propose_delta` = how much timeout_propose increases with each round
- - `timeout_prevote` = how long we wait after receiving +2/3 prevotes for
- anything (ie. not a single block or nil)
- - `timeout_prevote_delta` = how much the timeout_prevote increases with each
- round
- - `timeout_precommit` = how long we wait after receiving +2/3 precommits for
- anything (ie. not a single block or nil)
- - `timeout_precommit_delta` = how much the timeout_precommit increases with
- each round
- - `timeout_commit` = how long we wait after committing a block, before starting
- on the new height (this gives us a chance to receive some more precommits,
- even though we already have +2/3)
|