From 33ec8cb6092189e163fd9bae4b7e8439daa4b51f Mon Sep 17 00:00:00 2001 From: Anton Kaliaev Date: Tue, 22 May 2018 16:55:29 +0400 Subject: [PATCH 01/10] document logging Refs #1494 --- docs/running-in-production.rst | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) create mode 100644 docs/running-in-production.rst diff --git a/docs/running-in-production.rst b/docs/running-in-production.rst new file mode 100644 index 000000000..d9bbe33b9 --- /dev/null +++ b/docs/running-in-production.rst @@ -0,0 +1,20 @@ +Running in production +===================== + +Logging +------- + +Default logging level (``main:info,state:info,*:``) should suffice for normal +operation mode. Read `this post +__` +for details on how to configure ``log_level`` config variable. Some of the +modules can be found `here <./how-to-read-logs.html#list-of-modules>__`. + +If you're trying to debug Tendermint or asked to provide logs with debug +logging level, you can do so by running tendermint with +``--log_level="*:debug"``. + +Consensus WAL +------------- + +Consensus module writes every message to the WAL (write-ahead log). From 83c6f2864df90d118c9626ed5d5f1e9ae9b68538 Mon Sep 17 00:00:00 2001 From: Anton Kaliaev Date: Wed, 23 May 2018 10:29:44 +0400 Subject: [PATCH 02/10] document the consensus WAL Refs #1494 --- docs/running-in-production.rst | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/docs/running-in-production.rst b/docs/running-in-production.rst index d9bbe33b9..2e435f830 100644 --- a/docs/running-in-production.rst +++ b/docs/running-in-production.rst @@ -18,3 +18,33 @@ Consensus WAL ------------- Consensus module writes every message to the WAL (write-ahead log). + +It also issues fsync syscall through `File#Sync +__` for messages signed by this node (to +prevent double signing). + +Under the hood, it uses `autofile.Group +__`, which +rotates files when those get too big (> 10MB). + +The total maximum size is 1GB. We only need the latest block and the block before it, +but if the former is dragging on across many rounds, we want all those rounds. + +Replay +~~~~~~ + +Consensus module will replay all the messages of the last height written to WAL +before a crash (if such occurs). + +The private validator may try to sign messages during replay because it runs +somewhat autonomously and does not know about replay process. + +For example, if we got all the way to precommit in the WAL and then crash, +after we replay the proposal message, the private validator will try to sign a +prevote. But it will fail. That's ok because we’ll see the prevote later in the +WAL. Then it will go to precommit, and that time it will work because the +private validator contains the ``LastSignBytes`` and then we’ll replay the +precommit from the WAL. + +Make sure to read about `WAL corruption +<./specification/corruption.html#wal-corruption>__` and recovery strategies. From e0d4fe2dba27b517d9875224c227b0ac09f3323c Mon Sep 17 00:00:00 2001 From: Anton Kaliaev Date: Wed, 23 May 2018 12:41:47 +0400 Subject: [PATCH 03/10] document DOS exposure and mitigation Refs #1494 --- docs/running-in-production.rst | 84 ++++++++++++++++++++++++++++++++++ 1 file changed, 84 insertions(+) diff --git a/docs/running-in-production.rst b/docs/running-in-production.rst index 2e435f830..5fe4684b2 100644 --- a/docs/running-in-production.rst +++ b/docs/running-in-production.rst @@ -48,3 +48,87 @@ precommit from the WAL. Make sure to read about `WAL corruption <./specification/corruption.html#wal-corruption>__` and recovery strategies. + +DOS Exposure and Mitigation +--------------------------- + +Validators are supposed to setup `Sentry Node Architecture +__` +to prevent Denial-of-service attacks. You can read more about it `here +__`. + +Blockchain Reactor +~~~~~~~~~~~~~~~~~~ + +Defines ``maxMsgSize`` for the maximum size of incoming messages, +``SendQueueCapacity`` and ``RecvBufferCapacity`` for maximum sending and +receiving buffers respectively. These are supposed to prevent amplification +attacks by setting up the upper limit on how much data we can receive & send to +a peer. + +Sending incorrectly encoded data will result in stopping the peer. + +Consensus Reactor +~~~~~~~~~~~~~~~~~ + +Defines 4 channels: state, data, vote and vote_set_bits. Each channel +has ``SendQueueCapacity`` and ``RecvBufferCapacity`` and +``RecvMessageCapacity`` set to ``maxMsgSize``. + +Sending incorrectly encoded data will result in stopping the peer. + +Evidence Reactor +~~~~~~~~~~~~~~~~ + +`#1503 __` + +Sending invalid evidence will result in stopping the peer. + +Sending incorrectly encoded data or data exceeding ``maxMsgSize`` will result +in stopping the peer. + +PEX Reactor +~~~~~~~~~~~ + +Defines only ``SendQueueCapacity``. `#1503 __` + +Implements rate-limiting by enforcing minimal time between two consecutive +``pexRequestMessage`` requests. If the peer sends us addresses we did not ask, +it is stopped. + +Sending incorrectly encoded data or data exceeding ``maxMsgSize`` will result +in stopping the peer. + +Mempool Reactor +~~~~~~~~~~~~~~~ + +`#1503 __` + +Mempool maintains a cache of the last 10000 transactions to prevent replaying +old transactions (plus transactions coming from other validators, who are +continually exchanging transactions). Read `Replay Protection +<./app-development.html#replay-protection>` for details. + +Sending incorrectly encoded data or data exceeding ``maxMsgSize`` will result +in stopping the peer. + +P2P +~~~ + +The core of the Tendermint peer-to-peer system is ``MConnection``. Each +connection has ``MaxPacketMsgPayloadSize``, which is the maximum packet size +and bounded send & receive queues. One can impose restrictions on send & +receive rate per connection (``SendRate``, ``RecvRate``). + +RPC +~~~ + +Endpoints returning multiple entries are limited by default to return 30 +elements (100 max). + +Rate-limiting and authentication are another key aspects to help protect +against DOS attacks. While in the future we may implement these features, for +now, validators are supposed to use external tools like `NGINX +__` or `traefik +__` to archive +the same things. From 82ded582f2e5e72f38ca9c85cd10e6a4150d95c8 Mon Sep 17 00:00:00 2001 From: Anton Kaliaev Date: Mon, 28 May 2018 15:51:59 +0400 Subject: [PATCH 04/10] [docs] debugging/monitoring sections, restart handling Refs #1494 --- docs/running-in-production.rst | 55 ++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/docs/running-in-production.rst b/docs/running-in-production.rst index 5fe4684b2..ec3b46db6 100644 --- a/docs/running-in-production.rst +++ b/docs/running-in-production.rst @@ -132,3 +132,58 @@ now, validators are supposed to use external tools like `NGINX __` or `traefik __` to archive the same things. + +Debugging Tendermint +-------------------- + +If you ever have to debug Tendermint, the first thing you should probably do is +to check out the logs. See `"How to read logs" <./how-to-read-logs.html>__`, +where we explain what certain log statements mean. + +If, after skimming through the logs, things are not clear still, the second +TODO is to query the `/status` RPC endpoint. It provides the necessary info: +whenever the node is syncing or not, what height it is on, etc. + +``` +$ curl http(s)://{ip}:{rpcPort}/status +``` + +`/dump_consensus_state` will give you a detailed overview of the consensus +state (proposer, lastest validators, peers states). From it, you should be able +to figure out why, for example, the network had halted. + +``` +$ curl http(s)://{ip}:{rpcPort}/dump_consensus_state +``` + +There is a reduced version of this endpoint - `/consensus_state`, which +returns just the votes seen at the current height. + +- `Github Issues __` +- `StackOverflow questions __` + +Monitoring Tendermint +--------------------- + +Each Tendermint instance has a standard `/health` RPC endpoint, which responds +with 200 (OK) if everything is fine and 500 (or no response) - if something is +wrong. + +Other useful endpoints include mentioned earlier `/status`, `/net_info` and +`/validators`. + +We have a small tool, called tm-monitor, which outputs information from the +endpoints above plus some statistics. The tool can be found `here +__`. + +What happens when my app die? +----------------------------- + +You are supposed to run Tendermint under a `process supervisor +__` (like systemd or runit). +It will ensure Tendermint is always running (despite possible errors). + +Getting back to the original question, if your application dies, Tendermint +will panic. After a process supervisor restarts your application, Tendermint +should be able to reconnect successfully. The order of restart does not matter +for it. From b542dce2e1cd4e0c98a6f88a66e69acb6d9fcedc Mon Sep 17 00:00:00 2001 From: Anton Kaliaev Date: Mon, 28 May 2018 16:20:15 +0400 Subject: [PATCH 05/10] [docs] signal handling Refs #1494 --- docs/running-in-production.rst | 49 +++++++++++++++++++--------------- 1 file changed, 27 insertions(+), 22 deletions(-) diff --git a/docs/running-in-production.rst b/docs/running-in-production.rst index ec3b46db6..c2af48cad 100644 --- a/docs/running-in-production.rst +++ b/docs/running-in-production.rst @@ -6,13 +6,11 @@ Logging Default logging level (``main:info,state:info,*:``) should suffice for normal operation mode. Read `this post -__` +`__ for details on how to configure ``log_level`` config variable. Some of the -modules can be found `here <./how-to-read-logs.html#list-of-modules>__`. - -If you're trying to debug Tendermint or asked to provide logs with debug -logging level, you can do so by running tendermint with -``--log_level="*:debug"``. +modules can be found `here <./how-to-read-logs.html#list-of-modules>`__. If +you're trying to debug Tendermint or asked to provide logs with debug logging +level, you can do so by running tendermint with ``--log_level="*:debug"``. Consensus WAL ------------- @@ -20,11 +18,11 @@ Consensus WAL Consensus module writes every message to the WAL (write-ahead log). It also issues fsync syscall through `File#Sync -__` for messages signed by this node (to +`__ for messages signed by this node (to prevent double signing). Under the hood, it uses `autofile.Group -__`, which +`__, which rotates files when those get too big (> 10MB). The total maximum size is 1GB. We only need the latest block and the block before it, @@ -47,15 +45,15 @@ private validator contains the ``LastSignBytes`` and then we’ll replay the precommit from the WAL. Make sure to read about `WAL corruption -<./specification/corruption.html#wal-corruption>__` and recovery strategies. +<./specification/corruption.html#wal-corruption>`__ and recovery strategies. DOS Exposure and Mitigation --------------------------- Validators are supposed to setup `Sentry Node Architecture -__` +`__ to prevent Denial-of-service attacks. You can read more about it `here -__`. +`__. Blockchain Reactor ~~~~~~~~~~~~~~~~~~ @@ -80,7 +78,7 @@ Sending incorrectly encoded data will result in stopping the peer. Evidence Reactor ~~~~~~~~~~~~~~~~ -`#1503 __` +`#1503 `__ Sending invalid evidence will result in stopping the peer. @@ -90,7 +88,7 @@ in stopping the peer. PEX Reactor ~~~~~~~~~~~ -Defines only ``SendQueueCapacity``. `#1503 __` +Defines only ``SendQueueCapacity``. `#1503 `__ Implements rate-limiting by enforcing minimal time between two consecutive ``pexRequestMessage`` requests. If the peer sends us addresses we did not ask, @@ -102,12 +100,12 @@ in stopping the peer. Mempool Reactor ~~~~~~~~~~~~~~~ -`#1503 __` +`#1503 `__ Mempool maintains a cache of the last 10000 transactions to prevent replaying old transactions (plus transactions coming from other validators, who are continually exchanging transactions). Read `Replay Protection -<./app-development.html#replay-protection>` for details. +<./app-development.html#replay-protection>`__ for details. Sending incorrectly encoded data or data exceeding ``maxMsgSize`` will result in stopping the peer. @@ -129,15 +127,15 @@ elements (100 max). Rate-limiting and authentication are another key aspects to help protect against DOS attacks. While in the future we may implement these features, for now, validators are supposed to use external tools like `NGINX -__` or `traefik -__` to archive +`__ or `traefik +`__ to archive the same things. Debugging Tendermint -------------------- If you ever have to debug Tendermint, the first thing you should probably do is -to check out the logs. See `"How to read logs" <./how-to-read-logs.html>__`, +to check out the logs. See `"How to read logs" <./how-to-read-logs.html>`__, where we explain what certain log statements mean. If, after skimming through the logs, things are not clear still, the second @@ -159,8 +157,8 @@ $ curl http(s)://{ip}:{rpcPort}/dump_consensus_state There is a reduced version of this endpoint - `/consensus_state`, which returns just the votes seen at the current height. -- `Github Issues __` -- `StackOverflow questions __` +- `Github Issues `__ +- `StackOverflow questions `__ Monitoring Tendermint --------------------- @@ -174,16 +172,23 @@ Other useful endpoints include mentioned earlier `/status`, `/net_info` and We have a small tool, called tm-monitor, which outputs information from the endpoints above plus some statistics. The tool can be found `here -__`. +`__. What happens when my app die? ----------------------------- You are supposed to run Tendermint under a `process supervisor -__` (like systemd or runit). +`__ (like systemd or runit). It will ensure Tendermint is always running (despite possible errors). Getting back to the original question, if your application dies, Tendermint will panic. After a process supervisor restarts your application, Tendermint should be able to reconnect successfully. The order of restart does not matter for it. + +Signal handling +--------------- + +We catch SIGINT and SIGTERM and try to clean up nicely. For other signals we +use the default behaviour in Go: `Default behavior of signals in Go programs +`__. From 2a517ac98c7556498aed5446235b438cfd1f5166 Mon Sep 17 00:00:00 2001 From: Anton Kaliaev Date: Tue, 29 May 2018 14:31:28 +0400 Subject: [PATCH 06/10] hardware specs and configuration params Refs #1494 --- docs/running-in-production.rst | 49 ++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/docs/running-in-production.rst b/docs/running-in-production.rst index c2af48cad..04a8abffa 100644 --- a/docs/running-in-production.rst +++ b/docs/running-in-production.rst @@ -192,3 +192,52 @@ Signal handling We catch SIGINT and SIGTERM and try to clean up nicely. For other signals we use the default behaviour in Go: `Default behavior of signals in Go programs `__. + +Hardware +-------- + +Processor and Memory +~~~~~~~~~~~~~~~~~~~~ + +While actual specs vary depending on the load and validators count, minimal requirements are: + +- 1GB RAM +- 25GB of disk space +- 1.4 GHz CPU + +SSD disks are preffereble for applications with high transaction throughput. + +Recommended: + +- 2GB RAM +- 100GB SSD +- x64 2.0 GHz 2v CPU + +Operating Systems +~~~~~~~~~~~~~~~~~ + +Tendermint can be compiled for a wide range of operating systems thanks to Go +language (the list of $OS/$ARCH pairs can be found `here +`__). + +While we do not favor any operation system, more secure and stable Linux server +distributions (like Centos) should be preferred over desktop operation systems +(like Mac OS). + +Misc. +~~~~~ + +NOTE: if you are going to use Tendermint in a public domain, make sure you read +`hardware recommendations (see "4. Hardware") +`__ for a validator in the Cosmos network. + +Configuration parameters +------------------------ + +- ``skip_timeout_commit`` + +We want skip_timeout_commit=false when there is economics on the line because +proposers should wait to hear for more votes. But if you don't care about that +and want the fastest consensus, you can skip it. So we will keep it false for +the hub and as default, but for enterprise applications, no problem to set to +true. From f7106bfb3915514b09439e0911ae90bb873c0dcc Mon Sep 17 00:00:00 2001 From: Anton Kaliaev Date: Tue, 29 May 2018 16:24:52 +0400 Subject: [PATCH 07/10] more config variables Refs #1494 --- docs/running-in-production.rst | 46 +++++++++++++++++++++++++++++++++- 1 file changed, 45 insertions(+), 1 deletion(-) diff --git a/docs/running-in-production.rst b/docs/running-in-production.rst index 04a8abffa..432710831 100644 --- a/docs/running-in-production.rst +++ b/docs/running-in-production.rst @@ -234,10 +234,54 @@ NOTE: if you are going to use Tendermint in a public domain, make sure you read Configuration parameters ------------------------ -- ``skip_timeout_commit`` +- ``p2p.flush_throttle_timeout`` + ``p2p.max_packet_msg_payload_size`` + ``p2p.send_rate`` + ``p2p.recv_rate`` + +If you are going to use Tendermint in a private domain and you have a private +high-speed network among your peers, it makes sense to lower flush throttle +timeout and increase other params. + +:: + + [p2p] + + send_rate=20000000 # 2MB/s + recv_rate=20000000 # 2MB/s + flush_throttle_timeout=10 + max_packet_msg_payload_size=10240 # 10KB + +- ``mempool.recheck`` + +After every block, Tendermint rechecks every transaction left in the mempool to +see if transactions committed in that block affected the application state, so +some of the transactions left may become invalid. If that does not apply to +your application, you can disable it by setting ``mempool.recheck=false``. + +- ``mempool.broadcast`` + +Setting this to false will stop the mempool from relaying transactions to other +peers until they are included in a block. It means only the peer you send the +tx to will see it until it is included in a block. + +- ``consensus.skip_timeout_commit`` We want skip_timeout_commit=false when there is economics on the line because proposers should wait to hear for more votes. But if you don't care about that and want the fastest consensus, you can skip it. So we will keep it false for the hub and as default, but for enterprise applications, no problem to set to true. + +- ``consensus.peer_gossip_sleep_duration`` + +You can try to reduce the time node sleeps before checking if theres something to send its peers. + +- ``consensus.timeout_commit`` + +You can also try lowering ``timeout_commit`` (time we sleep before proposing the next block). + +- ``consensus.max_block_size_txs`` + +By default, the maximum number of transactions per a block is 10_000. Feel free +to change it to suit your needs. From 252a0a392b52e43d88d3bc53be888b97c6881d7c Mon Sep 17 00:00:00 2001 From: Anton Kaliaev Date: Tue, 29 May 2018 16:49:02 +0400 Subject: [PATCH 08/10] move reactor descriptions to relevant specs --- docs/running-in-production.rst | 90 ------------------- docs/spec/consensus/wal.md | 33 +++++++ docs/spec/reactors/block_sync/reactor.md | 10 +++ .../reactors/consensus/consensus-reactor.md | 8 ++ docs/spec/reactors/evidence/reactor.md | 10 +++ docs/spec/reactors/mempool/reactor.md | 14 +++ docs/spec/reactors/pex/reactor.md | 12 +++ 7 files changed, 87 insertions(+), 90 deletions(-) create mode 100644 docs/spec/consensus/wal.md create mode 100644 docs/spec/reactors/evidence/reactor.md create mode 100644 docs/spec/reactors/mempool/reactor.md create mode 100644 docs/spec/reactors/pex/reactor.md diff --git a/docs/running-in-production.rst b/docs/running-in-production.rst index 432710831..dedbd56d0 100644 --- a/docs/running-in-production.rst +++ b/docs/running-in-production.rst @@ -12,41 +12,6 @@ modules can be found `here <./how-to-read-logs.html#list-of-modules>`__. If you're trying to debug Tendermint or asked to provide logs with debug logging level, you can do so by running tendermint with ``--log_level="*:debug"``. -Consensus WAL -------------- - -Consensus module writes every message to the WAL (write-ahead log). - -It also issues fsync syscall through `File#Sync -`__ for messages signed by this node (to -prevent double signing). - -Under the hood, it uses `autofile.Group -`__, which -rotates files when those get too big (> 10MB). - -The total maximum size is 1GB. We only need the latest block and the block before it, -but if the former is dragging on across many rounds, we want all those rounds. - -Replay -~~~~~~ - -Consensus module will replay all the messages of the last height written to WAL -before a crash (if such occurs). - -The private validator may try to sign messages during replay because it runs -somewhat autonomously and does not know about replay process. - -For example, if we got all the way to precommit in the WAL and then crash, -after we replay the proposal message, the private validator will try to sign a -prevote. But it will fail. That's ok because we’ll see the prevote later in the -WAL. Then it will go to precommit, and that time it will work because the -private validator contains the ``LastSignBytes`` and then we’ll replay the -precommit from the WAL. - -Make sure to read about `WAL corruption -<./specification/corruption.html#wal-corruption>`__ and recovery strategies. - DOS Exposure and Mitigation --------------------------- @@ -55,61 +20,6 @@ Validators are supposed to setup `Sentry Node Architecture to prevent Denial-of-service attacks. You can read more about it `here `__. -Blockchain Reactor -~~~~~~~~~~~~~~~~~~ - -Defines ``maxMsgSize`` for the maximum size of incoming messages, -``SendQueueCapacity`` and ``RecvBufferCapacity`` for maximum sending and -receiving buffers respectively. These are supposed to prevent amplification -attacks by setting up the upper limit on how much data we can receive & send to -a peer. - -Sending incorrectly encoded data will result in stopping the peer. - -Consensus Reactor -~~~~~~~~~~~~~~~~~ - -Defines 4 channels: state, data, vote and vote_set_bits. Each channel -has ``SendQueueCapacity`` and ``RecvBufferCapacity`` and -``RecvMessageCapacity`` set to ``maxMsgSize``. - -Sending incorrectly encoded data will result in stopping the peer. - -Evidence Reactor -~~~~~~~~~~~~~~~~ - -`#1503 `__ - -Sending invalid evidence will result in stopping the peer. - -Sending incorrectly encoded data or data exceeding ``maxMsgSize`` will result -in stopping the peer. - -PEX Reactor -~~~~~~~~~~~ - -Defines only ``SendQueueCapacity``. `#1503 `__ - -Implements rate-limiting by enforcing minimal time between two consecutive -``pexRequestMessage`` requests. If the peer sends us addresses we did not ask, -it is stopped. - -Sending incorrectly encoded data or data exceeding ``maxMsgSize`` will result -in stopping the peer. - -Mempool Reactor -~~~~~~~~~~~~~~~ - -`#1503 `__ - -Mempool maintains a cache of the last 10000 transactions to prevent replaying -old transactions (plus transactions coming from other validators, who are -continually exchanging transactions). Read `Replay Protection -<./app-development.html#replay-protection>`__ for details. - -Sending incorrectly encoded data or data exceeding ``maxMsgSize`` will result -in stopping the peer. - P2P ~~~ diff --git a/docs/spec/consensus/wal.md b/docs/spec/consensus/wal.md new file mode 100644 index 000000000..a2e03137d --- /dev/null +++ b/docs/spec/consensus/wal.md @@ -0,0 +1,33 @@ +# WAL + +Consensus module writes every message to the WAL (write-ahead log). + +It also issues fsync syscall through +[File#Sync](https://golang.org/pkg/os/#File.Sync) for messages signed by this +node (to prevent double signing). + +Under the hood, it uses +[autofile.Group](https://godoc.org/github.com/tendermint/tmlibs/autofile#Group), +which rotates files when those get too big (> 10MB). + +The total maximum size is 1GB. We only need the latest block and the block before it, +but if the former is dragging on across many rounds, we want all those rounds. + +## Replay + +Consensus module will replay all the messages of the last height written to WAL +before a crash (if such occurs). + +The private validator may try to sign messages during replay because it runs +somewhat autonomously and does not know about replay process. + +For example, if we got all the way to precommit in the WAL and then crash, +after we replay the proposal message, the private validator will try to sign a +prevote. But it will fail. That's ok because we’ll see the prevote later in the +WAL. Then it will go to precommit, and that time it will work because the +private validator contains the `LastSignBytes` and then we’ll replay the +precommit from the WAL. + +Make sure to read about [WAL +corruption](https://tendermint.readthedocs.io/projects/tools/en/master/specification/corruption.html#wal-corruption) +and recovery strategies. diff --git a/docs/spec/reactors/block_sync/reactor.md b/docs/spec/reactors/block_sync/reactor.md index c00ea96f3..9a814bead 100644 --- a/docs/spec/reactors/block_sync/reactor.md +++ b/docs/spec/reactors/block_sync/reactor.md @@ -47,3 +47,13 @@ type bcStatusResponseMessage struct { ## Protocol TODO + +## Channels + +Defines `maxMsgSize` for the maximum size of incoming messages, +`SendQueueCapacity` and `RecvBufferCapacity` for maximum sending and +receiving buffers respectively. These are supposed to prevent amplification +attacks by setting up the upper limit on how much data we can receive & send to +a peer. + +Sending incorrectly encoded data will result in stopping the peer. diff --git a/docs/spec/reactors/consensus/consensus-reactor.md b/docs/spec/reactors/consensus/consensus-reactor.md index 21098dcac..0f03b44b7 100644 --- a/docs/spec/reactors/consensus/consensus-reactor.md +++ b/docs/spec/reactors/consensus/consensus-reactor.md @@ -342,3 +342,11 @@ It broadcasts `NewRoundStepMessage` or `CommitStepMessage` upon new round state broadcasting these messages does not depend on the PeerRoundState; it is sent on the StateChannel. Upon receiving VoteMessage it broadcasts `HasVoteMessage` message to its peers on the StateChannel. `ProposalHeartbeatMessage` is sent the same way on the StateChannel. + +## Channels + +Defines 4 channels: state, data, vote and vote_set_bits. Each channel +has `SendQueueCapacity` and `RecvBufferCapacity` and +`RecvMessageCapacity` set to `maxMsgSize`. + +Sending incorrectly encoded data will result in stopping the peer. diff --git a/docs/spec/reactors/evidence/reactor.md b/docs/spec/reactors/evidence/reactor.md new file mode 100644 index 000000000..efa63aa4c --- /dev/null +++ b/docs/spec/reactors/evidence/reactor.md @@ -0,0 +1,10 @@ +# Evidence Reactor + +## Channels + +[#1503](https://github.com/tendermint/tendermint/issues/1503) + +Sending invalid evidence will result in stopping the peer. + +Sending incorrectly encoded data or data exceeding `maxMsgSize` will result +in stopping the peer. diff --git a/docs/spec/reactors/mempool/reactor.md b/docs/spec/reactors/mempool/reactor.md new file mode 100644 index 000000000..2bdbd8951 --- /dev/null +++ b/docs/spec/reactors/mempool/reactor.md @@ -0,0 +1,14 @@ +# Mempool Reactor + +## Channels + +[#1503](https://github.com/tendermint/tendermint/issues/1503) + +Mempool maintains a cache of the last 10000 transactions to prevent +replaying old transactions (plus transactions coming from other +validators, who are continually exchanging transactions). Read [Replay +Protection](https://tendermint.readthedocs.io/projects/tools/en/master/app-development.html?#replay-protection) +for details. + +Sending incorrectly encoded data or data exceeding `maxMsgSize` will result +in stopping the peer. diff --git a/docs/spec/reactors/pex/reactor.md b/docs/spec/reactors/pex/reactor.md new file mode 100644 index 000000000..468f182cc --- /dev/null +++ b/docs/spec/reactors/pex/reactor.md @@ -0,0 +1,12 @@ +# PEX Reactor + +## Channels + +Defines only `SendQueueCapacity`. [#1503](https://github.com/tendermint/tendermint/issues/1503) + +Implements rate-limiting by enforcing minimal time between two consecutive +`pexRequestMessage` requests. If the peer sends us addresses we did not ask, +it is stopped. + +Sending incorrectly encoded data or data exceeding `maxMsgSize` will result +in stopping the peer. From 3da5198631702eb7afac037c3a6b221cb76899bb Mon Sep 17 00:00:00 2001 From: Anton Kaliaev Date: Wed, 30 May 2018 09:48:07 +0400 Subject: [PATCH 09/10] fixes after @zramsay's review --- docs/index.rst | 1 + docs/running-in-production.rst | 17 +++++++++-------- 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/docs/index.rst b/docs/index.rst index a9b207a12..f9d714296 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -58,6 +58,7 @@ Tendermint 102 subscribing-to-events-via-websocket.rst indexing-transactions.rst how-to-read-logs.rst + running-in-production.rst Tendermint 201 -------------- diff --git a/docs/running-in-production.rst b/docs/running-in-production.rst index dedbd56d0..d0b577681 100644 --- a/docs/running-in-production.rst +++ b/docs/running-in-production.rst @@ -38,7 +38,7 @@ Rate-limiting and authentication are another key aspects to help protect against DOS attacks. While in the future we may implement these features, for now, validators are supposed to use external tools like `NGINX `__ or `traefik -`__ to archive +`__ to achieve the same things. Debugging Tendermint @@ -84,8 +84,8 @@ We have a small tool, called tm-monitor, which outputs information from the endpoints above plus some statistics. The tool can be found `here `__. -What happens when my app die? ------------------------------ +What happens when my app dies? +------------------------------ You are supposed to run Tendermint under a `process supervisor `__ (like systemd or runit). @@ -115,7 +115,7 @@ While actual specs vary depending on the load and validators count, minimal requ - 25GB of disk space - 1.4 GHz CPU -SSD disks are preffereble for applications with high transaction throughput. +SSD disks are preferable for applications with high transaction throughput. Recommended: @@ -179,13 +179,14 @@ tx to will see it until it is included in a block. We want skip_timeout_commit=false when there is economics on the line because proposers should wait to hear for more votes. But if you don't care about that -and want the fastest consensus, you can skip it. So we will keep it false for -the hub and as default, but for enterprise applications, no problem to set to -true. +and want the fastest consensus, you can skip it. It will be kept false by +default for public deployments (e.g. `Cosmos Hub +`__) while for enterprise applications, +setting it to true is not a problem. - ``consensus.peer_gossip_sleep_duration`` -You can try to reduce the time node sleeps before checking if theres something to send its peers. +You can try to reduce the time your node sleeps before checking if theres something to send its peers. - ``consensus.timeout_commit`` From f0ce8b3883d8b9d6bb76d8984019eea193e59434 Mon Sep 17 00:00:00 2001 From: Anton Kaliaev Date: Wed, 30 May 2018 16:55:39 +0400 Subject: [PATCH 10/10] note about state syncing --- docs/running-in-production.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/running-in-production.rst b/docs/running-in-production.rst index d0b577681..162dfdd86 100644 --- a/docs/running-in-production.rst +++ b/docs/running-in-production.rst @@ -123,6 +123,11 @@ Recommended: - 100GB SSD - x64 2.0 GHz 2v CPU +While for now, Tendermint stores all the history and it may require significant +disk space over time, we are planning to implement state syncing (See `#828 +`__). So, storing all the +past blocks will not be necessary. + Operating Systems ~~~~~~~~~~~~~~~~~