Browse Source

docs: write about debug kill and dump (#4516)

* docs: write about debug kill and dump

Closes #4325

* wrap file tree in code blocks
pull/4519/head
Anton Kaliaev 5 years ago
committed by GitHub
parent
commit
a60d032b07
No known key found for this signature in database GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 94 additions and 38 deletions
  1. +28
    -34
      docs/tendermint-core/running-in-production.md
  2. +9
    -4
      docs/tools/README.md
  3. +57
    -0
      docs/tools/debugging.md

+ 28
- 34
docs/tendermint-core/running-in-production.md View File

@ -111,54 +111,44 @@ to achieve the same things.
## Debugging Tendermint
If you ever have to debug Tendermint, the first thing you should
probably do is to check out the logs. See [How to read
logs](./how-to-read-logs.md), where we explain what certain log
statements mean.
If you ever have to debug Tendermint, the first thing you should probably do is
check out the logs. See [How to read logs](./how-to-read-logs.md), where we
explain what certain log statements mean.
If, after skimming through the logs, things are not clear still, the
next thing to try is query the /status RPC endpoint. It provides the
necessary info: whenever the node is syncing or not, what height it is
on, etc.
If, after skimming through the logs, things are not clear still, the next thing
to try is querying the `/status` RPC endpoint. It provides the necessary info:
whenever the node is syncing or not, what height it is on, etc.
```
```sh
curl http(s)://{ip}:{rpcPort}/status
```
`dump_consensus_state` will give you a detailed overview of the
consensus state (proposer, lastest validators, peers states). From it,
you should be able to figure out why, for example, the network had
halted.
`/dump_consensus_state` will give you a detailed overview of the consensus
state (proposer, latest validators, peers states). From it, you should be able
to figure out why, for example, the network had halted.
```
```sh
curl http(s)://{ip}:{rpcPort}/dump_consensus_state
```
There is a reduced version of this endpoint - `consensus_state`, which
returns just the votes seen at the current height.
- [Github Issues](https://github.com/tendermint/tendermint/issues)
- [StackOverflow
questions](https://stackoverflow.com/questions/tagged/tendermint)
There is a reduced version of this endpoint - `/consensus_state`, which returns
just the votes seen at the current height.
### Debug Utility
If, after consulting with the logs and above endpoints, you still have no idea
what's happening, consider using `tendermint debug kill` sub-command. This
command will scrap all the available info and kill the process. See
[Debugging](../tools/debugging.md) for the exact format.
Tendermint also ships with a `debug` sub-command that allows you to kill a live
Tendermint process while collecting useful information in a compressed archive
such as the configuration used, consensus state, network state, the node' status,
the WAL, and even the stacktrace of the process before exit. These files can be
useful to examine when debugging a faulty Tendermint process.
In addition, the `debug` sub-command also allows you to dump debugging data into
compressed archives at a regular interval. These archives contain the goroutine
and heap profiles in addition to the consensus state, network info, node status,
and even the WAL.
You can inspect the resulting archive yourself or create an issue on
[Github](https://github.com/tendermint/tendermint). Before opening an issue
however, be sure to check if there's [no existing
issue](https://github.com/tendermint/tendermint/issues) already.
## Monitoring Tendermint
Each Tendermint instance has a standard `/health` RPC endpoint, which
responds with 200 (OK) if everything is fine and 500 (or no response) -
if something is wrong.
Each Tendermint instance has a standard `/health` RPC endpoint, which responds
with 200 (OK) if everything is fine and 500 (or no response) - if something is
wrong.
Other useful endpoints include mentioned earlier `/status`, `/net_info` and
`/validators`.
@ -166,6 +156,10 @@ Other useful endpoints include mentioned earlier `/status`, `/net_info` and
Tendermint also can report and serve Prometheus metrics. See
[Metrics](./metrics.md).
`tendermint debug dump` sub-command can be used to periodically dump useful
information into an archive. See [Debugging](../tools/debugging.md) for more
information.
## What happens when my app dies?
You are supposed to run Tendermint under a [process


+ 9
- 4
docs/tools/README.md View File

@ -9,16 +9,21 @@ parent:
Tendermint has some tools that are associated with it for:
- [Debugging](./debugging.md)
- [Benchmarking](#benchmarking)
- [Validation of remote signers](./remote-signer-validation.md)
- [Testnets](#testnets)
- [Validation of remote signers](./remote-signer-validation.md)
## Benchmarking
Benchmarking is done with tm-load-test, for information on how to use the tool please visit the docs: https://github.com/interchainio/tm-load-test
- https://github.com/interchainio/tm-load-test
`tm-load-test` is a distributed load testing tool (and framework) for load
testing Tendermint networks.
## Testnets
The testnets tool is aimed at testing Tendermint with different configurations. For more information please visit: https://github.com/interchainio/testnets.
- https://github.com/interchainio/testnets
This repository contains various different configurations of test networks for,
and relating to, Tendermint.

+ 57
- 0
docs/tools/debugging.md View File

@ -0,0 +1,57 @@
# Debugging
## tendermint debug kill
Tendermint comes with a `debug` sub-command that allows you to kill a live
Tendermint process while collecting useful information in a compressed archive.
The information includes the configuration used, consensus state, network
state, the node' status, the WAL, and even the stack trace of the process
before exit. These files can be useful to examine when debugging a faulty
Tendermint process.
```sh
tendermint debug kill <pid> </path/to/out.zip> --home=</path/to/app.d>
```
will write debug info into a compressed archive. The archive will contain the
following:
```
├── config.toml
├── consensus_state.json
├── net_info.json
├── stacktrace.out
├── status.json
└── wal
```
Under the hood, `debug kill` fetches info from `/status`, `/net_info`, and
`/dump_consensus_state` HTTP endpoints, and kills the process with `-6`, which
catches the go-routine dump.
## tendermint debug dump
Also, the `debug dump` sub-command allows you to dump debugging data into
compressed archives at a regular interval. These archives contain the goroutine
and heap profiles in addition to the consensus state, network info, node
status, and even the WAL.
```sh
tendermint debug dump </path/to/out> --home=</path/to/app.d>
```
will perform similarly to `kill` except it only polls the node and
dumps debugging data every frequency seconds to a compressed archive under a
given destination directory. Each archive will contain:
```
├── consensus_state.json
├── goroutine.out
├── heap.out
├── net_info.json
├── status.json
└── wal
```
Note: goroutine.out and heap.out will only be written if a profile address is
provided and is operational. This command is blocking and will log any error.

Loading…
Cancel
Save