internal/proxy: add initial set of abci metrics backport (#7342)
* internal/proxy: add initial set of abci metrics (#7115)
This PR adds an initial set of metrics for use ABCI. The initial metrics enable the calculation of timing histograms and call counts for each of the ABCI methods. The metrics are also labeled as either 'sync' or 'async' to determine if the method call was performed using ABCI's `*Async` methods.
An example of these metrics is included here for reference:
```
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.0001"} 0
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.0004"} 5
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.002"} 12
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.009"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.02"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.1"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.65"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="2"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="6"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="25"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="+Inf"} 13
tendermint_abci_connection_method_timing_sum{chain_id="ci",method="commit",type="sync"} 0.007802058000000001
tendermint_abci_connection_method_timing_count{chain_id="ci",method="commit",type="sync"} 13
```
These metrics can easily be graphed using prometheus's `histogram_quantile(...)` method to pick out a particular quantile to graph or examine. I chose buckets that were somewhat of an estimate of expected range of times for ABCI operations. They start at .0001 seconds and range to 25 seconds. The hope is that this range captures enough possible times to be useful for us and operators.
* lint++
* docs: add abci timing metrics to the metrics docs (#7311)
* cherry-pick fixup 3 years ago internal/proxy: add initial set of abci metrics backport (#7342)
* internal/proxy: add initial set of abci metrics (#7115)
This PR adds an initial set of metrics for use ABCI. The initial metrics enable the calculation of timing histograms and call counts for each of the ABCI methods. The metrics are also labeled as either 'sync' or 'async' to determine if the method call was performed using ABCI's `*Async` methods.
An example of these metrics is included here for reference:
```
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.0001"} 0
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.0004"} 5
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.002"} 12
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.009"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.02"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.1"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.65"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="2"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="6"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="25"} 13
tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="+Inf"} 13
tendermint_abci_connection_method_timing_sum{chain_id="ci",method="commit",type="sync"} 0.007802058000000001
tendermint_abci_connection_method_timing_count{chain_id="ci",method="commit",type="sync"} 13
```
These metrics can easily be graphed using prometheus's `histogram_quantile(...)` method to pick out a particular quantile to graph or examine. I chose buckets that were somewhat of an estimate of expected range of times for ABCI operations. They start at .0001 seconds and range to 25 seconds. The hope is that this range captures enough possible times to be useful for us and operators.
* lint++
* docs: add abci timing metrics to the metrics docs (#7311)
* cherry-pick fixup 3 years ago |
|
- ---
- order: 4
- ---
-
- # Metrics
-
- Tendermint can report and serve the Prometheus metrics, which in their turn can
- be consumed by Prometheus collector(s).
-
- This functionality is disabled by default.
-
- To enable the Prometheus metrics, set `instrumentation.prometheus=true` in your
- config file. Metrics will be served under `/metrics` on 26660 port by default.
- Listen address can be changed in the config file (see
- `instrumentation.prometheus\_listen\_addr`).
-
- ## List of available metrics
-
- The following metrics are available:
-
- | **Name** | **Type** | **Tags** | **Description** |
- | -------------------------------------- | --------- | ------------- | ---------------------------------------------------------------------- |
- | abci_connection_method_timing | Histogram | method, type | Timings for each of the ABCI methods |
- | consensus_height | Gauge | | Height of the chain |
- | consensus_validators | Gauge | | Number of validators |
- | consensus_validators_power | Gauge | | Total voting power of all validators |
- | consensus_validator_power | Gauge | | Voting power of the node if in the validator set |
- | consensus_validator_last_signed_height | Gauge | | Last height the node signed a block, if the node is a validator |
- | consensus_validator_missed_blocks | Gauge | | Total amount of blocks missed for the node, if the node is a validator |
- | consensus_missing_validators | Gauge | | Number of validators who did not sign |
- | consensus_missing_validators_power | Gauge | | Total voting power of the missing validators |
- | consensus_byzantine_validators | Gauge | | Number of validators who tried to double sign |
- | consensus_byzantine_validators_power | Gauge | | Total voting power of the byzantine validators |
- | consensus_block_interval_seconds | Histogram | | Time between this and last block (Block.Header.Time) in seconds |
- | consensus_rounds | Gauge | | Number of rounds |
- | consensus_num_txs | Gauge | | Number of transactions |
- | consensus_total_txs | Gauge | | Total number of transactions committed |
- | consensus_block_parts | counter | peer_id | number of blockparts transmitted by peer |
- | consensus_latest_block_height | gauge | | /status sync_info number |
- | consensus_fast_syncing | gauge | | either 0 (not fast syncing) or 1 (syncing) |
- | consensus_state_syncing | gauge | | either 0 (not state syncing) or 1 (syncing) |
- | consensus_block_size_bytes | Gauge | | Block size in bytes |
- | p2p_peers | Gauge | | Number of peers node's connected to |
- | p2p_peer_receive_bytes_total | counter | peer_id, chID | number of bytes per channel received from a given peer |
- | p2p_peer_send_bytes_total | counter | peer_id, chID | number of bytes per channel sent to a given peer |
- | p2p_peer_pending_send_bytes | gauge | peer_id | number of pending bytes to be sent to a given peer |
- | p2p_num_txs | gauge | peer_id | number of transactions submitted by each peer_id |
- | p2p_pending_send_bytes | gauge | peer_id | amount of data pending to be sent to peer |
- | mempool_size | Gauge | | Number of uncommitted transactions |
- | mempool_tx_size_bytes | histogram | | transaction sizes in bytes |
- | mempool_failed_txs | counter | | number of failed transactions |
- | mempool_recheck_times | counter | | number of transactions rechecked in the mempool |
- | state_block_processing_time | histogram | | time between BeginBlock and EndBlock in ms |
-
- ## Useful queries
-
- Percentage of missing + byzantine validators:
-
- ```prometheus
- ((consensus_byzantine_validators_power + consensus_missing_validators_power) / consensus_validators_power) * 100
- ```
-
- Rate at which the application is responding to each ABCI method call.
- ```
- sum(rate(tendermint_abci_connection_method_timing_count[5m])) by (method)
- ```
-
- The 95th percentile response time for the application to the `deliver_tx` ABCI method call.
- ```
- histogram_quantile(0.95, sum by(le) (rate(tendermint_abci_connection_method_timing_bucket{method="deliver_tx"}[5m])))
- ```
|