Browse Source

metrics ADR

Refs #986
pull/1737/head
Anton Kaliaev 7 years ago
parent
commit
03079185d4
No known key found for this signature in database GPG Key ID: 7B6881D965918214
1 changed files with 103 additions and 0 deletions
  1. +103
    -0
      docs/architecture/adr-010-monitoring.md

+ 103
- 0
docs/architecture/adr-010-monitoring.md View File

@ -0,0 +1,103 @@
# ADR 010: Monitoring
## Changelog
08-06-2018: Initial draft
## Context
In order to bring more visibility into Tendermint, we would like it to report
metrics and, maybe later, traces of transactions and RPC queries. See
https://github.com/tendermint/tendermint/issues/986.
A few solutions were considered:
1. [Prometheus](https://prometheus.io)
a) Prometheus API
b) [go-kit metrics package](https://github.com/go-kit/kit/tree/master/metrics) as an interface plus Prometheus
c) [telegraf](https://github.com/influxdata/telegraf)
d) new service, which will listen to events emitted by pubsub and report metrics
5. [OpenCensus](https://opencensus.io/go/index.html)
### 1. Prometheus
Prometheus seems to be the most popular product out there for monitoring. It has
a Go client library, powerful queries, alerts.
**a) Prometheus API**
We can commit to using Prometheus in Tendermint, but I think Tendermint users
should be free to choose whatever monitoring tool they feel will better suit
their needs (if they don't have existing one already). So we should try to
abstract interface enough so people can switch between Prometheus and other
similar tools.
**b) go-kit metrics package as an interface**
metrics package provides a set of uniform interfaces for service
instrumentation and offers adapters to popular metrics packages:
https://godoc.org/github.com/go-kit/kit/metrics#pkg-subdirectories
Comparing to Prometheus API, we're losing customisability and control, but gaining
freedom in choosing any instrument from the above list given we will extract
metrics creation into a separate function (see "providers" in node/node.go).
**c) telegraf**
Unlike already discussed options, telegraf does not require modifying Tendermint
source code. You create something called an input plugin, which polls
Tendermint RPC every second and calculates the metrics itself.
While it may sound good, but some metrics we want to report are not exposed via
RPC or pubsub, therefore can't be accessed externally.
**d) service, listening to pubsub**
Same issue as the above.
### 2. opencensus
opencensus provides both metrics and tracing, which may be important in the
future. It's API looks different from go-kit and Prometheus, but looks like it
covers everything we need.
Unfortunately, OpenCensus go client does not define any
interfaces, so if we want to abstract away metrics we
will need to write interfaces ourselves.
### List of metrics
| | Name | Type | |
| - | --------------------------------------- | ------- | ----------------------------------------------------------------------------- |
| A | height | Counter | |
| A | validators:<height> | Gauge | Number of validators who signed |
| A | missing_validators:<height> | Gauge | Number of validators who did not sign |
| A | byzantine_validators:<height> | Gauge | Number of validators who tried to double sign |
| A | block_interval | Timing | Time between this and last block (Block.Header.Time) |
| | block_time | Timing | Time to create a block (from creating a proposal to commit) |
| | time_between_blocks | Timing | Time between committing last block and (receiving proposal creating proposal) |
| A | rounds:<height> | Counter | Number of rounds |
| | prevotes:<height>:<round> | Counter | |
| | precommits:<height>:<round> | Counter | |
| | prevotes_total_power:<height>:<round> | Counter | |
| | precommits_total_power:<height>:<round> | Counter | |
| A | num_txs:<height> | Counter | |
| | total_txs | Counter | |
| | block_size:<height> | Gauge | In bytes |
| | peers | Gauge | Number of peers node's connected to |
| | power | Gauge | |
`A` - will be implemented in the fist place.
**Proposed solution**
## Status
## Consequences
### Positive
### Negative
### Neutral

Loading…
Cancel
Save