You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

252 lines
9.4 KiB

7 years ago
7 years ago
7 years ago
consensus: calculate prevote message delay metric (backport #7551) (#7617) * consensus: calculate prevote message delay metric (#7551) ## What does this pull request do? This pull requests adds two metrics intended for use in calculating an experimental value for `MessageDelay`. The metrics are as follows: ``` # HELP tendermint_consensus_complete_prevote_message_delay Difference in seconds between the proposal timestamp and the timestamp of the prevote that achieved 100% of the voting power in the prevote step. # TYPE tendermint_consensus_complete_prevote_message_delay gauge tendermint_consensus_complete_prevote_message_delay{chain_id="test-chain-aZbwF1"} 0.013025505 # HELP tendermint_consensus_quorum_prevote_message_delay Difference in seconds between the proposal timestamp and the timestamp of the prevote that achieved a quorum in the prevote step. # TYPE tendermint_consensus_quorum_prevote_message_delay gauge tendermint_consensus_quorum_prevote_message_delay{chain_id="test-chain-aZbwF1"} 0.013025505 ``` ## Why this change? For more information on what these metrics are calculating, see #7202. The aim is to merge to backport these metrics to v0.34 and run nodes on a few popular chains with these metrics to determine the experimental values for `MessageDelay` on these popular chains and use these to select our default `SynchronyParams.MessageDelay` value. ## Why Gauges for the metrics? Gauges allow us to overwrite the metric on each successive observation. We can then capture these metrics over time to track the highest and lowest observed value. (cherry picked from commit 0c82ceaa5f7964c13247af9b64d72477af9dc973) # Conflicts: # consensus/metrics.go # consensus/state.go * fix merge conflicts Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> Co-authored-by: William Banfield <wbanfield@gmail.com>
3 years ago
consensus: calculate prevote message delay metric (backport #7551) (#7617) * consensus: calculate prevote message delay metric (#7551) ## What does this pull request do? This pull requests adds two metrics intended for use in calculating an experimental value for `MessageDelay`. The metrics are as follows: ``` # HELP tendermint_consensus_complete_prevote_message_delay Difference in seconds between the proposal timestamp and the timestamp of the prevote that achieved 100% of the voting power in the prevote step. # TYPE tendermint_consensus_complete_prevote_message_delay gauge tendermint_consensus_complete_prevote_message_delay{chain_id="test-chain-aZbwF1"} 0.013025505 # HELP tendermint_consensus_quorum_prevote_message_delay Difference in seconds between the proposal timestamp and the timestamp of the prevote that achieved a quorum in the prevote step. # TYPE tendermint_consensus_quorum_prevote_message_delay gauge tendermint_consensus_quorum_prevote_message_delay{chain_id="test-chain-aZbwF1"} 0.013025505 ``` ## Why this change? For more information on what these metrics are calculating, see #7202. The aim is to merge to backport these metrics to v0.34 and run nodes on a few popular chains with these metrics to determine the experimental values for `MessageDelay` on these popular chains and use these to select our default `SynchronyParams.MessageDelay` value. ## Why Gauges for the metrics? Gauges allow us to overwrite the metric on each successive observation. We can then capture these metrics over time to track the highest and lowest observed value. (cherry picked from commit 0c82ceaa5f7964c13247af9b64d72477af9dc973) # Conflicts: # consensus/metrics.go # consensus/state.go * fix merge conflicts Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> Co-authored-by: William Banfield <wbanfield@gmail.com>
3 years ago
consensus: calculate prevote message delay metric (backport #7551) (#7617) * consensus: calculate prevote message delay metric (#7551) ## What does this pull request do? This pull requests adds two metrics intended for use in calculating an experimental value for `MessageDelay`. The metrics are as follows: ``` # HELP tendermint_consensus_complete_prevote_message_delay Difference in seconds between the proposal timestamp and the timestamp of the prevote that achieved 100% of the voting power in the prevote step. # TYPE tendermint_consensus_complete_prevote_message_delay gauge tendermint_consensus_complete_prevote_message_delay{chain_id="test-chain-aZbwF1"} 0.013025505 # HELP tendermint_consensus_quorum_prevote_message_delay Difference in seconds between the proposal timestamp and the timestamp of the prevote that achieved a quorum in the prevote step. # TYPE tendermint_consensus_quorum_prevote_message_delay gauge tendermint_consensus_quorum_prevote_message_delay{chain_id="test-chain-aZbwF1"} 0.013025505 ``` ## Why this change? For more information on what these metrics are calculating, see #7202. The aim is to merge to backport these metrics to v0.34 and run nodes on a few popular chains with these metrics to determine the experimental values for `MessageDelay` on these popular chains and use these to select our default `SynchronyParams.MessageDelay` value. ## Why Gauges for the metrics? Gauges allow us to overwrite the metric on each successive observation. We can then capture these metrics over time to track the highest and lowest observed value. (cherry picked from commit 0c82ceaa5f7964c13247af9b64d72477af9dc973) # Conflicts: # consensus/metrics.go # consensus/state.go * fix merge conflicts Co-authored-by: William Banfield <4561443+williambanfield@users.noreply.github.com> Co-authored-by: William Banfield <wbanfield@gmail.com>
3 years ago
  1. package consensus
  2. import (
  3. "github.com/go-kit/kit/metrics"
  4. "github.com/go-kit/kit/metrics/discard"
  5. prometheus "github.com/go-kit/kit/metrics/prometheus"
  6. stdprometheus "github.com/prometheus/client_golang/prometheus"
  7. )
  8. const (
  9. // MetricsSubsystem is a subsystem shared by all metrics exposed by this
  10. // package.
  11. MetricsSubsystem = "consensus"
  12. )
  13. // Metrics contains metrics exposed by this package.
  14. type Metrics struct {
  15. // Height of the chain.
  16. Height metrics.Gauge
  17. // ValidatorLastSignedHeight of a validator.
  18. ValidatorLastSignedHeight metrics.Gauge
  19. // Number of rounds.
  20. Rounds metrics.Gauge
  21. // Number of validators.
  22. Validators metrics.Gauge
  23. // Total power of all validators.
  24. ValidatorsPower metrics.Gauge
  25. // Power of a validator.
  26. ValidatorPower metrics.Gauge
  27. // Amount of blocks missed by a validator.
  28. ValidatorMissedBlocks metrics.Gauge
  29. // Number of validators who did not sign.
  30. MissingValidators metrics.Gauge
  31. // Total power of the missing validators.
  32. MissingValidatorsPower metrics.Gauge
  33. // Number of validators who tried to double sign.
  34. ByzantineValidators metrics.Gauge
  35. // Total power of the byzantine validators.
  36. ByzantineValidatorsPower metrics.Gauge
  37. // Time between this and the last block.
  38. BlockIntervalSeconds metrics.Histogram
  39. // Number of transactions.
  40. NumTxs metrics.Gauge
  41. // Size of the block.
  42. BlockSizeBytes metrics.Gauge
  43. // Total number of transactions.
  44. TotalTxs metrics.Gauge
  45. // The latest block height.
  46. CommittedHeight metrics.Gauge
  47. // Whether or not a node is fast syncing. 1 if yes, 0 if no.
  48. FastSyncing metrics.Gauge
  49. // Whether or not a node is state syncing. 1 if yes, 0 if no.
  50. StateSyncing metrics.Gauge
  51. // Number of blockparts transmitted by peer.
  52. BlockParts metrics.Counter
  53. // QuroumPrevoteMessageDelay is the interval in seconds between the proposal
  54. // timestamp and the timestamp of the earliest prevote that achieved a quorum
  55. // during the prevote step.
  56. //
  57. // To compute it, sum the voting power over each prevote received, in increasing
  58. // order of timestamp. The timestamp of the first prevote to increase the sum to
  59. // be above 2/3 of the total voting power of the network defines the endpoint
  60. // the endpoint of the interval. Subtract the proposal timestamp from this endpoint
  61. // to obtain the quorum delay.
  62. QuorumPrevoteMessageDelay metrics.Gauge
  63. // FullPrevoteMessageDelay is the interval in seconds between the proposal
  64. // timestamp and the timestamp of the latest prevote in a round where 100%
  65. // of the voting power on the network issued prevotes.
  66. FullPrevoteMessageDelay metrics.Gauge
  67. }
  68. // PrometheusMetrics returns Metrics build using Prometheus client library.
  69. // Optionally, labels can be provided along with their values ("foo",
  70. // "fooValue").
  71. func PrometheusMetrics(namespace string, labelsAndValues ...string) *Metrics {
  72. labels := []string{}
  73. for i := 0; i < len(labelsAndValues); i += 2 {
  74. labels = append(labels, labelsAndValues[i])
  75. }
  76. return &Metrics{
  77. Height: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  78. Namespace: namespace,
  79. Subsystem: MetricsSubsystem,
  80. Name: "height",
  81. Help: "Height of the chain.",
  82. }, labels).With(labelsAndValues...),
  83. Rounds: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  84. Namespace: namespace,
  85. Subsystem: MetricsSubsystem,
  86. Name: "rounds",
  87. Help: "Number of rounds.",
  88. }, labels).With(labelsAndValues...),
  89. Validators: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  90. Namespace: namespace,
  91. Subsystem: MetricsSubsystem,
  92. Name: "validators",
  93. Help: "Number of validators.",
  94. }, labels).With(labelsAndValues...),
  95. ValidatorLastSignedHeight: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  96. Namespace: namespace,
  97. Subsystem: MetricsSubsystem,
  98. Name: "validator_last_signed_height",
  99. Help: "Last signed height for a validator",
  100. }, append(labels, "validator_address")).With(labelsAndValues...),
  101. ValidatorMissedBlocks: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  102. Namespace: namespace,
  103. Subsystem: MetricsSubsystem,
  104. Name: "validator_missed_blocks",
  105. Help: "Total missed blocks for a validator",
  106. }, append(labels, "validator_address")).With(labelsAndValues...),
  107. ValidatorsPower: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  108. Namespace: namespace,
  109. Subsystem: MetricsSubsystem,
  110. Name: "validators_power",
  111. Help: "Total power of all validators.",
  112. }, labels).With(labelsAndValues...),
  113. ValidatorPower: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  114. Namespace: namespace,
  115. Subsystem: MetricsSubsystem,
  116. Name: "validator_power",
  117. Help: "Power of a validator",
  118. }, append(labels, "validator_address")).With(labelsAndValues...),
  119. MissingValidators: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  120. Namespace: namespace,
  121. Subsystem: MetricsSubsystem,
  122. Name: "missing_validators",
  123. Help: "Number of validators who did not sign.",
  124. }, labels).With(labelsAndValues...),
  125. MissingValidatorsPower: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  126. Namespace: namespace,
  127. Subsystem: MetricsSubsystem,
  128. Name: "missing_validators_power",
  129. Help: "Total power of the missing validators.",
  130. }, labels).With(labelsAndValues...),
  131. ByzantineValidators: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  132. Namespace: namespace,
  133. Subsystem: MetricsSubsystem,
  134. Name: "byzantine_validators",
  135. Help: "Number of validators who tried to double sign.",
  136. }, labels).With(labelsAndValues...),
  137. ByzantineValidatorsPower: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  138. Namespace: namespace,
  139. Subsystem: MetricsSubsystem,
  140. Name: "byzantine_validators_power",
  141. Help: "Total power of the byzantine validators.",
  142. }, labels).With(labelsAndValues...),
  143. BlockIntervalSeconds: prometheus.NewHistogramFrom(stdprometheus.HistogramOpts{
  144. Namespace: namespace,
  145. Subsystem: MetricsSubsystem,
  146. Name: "block_interval_seconds",
  147. Help: "Time between this and the last block.",
  148. }, labels).With(labelsAndValues...),
  149. NumTxs: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  150. Namespace: namespace,
  151. Subsystem: MetricsSubsystem,
  152. Name: "num_txs",
  153. Help: "Number of transactions.",
  154. }, labels).With(labelsAndValues...),
  155. BlockSizeBytes: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  156. Namespace: namespace,
  157. Subsystem: MetricsSubsystem,
  158. Name: "block_size_bytes",
  159. Help: "Size of the block.",
  160. }, labels).With(labelsAndValues...),
  161. TotalTxs: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  162. Namespace: namespace,
  163. Subsystem: MetricsSubsystem,
  164. Name: "total_txs",
  165. Help: "Total number of transactions.",
  166. }, labels).With(labelsAndValues...),
  167. CommittedHeight: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  168. Namespace: namespace,
  169. Subsystem: MetricsSubsystem,
  170. Name: "latest_block_height",
  171. Help: "The latest block height.",
  172. }, labels).With(labelsAndValues...),
  173. FastSyncing: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  174. Namespace: namespace,
  175. Subsystem: MetricsSubsystem,
  176. Name: "fast_syncing",
  177. Help: "Whether or not a node is fast syncing. 1 if yes, 0 if no.",
  178. }, labels).With(labelsAndValues...),
  179. StateSyncing: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  180. Namespace: namespace,
  181. Subsystem: MetricsSubsystem,
  182. Name: "state_syncing",
  183. Help: "Whether or not a node is state syncing. 1 if yes, 0 if no.",
  184. }, labels).With(labelsAndValues...),
  185. BlockParts: prometheus.NewCounterFrom(stdprometheus.CounterOpts{
  186. Namespace: namespace,
  187. Subsystem: MetricsSubsystem,
  188. Name: "block_parts",
  189. Help: "Number of blockparts transmitted by peer.",
  190. }, append(labels, "peer_id")).With(labelsAndValues...),
  191. QuorumPrevoteMessageDelay: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  192. Namespace: namespace,
  193. Subsystem: MetricsSubsystem,
  194. Name: "quorum_prevote_message_delay",
  195. Help: "Difference in seconds between the proposal timestamp and the timestamp " +
  196. "of the latest prevote that achieved a quorum in the prevote step.",
  197. }, labels).With(labelsAndValues...),
  198. FullPrevoteMessageDelay: prometheus.NewGaugeFrom(stdprometheus.GaugeOpts{
  199. Namespace: namespace,
  200. Subsystem: MetricsSubsystem,
  201. Name: "full_prevote_message_delay",
  202. Help: "Difference in seconds between the proposal timestamp and the timestamp " +
  203. "of the latest prevote that achieved 100% of the voting power in the prevote step.",
  204. }, labels).With(labelsAndValues...),
  205. }
  206. }
  207. // NopMetrics returns no-op Metrics.
  208. func NopMetrics() *Metrics {
  209. return &Metrics{
  210. Height: discard.NewGauge(),
  211. ValidatorLastSignedHeight: discard.NewGauge(),
  212. Rounds: discard.NewGauge(),
  213. Validators: discard.NewGauge(),
  214. ValidatorsPower: discard.NewGauge(),
  215. ValidatorPower: discard.NewGauge(),
  216. ValidatorMissedBlocks: discard.NewGauge(),
  217. MissingValidators: discard.NewGauge(),
  218. MissingValidatorsPower: discard.NewGauge(),
  219. ByzantineValidators: discard.NewGauge(),
  220. ByzantineValidatorsPower: discard.NewGauge(),
  221. BlockIntervalSeconds: discard.NewHistogram(),
  222. NumTxs: discard.NewGauge(),
  223. BlockSizeBytes: discard.NewGauge(),
  224. TotalTxs: discard.NewGauge(),
  225. CommittedHeight: discard.NewGauge(),
  226. FastSyncing: discard.NewGauge(),
  227. StateSyncing: discard.NewGauge(),
  228. BlockParts: discard.NewCounter(),
  229. QuorumPrevoteMessageDelay: discard.NewGauge(),
  230. FullPrevoteMessageDelay: discard.NewGauge(),
  231. }
  232. }