bft blockchain consensus consistency cryptocurrency cryptography db distributed-systems go

23 KiB

Raw Blame History

Fork detector

This an unfinished draft. Comments are welcome!

A detector (or detector for short) is a mechanism that expects as input a header with some height h, connects to different Tendermint full nodes, requests the header of height h from them, and then cross-checks the headers and the input header.

There are two foreseeable use cases:

strengthen the light client: If a light client accepts a header hd (after performing skipping or sequential verification), it can use the detector to probe the system for conflicting headers and increase the trust in hd. Instead of communicating with a single full node, communicating with several full nodes shall increase the likelihood to be aware of a fork (see [accountability] for discussion about forks) in case there is one.
to support fork accountability: In the case when more than 1/3 of the voting power is held by faulty validators, faulty nodes may generate two conflicting headers for the same height. The goal of the detector is to learn about the conflicting headers by probing different full nodes. Once a detector has two conflicting headers, these headers are evidence of misbehavior. A natural extension is to use the detector within a monitor process (on a full node) that calls the detector on a sample (or all) headers (in parallel). (If the sample is chosen at random, this adds a level of probabilistic reasoning.) If conflicting headers are found, they are evidence that can be used for punishing processes.

In this document we will focus onn strengthening the light client, and leave other uses of the detection mechanism (e.g., when run on a full node) to the future.

Context of this document

The light client verification specification [verification] is designed for the Tendermint failure model (1/3 assumption) [TMBC-FM-2THIRDS]. It is safe under this assumption, and live if it can reliably (that is, no message loss, no duplication, and eventually delivered) and timely communicate with a correct full node. If this assumption is violated, the light client can be fooled to trust a header that was not generated by Tendermint consensus.

This specification, the fork detector, is a "second line of defense", in case the 1/3 assumption is violated. Its goal is to detect fork (conflicting headers) and collect evidence. However, it is impractical to probe all full nodes. At this time we consider a simple scheme of maintaining an address book of known full nodes from which a small subset (e.g., 4) are chosen initially to communicate with. More involved book keeping with probabilistic guarantees can be considered at later stages of the project.

The light client maintains a simple address book containing addresses of full nodes that it can pick as primary and secondaries. To obtain a new header, the light client first does verification with the primary, and then cross-checks the header with the secondaries using this specification.

Tendermint Consensus and Forks

[TMBC-GENESIS.1]

Let Genesis be the agreed-upon initial block (file).

[TMBC-FUNC.1]

TODO be more precise. +2/3 of b.NextV = c.Val signed c. For now the following will do:

Let b and c be two light blocks with b.Header.Height + 1 = c.Header.Height. We define signs(b,c) iff ValidAndVerified(b, c)

TODO be more precise. +1/3 of b.NextV signed c. For now the following will do:

Let b and c be two light blocks. We define supports(b,c,t) iff ValidAndVerified(b, c) at time t

The following formalizes that b was properly generated by Tendermint; b can be traced back to genesis

[TMBC-SEQ-ROOTED.1]

Let b be a light block. We define sequ-rooted(b) iff for all i, 1 <= i < h = b.Header.Height, there exist light blocks a(i) s.t.

a(1) = Genesis and
a(h) = b and
signs( a(i) , a(i+1) ).

The following formalizes that c is trusted based on b in skipping verification. Observe that we do not require here (yet) that b was properly generated.

[TMBC-SKIP-ROOT.1]

Let b and c be light blocks. We define skip-root(b,c,t) if at time t there exists an h and a sequence a(1), ... a(h) s.t.

a(1) = b and
a(h) = c and
supports( a(i), a(i+1), t), for all i, 1 <= i < h.

TODO In the above we might use a sequence of times t(i). Not sure. TODO: I believe the following definition corresponds to Slashable fork in forks. Please confirm!
Observe that sign-skip-match is even defined if there is a fork on the chain.

[TMBC-SIGN-SKIP-MATCH.1]

Let a, b, c, be light blocks and t a time, we define sign-skip-match(a,b,c,t) = true iff

sequ-rooted(a) and
b.Header.Height = c.Header.Height and
skip-root(a,b,t)
skip-root(a,c,t)

implies b = c.

[TMBC-SIGN-FORK.1]

If there exists three light blocks a, b, and c, with sign-skip-match(a,b,c,t) = false then we have a slashable fork.
We call a a bifurcation block of the fork.
We say we have a fork at height b.Header.Height.

The lightblock a need not be unique, that is, a there may be several blocks that satisfy the above requirement for blocks b and c. TODO: I think the following definition is the intuition behind main chain forks in the document on forks. However, main chain forks were defined more operational "forks that are observed by full nodes as part of normal Tendermint consensus protocol". Please confirm!

[TMBC-SIGN-UNIQUE.1]

Let b and c be light blocks, we define sign-unique(b,c) = true iff

b.Header.Height = c.Header.Height and
sequ-rooted(b) and
sequ-rooted(c)

implies b = c.

If there exists two light blocks b and c, with sign-unique(b,c) = false then we have a fork on the chain.

The following captures what I believe is called a light client fork in our discussions. There is no fork on the chain up to the height of block b. However, c is of that height, is different, and passes skipping verification
Observe that a light client fork is a special case of a slashable fork.

[TMBC-LC-FORK.1]

Let a, b, c, be light blocks and t a time. We define light-client-fork(a,b,c,t) iff

sign-skip-match(a,b,c,t) = false and
sequ-rooted(b) and
b is "unique", that is, for all d, sequ-rooted(d) and d.Header.Height=b.Header.Height implies d = b

Finally, let's also define bogus blocks that have no support. Observe that bogus is even defined if there is a fork on the chain. Also, for the definition it would be sufficient to restrict a to a.height < b.height (which is implied by the definitions which unfold until supports().

[TMBC-BOGUS.1]

Let b be a light block and t a time. We define bogus(b,t) iff

sequ-rooted(b) = false and
for all a, sequ-rooted(a) implies skip-root(a,b,t) = false

Relation to fork accountability: F1, F2, and F3 (equivocation, amnesia, back to the past) can lead to a fork on the chain and to a light client fork.
F4 and F5 (phantom validators, lunatic) cannot lead to a fork on the chain but to a light client fork if t+1 < f < 2t+1.
F4 and F5 can also lead to bogus blocks

Informal Problem statement

We put tags to informal problem statements as there is no sequential specification.

The following requirements are operational in that they describe how things should be done, rather than what should be done. However, they do not constitute temporal logic verification conditions. For those, see [LCD-DIST-*] below.

[LCD-IP-STATE.1]

The detector works on a LightStore that contains LightBlocks in one of the state StateUnverified, StateVerified, StateFailed, or StateTrusted.

[LCD-IP-Q.1]

Whenever the light client verifier performs VerifyToTarget with the primary and returns with (lightStore, ResultSuccess), the detector should query the secondaries by calling FetchLightBlock for height LightStore.LatestVerified().Height remotely.
Then, the detector returns the set of all headers h' downloaded from secondaries that satisfy

h' is different from LightStore.LatestVerified()
h' is a (light) fork.

[LCD-IP-PEERSET.1]

Whenever the detector observes detectable misbehavior of a full node from the set of Secondaries it should be replaced by a fresh full node. (A full node that has not been primary or secondary before). Detectable misbehavior can be

a timeout
The case where h' is different from LightStore.LatestVerified() but h' is not a fork, that is, if h' is bogus. In other words, the secondary cannot provide a sequence of light blocks that constitutes proof of h'.

Assumptions/Incentives/Environment

It is not in the interest of faulty full nodes to talk to the detector as long as the detector is connected to at least one correct full node. This would only increase the likelihood of misbehavior being detected. Also we cannot punish them easily (cheaply). The absence of a response need not be the fault of the full node.

Correct full nodes have the incentive to respond, because the detector may help them to understand whether their header is a good one. We can thus base liveness arguments of the detector on the assumptions that correct full nodes reliably talk to the detector.

Assumptions

[LCD-A-CorrFull.1]

At all times there is at least one correct full node among the primary and the secondaries.

Remark: Check whether [LCD-A-CorrFull.1] is not needed in the end because the verification conditions [LCD-DIST-*] have preconditions on specific cases where primary and/or secondaries are faulty.

[LCD-A-RelComm.1]

Communication between the detector and a correct full node is reliable and bounded in time. Reliable communication means that messages are not lost, not duplicated, and eventually delivered. There is a (known) end-to-end delay Delta, such that if a message is sent at time t then it is received and processed by time t + Delta. This implies that we need a timeout of at least 2 Delta for remote procedure calls to ensure that the response of a correct peer arrives before the timeout expires.

(Distributed) Problem statement

As the fork detector from the beginning is there to reduce the impact of faulty nodes, and faulty nodes imply that there is a distributed system, there is no sequential specification.

The detector gets as input a lightstore lightStore. Let h-verified = lightStore.LatestVerified().Height and h-trust=lightStore.LatestTrusted().Height (see [LCV-DATA-LIGHTSTORE]). It queries the secondaries for headers at height h-verified. The detector returns a set PoF of Proof of Forks, and should satisfy the following temporal formulas:

[LCD-DIST-INV.1]

If there is no fork at height h-verified ([TMBC-SIGN-FORK.1]), then the detector should return the empty set.

If the empty set is returned the supervisor will change the state of the header at height h-verified to stateTrusted.

[LCD-DIST-LIVE-FORK.1]

If there is a fork at height h-verified, and there are two correct full nodes i and j that are

on different branches, and
i is primary and
j is secondary,

then the detector eventually outputs the fork.

[LCD-DIST-LIVE-FORK-FAULTY.1]

If there is a fork at height h-verified, and there is a correct secondary that is on a different branch than the primary reported, then the detector eventually outputs the fork.

The above property is quite operational ("Than the primary reported"), but it captures quite closely the requirement. As the fork detector only makes sense in a distributed setting, and does not have a sequential specification, less "pure" specification are acceptable. These properties capture the following operational requirement:

[LCD-REQ-REP.1]
If the detector observes two conflicting headers for height h, it should try to verify both. If both are verified it should report evidence. If the primary reports header h and a secondary reports header h', and if h' can be verified based on common root of trust, then evidence should be generated; By verifying we mean calling VerifyToTarget from the [verification] specification.

Definitions

A fixed set of full nodes is provided in the configuration upon initialization. Initially this set is partitioned into
- one full node that is the primary (singleton set),
- a set Secondaries (of fixed size, e.g., 3),
- a set FullNodes.
A set FaultyNodes of nodes that the light client suspects of being faulty; it is initially empty
Lightstore as defined in the verification specification.

[LCD-INV-NODES.1]:

The detector shall maintain the following invariants:

FullNodes \intersect Secondaries = {}
FullNodes \intersect FaultyNodes = {}
Secondaries \intersect FaultyNodes = {}

and the following transition invariant

FullNodes' \union Secondaries' \union FaultyNodes' = FullNodes \union Secondaries \union FaultyNodes

The following invariant is very useful for reasoning, and underlies many intuition when we

[LCD-INV-TRUSTED-AGREED.1]:

It is always the case the light client has downloaded a lightblock for height lightStore.LatestTrusted().Height from each of the current primary and the secondary, that all reported the identical lightblock for that height.

In the above, I guess "the identical" might be replaced with "a matching" to cover commits that might be different. The above requires us that before we pick a new secondary, we have to query the secondary for the header of height lightStore.LatestTrusted().Height.

Solution

Data Structures

Lightblocks and LightStores are defined at [LCV-DATA-LIGHTBLOCK.1] and [LCV-DATA-LIGHTSTORE.1]. See the verification specification for details.

The following data structure [LCV-DATA-POF.1] defines a proof of fork. Following [TMBC-SIGN-FORK.1], we require two blocks b and c for the same height that can both be verified from a common root block a (using the skipping or the sequential method). [LCV-DATA-POF.1] mirrors the definition [TMBC-SIGN-FORK.1]: TrustedBlock corresponds to a, and PrimaryTrace and SecondaryTrace are traces to two blocks b and c. The traces establish that both skip-root(a,b,t) and skip-root(a,c,t) are satisfied.

[LCV-DATA-POF.1]

type LightNodeProofOfFork struct {
    TrustedBlock      LightBlock
    PrimaryTrace      []LightBlock
    SecondaryTrace    []LightBlock
}

[LCV-DATA-POFSTORE.1]

Proofs of Forks are stored in a structure which stores all proofs generated during detection.

type PoFStore struct {
 ...
}

In additions to the functions defined in the verification specification, the LightStore exposes the following function

[LCD-FUNC-SUBTRACE.1]:

func (ls LightStore) Subtrace(from int, to int) LightStore

Expected postcondition
- returns a lightstore that contains all lightblocks b from ls that satisfy: from < b.Header.Height <= to

Inter Process Communication

func FetchLightBlock(peer PeerID, height Height) LightBlock

See the verification specification for details.

[LCD-FUNC-SUBMIT.1]:

func SubmitProofOfFork(pof LightNodeProofOfFork) Result

TODO: finalize what this should do, and what detail of specification we need.

Implementation remark
Expected precondition
- none
Expected postcondition
- submit evidence to primary and the secondary in pof, that is, to
  - pof.PrimaryTrace[1].Provider
  - pof.SecondaryTrace[1].Provider
- QUESTION minimize data? We could submit to the primary only the trace of the secondary, and vice versa. Do we need to spell that out here? (Also, by [LCD-INV-TRUSTED-AGREED.1], we do not need to send pof.TrustedBlock)
- FUTURE WORK: we might send pof to primary or all secondaries or broadcast to all full nodes. However, in evidence detection this might need that a full node has to check a pof where both traces are not theirs. This leads to more complicated logic at the full node, which we do not need right now.
Error condition
- none

Auxiliary Functions (Local)

[LCD-FUNC-CROSS-CHECK.1]:

func CrossCheck(peer PeerID, testedLB LightBlock) (result) {
 sh := FetchLightBlock(peer, testedLB.Height);
  // as the check below only needs the header, it is sufficient
  // to download the header rather than the LighBlock
    if testedLB.Header == sh.Header {
     return OK
 }
 else {
     return DoesNotMatch
 }
}

Implementation remark
- download block and compare to previously downloaded one.
Expected precondition
Expected postcondition
Error condition

[LCD-FUNC-REPLACE-PRIMARY.1]:

Replace_Primary()

TODO: formalize conditions

Implementation remark
- the primary is replaced by a secondary, and lightblocks above trusted blocks are removed
- to maintain a constant size of secondaries, at this point we might need to
  - pick a new secondary nsec
  - maintain [LCD-INV-TRUSTED-AGREED.1], that is,
    - call CrossCheck(nsec,lightStore.LatestTrusted(). If it matches we are OK, otherwise
      - we repeat with another full node as new secondary candidate
      - FUTURE: try to do fork detection from some possibly old lightblock in store. (Might be the approach for the light node that assumes to be connected to correct full nodes only from time to time)
Expected precondition
- FullNodes is nonempty
Expected postcondition
- primary is moved to FaultyNodes
- all lightblocks with height greater than lightStore.LatestTrusted().Height are removed from lightStore.
- a secondary s is moved from Secondaries to primary

this ensures that s agrees on the Last Trusted State

Error condition
- if precondition is violated

[LCD-FUNC-REPLACE-SECONDARY.1]:

Replace_Secondary(addr Address)

TODO: formalize conditions

Implementation remark
- maintain [LCD-INV-TRUSTED-AGREED.1], that is,
  - call CrossCheck(nsec,lightStore.LatestTrusted(). If it matches we are OK, otherwise
    - we might just repeat with another full node as new secondary
    - FUTURE: try to do fork detection from some possibly old lightblock in store. (Might be the approach for the light node that assumes to be connected to correct full nodes only from time to time)
Expected precondition
- FullNodes is nonempty
Expected postcondition
- addr is moved from Secondaries to FaultyNodes
- an address a is moved from FullNodes to Secondaries
Error condition
- if precondition is violated

From the verifier

func VerifyToTarget(primary PeerID, lightStore LightStore,
                    targetHeight Height) (LightStore, Result)

See the verification specification for details.

Protocol

Shared data of the light client

a pool of full nodes FullNodes that have not been contacted before
peer set called Secondaries
primary
lightStore

Outline

The problem laid out is solved by calling the function ForkDetector with a lightstore that contains a light block that has just been verified by the verifier.

TODO: We should clarify what is the expectation of VerifyToTarget so if it returns TimeoutError it can be assumed faulty. I guess that VerifyToTarget with correct full node should never terminate with TimeoutError.
TODO: clarify EXPIRED case. Can we always punish? Can we give sufficient conditions.

Fork Detector

[LCD-FUNC-DETECTOR.1]:

func ForkDetector(ls LightStore, PoFs PoFStore)
{
 testedLB := LightStore.LatestVerified()
 for i, secondary range Secondaries {
     if OK = CrossCheck(secondary, testedLB) {
   // header matches. we do nothing.
  }
  else {
   // [LCD-REQ-REP]
   // header does not match. there is a situation.
   // we try to verify sh by querying s
   // we set up an auxiliary lightstore with the highest
   // trusted lightblock and the lightblock we want to verify
   auxLS.Init
   auxLS.Update(LightStore.LatestTrusted(), StateVerified);
   auxLS.Update(sh,StateUnverified);
   LS,result := VerifyToTarget(secondary, auxLS, sh.Header.Height)
   if (result = ResultSuccess || result = EXPIRED) {
    // we verified header sh which is conflicting to hd
    // there is a fork on the main blockchain.
    // If return code was EXPIRED it might be too late
    // to punish, we still report it.
    pof = new LightNodeProofOfFork;
    pof.TrustedBlock := LightStore.LatestTrusted()
    pof.PrimaryTrace :=
        LightStore.Subtrace(LightStore.LatestTrusted().Height,
                         testedLB.Height);
    pof.SecondaryTrace :=
        auxLS.Subtrace(LightStore.LatestTrusted().Height,
                    testedLB.Height);
    PoFs.Add(pof);
   }
   else {
    // secondary might be faulty or unreachable
    // it might fail to provide a trace that supports sh
    // or time out
    Replace_Secondary(secondary)
   }
  }
 }
 return PoFs
}

TODO: formalize conditions

Expected precondition
- Secondaries initialized and non-empty
- PoFs initialized and empty
- lightStore.LatestTrusted().Height < lightStore.LatestVerified().Height
Expected postcondition
- satisfies [LCD-DIST-INV.1], [LCD-DIST-LIFE-FORK.1]
- removes faulty secondary if it reports wrong header
- TODO submit proof of fork
Error condition
- fails if precondition is violated
- fails if [LCV-INV-TP] is violated (no trusted header within trusting period

Correctness arguments

Argument for [LCD-DIST-INV]

TODO

Argument for [LCD-DIST-LIFE-FORK]

TODO

References

links to other specifications/ADRs this document refers to

[verification] The specification of the light client verification.

[tendermintfork] Tendermint fork detection and accountability

[accountability] Fork accountability

23 KiB Raw Blame History

Fork detector

Context of this document

Tendermint Consensus and Forks

[TMBC-GENESIS.1]

[TMBC-FUNC.1]

[TMBC-SEQ-ROOTED.1]

[TMBC-SKIP-ROOT.1]

[TMBC-SIGN-SKIP-MATCH.1]

[TMBC-SIGN-FORK.1]

[TMBC-SIGN-UNIQUE.1]

[TMBC-LC-FORK.1]

[TMBC-BOGUS.1]

Informal Problem statement

[LCD-IP-STATE.1]

[LCD-IP-Q.1]

[LCD-IP-PEERSET.1]

Assumptions/Incentives/Environment

Assumptions

[LCD-A-CorrFull.1]

[LCD-A-RelComm.1]

(Distributed) Problem statement

[LCD-DIST-INV.1]

[LCD-DIST-LIVE-FORK.1]

[LCD-DIST-LIVE-FORK-FAULTY.1]

Definitions

[LCD-INV-NODES.1]:

[LCD-INV-TRUSTED-AGREED.1]:

Solution

Data Structures

[LCV-DATA-POF.1]

[LCV-DATA-POFSTORE.1]

[LCD-FUNC-SUBTRACE.1]:

Inter Process Communication

[LCD-FUNC-SUBMIT.1]:

Auxiliary Functions (Local)

[LCD-FUNC-CROSS-CHECK.1]:

[LCD-FUNC-REPLACE-PRIMARY.1]:

[LCD-FUNC-REPLACE-SECONDARY.1]:

From the verifier

Protocol

Shared data of the light client

Outline

Fork Detector

[LCD-FUNC-DETECTOR.1]:

Correctness arguments

Argument for [LCD-DIST-INV]

Argument for [LCD-DIST-LIFE-FORK]

References

23 KiB

Raw Blame History