bft blockchain consensus consistency cryptocurrency cryptography db distributed-systems go

28 KiB

Raw Blame History

This an unfinished draft. Comments are welcome!

TODO: We will need to do small adaptations to the verification spec to reflect the semantics in the LightStore (verified, trusted, untrusted, etc. not needed anymore). In more detail:

The state of the Lightstore needs to go. Functions like LatestVerified can keep the name but will ignore state as it will not exist anymore.
verification spec should be adapted to the second parameter of VerifyToTarget being a lightblock; new version number of function tag;
We should clarify what is the expectation of VerifyToTarget so if it returns TimeoutError it can be assumed faulty. I guess that VerifyToTarget with correct full node should never terminate with TimeoutError.
We need to introduce a new version number for the new specification. So we should decide how to handle that.

Light Client Attack Detector

In this specification, we strengthen the light client to be resistant against so-called light client attacks. In a light client attack, all the correct Tendermint full nodes agree on the sequence of generated blocks (no fork), but a set of faulty full nodes attack a light client by generating (signing) a block that deviates from the block of the same height on the blockchain. In order to do so, some of these faulty full nodes must have been validators before and violate [TMBC-FM-2THIRDS], as otherwise, if [TMBC-FM-2THIRDS] would hold, verification would satisfy [LCV-SEQ-SAFE.1].

An attack detector (or detector for short) is a mechanism that is used by the light client supervisor after verification of a new light block with the primary, to cross-check the newly learned light block with other peers (secondaries). It expects as input a light block with some height root (that serves as a root of trust), and a verification trace (a sequence of lightblocks) that the primary provided.

In case the detector observes a light client attack, it computes evidence data that can be used by Tendermint full nodes to isolate a set of faulty full nodes that are still within the unbonding period (more than 1/3 of the voting power of the validator set at some block of the chain), and report them via ABCI to the application of a Tendermint blockchain in order to punish faulty nodes.

Context of this document

The light client verification specification is designed for the Tendermint failure model (1/3 assumption) [TMBC-FM-2THIRDS]. It is safe under this assumption, and live if it can reliably (that is, no message loss, no duplication, and eventually delivered) and timely communicate with a correct full node. If [TMBC-FM-2THIRDS] assumption is violated, the light client can be fooled to trust a light block that was not generated by Tendermint consensus.

This specification, the attack detector, is a "second line of defense", in case the 1/3 assumption is violated. Its goal is to detect a light client attack (conflicting light blocks) and collect evidence. However, it is impractical to probe all full nodes. At this time we consider a simple scheme of maintaining an address book of known full nodes from which a small subset (e.g., 4) are chosen initially to communicate with. More involved book keeping with probabilistic guarantees can be considered at later stages of the project.

The light client maintains a simple address book containing addresses of full nodes that it can pick as primary and secondaries. To obtain a new light block, the light client first does verification with the primary, and then cross-checks the light block (and the trace of light blocks that led to it) with the secondaries using this specification.

Tendermint Consensus and Light Client Attacks

In this section we will give some mathematical definitions of what we mean by light client attacks (that are considered in this specification) and how they differ from main-chain forks. To this end we start by defining some properties of the sequence of blocks that is decided upon by Tendermint consensus in normal operation (if the Tendermint failure model holds [TMBC-FM-2THIRDS]), and then define different deviations that correspond to attack scenarios.

[TMBC-GENESIS.1]

Let Genesis be the agreed-upon initial block (file).

[TMBC-FUNC-SIGN.1]

Let b and c be two light blocks with b.Header.Height + 1 = c.Header.Height. We define the predicate signs(b,c) to hold iff c.Header.LastCommit is in PossibleCommit(b). [TMBC-SOUND-DISTR-POSS-COMMIT.1].

The above encodes sequential verification, that is, intuitively, b.Header.NextValidators = c.Header.Validators and 2/3 of these Validators signed c?

[TMBC-FUNC-SUPPORT.1]

Let b and c be two light blocks. We define the predicate supports(b,c,t) to hold iff

t - trustingPeriod < b.Header.Time < t
the voting power in b.NextValidators of nodes in c.Commit is more than 1/3 of TotalVotingPower(b.Header.NextValidators)

That is, if the Tendermint failure model holds, then c has been signed by at least one correct full node, cf. [TMBC-VAL-CONTAINS-CORR.1]. The following formalizes that b was properly generated by Tendermint; b can be traced back to genesis

[TMBC-SEQ-ROOTED.1]

Let b be a light block. We define sequ-rooted(b) iff for all i, 1 <= i < h = b.Header.Height, there exist light blocks a(i) s.t.

a(1) = Genesis and
a(h) = b and
signs( a(i) , a(i+1) ).

The following formalizes that c is trusted based on b in skipping verification. Observe that we do not require here (yet) that b was properly generated.

[TMBC-SKIP-TRACE.1]

Let b and c be light blocks. We define skip-trace(b,c,t) if at time t there exists an h and a sequence a(1), ... a(h) s.t.

a(1) = b and
a(h) = c and
supports( a(i), a(i+1), t), for all i, 1 <= i < h.

We call such a sequence a(1), ... a(h) a verification trace.

The following formalizes that two light blocks of the same height should agree on the content of the header. Observe that b and c may disagree on the Commit. This is a special case if the canonical commit has not been decided on, that is, if b.Header.Height is the maximum height of all blocks decided upon by Tendermint at this moment.

[TMBC-SIGN-SKIP-MATCH.1]

Let a, b, c, be light blocks and t a time, we define sign-skip-match(a,b,c,t) = true iff the following implication evaluates to true:

sequ-rooted(a) and
b.Header.Height = c.Header.Height and
skip-trace(a,b,t)
skip-trace(a,c,t)

implies b.Header = c.Header.

Observe that sign-skip-match is defined via an implication. If it evaluates to false this means that the left-hand-side of the implication evaluates to true, and the right-hand-side evaluates to false. In particular, there are two different headers b and c that both can be verified from a common block a from the chain. Thus, the following describes an attack.

[TMBC-ATTACK.1]

If there exists three light blocks a, b, and c, with sign-skip-match(a,b,c,t) = false then we have an attack. We say we have an attack at height b.Header.Height and write attack(a,b,c,t).

The lightblock a need not be unique, that is, there may be several blocks that satisfy the above requirement for the same blocks b and c.

[TMBC-ATTACK.1] is a formalization of the violation of the agreement property based on the result of consensus, that is, the generated blocks.

Remark. Violation of agreement is only possible if more than 1/3 of the validators (or next validators) of some previous block deviated from the protocol. The upcoming "accountability" specification will describe how to compute a set of at least 1/3 faulty nodes from two conflicting blocks. []

There are different ways to characterize forks and attack scenarios. This specification uses the "node-based characterization of attacks" which focuses on what kinds of nodes are affected (light nodes vs. full nodes). For future reference and discussion we also provide a "block-based characterization of attacks" below.

Node-based characterization of attacks

[TMBC-MC-FORK.1]

We say there is a (main chain) fork at time t if

there are two correct full nodes i and j and
i is different from j and
i has decided on b and
j has decided on c and
there exist a such that attack(a,b,c,t).

[TMBC-LC-ATTACK.1]

We say there is a light client attack at time t, if

there is no (main chain) fork [TMBC-MC-FORK.1], and
there exist nodes that have computed light blocks b and c and
there exist a such that attack(a,b,c,t).

We say the attack is at height a.Header.Height.

In this specification we consider detection of light client attacks. Intuitively, the case we consider is that light block b is the one from the blockchain, and some attacker has computed c and tries to wrongly convince the light client that c is the block from the chain.

[TMBC-LC-ATTACK-EVIDENCE.1]

We consider the following case of a light client attack [TMBC-LC-ATTACK.1]:

attack(a,b,c,t)
there is a peer p1 that has a sequence chain of blocks from a to b
skip-trace(a,c,t): by [TMBC-SKIP-TRACE.1] there is a verification trace v of the form a = v(1), ... v(h) = c

Evidence for p1 (that proves an attack) consists for index i of v(i) and v(i+1) such that

E1(i). v(i) is equal to the block of chain at height v(i).Height, and
E2(i). v(i+1) that is different from the block of chain at height v(i+1).height

Observe p1 can

check that v(i+1) differs from its block at that height, and

verify v(i+1) in one step from v(i) as v is a verification trace.

Proposition. In the case of attack, evidence exists.
Proof. First observe that

(A). (NOT E2(i)) implies E1(i+1)

Now by contradiction assume there is no evidence. Thus

for all i, we have NOT E1(i) or NOT E2(i)
for i = 1 we have E1(1) and thus NOT E2(1) thus by induction on i, by (A) we have for all i that E1(i)
from attack we have E2(h-1), and as there is no evidence for i = h - 1 we get NOT E1(h-1). Contradiction. QED.

[TMBC-LC-EVIDENCE-DATA.1]

To prove the attack to p1, because of Point E1, it is sufficient to submit

v(i).Height (rather than v(i)).
v(i+1)

This information is evidence for height v(i).Height.

Block-based characterization of attacks

In this section we provide a different characterization of attacks. It is not defined on the nodes that are affected but purely on the content of the blocks. In that sense these definitions are less operational.

They might be relevant for a closer analysis of fork scenarios on the chain, which is out of the scope of this specification.

[TMBC-SIGN-UNIQUE.1]

Let b and c be light blocks, we define the predicate sign-unique(b,c) to evaluate to true iff the following implication evaluates to true:

b.Header.Height = c.Header.Height and
sequ-rooted(b) and
sequ-rooted(c)

implies b = c.

[TMBC-BLOCKS-MCFORK.1]

If there exists two light blocks b and c, with sign-unique(b,c) = false then we have a fork.

The difference of the above definition to [TMBC-MC-FORK.1] is subtle. The latter requires a full node being affected by a bad block while [TMBC-BLOCKS-MCFORK.1] just requires that a bad block exists, possibly in memory of an attacker. The following captures a light client fork. There is no fork up to the height of block b. However, c is of that height, is different, and passes skipping verification. It is a stricter property than [TMBC-LC-ATTACK.1], as [TMBC-LC-ATTACK.1] requires that no correct full node is affected.

[TMBC-BLOCKS-LCFORK.1]

Let a, b, c, be light blocks and t a time. We define light-client-fork(a,b,c,t) iff

sign-skip-match(a,b,c,t) = false and
sequ-rooted(b) and
b is "unique", that is, for all d, sequ-rooted(d) and d.Header.Height = b.Header.Height implies d = b

Finally, let us also define bogus blocks that have no support. Observe that bogus is even defined if there is a fork. Also, for the definition it would be sufficient to restrict a to a.height < b.height (which is implied by the definitions which unfold until supports()).

[TMBC-BOGUS.1]

Let b be a light block and t a time. We define bogus(b,t) iff

sequ-rooted(b) = false and
for all a, sequ-rooted(a) implies skip-trace(a,b,t) = false

Informal Problem statement

There is no sequential specification: the detector only makes sense in a distributed systems where some nodes misbehave.

We work under the assumption that full nodes and validators are responsible for detecting attacks on the main chain, and the evidence reactor takes care of broadcasting evidence to communicate misbehaving nodes via ABCI to the application, and halt the chain in case of a fork. The point of this specification is to shield a light clients against attacks that cannot be detected by full nodes, and are fully addressed at light clients (and consequently IBC relayers, which use the light client protocols to observe the state of a blockchain). In order to provide full nodes the incentive to follow the protocols when communicating with the light client, this specification also considers the generation of evidence that will also be processed by the Tendermint blockchain.

[LCD-IP-MODEL.1]

The detector is designed under the assumption that

[TMBC-FM-2THIRDS] may be violated
there is no fork on the main chain.

As a result some faulty full nodes may launch an attack on a light client.

The following requirements are operational in that they describe how things should be done, rather than what should be done. However, they do not constitute temporal logic verification conditions. For those, see [LCD-DIST-*] below.

The detector is called in the supervisor as follows

Evidences := AttackDetector(root_of_trust, verifiedLS);`

where

root-of-trust is a light block that is trusted (that is, except upon initialization, the primary and the secondaries agreed on in the past), and
verifiedLS is a lightstore that contains a verification trace that starts from a lightblock that can be verified with the root-of-trust in one step and ends with a lightblock of the height requested by the user
Evidences is a list of evidences for misbehavior

[LCD-IP-STATEMENT.1]

Whenever AttackDetector is called, the detector should for each secondary try to replay the verification trace verifiedLS with the secondary

in case replaying leads to detection of a light client attack (one of the lightblocks differ from the one in verifiedLS with the same height), we should return evidence
if the secondary cannot provide a verification trace, we have no proof for an attack. Block b may be bogus. In this case the secondary is faulty and it should be replaced.

Assumptions/Incentives/Environment

It is not in the interest of faulty full nodes to talk to the detector as long as the detector is connected to at least one correct full node. This would only increase the likelihood of misbehavior being detected. Also we cannot punish them easily (cheaply). The absence of a response need not be the fault of the full node.

Correct full nodes have the incentive to respond, because the detector may help them to understand whether their header is a good one. We can thus base liveness arguments of the detector on the assumptions that correct full nodes reliably talk to the detector.

Assumptions

[LCD-A-CorrFull.1]

At all times there is at least one correct full node among the primary and the secondaries.

For this version of the detection we take this assumption. It allows us to establish the invariant that the lightblock root-of-trust is always the one from the blockchain, and we can use it as starting point for the evidence computation. Moreover, it allows us to establish the invariant at the supervisor that any lightblock in the (top-level) lightstore is from the blockchain.
In the future we might design a lightclient based on the assumption that at least in regular intervals the lightclient is connected to a correct full node. This will require the detector to reconsider root-of-trust, and remove lightblocks from the top-level lightstore.

[LCD-A-RelComm.1]

Communication between the detector and a correct full node is reliable and bounded in time. Reliable communication means that messages are not lost, not duplicated, and eventually delivered. There is a (known) end-to-end delay Delta, such that if a message is sent at time t then it is received and processed by time t + Delta. This implies that we need a timeout of at least 2 Delta for remote procedure calls to ensure that the response of a correct peer arrives before the timeout expires.

Definitions

Evidence

Following the definition of [TMBC-LC-ATTACK-EVIDENCE.1], by evidence we refer to a variable of the following type

[LC-DATA-EVIDENCE.1]

type LightClientAttackEvidence struct {
    ConflictingBlock   LightBlock
    CommonHeight       int64
}

As the above data is computed for a specific peer, the following data structure wraps the evidence and adds the peerID.

[LC-DATA-EVIDENCE-INT.1]

type InternalEvidence struct {
    Evidence           LightClientAttackEvidence
    Peer               PeerID
}

[LC-SUMBIT-EVIDENCE.1]

func submitEvidence(Evidences []InternalEvidence)

Expected postcondition
- for each ev in Evidences: submit ev.Evidence to ev.Peer

LightStore

Lightblocks and LightStores are defined in the verification specification [LCV-DATA-LIGHTBLOCK.1] and [LCV-DATA-LIGHTSTORE.1]. See the verification specification for details.

(Distributed) Problem statement

As the attack detector is there to reduce the impact of faulty nodes, and faulty nodes imply that there is a distributed system, there is no sequential specification to which this distributed problem statement may refer to.

The detector gets as input a trusted lightblock called root and an auxiliary lightstore called primary_trace with lightblocks that have been verified before, and that were provided by the primary.

[LCD-DIST-INV-ATTACK.1]

If the detector returns evidence for height h [TMBC-LC-EVIDENCE-DATA.1], then there is an attack at height h. [TMBC-LC-ATTACK.1]

[LCD-DIST-INV-STORE.1]

If the detector does not return evidence, then primary_trace contains only blocks from the blockchain.

[LCD-DIST-LIVE.1]

The detector eventually terminates.

[LCD-DIST-TERM-NORMAL.1]

the primary_trace contains only blocks from the blockchain, and
there is no attack, and
Secondaries is always non-empty, and
the age of root is always less than the trusting period,

then the detector does not return evidence.

[LCD-DIST-TERM-ATTACK.1]

there is an attack, and
a secondary reports a block that conflicts with one of the blocks in primary_trace, and
Secondaries is always non-empty, and
the age of root is always less than the trusting period,

then the detector returns evidence.

Observe that above we require that "a secondary reports a block that conflicts". If there is an attack, but no secondary tries to launch it against the detector (or the message from the secondary is lost by the network), then there is nothing to detect for us.

[LCD-DIST-SAFE-SECONDARY.1]

No correct secondary is ever replaced.

[LCD-DIST-SAFE-BOGUS.1]

a secondary reports a bogus lightblock,
the age of root is always less than the trusting period,

then the secondary is replaced before the detector terminates.

The above property is quite operational ("reports"), but it captures quite closely the requirement. As the detector only makes sense in a distributed setting, and does not have a sequential specification, less "pure" specification are acceptable.

Protocol

Functions and Data defined in other Specifications

From the supervisor

Replace_Secondary(addr Address, root-of-trust LightBlock)

From the verifier

func VerifyToTarget(primary PeerID, root LightBlock,
                    targetHeight Height) (LightStore, Result)

Note: the above differs from the current version in the second parameter. verification will be revised.

Observe that VerifyToTarget does communication with the secondaries via the function FetchLightBlock.

Shared data of the light client

a pool of full nodes FullNodes that have not been contacted before
peer set called Secondaries
primary

Note that the lightStore is not needed to be shared.

Outline

The problem laid out is solved by calling the function AttackDetector with a lightstore that contains a light block that has just been verified by the verifier.

Then AttackDetector downloads headers from the secondaries. In case a conflicting header is downloaded from a secondary, CreateEvidenceForPeer which computes evidence in the case that indeed an attack is confirmed. It could be that the secondary reports a bogus block, which means that there need not be an attack, and the secondary is replaced.

Details of the functions

[LCD-FUNC-DETECTOR.1]:

func AttackDetector(root LightBlock, primary_trace []LightBlock)
                   ([]InternalEvidence) {

    Evidences := new []InternalEvidence;

    for each secondary in Secondaries {
        // we replay the primary trace with the secondary, in
        // order to generate evidence that we can submit to the
        // secodary. We return the evidence + the trace the
        // secondary told us that spans the evidence at its local store

        EvidenceForSecondary, newroot, secondary_trace, result :=
                CreateEvidenceForPeer(secondary,
                          root,
           primary_trace);
        if result == FaultyPeer {
            Replace_Secondary(root);
        }
        else if result == FoundEvidence {
            // the conflict is not bogus
            Evidences.Add(EvidenceForSecondary);
            // we replay the secondary trace with the primary, ...
            EvidenceForPrimary, _, result :=
        CreateEvidenceForPeer(primary,
                                          newroot,
                                          secondary_trace);
            if result == FoundEvidence {
                Evidences.Add(EvidenceForPrimary);
            }
            // At this point we do not care about the other error
            // codes. We already have generated evidence for an
            // attack and need to stop the lightclient. It does not
            // help to call replace_primary. Also we will use the
            // same primary to check with other secondaries in
            // later iterations of the loop
         }
         // In the case where the secondary reports NoEvidence
         // we do nothing
    }
    return Evidences;
}

Expected precondition
- root and primary trace are a verification trace
Expected postcondition
- solves the problem statement (if attack found, then evidence is reported)
Error condition
- ErrorTrustExpired: fails if root expires (outside trusting period) [LCV-INV-TP.1]
- ErrorNoPeers: if no peers are left to replace secondaries, and no evidence was found before that happened

func CreateEvidenceForPeer(peer PeerID, root LightBlock, trace LightStore)
                          (Evidence, LightBlock, LightStore, result) {

    common := root;

    for i in 1 .. len(trace) {
        auxLS, result := VerifyToTarget(peer, common, trace[i].Header.Height)
  
        if result != ResultSuccess {
            // something went wrong; peer did not provide a verifyable block
            return (nil, nil, nil, FaultyPeer)
        }
        else {
            if auxLS.LatestVerified().Header != trace[i].Header {
                // the header reported by the peer differs from the
                // reference header in trace but both could be
                // verified from common in one step.
                // we can create evidence for submission to the secondary
                ev := new InternalEvidence;
                ev.Evidence.ConflictingBlock := trace[i];
                ev.Evidence.CommonHeight := common.Height;
                ev.Peer := peer
                return (ev, common, auxLS, FoundEvidence)
            }
            else {
                // the peer agrees with the trace, we move common forward
                // we could delete auxLS as it will be overwritten in
                // the next iteration
                common := trace[i]
            }
        }
    }
    return (nil, nil, nil, NoEvidence)
}

Expected precondition
- root and trace are a verification trace
Expected postcondition
- finds evidence where trace and peer diverge
Error condition
- ErrorTrustExpired: fails if root expires (outside trusting period) [LCV-INV-TP.1]
- If VerifyToTarget returns error but root is not expired then return FaultyPeer

Correctness arguments

Argument for [LCD-DIST-INV-ATTACK.1]

Under the assumption that root and trace are a verification trace, when in CreateEvidenceForPeer the detector the detector creates evidence, then the lightclient has seen two different headers (one via trace and one via VerifyToTarget for the same height that can both be verified in one step.

Argument for [LCD-DIST-INV-STORE.1]

We assume that there is at least one correct peer, and there is no fork. As a result the correct peer has the correct sequence of blocks. Since the primary_trace is checked block-by-block also against each secondary, and at no point evidence was generated that means at no point there were conflicting blocks.

Argument for [LCD-DIST-LIVE.1]

At the latest when [LCV-INV-TP.1] is violated, AttackDetector terminates.

Argument for [LCD-DIST-TERM-NORMAL.1]

As there are finitely many peers, eventually the main loop terminates. As there is no attack no evidence can be generated.

Argument for [LCD-DIST-TERM-ATTACK.1]

Argument similar to [LCD-DIST-TERM-NORMAL.1]

Argument for [LCD-DIST-SAFE-SECONDARY.1]

Secondaries are only replaced if they time-out or if they report bogus blocks. The former is ruled out by the timing assumption, the latter by correct peers only reporting blocks from the chain.

Argument for [LCD-DIST-SAFE-BOGUS.1]

Once a bogus block is recognized as such the secondary is removed.

References

links to other specifications/ADRs this document refers to

[verification] The specification of the light client verification.

[supervisor] The specification of the light client supervisor.

28 KiB Raw Blame History