Updated lite client spec (#3841)

* updates lite cline spec based on Zarko's and my previous specs * updates spec incorporating comments from Aug 8 meeting * added check of validators * Update docs/spec/consensus/light-client.md Co-Authored-By: Anca Zamfir <ancazamfir@users.noreply.github.com> * Update docs/spec/consensus/light-client.md Co-Authored-By: Anca Zamfir <ancazamfir@users.noreply.github.com> * Update docs/spec/consensus/light-client.md Co-Authored-By: Anca Zamfir <ancazamfir@users.noreply.github.com> * Update docs/spec/consensus/light-client.md Co-Authored-By: Anca Zamfir <ancazamfir@users.noreply.github.com> * fix typos & improve formatting Co-Authored-By: Anca Zamfir <ancazamfir@users.noreply.github.com> * Update docs/spec/consensus/light-client.md Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * incorporated some of Anca's and Anton's comments * addressed all current comments * highlight assumptions/verification conditions * added remark on checks * Update docs/spec/consensus/light-client.md Co-Authored-By: Ethan Buchman <ethan@coinculture.info> * Update docs/spec/consensus/light-client.md Co-Authored-By: Ethan Buchman <ethan@coinculture.info> * Update docs/spec/consensus/light-client.md Co-Authored-By: Ethan Buchman <ethan@coinculture.info> * fixes * addressed Ethan's comments from Aug 18 * trustlevel * description of 3 docs fixed * added comment on justification asked for by Anca * bla * removed trustlevel for adjacent headers. Added bisection order. * added comment in bisection * fixed comment in checksupport * added comment on bisection timeout
5 years ago · f46e43eb38
--- a/docs/spec/consensus/light-client.md
+++ b/docs/spec/consensus/light-client.md
@ -1,113 +1,329 @@
 # Light Client

 A light client is a process that connects to the Tendermint Full Node(s) and then tries to verify the Merkle proofs
 about the blockchain application. In this document we describe mechanisms that ensures that the Tendermint light client
 has the same level of security as Full Node processes (without being itself a Full Node).

 To be able to validate a Merkle proof, a light client needs to validate the blockchain header that contains the root app hash.
 Validating a blockchain header in Tendermint consists in verifying that the header is committed (signed) by >2/3 of the
 voting power of the corresponding validator set. As the validator set is a dynamic set (it is changing), one of the
 core functionality of the light client is updating the current validator set, that is then used to verify the
 blockchain header, and further the corresponding Merkle proofs.

 For the purpose of this light client specification, we assume that the Tendermint Full Node exposes the following functions over
 Tendermint RPC:

 ```golang
 Header(height int64) (SignedHeader, error) // returns signed header for the given height
 Validators(height int64) (ResultValidators, error) // returns validator set for the given height
 LastHeader(valSetNumber int64) (SignedHeader, error)   // returns last header signed by the validator set with the given validator set number

 type SignedHeader struct {
    Header        Header
    Commit        Commit
    ValSetNumber  int64
 }
 # Lite client

 type ResultValidators struct {
    BlockHeight   int64
    Validators    []Validator
    // time the current validator set is initialised, i.e, time of the last validator change before header BlockHeight
    ValSetTime    int64
 }
 A lite client is a process that connects to Tendermint full nodes and then tries to verify application data using the Merkle proofs.

 ## Context of this document

 In order to make sure that full nodes have the incentive to follow the protocol, we have to address the following three Issues

 1) The lite client needs a method to verify headers it obtains from full nodes according to trust assumptions -- this document.

 2) The lite client must be able to connect to one correct full node to detect and report on failures in the trust assumptions (i.e., conflicting headers) -- a future document.

 3) In the event the trust assumption fails (i.e., a lite client is fooled by a conflicting header), the Tendermint fork accountability protocol must account for the evidence -- see #3840

 ## Problem statement


 We assume that the lite client knows a (base) header *inithead* it trusts (by social consensus or because the lite client has decided to trust the header before). The goal is to check whether another header *newhead* can be trusted based on the data in *inithead*.

 The correctness of the protocol is based on the assumption that *inithead* was generated by an instance of Tendermint consensus. The term "trusting" above indicates that the correctness on the protocol depends on this assumption. It is in the responsibility of the user that runs the lite client to make sure that the risk of trusting a corrupted/forged *inithead* is negligible.


 ## Definitions

 ### Data structures

 In the following, only the details of the data structures needed for this specification are given.

  * header fields
    - *height*
    - *bfttime*: the chain time when the header (block) was generated
    - *V*: validator set containing validators for this block.
    - *NextV*: validator set for next block.
    - *commit*: evidence that block with height *height* - 1 was committed by a set of validators (canonical commit). We will use ```signers(commit)``` to refer to the set of validators that committed the block.

  * signed header fields: contains a header and a  *commit* for the current header; a "seen commit". In the Tendermint consensus the "canonical commit" is stored in header *height* + 1.

  * For each header *h* it has locally stored, the lite client stores whether
    it trusts *h*. We write *trust(h) = true*, if this is the case.

  * Validator fields. We will write a validator as a tuple *(v,p)* such that
    + *v* is the identifier (we assume identifiers are unique in each validator set)
    + *p* is its voting power


 ### Functions

 For the purpose of this lite client specification, we assume that the Tendermint Full Node exposes the following function over Tendermint RPC:
 ```go
    func Commit(height int64) (SignedHeader, error)
      // returns signed header: header (with the fields from
      // above) with Commit that include signatures of
      // validators that signed the header


    type SignedHeader struct {
      Header        Header
      Commit        Commit
    }
 ```

 We assume that Tendermint keeps track of the validator set changes and that each time a validator set is changed it is
 being assigned the next sequence number. We can call this number the validator set sequence number. Tendermint also remembers
 the Time from the header when the next validator set is initialised (starts to be in power), and we refer to this time
 as validator set init time.
 Furthermore, we assume that each validator set change is signed (committed) by the current validator set. More precisely,
 given a block `H` that contains transactions that are modifying the current validator set, the Merkle root hash of the next
 validator set (modified based on transactions from block H) will be in block `H+1` (and signed by the current validator
 set), and then starting from the block `H+2`, it will be signed by the next validator set.

 Note that the real Tendermint RPC API is slightly different (for example, response messages contain more data and function
 names are slightly different); we shortened (and modified) it for the purpose of this document to make the spec more
 clear and simple. Furthermore, note that in case of the third function, the returned header has `ValSetNumber` equals to
 `valSetNumber+1`.

 Locally, light client manages the following state:

 ```golang
 valSet        []Validator    // current validator set (last known and verified validator set)
 valSetNumber  int64          // sequence number of the current validator set
 valSetHash    []byte         // hash of the current validator set
 valSetTime    int64          // time when the current validator set is initialised
 ### Definitions

 * *tp*: trusting period   
 * for realtime *t*, the predicate *correct(v,t)* is true if the validator *v*
  follows the protocol until time *t* (we will see about recovery later).




 ### Tendermint Failure Model

 If a block *h* is generated at time *bfttime* (and this time is stored in the block), then a set of validators that hold more than 2/3 of the voting power in h.Header.NextV is correct until time h.Header.bfttime + tp.

 Formally,
 \[
 \sum_{(v,p) \in h.Header.NextV \wedge correct(v,h.Header.bfttime + tp)} p >
 2/3 \sum_{(v,p) \in h.Header.NextV} p
 \]

 *Assumption*: "correct" is defined w.r.t. realtime (some Newtonian global notion of time, i.e., wall time), while *bfttime* corresponds to the reading of the local clock of a validator (how this time is computed may change when the Tendermint consensus is modified). In this note, we assume that all clocks are synchronized to realtime. We can make this more precise eventually (incorporating clock drift, accuracy, precision, etc.). Right now, we consider this assumption sufficient, as clock synchronization (under NTP) is in the order of milliseconds and *tp* is in the order of weeks.  

 *Remark*: This failure model might change to a hybrid version that takes heights into account in the future.

 The specification in this document considers an implementation of the lite client under this assumption. Issues like *counter-factual signing* and *fork accountability* and *evidence submission* are mechanisms that justify this assumption by incentivizing validators to follow the protocol.
 If they don't, and we have more that 1/3 faults, safety may be violated. Our approach then is to *detect* these cases (after the fact), and take suitable repair actions (automatic and social). This is discussed in an upcoming document on "Fork accountability". (These safety violations include the lite client wrongly trusting a header, a fork in the blockchain, etc.)


 ## Lite Client Trusting Spec

 The lite client communicates with a full node and learns new headers. The goal is to locally decide whether to trust a header. Our implementation needs to ensure the following two properties:

 - Lite Client Completeness: If header *h* was correctly generated by an instance of Tendermint consensus (and its age is less than the trusting period), then the lite client should eventually set *trust(h)* to true.

 - Lite Client Accuracy:  If header *h* was *not generated* by an instance of Tendermint consensus, then the lite client should never set *trust(h)* to true.

 *Remark*: If in the course of the computation, the lite client obtains certainty that some headers were forged by adversaries (that is were not generated by an instance of Tendermint consensus), it may submit (a subset of) the headers it has seen as evidence of misbehavior.

 *Remark*: In Completeness we use "eventually", while in practice *trust(h)* should be set to true before *h.Header.bfttime + tp*. If not, the block cannot be trusted because it is too old.

 *Remark*: If a header *h* is marked with *trust(h)*, but it is too old (its bfttime is more than *tp* ago), then the lite client should set *trust(h)* to false again.

 *Assumption*: Initially, the lite client has a header *inithead* that it trusts correctly, that is, *inithead* was correctly generated by the Tendermint consensus.

 To reason about the correctness, we may prove the following invariant.

 *Verification Condition: Lite Client Invariant.*
 For each lite client *l* and each header *h*:
 if *l* has set *trust(h) = true*,
  then validators that are correct until time *h.Header.bfttime + tp* have more than two thirds of the voting power in *h.Header.NextV*.

  Formally,
  \[
  \sum_{(v,p) \in h.Header.NextV \wedge correct(v,h.Header.bfttime + tp)} p >
  2/3 \sum_{(v,p) \in h.Header.NextV} p
  \]

 *Remark.* To prove the invariant, we will have to prove that the lite client only trusts headers that were correctly generated by Tendermint consensus, then the formula above follows from the Tendermint failure model.


 ## High Level Solution

 Upon initialization, the lite client is given a header *inithead* it trusts (by
 social consensus). It is assumed that *inithead* satisfies the lite client invariant. (If *inithead* has been correctly generated by Tendermint consensus, the invariant follows from the Tendermint Failure Model.)

 When a lite clients sees a signed new header *snh*, it has to decide whether to trust the new
 header. Trust can be obtained by (possibly) the combination of three methods.

 1. **Uninterrupted sequence of proof.** If a block is appended to the chain, where the last block
 is trusted (and properly committed by the old validator set in the next block),
 and the new block contains a new validator set, the new block is trusted if the lite client knows all headers in the prefix.
 Intuitively, a trusted validator set is assumed to only chose a new validator set that will obey the Tendermint Failure Model.

 2. **Trusting period.** Based on a trusted block *h*, and the lite client
 invariant, which ensures the fault assumption during the trusting period, we can check whether at least one  validator, that has been continuously correct from *h.Header.bfttime* until now, has signed *snh*.
 If this is the case, similarly to above, the chosen validator set in *snh* does not violate the Tendermint Failure Model.

 3. **Bisection.** If a check according to the trusting period fails, the lite client can try to obtain a header *hp* whose height lies between *h* and *snh* in order to check whether *h* can be used to get trust for *hp*, and *hp* can be used to get trust for *snh*. If this is the case we can trust *snh*; if not, we may continue recursively.

 ## How to use it

 We consider the following use case:
 the lite client wants to verify a header for some given height *k*. Thus:
  - it requests the signed header for height *k* from a full node
  - it tries to verify this header with the methods described here.

 This can be used in several settings:
  - someone tells the lite client that application data that is relevant for it can be read in the block of height *k*.
  - the lite clients wants the latest state. It asks a full nude for the current height, and uses the response for *k*.


 ## Details

 *Assumptions*

 1. *tp < unbonding period*.
 2. *snh.Header.bfttime < now*
 3. *snh.Header.bfttime < h.Header.bfttime+tp*
 4. *trust(h)=true*


 **Observation 1.** If *h.Header.bfttime + tp > now*, we trust the old
 validator set *h.Header.NextV*.

 When we say we trust *h.Header.NextV* we do *not* trust that each individual validator in *h.Header.NextV* is correct, but we only trust the fact that at most 1/3 of them are faulty (more precisely, the faulty ones have at most 1/3 of the total voting power).



 ### Functions

 The function *Bisection* checks whether to trust header *h2* based on the trusted header *h1*. It does so by calling
 the function *CheckSupport* in the process of
 bisection/recursion. *CheckSupport* implements the trusted period method and, for two adjacent headers (in term of heights), it checks uninterrupted sequence of proof.

 *Assumption*: In the following, we assume that *h2.Header.height > h1.Header.height*. We will quickly discuss the other case in the next section.

 We consider the following set-up:
 - the lite client communicates with one full node
 - the lite client locally stores all the signed headers it obtained (trusted or not). In the pseudo code below we write *Store(header)* for this.
 - If *Bisection* returns *false*, then the lite client has seen a  forged header.
  * However, it does not know which header(s) is/are the problematic one(s).
  * In this case, the lite client can submit (some of) the headers it has seen as evidence. As the lite client communicates with one full node only when executing Bisection, there are two cases
    - the full node is faulty
    - the full node is correct and there was a fork in Tendermint consensus. Header *h1* is from a different branch than the one taken by the full node. This case is not focus of this document, but will be treated in the document on fork accountability.

 - the lite client must retry to retrieve correct headers from another full node
  * it picks a new full node
  * it restarts *Bisection*
  * there might be optimizations; a lite client may not need to call *Commit(k)*, for a height *k* for which it already has a signed header it trusts.
  * how to make sure that a lite client can communicate with a correct full node will be the focus of a separate document (recall Issue 3 from "Context of this document").

 **Auxiliary Functions.** We will use the  function ```votingpower_in(V1,V2)``` to compute the voting power the validators in set V1 have according to their voting power in set V2;
 we will write ```totalVotingPower(V)``` for ```votingpower_in(V,V)```, which returns the total voting power in V.
 We further use the function ```signers(Commit)``` that returns the set of validators that signed the Commit.

 **CheckSupport.** The following function checks whether we can trust the header h2 based on header h1 following the trusting period method.

 ```go
  func CheckSupport(h1,h2,trustlevel) bool {
    if h1.Header.bfttime + tp < now { // Observation 1
      return false // old header was once trusted but it is expired
    }  
    vp_all := totalVotingPower(h1.Header.NextV)
      // total sum of voting power of validators in h2

    if h2.Header.height == h1.Header.height + 1 {
      // specific check for adjacent headers; everything must be
      // properly signed.
      // also check that h2.Header.V == h1.Header.NextV
      // Plus the following check that 2/3 of the voting power
      // in h1 signed h2
      return (votingpower_in(signers(h2.Commit),h1.Header.NextV) >
              2/3 * vp_all)
        // signing validators are more than two third in h1.
    }

    return (votingpower_in(signers(h2.Commit),h1.Header.NextV) >
            max(1/3,trustlevel) * vp_all)
      // get validators in h1 that signed h2
      // sum of voting powers in h1 of
      // validators that signed h2
      // is more than a third in h1
  }
 ```

 The light client is initialised with the trusted validator set, for example based on the known validator set hash,
 validator set sequence number and the validator set init time.
 The core of the light client logic is captured by the VerifyAndUpdate function that is used to 1) verify if the given header is valid,
 and 2) update the validator set (when the given header is valid and it is more recent than the seen headers).
  *Remark*: Basic header verification must be done for *h2*. Similar checks are done in:  
  https://github.com/tendermint/tendermint/blob/master/types/validator_set.go#L591-L633

  *Remark*: There are some sanity checks which are not in the code:
  *h2.Header.height > h1.Header.height* and *h2.Header.bfttime > h1.Header.bfttime* and *h2.Header.bfttime < now*.

  *Remark*: ```return (votingpower_in(signers(h2.Commit),h1.Header.NextV) > max(1/3,trustlevel) * vp_all)``` may return false even if *h2* was properly generated by Tendermint consensus in the case of big changes in the validator sets. However, the check ```return (votingpower_in(signers(h2.Commit),h1.Header.NextV) >
          2/3 * vp_all)``` must return true if *h1* and *h2* were generated by Tendermint consensus.

 *Remark*: The 1/3 check differs from a previously proposed method that was based on intersecting validator sets and checking that the new validator set contains "enough" correct validators. We found that the old check is not suited for realistic changes in the validator sets. The new method is not only based on cardinalities, but also exploits that we can trust what is signed by a correct validator (i.e., signed by more than 1/3 of the voting power).

 ```golang
 VerifyAndUpdate(signedHeader SignedHeader):
  assertThat signedHeader.valSetNumber >= valSetNumber
  if isValid(signedHeader) and signedHeader.Header.Time <= valSetTime + UNBONDING_PERIOD then
    setValidatorSet(signedHeader)
 *Correctness arguments*

 Towards Lite Client Accuracy:
 - Assume by contradiction that *h2* was not generated correctly and the lite client sets trust to true because *CheckSupport* returns true.
 - h1 is trusted and sufficiently new
 - by Tendermint Fault Model, less than 1/3 of voting power held by faulty validators => at least one correct validator *v* has signed *h2*.
 - as *v* is correct up to now, it followed the Tendermint consensus protocol at least up to signing *h2* => *h2* was correctly generated, we arrive at the required contradiction.


 Towards Lite Client Completeness:
 - The check is successful if sufficiently many validators of *h1* are still validators in *h2* and signed *h2*.
 - If *h2.Header.height = h1.Header.height + 1*, and both headers were generated correctly, the test passes

 *Verification Condition:* We may need a Tendermint invariant stating that if *h2.Header.height = h1.Header.height + 1* then *signers(h2.Commit) \subseteq h1.Header.NextV*.

 *Remark*: The variable *trustlevel* can be used if the user believes that relying on one correct validator is not sufficient. However, in case of (frequent) changes in the validator set, the higher the *trustlevel* is chosen, the more unlikely it becomes that CheckSupport returns true for non-adjacent headers.

 **Bisection.** The following function uses CheckSupport in a recursion to find intermediate headers that allow to establish a sequence of trust.




 ```go
 func Bisection(h1,h2,trustlevel) bool{
  if CheckSupport(h1,h2,trustlevel) {
    return true
  else
    updateValidatorSet(signedHeader.ValSetNumber)
    return VerifyAndUpdate(signedHeader)

 isValid(signedHeader SignedHeader):
  valSetOfTheHeader = Validators(signedHeader.Header.Height)
  assertThat Hash(valSetOfTheHeader) == signedHeader.Header.ValSetHash
  assertThat signedHeader is passing basic validation
  if votingPower(signedHeader.Commit) > 2/3 * votingPower(valSetOfTheHeader) then return true
  else
  }
  if h2.Header.height == h1.Header.height + 1 {
    // we have adjacent headers that are not matching (failed
    // the CheckSupport)
    // we could submit evidence here
    return false
  }
  pivot := (h1.Header.height + h2.Header.height) / 2
  hp := Commit(pivot)
    // ask a full node for header of height pivot
  Store(hp)  
    // store header hp locally
  if Bisection(h1,hp,trustlevel) {
    // only check right branch if hp is trusted
    // (otherwise a lot of unnecessary computation may be done)
    return Bisection(hp,h2,trustlevel)
  }
  else {
    return false
  }
 }
 ```  

 setValidatorSet(signedHeader SignedHeader):
  nextValSet = Validators(signedHeader.Header.Height)
  assertThat Hash(nextValSet) == signedHeader.Header.ValidatorsHash
  valSet = nextValSet.Validators
  valSetHash = signedHeader.Header.ValidatorsHash
  valSetNumber = signedHeader.ValSetNumber
  valSetTime = nextValSet.ValSetTime

 votingPower(commit Commit):
  votingPower = 0
  for each precommit in commit.Precommits do:
    if precommit.ValidatorAddress is in valSet and signature of the precommit verifies then
      votingPower += valSet[precommit.ValidatorAddress].VotingPower
  return votingPower

 votingPower(validatorSet []Validator):
  for each validator in validatorSet do:
    votingPower += validator.VotingPower
  return votingPower

 updateValidatorSet(valSetNumberOfTheHeader):
  while valSetNumber != valSetNumberOfTheHeader do
    signedHeader = LastHeader(valSetNumber)
    if isValid(signedHeader) then
      setValidatorSet(signedHeader)
    else return error
  return
 ```

 Note that in the logic above we assume that the light client will always go upward with respect to header verifications,
 i.e., that it will always be used to verify more recent headers. In case a light client needs to be used to verify older
 headers (go backward) the same mechanisms and similar logic can be used. In case a call to the FullNode or subsequent
 checks fail, a light client need to implement some recovery strategy, for example connecting to other FullNode.


 *Correctness arguments (sketch)*

 Lite Client Accuracy:
 - Assume by contradiction that *h2* was not generated correctly and the lite client sets trust to true because Bisection returns true.
 - Bisection returns true only if all calls to CheckSupport in the recursion return true.
 - Thus we have a sequence of headers that all satisfied the CheckSupport
 - again a contradiction

 Lite Client Completeness:

 This is only ensured if upon *Commit(pivot)* the lite client is always provided with a correctly generated header.

 *Stalling*

 With Bisection, a faulty full node could stall a lite client by creating a long sequence of headers that are queried one-by-one by the lite client and look OK, before the lite client eventually detects a problem. There are several ways to address this:
 * Each call to ```Commit``` could be issued to a different full node
 * Instead of querying header by header, the lite client tells a full node which header it trusts, and the height of the header it needs. The full node responds with the header along with a proof consisting of intermediate headers that the light client can use to verify. Roughly, Bisection would then be executed at the full node.
 * We may set a timeout how long bisection may take.


 ### The case *h2.Header.height < h1.Header.height*

 In the use case where someone tells the lite client that application data that is relevant for it can be read in the block of height *k* and the lite client trusts a more recent header, we can use the hashes to verify headers "down the chain." That is, we iterate down the heights and check the hashes in each step.

 *Remark.* For the case were the lite client trusts two headers *i* and *j* with *i < k < j*, we should discuss/experiment whether the forward or the backward method is more effective.

 ```go
 func Backwards(h1,h2) bool {
  assert (h2.Header.height < h1.Header.height)
  old := h1
  for i := h1.Header.height - 1; i > h2.Header.height; i-- {
    new := Commit(i)
    Store(new)
    if (hash(new) != old.Header.hash) {
      return false
    }
    old := new
  }
  return (hash(h2) == old.Header.hash)
 }
 ```