You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

298 lines
9.5 KiB

  1. # Encoding
  2. ## Protocol Buffers
  3. Tendermint uses [Protocol Buffers](https://developers.google.com/protocol-buffers), specifically proto3, for all data structures.
  4. Please see the [Proto3 language guide](https://developers.google.com/protocol-buffers/docs/proto3) for more details.
  5. ## Byte Arrays
  6. The encoding of a byte array is simply the raw-bytes prefixed with the length of
  7. the array as a `UVarint` (what proto calls a `Varint`).
  8. For details on varints, see the [protobuf
  9. spec](https://developers.google.com/protocol-buffers/docs/encoding#varints).
  10. For example, the byte-array `[0xA, 0xB]` would be encoded as `0x020A0B`,
  11. while a byte-array containing 300 entires beginning with `[0xA, 0xB, ...]` would
  12. be encoded as `0xAC020A0B...` where `0xAC02` is the UVarint encoding of 300.
  13. ## Hashing
  14. Tendermint uses `SHA256` as its hash function.
  15. Objects are always Amino encoded before being hashed.
  16. So `SHA256(obj)` is short for `SHA256(ProtoEncoding(obj))`.
  17. ## Public Key Cryptography
  18. Tendermint uses Protobuf [Oneof](https://developers.google.com/protocol-buffers/docs/proto3#oneof)
  19. to distinguish between different types public keys, and signatures.
  20. Additionally, for each public key, Tendermint
  21. defines an Address function that can be used as a more compact identifier in
  22. place of the public key. Here we list the concrete types, their names,
  23. and prefix bytes for public keys and signatures, as well as the address schemes
  24. for each PubKey. Note for brevity we don't
  25. include details of the private keys beyond their type and name.
  26. ### Key Types
  27. Each type specifies it's own pubkey, address, and signature format.
  28. #### Ed25519
  29. TODO: pubkey
  30. The address is the first 20-bytes of the SHA256 hash of the raw 32-byte public key:
  31. ```go
  32. address = SHA256(pubkey)[:20]
  33. ```
  34. The signature is the raw 64-byte ED25519 signature.
  35. ## Other Common Types
  36. ### BitArray
  37. The BitArray is used in some consensus messages to represent votes received from
  38. validators, or parts received in a block. It is represented
  39. with a struct containing the number of bits (`Bits`) and the bit-array itself
  40. encoded in base64 (`Elems`).
  41. ```go
  42. type BitArray struct {
  43. Bits int64
  44. Elems []uint64
  45. }
  46. ```
  47. This type is easily encoded directly by Amino.
  48. Note BitArray receives a special JSON encoding in the form of `x` and `_`
  49. representing `1` and `0`. Ie. the BitArray `10110` would be JSON encoded as
  50. `"x_xx_"`
  51. ### Part
  52. Part is used to break up blocks into pieces that can be gossiped in parallel
  53. and securely verified using a Merkle tree of the parts.
  54. Part contains the index of the part (`Index`), the actual
  55. underlying data of the part (`Bytes`), and a Merkle proof that the part is contained in
  56. the set (`Proof`).
  57. ```go
  58. type Part struct {
  59. Index uint32
  60. Bytes []byte
  61. Proof SimpleProof
  62. }
  63. ```
  64. See details of SimpleProof, below.
  65. ### MakeParts
  66. Encode an object using Protobuf and slice it into parts.
  67. Tendermint uses a part size of 65536 bytes, and allows a maximum of 1601 parts
  68. (see `types.MaxBlockPartsCount`). This corresponds to the hard-coded block size
  69. limit of 100MB.
  70. ```go
  71. func MakeParts(block Block) []Part
  72. ```
  73. ## Merkle Trees
  74. For an overview of Merkle trees, see
  75. [wikipedia](https://en.wikipedia.org/wiki/Merkle_tree)
  76. We use the RFC 6962 specification of a merkle tree, with sha256 as the hash function.
  77. Merkle trees are used throughout Tendermint to compute a cryptographic digest of a data structure.
  78. The differences between RFC 6962 and the simplest form a merkle tree are that:
  79. 1. leaf nodes and inner nodes have different hashes.
  80. This is for "second pre-image resistance", to prevent the proof to an inner node being valid as the proof of a leaf.
  81. The leaf nodes are `SHA256(0x00 || leaf_data)`, and inner nodes are `SHA256(0x01 || left_hash || right_hash)`.
  82. 2. When the number of items isn't a power of two, the left half of the tree is as big as it could be.
  83. (The largest power of two less than the number of items) This allows new leaves to be added with less
  84. recomputation. For example:
  85. ```md
  86. Simple Tree with 6 items Simple Tree with 7 items
  87. * *
  88. / \ / \
  89. / \ / \
  90. / \ / \
  91. / \ / \
  92. * * * *
  93. / \ / \ / \ / \
  94. / \ / \ / \ / \
  95. / \ / \ / \ / \
  96. * * h4 h5 * * * h6
  97. / \ / \ / \ / \ / \
  98. h0 h1 h2 h3 h0 h1 h2 h3 h4 h5
  99. ```
  100. ### MerkleRoot
  101. The function `MerkleRoot` is a simple recursive function defined as follows:
  102. ```go
  103. // SHA256([]byte{})
  104. func emptyHash() []byte {
  105. return tmhash.Sum([]byte{})
  106. }
  107. // SHA256(0x00 || leaf)
  108. func leafHash(leaf []byte) []byte {
  109. return tmhash.Sum(append(0x00, leaf...))
  110. }
  111. // SHA256(0x01 || left || right)
  112. func innerHash(left []byte, right []byte) []byte {
  113. return tmhash.Sum(append(0x01, append(left, right...)...))
  114. }
  115. // largest power of 2 less than k
  116. func getSplitPoint(k int) { ... }
  117. func MerkleRoot(items [][]byte) []byte{
  118. switch len(items) {
  119. case 0:
  120. return empthHash()
  121. case 1:
  122. return leafHash(items[0])
  123. default:
  124. k := getSplitPoint(len(items))
  125. left := MerkleRoot(items[:k])
  126. right := MerkleRoot(items[k:])
  127. return innerHash(left, right)
  128. }
  129. }
  130. ```
  131. Note: `MerkleRoot` operates on items which are arbitrary byte arrays, not
  132. necessarily hashes. For items which need to be hashed first, we introduce the
  133. `Hashes` function:
  134. ```go
  135. func Hashes(items [][]byte) [][]byte {
  136. return SHA256 of each item
  137. }
  138. ```
  139. Note: we will abuse notion and invoke `MerkleRoot` with arguments of type `struct` or type `[]struct`.
  140. For `struct` arguments, we compute a `[][]byte` containing the protobuf encoding of each
  141. field in the struct, in the same order the fields appear in the struct.
  142. For `[]struct` arguments, we compute a `[][]byte` by protobuf encoding the individual `struct` elements.
  143. ### Merkle Proof
  144. Proof that a leaf is in a Merkle tree is composed as follows:
  145. ```golang
  146. type Proof struct {
  147. Total int
  148. Index int
  149. LeafHash []byte
  150. Aunts [][]byte
  151. }
  152. ```
  153. Which is verified as follows:
  154. ```golang
  155. func (proof Proof) Verify(rootHash []byte, leaf []byte) bool {
  156. assert(proof.LeafHash, leafHash(leaf)
  157. computedHash := computeHashFromAunts(proof.Index, proof.Total, proof.LeafHash, proof.Aunts)
  158. return computedHash == rootHash
  159. }
  160. func computeHashFromAunts(index, total int, leafHash []byte, innerHashes [][]byte) []byte{
  161. assert(index < total && index >= 0 && total > 0)
  162. if total == 1{
  163. assert(len(proof.Aunts) == 0)
  164. return leafHash
  165. }
  166. assert(len(innerHashes) > 0)
  167. numLeft := getSplitPoint(total) // largest power of 2 less than total
  168. if index < numLeft {
  169. leftHash := computeHashFromAunts(index, numLeft, leafHash, innerHashes[:len(innerHashes)-1])
  170. assert(leftHash != nil)
  171. return innerHash(leftHash, innerHashes[len(innerHashes)-1])
  172. }
  173. rightHash := computeHashFromAunts(index-numLeft, total-numLeft, leafHash, innerHashes[:len(innerHashes)-1])
  174. assert(rightHash != nil)
  175. return innerHash(innerHashes[len(innerHashes)-1], rightHash)
  176. }
  177. ```
  178. The number of aunts is limited to 100 (`MaxAunts`) to protect the node against DOS attacks.
  179. This limits the tree size to 2^100 leaves, which should be sufficient for any
  180. conceivable purpose.
  181. ### IAVL+ Tree
  182. Because Tendermint only uses a Simple Merkle Tree, application developers are expect to use their own Merkle tree in their applications. For example, the IAVL+ Tree - an immutable self-balancing binary tree for persisting application state is used by the [Cosmos SDK](https://github.com/cosmos/cosmos-sdk/blob/ae77f0080a724b159233bd9b289b2e91c0de21b5/docs/interfaces/lite/specification.md)
  183. ## JSON
  184. Tendermint has its own JSON encoding in order to keep backwards compatibility with the prvious RPC layer.
  185. Registered types are encoded as:
  186. ```json
  187. {
  188. "type": "<type name>",
  189. "value": <JSON>
  190. }
  191. ```
  192. For instance, an ED25519 PubKey would look like:
  193. ```json
  194. {
  195. "type": "tendermint/PubKeyEd25519",
  196. "value": "uZ4h63OFWuQ36ZZ4Bd6NF+/w9fWUwrOncrQsackrsTk="
  197. }
  198. ```
  199. Where the `"value"` is the base64 encoding of the raw pubkey bytes, and the
  200. `"type"` is the type name for Ed25519 pubkeys.
  201. ### Signed Messages
  202. Signed messages (eg. votes, proposals) in the consensus are encoded using protobuf.
  203. When signing, the elements of a message are re-ordered so the fixed-length fields
  204. are first, making it easy to quickly check the type, height, and round.
  205. The `ChainID` is also appended to the end.
  206. We call this encoding the SignBytes. For instance, SignBytes for a vote is the protobuf encoding of the following struct:
  207. ```protobuf
  208. message CanonicalVote {
  209. SignedMsgType type = 1;
  210. sfixed64 height = 2; // canonicalization requires fixed size encoding here
  211. sfixed64 round = 3; // canonicalization requires fixed size encoding here
  212. CanonicalBlockID block_id = 4;
  213. google.protobuf.Timestamp timestamp = 5;
  214. string chain_id = 6;
  215. }
  216. ```
  217. The field ordering and the fixed sized encoding for the first three fields is optimized to ease parsing of SignBytes
  218. in HSMs. It creates fixed offsets for relevant fields that need to be read in this context.
  219. > Note: All canonical messages are length prefixed.
  220. For more details, see the [signing spec](../consensus/signing.md).
  221. Also, see the motivating discussion in
  222. [#1622](https://github.com/tendermint/tendermint/issues/1622).