You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

206 lines
5.9 KiB

7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
7 years ago
  1. # Tendermint Encoding
  2. ## Binary Serialization (TMBIN)
  3. Tendermint aims to encode data structures in a manner similar to how the corresponding Go structs
  4. are laid out in memory.
  5. Variable length items are length-prefixed.
  6. While the encoding was inspired by Go, it is easily implemented in other languages as well, given its intuitive design.
  7. XXX: This is changing to use real varints and 4-byte-prefixes.
  8. See https://github.com/tendermint/go-wire/tree/sdk2.
  9. ### Fixed Length Integers
  10. Fixed length integers are encoded in Big-Endian using the specified number of bytes.
  11. So `uint8` and `int8` use one byte, `uint16` and `int16` use two bytes,
  12. `uint32` and `int32` use 3 bytes, and `uint64` and `int64` use 4 bytes.
  13. Negative integers are encoded via twos-complement.
  14. Examples:
  15. ```go
  16. encode(uint8(6)) == [0x06]
  17. encode(uint32(6)) == [0x00, 0x00, 0x00, 0x06]
  18. encode(int8(-6)) == [0xFA]
  19. encode(int32(-6)) == [0xFF, 0xFF, 0xFF, 0xFA]
  20. ```
  21. ### Variable Length Integers
  22. Variable length integers are encoded as length-prefixed Big-Endian integers.
  23. The length-prefix consists of a single byte and corresponds to the length of the encoded integer.
  24. Negative integers are encoded by flipping the leading bit of the length-prefix to a `1`.
  25. Zero is encoded as `0x00`. It is not length-prefixed.
  26. Examples:
  27. ```go
  28. encode(uint(6)) == [0x01, 0x06]
  29. encode(uint(70000)) == [0x03, 0x01, 0x11, 0x70]
  30. encode(int(-6)) == [0xF1, 0x06]
  31. encode(int(-70000)) == [0xF3, 0x01, 0x11, 0x70]
  32. encode(int(0)) == [0x00]
  33. ```
  34. ### Strings
  35. An encoded string is length-prefixed followed by the underlying bytes of the string.
  36. The length-prefix is itself encoded as an `int`.
  37. The empty string is encoded as `0x00`. It is not length-prefixed.
  38. Examples:
  39. ```go
  40. encode("") == [0x00]
  41. encode("a") == [0x01, 0x01, 0x61]
  42. encode("hello") == [0x01, 0x05, 0x68, 0x65, 0x6C, 0x6C, 0x6F]
  43. encode("¥") == [0x01, 0x02, 0xC2, 0xA5]
  44. ```
  45. ### Arrays (fixed length)
  46. An encoded fix-lengthed array is the concatenation of the encoding of its elements.
  47. There is no length-prefix.
  48. Examples:
  49. ```go
  50. encode([4]int8{1, 2, 3, 4}) == [0x01, 0x02, 0x03, 0x04]
  51. encode([4]int16{1, 2, 3, 4}) == [0x00, 0x01, 0x00, 0x02, 0x00, 0x03, 0x00, 0x04]
  52. encode([4]int{1, 2, 3, 4}) == [0x01, 0x01, 0x01, 0x02, 0x01, 0x03, 0x01, 0x04]
  53. encode([2]string{"abc", "efg"}) == [0x01, 0x03, 0x61, 0x62, 0x63, 0x01, 0x03, 0x65, 0x66, 0x67]
  54. ```
  55. ### Slices (variable length)
  56. An encoded variable-length array is length-prefixed followed by the concatenation of the encoding of
  57. its elements.
  58. The length-prefix is itself encoded as an `int`.
  59. An empty slice is encoded as `0x00`. It is not length-prefixed.
  60. Examples:
  61. ```go
  62. encode([]int8{}) == [0x00]
  63. encode([]int8{1, 2, 3, 4}) == [0x01, 0x04, 0x01, 0x02, 0x03, 0x04]
  64. encode([]int16{1, 2, 3, 4}) == [0x01, 0x04, 0x00, 0x01, 0x00, 0x02, 0x00, 0x03, 0x00, 0x04]
  65. encode([]int{1, 2, 3, 4}) == [0x01, 0x04, 0x01, 0x01, 0x01, 0x02, 0x01, 0x03, 0x01, 0x4]
  66. encode([]string{"abc", "efg"}) == [0x01, 0x02, 0x01, 0x03, 0x61, 0x62, 0x63, 0x01, 0x03, 0x65, 0x66, 0x67]
  67. ```
  68. ### BitArray
  69. BitArray is encoded as an `int` of the number of bits, and with an array of `uint64` to encode
  70. value of each array element.
  71. ```go
  72. type BitArray struct {
  73. Bits int
  74. Elems []uint64
  75. }
  76. ```
  77. ### Time
  78. Time is encoded as an `int64` of the number of nanoseconds since January 1, 1970,
  79. rounded to the nearest millisecond.
  80. Times before then are invalid.
  81. Examples:
  82. ```go
  83. encode(time.Time("Jan 1 00:00:00 UTC 1970")) == [0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00]
  84. encode(time.Time("Jan 1 00:00:01 UTC 1970")) == [0x00, 0x00, 0x00, 0x00, 0x3B, 0x9A, 0xCA, 0x00] // 1,000,000,000 ns
  85. encode(time.Time("Mon Jan 2 15:04:05 -0700 MST 2006")) == [0x0F, 0xC4, 0xBB, 0xC1, 0x53, 0x03, 0x12, 0x00]
  86. ```
  87. ### Structs
  88. An encoded struct is the concatenation of the encoding of its elements.
  89. There is no length-prefix.
  90. Examples:
  91. ```go
  92. type MyStruct struct{
  93. A int
  94. B string
  95. C time.Time
  96. }
  97. encode(MyStruct{4, "hello", time.Time("Mon Jan 2 15:04:05 -0700 MST 2006")}) ==
  98. [0x01, 0x04, 0x01, 0x05, 0x68, 0x65, 0x6C, 0x6C, 0x6F, 0x0F, 0xC4, 0xBB, 0xC1, 0x53, 0x03, 0x12, 0x00]
  99. ```
  100. ## Merkle Trees
  101. Simple Merkle trees are used in numerous places in Tendermint to compute a cryptographic digest of a data structure.
  102. RIPEMD160 is always used as the hashing function.
  103. The function `SimpleMerkleRoot` is a simple recursive function defined as follows:
  104. ```go
  105. func SimpleMerkleRoot(hashes [][]byte) []byte{
  106. switch len(hashes) {
  107. case 0:
  108. return nil
  109. case 1:
  110. return hashes[0]
  111. default:
  112. left := SimpleMerkleRoot(hashes[:(len(hashes)+1)/2])
  113. right := SimpleMerkleRoot(hashes[(len(hashes)+1)/2:])
  114. return RIPEMD160(append(left, right))
  115. }
  116. }
  117. ```
  118. Note: we abuse notion and call `SimpleMerkleRoot` with arguments of type `struct` or type `[]struct`.
  119. For `struct` arguments, we compute a `[][]byte` by sorting elements of the `struct` according to
  120. field name and then hashing them.
  121. For `[]struct` arguments, we compute a `[][]byte` by hashing the individual `struct` elements.
  122. ## JSON (TMJSON)
  123. Signed messages (eg. votes, proposals) in the consensus are encoded in TMJSON, rather than TMBIN.
  124. TMJSON is JSON where `[]byte` are encoded as uppercase hex, rather than base64.
  125. When signing, the elements of a message are sorted by key and the sorted message is embedded in an
  126. outer JSON that includes a `chain_id` field.
  127. We call this encoding the CanonicalSignBytes. For instance, CanonicalSignBytes for a vote would look
  128. like:
  129. ```json
  130. {"chain_id":"my-chain-id","vote":{"block_id":{"hash":DEADBEEF,"parts":{"hash":BEEFDEAD,"total":3}},"height":3,"round":2,"timestamp":1234567890, "type":2}
  131. ```
  132. Note how the fields within each level are sorted.
  133. ## Other
  134. ### MakeParts
  135. Encode an object using TMBIN and slice it into parts.
  136. ```go
  137. MakeParts(object, partSize)
  138. ```
  139. ### Part
  140. ```go
  141. type Part struct {
  142. Index int
  143. Bytes byte[]
  144. Proof byte[]
  145. }
  146. ```