You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

79 lines
1.4 KiB

WAL: better errors and new fail point (#3246) * privval: more info in errors * wal: change Debug logs to Info * wal: log and return error on corrupted wal instead of panicing * fail: Exit right away instead of sending interupt * consensus: FAIL before handling our own vote allows to replicate #3089: - run using `FAIL_TEST_INDEX=0` - delete some bytes from the end of the WAL - start normally Results in logs like: ``` I[2019-02-03|18:12:58.225] Searching for height module=consensus wal=/Users/ethanbuchman/.tendermint/data/cs.wal/wal height=1 min=0 max=0 E[2019-02-03|18:12:58.225] Error on catchup replay. Proceeding to start ConsensusState anyway module=consensus err="failed to read data: EOF" I[2019-02-03|18:12:58.225] Started node module=main nodeInfo="{ProtocolVersion:{P2P:6 Block:9 App:1} ID_:35e87e93f2e31f305b65a5517fd2102331b56002 ListenAddr:tcp://0.0.0.0:26656 Network:test-chain-J8JvJH Version:0.29.1 Channels:4020212223303800 Moniker:Ethans-MacBook-Pro.local Other:{TxIndex:on RPCAddress:tcp://0.0.0.0:26657}}" E[2019-02-03|18:12:58.226] Couldn't connect to any seeds module=p2p I[2019-02-03|18:12:59.229] Timed out module=consensus dur=998.568ms height=1 round=0 step=RoundStepNewHeight I[2019-02-03|18:12:59.230] enterNewRound(1/0). Current: 1/0/RoundStepNewHeight module=consensus height=1 round=0 I[2019-02-03|18:12:59.230] enterPropose(1/0). Current: 1/0/RoundStepNewRound module=consensus height=1 round=0 I[2019-02-03|18:12:59.230] enterPropose: Our turn to propose module=consensus height=1 round=0 proposer=AD278B7767B05D7FBEB76207024C650988FA77D5 privValidator="PrivValidator{AD278B7767B05D7FBEB76207024C650988FA77D5 LH:1, LR:0, LS:2}" E[2019-02-03|18:12:59.230] enterPropose: Error signing proposal module=consensus height=1 round=0 err="Error signing proposal: Step regression at height 1 round 0. Got 1, last step 2" I[2019-02-03|18:13:02.233] Timed out module=consensus dur=3s height=1 round=0 step=RoundStepPropose I[2019-02-03|18:13:02.233] enterPrevote(1/0). Current: 1/0/RoundStepPropose module=consensus I[2019-02-03|18:13:02.233] enterPrevote: ProposalBlock is nil module=consensus height=1 round=0 E[2019-02-03|18:13:02.234] Error signing vote module=consensus height=1 round=0 vote="Vote{0:AD278B7767B0 1/00/1(Prevote) 000000000000 000000000000 @ 2019-02-04T02:13:02.233897Z}" err="Error signing vote: Conflicting data" ``` Notice the EOF, the step regression, and the conflicting data. * wal: change errors to be DataCorruptionError * exit on corrupt WAL * fix log * fix new line
6 years ago
  1. package fail
  2. import (
  3. "fmt"
  4. "math/rand"
  5. "os"
  6. "strconv"
  7. )
  8. var callIndexToFail int
  9. func init() {
  10. callIndexToFailS := os.Getenv("FAIL_TEST_INDEX")
  11. if callIndexToFailS == "" {
  12. callIndexToFail = -1
  13. } else {
  14. var err error
  15. callIndexToFail, err = strconv.Atoi(callIndexToFailS)
  16. if err != nil {
  17. callIndexToFail = -1
  18. }
  19. }
  20. }
  21. // Fail when FAIL_TEST_INDEX == callIndex
  22. var (
  23. callIndex int //indexes Fail calls
  24. callRandIndex int // indexes a run of FailRand calls
  25. callRandIndexToFail = -1 // the callRandIndex to fail on in FailRand
  26. )
  27. func Fail() {
  28. if callIndexToFail < 0 {
  29. return
  30. }
  31. if callIndex == callIndexToFail {
  32. Exit()
  33. }
  34. callIndex += 1
  35. }
  36. // FailRand should be called n successive times.
  37. // It will fail on a random one of those calls
  38. // n must be greater than 0
  39. func FailRand(n int) {
  40. if callIndexToFail < 0 {
  41. return
  42. }
  43. if callRandIndexToFail < 0 {
  44. // first call in the loop, pick a random index to fail at
  45. callRandIndexToFail = rand.Intn(n)
  46. callRandIndex = 0
  47. }
  48. if callIndex == callIndexToFail {
  49. if callRandIndex == callRandIndexToFail {
  50. Exit()
  51. }
  52. }
  53. callRandIndex += 1
  54. if callRandIndex == n {
  55. callIndex += 1
  56. }
  57. }
  58. func Exit() {
  59. fmt.Printf("*** fail-test %d ***\n", callIndex)
  60. os.Exit(1)
  61. // proc, _ := os.FindProcess(os.Getpid())
  62. // proc.Signal(os.Interrupt)
  63. // panic(fmt.Sprintf("*** fail-test %d ***", callIndex))
  64. }