You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

69 lines
2.2 KiB

  1. Corruption
  2. ==========
  3. Important step
  4. --------------
  5. Make sure you have a backup of the Tendermint data directory.
  6. Possible causes
  7. ---------------
  8. Remember that most corruption is caused by hardware issues:
  9. - RAID controllers with faulty / worn out battery backup, and an unexpected power loss
  10. - Hard disk drives with write-back cache enabled, and an unexpected power loss
  11. - Cheap SSDs with insufficient power-loss protection, and an unexpected power-loss
  12. - Defective RAM
  13. - Defective or overheating CPU(s)
  14. Other causes can be:
  15. - Database systems configured with fsync=off and an OS crash or power loss
  16. - Filesystems configured to use write barriers plus a storage layer that ignores write barriers. LVM is a particular culprit.
  17. - Tendermint bugs
  18. - Operating system bugs
  19. - Admin error
  20. - directly modifying Tendermint data-directory contents
  21. (Source: https://wiki.postgresql.org/wiki/Corruption)
  22. WAL Corruption
  23. --------------
  24. If consensus WAL is corrupted at the lastest height and you are trying to start
  25. Tendermint, replay will fail with panic.
  26. Recovering from data corruption can be hard and time-consuming. Here are two approaches you can take:
  27. 1) Delete the WAL file and restart Tendermint. It will attempt to sync with other peers.
  28. 2) Try to repair the WAL file manually:
  29. 1. Create a backup of the corrupted WAL file:
  30. .. code:: bash
  31. cp "$TMHOME/data/cs.wal/wal" > /tmp/corrupted_wal_backup
  32. 2. Use ./scripts/wal2json to create a human-readable version
  33. .. code:: bash
  34. ./scripts/wal2json/wal2json "$TMHOME/data/cs.wal/wal" > /tmp/corrupted_wal
  35. 3. Search for a "CORRUPTED MESSAGE" line.
  36. 4. By looking at the previous message and the message after the corrupted one
  37. and looking at the logs, try to rebuild the message. If the consequent
  38. messages are marked as corrupted too (this may happen if length header
  39. got corrupted or some writes did not make it to the WAL ~ truncation),
  40. then remove all the lines starting from the corrupted one and restart
  41. Tendermint.
  42. .. code:: bash
  43. $EDITOR /tmp/corrupted_wal
  44. 5. After editing, convert this file back into binary form by running:
  45. .. code:: bash
  46. ./scripts/json2wal/json2wal /tmp/corrupted_wal > "$TMHOME/data/cs.wal/wal"