You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1477 lines
44 KiB

new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases
7 years ago
new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases
7 years ago
8 years ago
10 years ago
new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases
7 years ago
10 years ago
8 years ago
10 years ago
new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases
7 years ago
10 years ago
new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases
7 years ago
8 years ago
9 years ago
9 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
7 years ago
7 years ago
10 years ago
7 years ago
10 years ago
10 years ago
7 years ago
7 years ago
7 years ago
8 years ago
8 years ago
10 years ago
7 years ago
10 years ago
8 years ago
10 years ago
10 years ago
10 years ago
10 years ago
8 years ago
8 years ago
8 years ago
10 years ago
10 years ago
10 years ago
new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases
7 years ago
7 years ago
new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases
7 years ago
new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases
7 years ago
new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases
7 years ago
new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases
7 years ago
new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases
7 years ago
new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases
7 years ago
new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases
7 years ago
new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases
7 years ago
new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases
7 years ago
7 years ago
7 years ago
7 years ago
8 years ago
10 years ago
10 years ago
8 years ago
10 years ago
10 years ago
10 years ago
8 years ago
10 years ago
10 years ago
10 years ago
10 years ago
7 years ago
8 years ago
10 years ago
8 years ago
10 years ago
10 years ago
8 years ago
9 years ago
10 years ago
10 years ago
10 years ago
10 years ago
8 years ago
7 years ago
8 years ago
8 years ago
7 years ago
7 years ago
10 years ago
7 years ago
7 years ago
7 years ago
10 years ago
10 years ago
10 years ago
10 years ago
protect Record* peerStateStats functions by mutex Fixes #1414 DATA RACE: ``` Read at 0x00c4214ee940 by goroutine 146: github.com/tendermint/tendermint/consensus.(*peerStateStats).String() <autogenerated>:1 +0x57 fmt.(*pp).handleMethods() /usr/local/go/src/fmt/print.go:596 +0x3f4 fmt.(*pp).printArg() /usr/local/go/src/fmt/print.go:679 +0x11f fmt.(*pp).doPrintf() /usr/local/go/src/fmt/print.go:996 +0x319 fmt.Sprintf() /usr/local/go/src/fmt/print.go:196 +0x73 github.com/tendermint/tendermint/consensus.(*PeerState).StringIndented() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1426 +0x573 github.com/tendermint/tendermint/consensus.(*PeerState).String() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1419 +0x66 github.com/go-logfmt/logfmt.safeString() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:299 +0x9d github.com/go-logfmt/logfmt.writeValue() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:217 +0x5a0 github.com/go-logfmt/logfmt.(*Encoder).EncodeKeyval() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:61 +0x1dd github.com/tendermint/tmlibs/log.tmfmtLogger.Log() /home/ubuntu/go/src/github.com/tendermint/tmlibs/log/tmfmt_logger.go:107 +0x1001 github.com/tendermint/tmlibs/log.(*tmfmtLogger).Log() <autogenerated>:1 +0x93 github.com/go-kit/kit/log.(*context).Log() /home/ubuntu/go/src/github.com/go-kit/kit/log/log.go:124 +0x248 github.com/tendermint/tmlibs/log.(*tmLogger).Debug() /home/ubuntu/go/src/github.com/tendermint/tmlibs/log/tm_logger.go:64 +0x1d0 github.com/tendermint/tendermint/consensus.(*PeerState).PickSendVote() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1059 +0x242 github.com/tendermint/tendermint/consensus.(*ConsensusReactor).gossipVotesForHeight() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:789 +0x6ef github.com/tendermint/tendermint/consensus.(*ConsensusReactor).gossipVotesRoutine() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:723 +0x1039 Previous write at 0x00c4214ee940 by goroutine 21: github.com/tendermint/tendermint/consensus.(*PeerState).RecordVote() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1242 +0x15a github.com/tendermint/tendermint/consensus.(*ConsensusReactor).Receive() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:309 +0x32e6 github.com/tendermint/tendermint/p2p.createMConnection.func1() /home/ubuntu/go/src/github.com/tendermint/tendermint/p2p/peer.go:365 +0xea github.com/tendermint/tendermint/p2p/conn.(*MConnection).recvRoutine() /home/ubuntu/go/src/github.com/tendermint/tendermint/p2p/conn/connection.go:531 +0x779 ```
7 years ago
protect Record* peerStateStats functions by mutex Fixes #1414 DATA RACE: ``` Read at 0x00c4214ee940 by goroutine 146: github.com/tendermint/tendermint/consensus.(*peerStateStats).String() <autogenerated>:1 +0x57 fmt.(*pp).handleMethods() /usr/local/go/src/fmt/print.go:596 +0x3f4 fmt.(*pp).printArg() /usr/local/go/src/fmt/print.go:679 +0x11f fmt.(*pp).doPrintf() /usr/local/go/src/fmt/print.go:996 +0x319 fmt.Sprintf() /usr/local/go/src/fmt/print.go:196 +0x73 github.com/tendermint/tendermint/consensus.(*PeerState).StringIndented() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1426 +0x573 github.com/tendermint/tendermint/consensus.(*PeerState).String() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1419 +0x66 github.com/go-logfmt/logfmt.safeString() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:299 +0x9d github.com/go-logfmt/logfmt.writeValue() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:217 +0x5a0 github.com/go-logfmt/logfmt.(*Encoder).EncodeKeyval() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:61 +0x1dd github.com/tendermint/tmlibs/log.tmfmtLogger.Log() /home/ubuntu/go/src/github.com/tendermint/tmlibs/log/tmfmt_logger.go:107 +0x1001 github.com/tendermint/tmlibs/log.(*tmfmtLogger).Log() <autogenerated>:1 +0x93 github.com/go-kit/kit/log.(*context).Log() /home/ubuntu/go/src/github.com/go-kit/kit/log/log.go:124 +0x248 github.com/tendermint/tmlibs/log.(*tmLogger).Debug() /home/ubuntu/go/src/github.com/tendermint/tmlibs/log/tm_logger.go:64 +0x1d0 github.com/tendermint/tendermint/consensus.(*PeerState).PickSendVote() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1059 +0x242 github.com/tendermint/tendermint/consensus.(*ConsensusReactor).gossipVotesForHeight() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:789 +0x6ef github.com/tendermint/tendermint/consensus.(*ConsensusReactor).gossipVotesRoutine() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:723 +0x1039 Previous write at 0x00c4214ee940 by goroutine 21: github.com/tendermint/tendermint/consensus.(*PeerState).RecordVote() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1242 +0x15a github.com/tendermint/tendermint/consensus.(*ConsensusReactor).Receive() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:309 +0x32e6 github.com/tendermint/tendermint/p2p.createMConnection.func1() /home/ubuntu/go/src/github.com/tendermint/tendermint/p2p/peer.go:365 +0xea github.com/tendermint/tendermint/p2p/conn.(*MConnection).recvRoutine() /home/ubuntu/go/src/github.com/tendermint/tendermint/p2p/conn/connection.go:531 +0x779 ```
7 years ago
protect Record* peerStateStats functions by mutex Fixes #1414 DATA RACE: ``` Read at 0x00c4214ee940 by goroutine 146: github.com/tendermint/tendermint/consensus.(*peerStateStats).String() <autogenerated>:1 +0x57 fmt.(*pp).handleMethods() /usr/local/go/src/fmt/print.go:596 +0x3f4 fmt.(*pp).printArg() /usr/local/go/src/fmt/print.go:679 +0x11f fmt.(*pp).doPrintf() /usr/local/go/src/fmt/print.go:996 +0x319 fmt.Sprintf() /usr/local/go/src/fmt/print.go:196 +0x73 github.com/tendermint/tendermint/consensus.(*PeerState).StringIndented() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1426 +0x573 github.com/tendermint/tendermint/consensus.(*PeerState).String() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1419 +0x66 github.com/go-logfmt/logfmt.safeString() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:299 +0x9d github.com/go-logfmt/logfmt.writeValue() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:217 +0x5a0 github.com/go-logfmt/logfmt.(*Encoder).EncodeKeyval() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:61 +0x1dd github.com/tendermint/tmlibs/log.tmfmtLogger.Log() /home/ubuntu/go/src/github.com/tendermint/tmlibs/log/tmfmt_logger.go:107 +0x1001 github.com/tendermint/tmlibs/log.(*tmfmtLogger).Log() <autogenerated>:1 +0x93 github.com/go-kit/kit/log.(*context).Log() /home/ubuntu/go/src/github.com/go-kit/kit/log/log.go:124 +0x248 github.com/tendermint/tmlibs/log.(*tmLogger).Debug() /home/ubuntu/go/src/github.com/tendermint/tmlibs/log/tm_logger.go:64 +0x1d0 github.com/tendermint/tendermint/consensus.(*PeerState).PickSendVote() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1059 +0x242 github.com/tendermint/tendermint/consensus.(*ConsensusReactor).gossipVotesForHeight() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:789 +0x6ef github.com/tendermint/tendermint/consensus.(*ConsensusReactor).gossipVotesRoutine() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:723 +0x1039 Previous write at 0x00c4214ee940 by goroutine 21: github.com/tendermint/tendermint/consensus.(*PeerState).RecordVote() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1242 +0x15a github.com/tendermint/tendermint/consensus.(*ConsensusReactor).Receive() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:309 +0x32e6 github.com/tendermint/tendermint/p2p.createMConnection.func1() /home/ubuntu/go/src/github.com/tendermint/tendermint/p2p/peer.go:365 +0xea github.com/tendermint/tendermint/p2p/conn.(*MConnection).recvRoutine() /home/ubuntu/go/src/github.com/tendermint/tendermint/p2p/conn/connection.go:531 +0x779 ```
7 years ago
protect Record* peerStateStats functions by mutex Fixes #1414 DATA RACE: ``` Read at 0x00c4214ee940 by goroutine 146: github.com/tendermint/tendermint/consensus.(*peerStateStats).String() <autogenerated>:1 +0x57 fmt.(*pp).handleMethods() /usr/local/go/src/fmt/print.go:596 +0x3f4 fmt.(*pp).printArg() /usr/local/go/src/fmt/print.go:679 +0x11f fmt.(*pp).doPrintf() /usr/local/go/src/fmt/print.go:996 +0x319 fmt.Sprintf() /usr/local/go/src/fmt/print.go:196 +0x73 github.com/tendermint/tendermint/consensus.(*PeerState).StringIndented() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1426 +0x573 github.com/tendermint/tendermint/consensus.(*PeerState).String() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1419 +0x66 github.com/go-logfmt/logfmt.safeString() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:299 +0x9d github.com/go-logfmt/logfmt.writeValue() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:217 +0x5a0 github.com/go-logfmt/logfmt.(*Encoder).EncodeKeyval() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:61 +0x1dd github.com/tendermint/tmlibs/log.tmfmtLogger.Log() /home/ubuntu/go/src/github.com/tendermint/tmlibs/log/tmfmt_logger.go:107 +0x1001 github.com/tendermint/tmlibs/log.(*tmfmtLogger).Log() <autogenerated>:1 +0x93 github.com/go-kit/kit/log.(*context).Log() /home/ubuntu/go/src/github.com/go-kit/kit/log/log.go:124 +0x248 github.com/tendermint/tmlibs/log.(*tmLogger).Debug() /home/ubuntu/go/src/github.com/tendermint/tmlibs/log/tm_logger.go:64 +0x1d0 github.com/tendermint/tendermint/consensus.(*PeerState).PickSendVote() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1059 +0x242 github.com/tendermint/tendermint/consensus.(*ConsensusReactor).gossipVotesForHeight() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:789 +0x6ef github.com/tendermint/tendermint/consensus.(*ConsensusReactor).gossipVotesRoutine() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:723 +0x1039 Previous write at 0x00c4214ee940 by goroutine 21: github.com/tendermint/tendermint/consensus.(*PeerState).RecordVote() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1242 +0x15a github.com/tendermint/tendermint/consensus.(*ConsensusReactor).Receive() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:309 +0x32e6 github.com/tendermint/tendermint/p2p.createMConnection.func1() /home/ubuntu/go/src/github.com/tendermint/tendermint/p2p/peer.go:365 +0xea github.com/tendermint/tendermint/p2p/conn.(*MConnection).recvRoutine() /home/ubuntu/go/src/github.com/tendermint/tendermint/p2p/conn/connection.go:531 +0x779 ```
7 years ago
protect Record* peerStateStats functions by mutex Fixes #1414 DATA RACE: ``` Read at 0x00c4214ee940 by goroutine 146: github.com/tendermint/tendermint/consensus.(*peerStateStats).String() <autogenerated>:1 +0x57 fmt.(*pp).handleMethods() /usr/local/go/src/fmt/print.go:596 +0x3f4 fmt.(*pp).printArg() /usr/local/go/src/fmt/print.go:679 +0x11f fmt.(*pp).doPrintf() /usr/local/go/src/fmt/print.go:996 +0x319 fmt.Sprintf() /usr/local/go/src/fmt/print.go:196 +0x73 github.com/tendermint/tendermint/consensus.(*PeerState).StringIndented() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1426 +0x573 github.com/tendermint/tendermint/consensus.(*PeerState).String() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1419 +0x66 github.com/go-logfmt/logfmt.safeString() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:299 +0x9d github.com/go-logfmt/logfmt.writeValue() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:217 +0x5a0 github.com/go-logfmt/logfmt.(*Encoder).EncodeKeyval() /home/ubuntu/go/src/github.com/go-logfmt/logfmt/encode.go:61 +0x1dd github.com/tendermint/tmlibs/log.tmfmtLogger.Log() /home/ubuntu/go/src/github.com/tendermint/tmlibs/log/tmfmt_logger.go:107 +0x1001 github.com/tendermint/tmlibs/log.(*tmfmtLogger).Log() <autogenerated>:1 +0x93 github.com/go-kit/kit/log.(*context).Log() /home/ubuntu/go/src/github.com/go-kit/kit/log/log.go:124 +0x248 github.com/tendermint/tmlibs/log.(*tmLogger).Debug() /home/ubuntu/go/src/github.com/tendermint/tmlibs/log/tm_logger.go:64 +0x1d0 github.com/tendermint/tendermint/consensus.(*PeerState).PickSendVote() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1059 +0x242 github.com/tendermint/tendermint/consensus.(*ConsensusReactor).gossipVotesForHeight() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:789 +0x6ef github.com/tendermint/tendermint/consensus.(*ConsensusReactor).gossipVotesRoutine() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:723 +0x1039 Previous write at 0x00c4214ee940 by goroutine 21: github.com/tendermint/tendermint/consensus.(*PeerState).RecordVote() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:1242 +0x15a github.com/tendermint/tendermint/consensus.(*ConsensusReactor).Receive() github.com/tendermint/tendermint/consensus/_test/_obj_test/reactor.go:309 +0x32e6 github.com/tendermint/tendermint/p2p.createMConnection.func1() /home/ubuntu/go/src/github.com/tendermint/tendermint/p2p/peer.go:365 +0xea github.com/tendermint/tendermint/p2p/conn.(*MConnection).recvRoutine() /home/ubuntu/go/src/github.com/tendermint/tendermint/p2p/conn/connection.go:531 +0x779 ```
7 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
10 years ago
7 years ago
  1. package consensus
  2. import (
  3. "bytes"
  4. "context"
  5. "fmt"
  6. "reflect"
  7. "sync"
  8. "time"
  9. "github.com/pkg/errors"
  10. wire "github.com/tendermint/go-wire"
  11. cmn "github.com/tendermint/tmlibs/common"
  12. "github.com/tendermint/tmlibs/log"
  13. cstypes "github.com/tendermint/tendermint/consensus/types"
  14. "github.com/tendermint/tendermint/p2p"
  15. sm "github.com/tendermint/tendermint/state"
  16. "github.com/tendermint/tendermint/types"
  17. )
  18. const (
  19. StateChannel = byte(0x20)
  20. DataChannel = byte(0x21)
  21. VoteChannel = byte(0x22)
  22. VoteSetBitsChannel = byte(0x23)
  23. maxConsensusMessageSize = 1048576 // 1MB; NOTE/TODO: keep in sync with types.PartSet sizes.
  24. blocksToContributeToBecomeGoodPeer = 10000
  25. )
  26. //-----------------------------------------------------------------------------
  27. // ConsensusReactor defines a reactor for the consensus service.
  28. type ConsensusReactor struct {
  29. p2p.BaseReactor // BaseService + p2p.Switch
  30. conS *ConsensusState
  31. mtx sync.RWMutex
  32. fastSync bool
  33. eventBus *types.EventBus
  34. }
  35. // NewConsensusReactor returns a new ConsensusReactor with the given consensusState.
  36. func NewConsensusReactor(consensusState *ConsensusState, fastSync bool) *ConsensusReactor {
  37. conR := &ConsensusReactor{
  38. conS: consensusState,
  39. fastSync: fastSync,
  40. }
  41. conR.BaseReactor = *p2p.NewBaseReactor("ConsensusReactor", conR)
  42. return conR
  43. }
  44. // OnStart implements BaseService.
  45. func (conR *ConsensusReactor) OnStart() error {
  46. conR.Logger.Info("ConsensusReactor ", "fastSync", conR.FastSync())
  47. if err := conR.BaseReactor.OnStart(); err != nil {
  48. return err
  49. }
  50. err := conR.startBroadcastRoutine()
  51. if err != nil {
  52. return err
  53. }
  54. if !conR.FastSync() {
  55. err := conR.conS.Start()
  56. if err != nil {
  57. return err
  58. }
  59. }
  60. return nil
  61. }
  62. // OnStop implements BaseService
  63. func (conR *ConsensusReactor) OnStop() {
  64. conR.BaseReactor.OnStop()
  65. conR.conS.Stop()
  66. }
  67. // SwitchToConsensus switches from fast_sync mode to consensus mode.
  68. // It resets the state, turns off fast_sync, and starts the consensus state-machine
  69. func (conR *ConsensusReactor) SwitchToConsensus(state sm.State, blocksSynced int) {
  70. conR.Logger.Info("SwitchToConsensus")
  71. conR.conS.reconstructLastCommit(state)
  72. // NOTE: The line below causes broadcastNewRoundStepRoutine() to
  73. // broadcast a NewRoundStepMessage.
  74. conR.conS.updateToState(state)
  75. conR.mtx.Lock()
  76. conR.fastSync = false
  77. conR.mtx.Unlock()
  78. if blocksSynced > 0 {
  79. // dont bother with the WAL if we fast synced
  80. conR.conS.doWALCatchup = false
  81. }
  82. err := conR.conS.Start()
  83. if err != nil {
  84. conR.Logger.Error("Error starting conS", "err", err)
  85. }
  86. }
  87. // GetChannels implements Reactor
  88. func (conR *ConsensusReactor) GetChannels() []*p2p.ChannelDescriptor {
  89. // TODO optimize
  90. return []*p2p.ChannelDescriptor{
  91. {
  92. ID: StateChannel,
  93. Priority: 5,
  94. SendQueueCapacity: 100,
  95. },
  96. {
  97. ID: DataChannel, // maybe split between gossiping current block and catchup stuff
  98. Priority: 10, // once we gossip the whole block there's nothing left to send until next height or round
  99. SendQueueCapacity: 100,
  100. RecvBufferCapacity: 50 * 4096,
  101. },
  102. {
  103. ID: VoteChannel,
  104. Priority: 5,
  105. SendQueueCapacity: 100,
  106. RecvBufferCapacity: 100 * 100,
  107. },
  108. {
  109. ID: VoteSetBitsChannel,
  110. Priority: 1,
  111. SendQueueCapacity: 2,
  112. RecvBufferCapacity: 1024,
  113. },
  114. }
  115. }
  116. // AddPeer implements Reactor
  117. func (conR *ConsensusReactor) AddPeer(peer p2p.Peer) {
  118. if !conR.IsRunning() {
  119. return
  120. }
  121. // Create peerState for peer
  122. peerState := NewPeerState(peer).SetLogger(conR.Logger)
  123. peer.Set(types.PeerStateKey, peerState)
  124. // Begin routines for this peer.
  125. go conR.gossipDataRoutine(peer, peerState)
  126. go conR.gossipVotesRoutine(peer, peerState)
  127. go conR.queryMaj23Routine(peer, peerState)
  128. // Send our state to peer.
  129. // If we're fast_syncing, broadcast a RoundStepMessage later upon SwitchToConsensus().
  130. if !conR.FastSync() {
  131. conR.sendNewRoundStepMessages(peer)
  132. }
  133. }
  134. // RemovePeer implements Reactor
  135. func (conR *ConsensusReactor) RemovePeer(peer p2p.Peer, reason interface{}) {
  136. if !conR.IsRunning() {
  137. return
  138. }
  139. // TODO
  140. //peer.Get(PeerStateKey).(*PeerState).Disconnect()
  141. }
  142. // Receive implements Reactor
  143. // NOTE: We process these messages even when we're fast_syncing.
  144. // Messages affect either a peer state or the consensus state.
  145. // Peer state updates can happen in parallel, but processing of
  146. // proposals, block parts, and votes are ordered by the receiveRoutine
  147. // NOTE: blocks on consensus state for proposals, block parts, and votes
  148. func (conR *ConsensusReactor) Receive(chID byte, src p2p.Peer, msgBytes []byte) {
  149. if !conR.IsRunning() {
  150. conR.Logger.Debug("Receive", "src", src, "chId", chID, "bytes", msgBytes)
  151. return
  152. }
  153. _, msg, err := DecodeMessage(msgBytes)
  154. if err != nil {
  155. conR.Logger.Error("Error decoding message", "src", src, "chId", chID, "msg", msg, "err", err, "bytes", msgBytes)
  156. conR.Switch.StopPeerForError(src, err)
  157. return
  158. }
  159. conR.Logger.Debug("Receive", "src", src, "chId", chID, "msg", msg)
  160. // Get peer states
  161. ps := src.Get(types.PeerStateKey).(*PeerState)
  162. switch chID {
  163. case StateChannel:
  164. switch msg := msg.(type) {
  165. case *NewRoundStepMessage:
  166. ps.ApplyNewRoundStepMessage(msg)
  167. case *CommitStepMessage:
  168. ps.ApplyCommitStepMessage(msg)
  169. case *HasVoteMessage:
  170. ps.ApplyHasVoteMessage(msg)
  171. case *VoteSetMaj23Message:
  172. cs := conR.conS
  173. cs.mtx.Lock()
  174. height, votes := cs.Height, cs.Votes
  175. cs.mtx.Unlock()
  176. if height != msg.Height {
  177. return
  178. }
  179. // Peer claims to have a maj23 for some BlockID at H,R,S,
  180. err := votes.SetPeerMaj23(msg.Round, msg.Type, ps.Peer.ID(), msg.BlockID)
  181. if err != nil {
  182. conR.Switch.StopPeerForError(src, err)
  183. return
  184. }
  185. // Respond with a VoteSetBitsMessage showing which votes we have.
  186. // (and consequently shows which we don't have)
  187. var ourVotes *cmn.BitArray
  188. switch msg.Type {
  189. case types.VoteTypePrevote:
  190. ourVotes = votes.Prevotes(msg.Round).BitArrayByBlockID(msg.BlockID)
  191. case types.VoteTypePrecommit:
  192. ourVotes = votes.Precommits(msg.Round).BitArrayByBlockID(msg.BlockID)
  193. default:
  194. conR.Logger.Error("Bad VoteSetBitsMessage field Type")
  195. return
  196. }
  197. src.TrySend(VoteSetBitsChannel, struct{ ConsensusMessage }{&VoteSetBitsMessage{
  198. Height: msg.Height,
  199. Round: msg.Round,
  200. Type: msg.Type,
  201. BlockID: msg.BlockID,
  202. Votes: ourVotes,
  203. }})
  204. case *ProposalHeartbeatMessage:
  205. hb := msg.Heartbeat
  206. conR.Logger.Debug("Received proposal heartbeat message",
  207. "height", hb.Height, "round", hb.Round, "sequence", hb.Sequence,
  208. "valIdx", hb.ValidatorIndex, "valAddr", hb.ValidatorAddress)
  209. default:
  210. conR.Logger.Error(cmn.Fmt("Unknown message type %v", reflect.TypeOf(msg)))
  211. }
  212. case DataChannel:
  213. if conR.FastSync() {
  214. conR.Logger.Info("Ignoring message received during fastSync", "msg", msg)
  215. return
  216. }
  217. switch msg := msg.(type) {
  218. case *ProposalMessage:
  219. ps.SetHasProposal(msg.Proposal)
  220. conR.conS.peerMsgQueue <- msgInfo{msg, src.ID()}
  221. case *ProposalPOLMessage:
  222. ps.ApplyProposalPOLMessage(msg)
  223. case *BlockPartMessage:
  224. ps.SetHasProposalBlockPart(msg.Height, msg.Round, msg.Part.Index)
  225. if numBlocks := ps.RecordBlockPart(msg); numBlocks%blocksToContributeToBecomeGoodPeer == 0 {
  226. conR.Switch.MarkPeerAsGood(src)
  227. }
  228. conR.conS.peerMsgQueue <- msgInfo{msg, src.ID()}
  229. default:
  230. conR.Logger.Error(cmn.Fmt("Unknown message type %v", reflect.TypeOf(msg)))
  231. }
  232. case VoteChannel:
  233. if conR.FastSync() {
  234. conR.Logger.Info("Ignoring message received during fastSync", "msg", msg)
  235. return
  236. }
  237. switch msg := msg.(type) {
  238. case *VoteMessage:
  239. cs := conR.conS
  240. cs.mtx.Lock()
  241. height, valSize, lastCommitSize := cs.Height, cs.Validators.Size(), cs.LastCommit.Size()
  242. cs.mtx.Unlock()
  243. ps.EnsureVoteBitArrays(height, valSize)
  244. ps.EnsureVoteBitArrays(height-1, lastCommitSize)
  245. ps.SetHasVote(msg.Vote)
  246. if blocks := ps.RecordVote(msg.Vote); blocks%blocksToContributeToBecomeGoodPeer == 0 {
  247. conR.Switch.MarkPeerAsGood(src)
  248. }
  249. cs.peerMsgQueue <- msgInfo{msg, src.ID()}
  250. default:
  251. // don't punish (leave room for soft upgrades)
  252. conR.Logger.Error(cmn.Fmt("Unknown message type %v", reflect.TypeOf(msg)))
  253. }
  254. case VoteSetBitsChannel:
  255. if conR.FastSync() {
  256. conR.Logger.Info("Ignoring message received during fastSync", "msg", msg)
  257. return
  258. }
  259. switch msg := msg.(type) {
  260. case *VoteSetBitsMessage:
  261. cs := conR.conS
  262. cs.mtx.Lock()
  263. height, votes := cs.Height, cs.Votes
  264. cs.mtx.Unlock()
  265. if height == msg.Height {
  266. var ourVotes *cmn.BitArray
  267. switch msg.Type {
  268. case types.VoteTypePrevote:
  269. ourVotes = votes.Prevotes(msg.Round).BitArrayByBlockID(msg.BlockID)
  270. case types.VoteTypePrecommit:
  271. ourVotes = votes.Precommits(msg.Round).BitArrayByBlockID(msg.BlockID)
  272. default:
  273. conR.Logger.Error("Bad VoteSetBitsMessage field Type")
  274. return
  275. }
  276. ps.ApplyVoteSetBitsMessage(msg, ourVotes)
  277. } else {
  278. ps.ApplyVoteSetBitsMessage(msg, nil)
  279. }
  280. default:
  281. // don't punish (leave room for soft upgrades)
  282. conR.Logger.Error(cmn.Fmt("Unknown message type %v", reflect.TypeOf(msg)))
  283. }
  284. default:
  285. conR.Logger.Error(cmn.Fmt("Unknown chId %X", chID))
  286. }
  287. if err != nil {
  288. conR.Logger.Error("Error in Receive()", "err", err)
  289. }
  290. }
  291. // SetEventBus sets event bus.
  292. func (conR *ConsensusReactor) SetEventBus(b *types.EventBus) {
  293. conR.eventBus = b
  294. conR.conS.SetEventBus(b)
  295. }
  296. // FastSync returns whether the consensus reactor is in fast-sync mode.
  297. func (conR *ConsensusReactor) FastSync() bool {
  298. conR.mtx.RLock()
  299. defer conR.mtx.RUnlock()
  300. return conR.fastSync
  301. }
  302. //--------------------------------------
  303. // startBroadcastRoutine subscribes for new round steps, votes and proposal
  304. // heartbeats using the event bus and starts a go routine to broadcasts events
  305. // to peers upon receiving them.
  306. func (conR *ConsensusReactor) startBroadcastRoutine() error {
  307. const subscriber = "consensus-reactor"
  308. ctx := context.Background()
  309. // new round steps
  310. stepsCh := make(chan interface{})
  311. err := conR.eventBus.Subscribe(ctx, subscriber, types.EventQueryNewRoundStep, stepsCh)
  312. if err != nil {
  313. return errors.Wrapf(err, "failed to subscribe %s to %s", subscriber, types.EventQueryNewRoundStep)
  314. }
  315. // votes
  316. votesCh := make(chan interface{})
  317. err = conR.eventBus.Subscribe(ctx, subscriber, types.EventQueryVote, votesCh)
  318. if err != nil {
  319. return errors.Wrapf(err, "failed to subscribe %s to %s", subscriber, types.EventQueryVote)
  320. }
  321. // proposal heartbeats
  322. heartbeatsCh := make(chan interface{})
  323. err = conR.eventBus.Subscribe(ctx, subscriber, types.EventQueryProposalHeartbeat, heartbeatsCh)
  324. if err != nil {
  325. return errors.Wrapf(err, "failed to subscribe %s to %s", subscriber, types.EventQueryProposalHeartbeat)
  326. }
  327. go func() {
  328. var data interface{}
  329. var ok bool
  330. for {
  331. select {
  332. case data, ok = <-stepsCh:
  333. if ok { // a receive from a closed channel returns the zero value immediately
  334. edrs := data.(types.TMEventData).Unwrap().(types.EventDataRoundState)
  335. conR.broadcastNewRoundStep(edrs.RoundState.(*cstypes.RoundState))
  336. }
  337. case data, ok = <-votesCh:
  338. if ok {
  339. edv := data.(types.TMEventData).Unwrap().(types.EventDataVote)
  340. conR.broadcastHasVoteMessage(edv.Vote)
  341. }
  342. case data, ok = <-heartbeatsCh:
  343. if ok {
  344. edph := data.(types.TMEventData).Unwrap().(types.EventDataProposalHeartbeat)
  345. conR.broadcastProposalHeartbeatMessage(edph)
  346. }
  347. case <-conR.Quit():
  348. conR.eventBus.UnsubscribeAll(ctx, subscriber)
  349. return
  350. }
  351. if !ok {
  352. conR.eventBus.UnsubscribeAll(ctx, subscriber)
  353. return
  354. }
  355. }
  356. }()
  357. return nil
  358. }
  359. func (conR *ConsensusReactor) broadcastProposalHeartbeatMessage(heartbeat types.EventDataProposalHeartbeat) {
  360. hb := heartbeat.Heartbeat
  361. conR.Logger.Debug("Broadcasting proposal heartbeat message",
  362. "height", hb.Height, "round", hb.Round, "sequence", hb.Sequence)
  363. msg := &ProposalHeartbeatMessage{hb}
  364. conR.Switch.Broadcast(StateChannel, struct{ ConsensusMessage }{msg})
  365. }
  366. func (conR *ConsensusReactor) broadcastNewRoundStep(rs *cstypes.RoundState) {
  367. nrsMsg, csMsg := makeRoundStepMessages(rs)
  368. if nrsMsg != nil {
  369. conR.Switch.Broadcast(StateChannel, struct{ ConsensusMessage }{nrsMsg})
  370. }
  371. if csMsg != nil {
  372. conR.Switch.Broadcast(StateChannel, struct{ ConsensusMessage }{csMsg})
  373. }
  374. }
  375. // Broadcasts HasVoteMessage to peers that care.
  376. func (conR *ConsensusReactor) broadcastHasVoteMessage(vote *types.Vote) {
  377. msg := &HasVoteMessage{
  378. Height: vote.Height,
  379. Round: vote.Round,
  380. Type: vote.Type,
  381. Index: vote.ValidatorIndex,
  382. }
  383. conR.Switch.Broadcast(StateChannel, struct{ ConsensusMessage }{msg})
  384. /*
  385. // TODO: Make this broadcast more selective.
  386. for _, peer := range conR.Switch.Peers().List() {
  387. ps := peer.Get(PeerStateKey).(*PeerState)
  388. prs := ps.GetRoundState()
  389. if prs.Height == vote.Height {
  390. // TODO: Also filter on round?
  391. peer.TrySend(StateChannel, struct{ ConsensusMessage }{msg})
  392. } else {
  393. // Height doesn't match
  394. // TODO: check a field, maybe CatchupCommitRound?
  395. // TODO: But that requires changing the struct field comment.
  396. }
  397. }
  398. */
  399. }
  400. func makeRoundStepMessages(rs *cstypes.RoundState) (nrsMsg *NewRoundStepMessage, csMsg *CommitStepMessage) {
  401. nrsMsg = &NewRoundStepMessage{
  402. Height: rs.Height,
  403. Round: rs.Round,
  404. Step: rs.Step,
  405. SecondsSinceStartTime: int(time.Since(rs.StartTime).Seconds()),
  406. LastCommitRound: rs.LastCommit.Round(),
  407. }
  408. if rs.Step == cstypes.RoundStepCommit {
  409. csMsg = &CommitStepMessage{
  410. Height: rs.Height,
  411. BlockPartsHeader: rs.ProposalBlockParts.Header(),
  412. BlockParts: rs.ProposalBlockParts.BitArray(),
  413. }
  414. }
  415. return
  416. }
  417. func (conR *ConsensusReactor) sendNewRoundStepMessages(peer p2p.Peer) {
  418. rs := conR.conS.GetRoundState()
  419. nrsMsg, csMsg := makeRoundStepMessages(rs)
  420. if nrsMsg != nil {
  421. peer.Send(StateChannel, struct{ ConsensusMessage }{nrsMsg})
  422. }
  423. if csMsg != nil {
  424. peer.Send(StateChannel, struct{ ConsensusMessage }{csMsg})
  425. }
  426. }
  427. func (conR *ConsensusReactor) gossipDataRoutine(peer p2p.Peer, ps *PeerState) {
  428. logger := conR.Logger.With("peer", peer)
  429. OUTER_LOOP:
  430. for {
  431. // Manage disconnects from self or peer.
  432. if !peer.IsRunning() || !conR.IsRunning() {
  433. logger.Info("Stopping gossipDataRoutine for peer")
  434. return
  435. }
  436. rs := conR.conS.GetRoundState()
  437. prs := ps.GetRoundState()
  438. // Send proposal Block parts?
  439. if rs.ProposalBlockParts.HasHeader(prs.ProposalBlockPartsHeader) {
  440. if index, ok := rs.ProposalBlockParts.BitArray().Sub(prs.ProposalBlockParts.Copy()).PickRandom(); ok {
  441. part := rs.ProposalBlockParts.GetPart(index)
  442. msg := &BlockPartMessage{
  443. Height: rs.Height, // This tells peer that this part applies to us.
  444. Round: rs.Round, // This tells peer that this part applies to us.
  445. Part: part,
  446. }
  447. logger.Debug("Sending block part", "height", prs.Height, "round", prs.Round)
  448. if peer.Send(DataChannel, struct{ ConsensusMessage }{msg}) {
  449. ps.SetHasProposalBlockPart(prs.Height, prs.Round, index)
  450. }
  451. continue OUTER_LOOP
  452. }
  453. }
  454. // If the peer is on a previous height, help catch up.
  455. if (0 < prs.Height) && (prs.Height < rs.Height) {
  456. heightLogger := logger.With("height", prs.Height)
  457. // if we never received the commit message from the peer, the block parts wont be initialized
  458. if prs.ProposalBlockParts == nil {
  459. blockMeta := conR.conS.blockStore.LoadBlockMeta(prs.Height)
  460. if blockMeta == nil {
  461. cmn.PanicCrisis(cmn.Fmt("Failed to load block %d when blockStore is at %d",
  462. prs.Height, conR.conS.blockStore.Height()))
  463. }
  464. ps.InitProposalBlockParts(blockMeta.BlockID.PartsHeader)
  465. // continue the loop since prs is a copy and not effected by this initialization
  466. continue OUTER_LOOP
  467. }
  468. conR.gossipDataForCatchup(heightLogger, rs, prs, ps, peer)
  469. continue OUTER_LOOP
  470. }
  471. // If height and round don't match, sleep.
  472. if (rs.Height != prs.Height) || (rs.Round != prs.Round) {
  473. //logger.Info("Peer Height|Round mismatch, sleeping", "peerHeight", prs.Height, "peerRound", prs.Round, "peer", peer)
  474. time.Sleep(conR.conS.config.PeerGossipSleep())
  475. continue OUTER_LOOP
  476. }
  477. // By here, height and round match.
  478. // Proposal block parts were already matched and sent if any were wanted.
  479. // (These can match on hash so the round doesn't matter)
  480. // Now consider sending other things, like the Proposal itself.
  481. // Send Proposal && ProposalPOL BitArray?
  482. if rs.Proposal != nil && !prs.Proposal {
  483. // Proposal: share the proposal metadata with peer.
  484. {
  485. msg := &ProposalMessage{Proposal: rs.Proposal}
  486. logger.Debug("Sending proposal", "height", prs.Height, "round", prs.Round)
  487. if peer.Send(DataChannel, struct{ ConsensusMessage }{msg}) {
  488. ps.SetHasProposal(rs.Proposal)
  489. }
  490. }
  491. // ProposalPOL: lets peer know which POL votes we have so far.
  492. // Peer must receive ProposalMessage first.
  493. // rs.Proposal was validated, so rs.Proposal.POLRound <= rs.Round,
  494. // so we definitely have rs.Votes.Prevotes(rs.Proposal.POLRound).
  495. if 0 <= rs.Proposal.POLRound {
  496. msg := &ProposalPOLMessage{
  497. Height: rs.Height,
  498. ProposalPOLRound: rs.Proposal.POLRound,
  499. ProposalPOL: rs.Votes.Prevotes(rs.Proposal.POLRound).BitArray(),
  500. }
  501. logger.Debug("Sending POL", "height", prs.Height, "round", prs.Round)
  502. peer.Send(DataChannel, struct{ ConsensusMessage }{msg})
  503. }
  504. continue OUTER_LOOP
  505. }
  506. // Nothing to do. Sleep.
  507. time.Sleep(conR.conS.config.PeerGossipSleep())
  508. continue OUTER_LOOP
  509. }
  510. }
  511. func (conR *ConsensusReactor) gossipDataForCatchup(logger log.Logger, rs *cstypes.RoundState,
  512. prs *cstypes.PeerRoundState, ps *PeerState, peer p2p.Peer) {
  513. if index, ok := prs.ProposalBlockParts.Not().PickRandom(); ok {
  514. // Ensure that the peer's PartSetHeader is correct
  515. blockMeta := conR.conS.blockStore.LoadBlockMeta(prs.Height)
  516. if blockMeta == nil {
  517. logger.Error("Failed to load block meta",
  518. "ourHeight", rs.Height, "blockstoreHeight", conR.conS.blockStore.Height())
  519. time.Sleep(conR.conS.config.PeerGossipSleep())
  520. return
  521. } else if !blockMeta.BlockID.PartsHeader.Equals(prs.ProposalBlockPartsHeader) {
  522. logger.Info("Peer ProposalBlockPartsHeader mismatch, sleeping",
  523. "blockPartsHeader", blockMeta.BlockID.PartsHeader, "peerBlockPartsHeader", prs.ProposalBlockPartsHeader)
  524. time.Sleep(conR.conS.config.PeerGossipSleep())
  525. return
  526. }
  527. // Load the part
  528. part := conR.conS.blockStore.LoadBlockPart(prs.Height, index)
  529. if part == nil {
  530. logger.Error("Could not load part", "index", index,
  531. "blockPartsHeader", blockMeta.BlockID.PartsHeader, "peerBlockPartsHeader", prs.ProposalBlockPartsHeader)
  532. time.Sleep(conR.conS.config.PeerGossipSleep())
  533. return
  534. }
  535. // Send the part
  536. msg := &BlockPartMessage{
  537. Height: prs.Height, // Not our height, so it doesn't matter.
  538. Round: prs.Round, // Not our height, so it doesn't matter.
  539. Part: part,
  540. }
  541. logger.Debug("Sending block part for catchup", "round", prs.Round, "index", index)
  542. if peer.Send(DataChannel, struct{ ConsensusMessage }{msg}) {
  543. ps.SetHasProposalBlockPart(prs.Height, prs.Round, index)
  544. } else {
  545. logger.Debug("Sending block part for catchup failed")
  546. }
  547. return
  548. }
  549. //logger.Info("No parts to send in catch-up, sleeping")
  550. time.Sleep(conR.conS.config.PeerGossipSleep())
  551. }
  552. func (conR *ConsensusReactor) gossipVotesRoutine(peer p2p.Peer, ps *PeerState) {
  553. logger := conR.Logger.With("peer", peer)
  554. // Simple hack to throttle logs upon sleep.
  555. var sleeping = 0
  556. OUTER_LOOP:
  557. for {
  558. // Manage disconnects from self or peer.
  559. if !peer.IsRunning() || !conR.IsRunning() {
  560. logger.Info("Stopping gossipVotesRoutine for peer")
  561. return
  562. }
  563. rs := conR.conS.GetRoundState()
  564. prs := ps.GetRoundState()
  565. switch sleeping {
  566. case 1: // First sleep
  567. sleeping = 2
  568. case 2: // No more sleep
  569. sleeping = 0
  570. }
  571. //logger.Debug("gossipVotesRoutine", "rsHeight", rs.Height, "rsRound", rs.Round,
  572. // "prsHeight", prs.Height, "prsRound", prs.Round, "prsStep", prs.Step)
  573. // If height matches, then send LastCommit, Prevotes, Precommits.
  574. if rs.Height == prs.Height {
  575. heightLogger := logger.With("height", prs.Height)
  576. if conR.gossipVotesForHeight(heightLogger, rs, prs, ps) {
  577. continue OUTER_LOOP
  578. }
  579. }
  580. // Special catchup logic.
  581. // If peer is lagging by height 1, send LastCommit.
  582. if prs.Height != 0 && rs.Height == prs.Height+1 {
  583. if ps.PickSendVote(rs.LastCommit) {
  584. logger.Debug("Picked rs.LastCommit to send", "height", prs.Height)
  585. continue OUTER_LOOP
  586. }
  587. }
  588. // Catchup logic
  589. // If peer is lagging by more than 1, send Commit.
  590. if prs.Height != 0 && rs.Height >= prs.Height+2 {
  591. // Load the block commit for prs.Height,
  592. // which contains precommit signatures for prs.Height.
  593. commit := conR.conS.blockStore.LoadBlockCommit(prs.Height)
  594. if ps.PickSendVote(commit) {
  595. logger.Debug("Picked Catchup commit to send", "height", prs.Height)
  596. continue OUTER_LOOP
  597. }
  598. }
  599. if sleeping == 0 {
  600. // We sent nothing. Sleep...
  601. sleeping = 1
  602. logger.Debug("No votes to send, sleeping", "rs.Height", rs.Height, "prs.Height", prs.Height,
  603. "localPV", rs.Votes.Prevotes(rs.Round).BitArray(), "peerPV", prs.Prevotes,
  604. "localPC", rs.Votes.Precommits(rs.Round).BitArray(), "peerPC", prs.Precommits)
  605. } else if sleeping == 2 {
  606. // Continued sleep...
  607. sleeping = 1
  608. }
  609. time.Sleep(conR.conS.config.PeerGossipSleep())
  610. continue OUTER_LOOP
  611. }
  612. }
  613. func (conR *ConsensusReactor) gossipVotesForHeight(logger log.Logger, rs *cstypes.RoundState, prs *cstypes.PeerRoundState, ps *PeerState) bool {
  614. // If there are lastCommits to send...
  615. if prs.Step == cstypes.RoundStepNewHeight {
  616. if ps.PickSendVote(rs.LastCommit) {
  617. logger.Debug("Picked rs.LastCommit to send")
  618. return true
  619. }
  620. }
  621. // If there are prevotes to send...
  622. if prs.Step <= cstypes.RoundStepPrevote && prs.Round != -1 && prs.Round <= rs.Round {
  623. if ps.PickSendVote(rs.Votes.Prevotes(prs.Round)) {
  624. logger.Debug("Picked rs.Prevotes(prs.Round) to send", "round", prs.Round)
  625. return true
  626. }
  627. }
  628. // If there are precommits to send...
  629. if prs.Step <= cstypes.RoundStepPrecommit && prs.Round != -1 && prs.Round <= rs.Round {
  630. if ps.PickSendVote(rs.Votes.Precommits(prs.Round)) {
  631. logger.Debug("Picked rs.Precommits(prs.Round) to send", "round", prs.Round)
  632. return true
  633. }
  634. }
  635. // If there are POLPrevotes to send...
  636. if prs.ProposalPOLRound != -1 {
  637. if polPrevotes := rs.Votes.Prevotes(prs.ProposalPOLRound); polPrevotes != nil {
  638. if ps.PickSendVote(polPrevotes) {
  639. logger.Debug("Picked rs.Prevotes(prs.ProposalPOLRound) to send",
  640. "round", prs.ProposalPOLRound)
  641. return true
  642. }
  643. }
  644. }
  645. return false
  646. }
  647. // NOTE: `queryMaj23Routine` has a simple crude design since it only comes
  648. // into play for liveness when there's a signature DDoS attack happening.
  649. func (conR *ConsensusReactor) queryMaj23Routine(peer p2p.Peer, ps *PeerState) {
  650. logger := conR.Logger.With("peer", peer)
  651. OUTER_LOOP:
  652. for {
  653. // Manage disconnects from self or peer.
  654. if !peer.IsRunning() || !conR.IsRunning() {
  655. logger.Info("Stopping queryMaj23Routine for peer")
  656. return
  657. }
  658. // Maybe send Height/Round/Prevotes
  659. {
  660. rs := conR.conS.GetRoundState()
  661. prs := ps.GetRoundState()
  662. if rs.Height == prs.Height {
  663. if maj23, ok := rs.Votes.Prevotes(prs.Round).TwoThirdsMajority(); ok {
  664. peer.TrySend(StateChannel, struct{ ConsensusMessage }{&VoteSetMaj23Message{
  665. Height: prs.Height,
  666. Round: prs.Round,
  667. Type: types.VoteTypePrevote,
  668. BlockID: maj23,
  669. }})
  670. time.Sleep(conR.conS.config.PeerQueryMaj23Sleep())
  671. }
  672. }
  673. }
  674. // Maybe send Height/Round/Precommits
  675. {
  676. rs := conR.conS.GetRoundState()
  677. prs := ps.GetRoundState()
  678. if rs.Height == prs.Height {
  679. if maj23, ok := rs.Votes.Precommits(prs.Round).TwoThirdsMajority(); ok {
  680. peer.TrySend(StateChannel, struct{ ConsensusMessage }{&VoteSetMaj23Message{
  681. Height: prs.Height,
  682. Round: prs.Round,
  683. Type: types.VoteTypePrecommit,
  684. BlockID: maj23,
  685. }})
  686. time.Sleep(conR.conS.config.PeerQueryMaj23Sleep())
  687. }
  688. }
  689. }
  690. // Maybe send Height/Round/ProposalPOL
  691. {
  692. rs := conR.conS.GetRoundState()
  693. prs := ps.GetRoundState()
  694. if rs.Height == prs.Height && prs.ProposalPOLRound >= 0 {
  695. if maj23, ok := rs.Votes.Prevotes(prs.ProposalPOLRound).TwoThirdsMajority(); ok {
  696. peer.TrySend(StateChannel, struct{ ConsensusMessage }{&VoteSetMaj23Message{
  697. Height: prs.Height,
  698. Round: prs.ProposalPOLRound,
  699. Type: types.VoteTypePrevote,
  700. BlockID: maj23,
  701. }})
  702. time.Sleep(conR.conS.config.PeerQueryMaj23Sleep())
  703. }
  704. }
  705. }
  706. // Little point sending LastCommitRound/LastCommit,
  707. // These are fleeting and non-blocking.
  708. // Maybe send Height/CatchupCommitRound/CatchupCommit.
  709. {
  710. prs := ps.GetRoundState()
  711. if prs.CatchupCommitRound != -1 && 0 < prs.Height && prs.Height <= conR.conS.blockStore.Height() {
  712. commit := conR.conS.LoadCommit(prs.Height)
  713. peer.TrySend(StateChannel, struct{ ConsensusMessage }{&VoteSetMaj23Message{
  714. Height: prs.Height,
  715. Round: commit.Round(),
  716. Type: types.VoteTypePrecommit,
  717. BlockID: commit.BlockID,
  718. }})
  719. time.Sleep(conR.conS.config.PeerQueryMaj23Sleep())
  720. }
  721. }
  722. time.Sleep(conR.conS.config.PeerQueryMaj23Sleep())
  723. continue OUTER_LOOP
  724. }
  725. }
  726. // String returns a string representation of the ConsensusReactor.
  727. // NOTE: For now, it is just a hard-coded string to avoid accessing unprotected shared variables.
  728. // TODO: improve!
  729. func (conR *ConsensusReactor) String() string {
  730. // better not to access shared variables
  731. return "ConsensusReactor" // conR.StringIndented("")
  732. }
  733. // StringIndented returns an indented string representation of the ConsensusReactor
  734. func (conR *ConsensusReactor) StringIndented(indent string) string {
  735. s := "ConsensusReactor{\n"
  736. s += indent + " " + conR.conS.StringIndented(indent+" ") + "\n"
  737. for _, peer := range conR.Switch.Peers().List() {
  738. ps := peer.Get(types.PeerStateKey).(*PeerState)
  739. s += indent + " " + ps.StringIndented(indent+" ") + "\n"
  740. }
  741. s += indent + "}"
  742. return s
  743. }
  744. //-----------------------------------------------------------------------------
  745. var (
  746. ErrPeerStateHeightRegression = errors.New("Error peer state height regression")
  747. ErrPeerStateInvalidStartTime = errors.New("Error peer state invalid startTime")
  748. )
  749. // PeerState contains the known state of a peer, including its connection
  750. // and threadsafe access to its PeerRoundState.
  751. type PeerState struct {
  752. Peer p2p.Peer
  753. logger log.Logger
  754. mtx sync.Mutex
  755. cstypes.PeerRoundState
  756. stats *peerStateStats
  757. }
  758. // peerStateStats holds internal statistics for a peer.
  759. type peerStateStats struct {
  760. lastVoteHeight int64
  761. votes int
  762. lastBlockPartHeight int64
  763. blockParts int
  764. }
  765. func (pss peerStateStats) String() string {
  766. return fmt.Sprintf("peerStateStats{votes: %d, blockParts: %d}", pss.votes, pss.blockParts)
  767. }
  768. // NewPeerState returns a new PeerState for the given Peer
  769. func NewPeerState(peer p2p.Peer) *PeerState {
  770. return &PeerState{
  771. Peer: peer,
  772. logger: log.NewNopLogger(),
  773. PeerRoundState: cstypes.PeerRoundState{
  774. Round: -1,
  775. ProposalPOLRound: -1,
  776. LastCommitRound: -1,
  777. CatchupCommitRound: -1,
  778. },
  779. stats: &peerStateStats{},
  780. }
  781. }
  782. func (ps *PeerState) SetLogger(logger log.Logger) *PeerState {
  783. ps.logger = logger
  784. return ps
  785. }
  786. // GetRoundState returns an atomic snapshot of the PeerRoundState.
  787. // There's no point in mutating it since it won't change PeerState.
  788. func (ps *PeerState) GetRoundState() *cstypes.PeerRoundState {
  789. ps.mtx.Lock()
  790. defer ps.mtx.Unlock()
  791. prs := ps.PeerRoundState // copy
  792. return &prs
  793. }
  794. // GetHeight returns an atomic snapshot of the PeerRoundState's height
  795. // used by the mempool to ensure peers are caught up before broadcasting new txs
  796. func (ps *PeerState) GetHeight() int64 {
  797. ps.mtx.Lock()
  798. defer ps.mtx.Unlock()
  799. return ps.PeerRoundState.Height
  800. }
  801. // SetHasProposal sets the given proposal as known for the peer.
  802. func (ps *PeerState) SetHasProposal(proposal *types.Proposal) {
  803. ps.mtx.Lock()
  804. defer ps.mtx.Unlock()
  805. if ps.Height != proposal.Height || ps.Round != proposal.Round {
  806. return
  807. }
  808. if ps.Proposal {
  809. return
  810. }
  811. ps.Proposal = true
  812. ps.ProposalBlockPartsHeader = proposal.BlockPartsHeader
  813. ps.ProposalBlockParts = cmn.NewBitArray(proposal.BlockPartsHeader.Total)
  814. ps.ProposalPOLRound = proposal.POLRound
  815. ps.ProposalPOL = nil // Nil until ProposalPOLMessage received.
  816. }
  817. // InitProposalBlockParts initializes the peer's proposal block parts header and bit array.
  818. func (ps *PeerState) InitProposalBlockParts(partsHeader types.PartSetHeader) {
  819. ps.mtx.Lock()
  820. defer ps.mtx.Unlock()
  821. if ps.ProposalBlockParts != nil {
  822. return
  823. }
  824. ps.ProposalBlockPartsHeader = partsHeader
  825. ps.ProposalBlockParts = cmn.NewBitArray(partsHeader.Total)
  826. }
  827. // SetHasProposalBlockPart sets the given block part index as known for the peer.
  828. func (ps *PeerState) SetHasProposalBlockPart(height int64, round int, index int) {
  829. ps.mtx.Lock()
  830. defer ps.mtx.Unlock()
  831. if ps.Height != height || ps.Round != round {
  832. return
  833. }
  834. ps.ProposalBlockParts.SetIndex(index, true)
  835. }
  836. // PickSendVote picks a vote and sends it to the peer.
  837. // Returns true if vote was sent.
  838. func (ps *PeerState) PickSendVote(votes types.VoteSetReader) bool {
  839. if vote, ok := ps.PickVoteToSend(votes); ok {
  840. msg := &VoteMessage{vote}
  841. ps.logger.Debug("Sending vote message", "ps", ps, "vote", vote)
  842. return ps.Peer.Send(VoteChannel, struct{ ConsensusMessage }{msg})
  843. }
  844. return false
  845. }
  846. // PickVoteToSend picks a vote to send to the peer.
  847. // Returns true if a vote was picked.
  848. // NOTE: `votes` must be the correct Size() for the Height().
  849. func (ps *PeerState) PickVoteToSend(votes types.VoteSetReader) (vote *types.Vote, ok bool) {
  850. ps.mtx.Lock()
  851. defer ps.mtx.Unlock()
  852. if votes.Size() == 0 {
  853. return nil, false
  854. }
  855. height, round, type_, size := votes.Height(), votes.Round(), votes.Type(), votes.Size()
  856. // Lazily set data using 'votes'.
  857. if votes.IsCommit() {
  858. ps.ensureCatchupCommitRound(height, round, size)
  859. }
  860. ps.ensureVoteBitArrays(height, size)
  861. psVotes := ps.getVoteBitArray(height, round, type_)
  862. if psVotes == nil {
  863. return nil, false // Not something worth sending
  864. }
  865. if index, ok := votes.BitArray().Sub(psVotes).PickRandom(); ok {
  866. ps.setHasVote(height, round, type_, index)
  867. return votes.GetByIndex(index), true
  868. }
  869. return nil, false
  870. }
  871. func (ps *PeerState) getVoteBitArray(height int64, round int, type_ byte) *cmn.BitArray {
  872. if !types.IsVoteTypeValid(type_) {
  873. return nil
  874. }
  875. if ps.Height == height {
  876. if ps.Round == round {
  877. switch type_ {
  878. case types.VoteTypePrevote:
  879. return ps.Prevotes
  880. case types.VoteTypePrecommit:
  881. return ps.Precommits
  882. }
  883. }
  884. if ps.CatchupCommitRound == round {
  885. switch type_ {
  886. case types.VoteTypePrevote:
  887. return nil
  888. case types.VoteTypePrecommit:
  889. return ps.CatchupCommit
  890. }
  891. }
  892. if ps.ProposalPOLRound == round {
  893. switch type_ {
  894. case types.VoteTypePrevote:
  895. return ps.ProposalPOL
  896. case types.VoteTypePrecommit:
  897. return nil
  898. }
  899. }
  900. return nil
  901. }
  902. if ps.Height == height+1 {
  903. if ps.LastCommitRound == round {
  904. switch type_ {
  905. case types.VoteTypePrevote:
  906. return nil
  907. case types.VoteTypePrecommit:
  908. return ps.LastCommit
  909. }
  910. }
  911. return nil
  912. }
  913. return nil
  914. }
  915. // 'round': A round for which we have a +2/3 commit.
  916. func (ps *PeerState) ensureCatchupCommitRound(height int64, round int, numValidators int) {
  917. if ps.Height != height {
  918. return
  919. }
  920. /*
  921. NOTE: This is wrong, 'round' could change.
  922. e.g. if orig round is not the same as block LastCommit round.
  923. if ps.CatchupCommitRound != -1 && ps.CatchupCommitRound != round {
  924. cmn.PanicSanity(cmn.Fmt("Conflicting CatchupCommitRound. Height: %v, Orig: %v, New: %v", height, ps.CatchupCommitRound, round))
  925. }
  926. */
  927. if ps.CatchupCommitRound == round {
  928. return // Nothing to do!
  929. }
  930. ps.CatchupCommitRound = round
  931. if round == ps.Round {
  932. ps.CatchupCommit = ps.Precommits
  933. } else {
  934. ps.CatchupCommit = cmn.NewBitArray(numValidators)
  935. }
  936. }
  937. // EnsureVoteVitArrays ensures the bit-arrays have been allocated for tracking
  938. // what votes this peer has received.
  939. // NOTE: It's important to make sure that numValidators actually matches
  940. // what the node sees as the number of validators for height.
  941. func (ps *PeerState) EnsureVoteBitArrays(height int64, numValidators int) {
  942. ps.mtx.Lock()
  943. defer ps.mtx.Unlock()
  944. ps.ensureVoteBitArrays(height, numValidators)
  945. }
  946. func (ps *PeerState) ensureVoteBitArrays(height int64, numValidators int) {
  947. if ps.Height == height {
  948. if ps.Prevotes == nil {
  949. ps.Prevotes = cmn.NewBitArray(numValidators)
  950. }
  951. if ps.Precommits == nil {
  952. ps.Precommits = cmn.NewBitArray(numValidators)
  953. }
  954. if ps.CatchupCommit == nil {
  955. ps.CatchupCommit = cmn.NewBitArray(numValidators)
  956. }
  957. if ps.ProposalPOL == nil {
  958. ps.ProposalPOL = cmn.NewBitArray(numValidators)
  959. }
  960. } else if ps.Height == height+1 {
  961. if ps.LastCommit == nil {
  962. ps.LastCommit = cmn.NewBitArray(numValidators)
  963. }
  964. }
  965. }
  966. // RecordVote updates internal statistics for this peer by recording the vote.
  967. // It returns the total number of votes (1 per block). This essentially means
  968. // the number of blocks for which peer has been sending us votes.
  969. func (ps *PeerState) RecordVote(vote *types.Vote) int {
  970. ps.mtx.Lock()
  971. defer ps.mtx.Unlock()
  972. if ps.stats.lastVoteHeight >= vote.Height {
  973. return ps.stats.votes
  974. }
  975. ps.stats.lastVoteHeight = vote.Height
  976. ps.stats.votes++
  977. return ps.stats.votes
  978. }
  979. // VotesSent returns the number of blocks for which peer has been sending us
  980. // votes.
  981. func (ps *PeerState) VotesSent() int {
  982. ps.mtx.Lock()
  983. defer ps.mtx.Unlock()
  984. return ps.stats.votes
  985. }
  986. // RecordBlockPart updates internal statistics for this peer by recording the
  987. // block part. It returns the total number of block parts (1 per block). This
  988. // essentially means the number of blocks for which peer has been sending us
  989. // block parts.
  990. func (ps *PeerState) RecordBlockPart(bp *BlockPartMessage) int {
  991. ps.mtx.Lock()
  992. defer ps.mtx.Unlock()
  993. if ps.stats.lastBlockPartHeight >= bp.Height {
  994. return ps.stats.blockParts
  995. }
  996. ps.stats.lastBlockPartHeight = bp.Height
  997. ps.stats.blockParts++
  998. return ps.stats.blockParts
  999. }
  1000. // BlockPartsSent returns the number of blocks for which peer has been sending
  1001. // us block parts.
  1002. func (ps *PeerState) BlockPartsSent() int {
  1003. ps.mtx.Lock()
  1004. defer ps.mtx.Unlock()
  1005. return ps.stats.blockParts
  1006. }
  1007. // SetHasVote sets the given vote as known by the peer
  1008. func (ps *PeerState) SetHasVote(vote *types.Vote) {
  1009. ps.mtx.Lock()
  1010. defer ps.mtx.Unlock()
  1011. ps.setHasVote(vote.Height, vote.Round, vote.Type, vote.ValidatorIndex)
  1012. }
  1013. func (ps *PeerState) setHasVote(height int64, round int, type_ byte, index int) {
  1014. logger := ps.logger.With("peerH/R", cmn.Fmt("%d/%d", ps.Height, ps.Round), "H/R", cmn.Fmt("%d/%d", height, round))
  1015. logger.Debug("setHasVote", "type", type_, "index", index)
  1016. // NOTE: some may be nil BitArrays -> no side effects.
  1017. psVotes := ps.getVoteBitArray(height, round, type_)
  1018. if psVotes != nil {
  1019. psVotes.SetIndex(index, true)
  1020. }
  1021. }
  1022. // ApplyNewRoundStepMessage updates the peer state for the new round.
  1023. func (ps *PeerState) ApplyNewRoundStepMessage(msg *NewRoundStepMessage) {
  1024. ps.mtx.Lock()
  1025. defer ps.mtx.Unlock()
  1026. // Ignore duplicates or decreases
  1027. if CompareHRS(msg.Height, msg.Round, msg.Step, ps.Height, ps.Round, ps.Step) <= 0 {
  1028. return
  1029. }
  1030. // Just remember these values.
  1031. psHeight := ps.Height
  1032. psRound := ps.Round
  1033. //psStep := ps.Step
  1034. psCatchupCommitRound := ps.CatchupCommitRound
  1035. psCatchupCommit := ps.CatchupCommit
  1036. startTime := time.Now().Add(-1 * time.Duration(msg.SecondsSinceStartTime) * time.Second)
  1037. ps.Height = msg.Height
  1038. ps.Round = msg.Round
  1039. ps.Step = msg.Step
  1040. ps.StartTime = startTime
  1041. if psHeight != msg.Height || psRound != msg.Round {
  1042. ps.Proposal = false
  1043. ps.ProposalBlockPartsHeader = types.PartSetHeader{}
  1044. ps.ProposalBlockParts = nil
  1045. ps.ProposalPOLRound = -1
  1046. ps.ProposalPOL = nil
  1047. // We'll update the BitArray capacity later.
  1048. ps.Prevotes = nil
  1049. ps.Precommits = nil
  1050. }
  1051. if psHeight == msg.Height && psRound != msg.Round && msg.Round == psCatchupCommitRound {
  1052. // Peer caught up to CatchupCommitRound.
  1053. // Preserve psCatchupCommit!
  1054. // NOTE: We prefer to use prs.Precommits if
  1055. // pr.Round matches pr.CatchupCommitRound.
  1056. ps.Precommits = psCatchupCommit
  1057. }
  1058. if psHeight != msg.Height {
  1059. // Shift Precommits to LastCommit.
  1060. if psHeight+1 == msg.Height && psRound == msg.LastCommitRound {
  1061. ps.LastCommitRound = msg.LastCommitRound
  1062. ps.LastCommit = ps.Precommits
  1063. } else {
  1064. ps.LastCommitRound = msg.LastCommitRound
  1065. ps.LastCommit = nil
  1066. }
  1067. // We'll update the BitArray capacity later.
  1068. ps.CatchupCommitRound = -1
  1069. ps.CatchupCommit = nil
  1070. }
  1071. }
  1072. // ApplyCommitStepMessage updates the peer state for the new commit.
  1073. func (ps *PeerState) ApplyCommitStepMessage(msg *CommitStepMessage) {
  1074. ps.mtx.Lock()
  1075. defer ps.mtx.Unlock()
  1076. if ps.Height != msg.Height {
  1077. return
  1078. }
  1079. ps.ProposalBlockPartsHeader = msg.BlockPartsHeader
  1080. ps.ProposalBlockParts = msg.BlockParts
  1081. }
  1082. // ApplyProposalPOLMessage updates the peer state for the new proposal POL.
  1083. func (ps *PeerState) ApplyProposalPOLMessage(msg *ProposalPOLMessage) {
  1084. ps.mtx.Lock()
  1085. defer ps.mtx.Unlock()
  1086. if ps.Height != msg.Height {
  1087. return
  1088. }
  1089. if ps.ProposalPOLRound != msg.ProposalPOLRound {
  1090. return
  1091. }
  1092. // TODO: Merge onto existing ps.ProposalPOL?
  1093. // We might have sent some prevotes in the meantime.
  1094. ps.ProposalPOL = msg.ProposalPOL
  1095. }
  1096. // ApplyHasVoteMessage updates the peer state for the new vote.
  1097. func (ps *PeerState) ApplyHasVoteMessage(msg *HasVoteMessage) {
  1098. ps.mtx.Lock()
  1099. defer ps.mtx.Unlock()
  1100. if ps.Height != msg.Height {
  1101. return
  1102. }
  1103. ps.setHasVote(msg.Height, msg.Round, msg.Type, msg.Index)
  1104. }
  1105. // ApplyVoteSetBitsMessage updates the peer state for the bit-array of votes
  1106. // it claims to have for the corresponding BlockID.
  1107. // `ourVotes` is a BitArray of votes we have for msg.BlockID
  1108. // NOTE: if ourVotes is nil (e.g. msg.Height < rs.Height),
  1109. // we conservatively overwrite ps's votes w/ msg.Votes.
  1110. func (ps *PeerState) ApplyVoteSetBitsMessage(msg *VoteSetBitsMessage, ourVotes *cmn.BitArray) {
  1111. ps.mtx.Lock()
  1112. defer ps.mtx.Unlock()
  1113. votes := ps.getVoteBitArray(msg.Height, msg.Round, msg.Type)
  1114. if votes != nil {
  1115. if ourVotes == nil {
  1116. votes.Update(msg.Votes)
  1117. } else {
  1118. otherVotes := votes.Sub(ourVotes)
  1119. hasVotes := otherVotes.Or(msg.Votes)
  1120. votes.Update(hasVotes)
  1121. }
  1122. }
  1123. }
  1124. // String returns a string representation of the PeerState
  1125. func (ps *PeerState) String() string {
  1126. return ps.StringIndented("")
  1127. }
  1128. // StringIndented returns a string representation of the PeerState
  1129. func (ps *PeerState) StringIndented(indent string) string {
  1130. ps.mtx.Lock()
  1131. defer ps.mtx.Unlock()
  1132. return fmt.Sprintf(`PeerState{
  1133. %s Key %v
  1134. %s PRS %v
  1135. %s Stats %v
  1136. %s}`,
  1137. indent, ps.Peer.ID(),
  1138. indent, ps.PeerRoundState.StringIndented(indent+" "),
  1139. indent, ps.stats,
  1140. indent)
  1141. }
  1142. //-----------------------------------------------------------------------------
  1143. // Messages
  1144. const (
  1145. msgTypeNewRoundStep = byte(0x01)
  1146. msgTypeCommitStep = byte(0x02)
  1147. msgTypeProposal = byte(0x11)
  1148. msgTypeProposalPOL = byte(0x12)
  1149. msgTypeBlockPart = byte(0x13) // both block & POL
  1150. msgTypeVote = byte(0x14)
  1151. msgTypeHasVote = byte(0x15)
  1152. msgTypeVoteSetMaj23 = byte(0x16)
  1153. msgTypeVoteSetBits = byte(0x17)
  1154. msgTypeProposalHeartbeat = byte(0x20)
  1155. )
  1156. // ConsensusMessage is a message that can be sent and received on the ConsensusReactor
  1157. type ConsensusMessage interface{}
  1158. var _ = wire.RegisterInterface(
  1159. struct{ ConsensusMessage }{},
  1160. wire.ConcreteType{&NewRoundStepMessage{}, msgTypeNewRoundStep},
  1161. wire.ConcreteType{&CommitStepMessage{}, msgTypeCommitStep},
  1162. wire.ConcreteType{&ProposalMessage{}, msgTypeProposal},
  1163. wire.ConcreteType{&ProposalPOLMessage{}, msgTypeProposalPOL},
  1164. wire.ConcreteType{&BlockPartMessage{}, msgTypeBlockPart},
  1165. wire.ConcreteType{&VoteMessage{}, msgTypeVote},
  1166. wire.ConcreteType{&HasVoteMessage{}, msgTypeHasVote},
  1167. wire.ConcreteType{&VoteSetMaj23Message{}, msgTypeVoteSetMaj23},
  1168. wire.ConcreteType{&VoteSetBitsMessage{}, msgTypeVoteSetBits},
  1169. wire.ConcreteType{&ProposalHeartbeatMessage{}, msgTypeProposalHeartbeat},
  1170. )
  1171. // DecodeMessage decodes the given bytes into a ConsensusMessage.
  1172. // TODO: check for unnecessary extra bytes at the end.
  1173. func DecodeMessage(bz []byte) (msgType byte, msg ConsensusMessage, err error) {
  1174. msgType = bz[0]
  1175. n := new(int)
  1176. r := bytes.NewReader(bz)
  1177. msgI := wire.ReadBinary(struct{ ConsensusMessage }{}, r, maxConsensusMessageSize, n, &err)
  1178. msg = msgI.(struct{ ConsensusMessage }).ConsensusMessage
  1179. return
  1180. }
  1181. //-------------------------------------
  1182. // NewRoundStepMessage is sent for every step taken in the ConsensusState.
  1183. // For every height/round/step transition
  1184. type NewRoundStepMessage struct {
  1185. Height int64
  1186. Round int
  1187. Step cstypes.RoundStepType
  1188. SecondsSinceStartTime int
  1189. LastCommitRound int
  1190. }
  1191. // String returns a string representation.
  1192. func (m *NewRoundStepMessage) String() string {
  1193. return fmt.Sprintf("[NewRoundStep H:%v R:%v S:%v LCR:%v]",
  1194. m.Height, m.Round, m.Step, m.LastCommitRound)
  1195. }
  1196. //-------------------------------------
  1197. // CommitStepMessage is sent when a block is committed.
  1198. type CommitStepMessage struct {
  1199. Height int64
  1200. BlockPartsHeader types.PartSetHeader
  1201. BlockParts *cmn.BitArray
  1202. }
  1203. // String returns a string representation.
  1204. func (m *CommitStepMessage) String() string {
  1205. return fmt.Sprintf("[CommitStep H:%v BP:%v BA:%v]", m.Height, m.BlockPartsHeader, m.BlockParts)
  1206. }
  1207. //-------------------------------------
  1208. // ProposalMessage is sent when a new block is proposed.
  1209. type ProposalMessage struct {
  1210. Proposal *types.Proposal
  1211. }
  1212. // String returns a string representation.
  1213. func (m *ProposalMessage) String() string {
  1214. return fmt.Sprintf("[Proposal %v]", m.Proposal)
  1215. }
  1216. //-------------------------------------
  1217. // ProposalPOLMessage is sent when a previous proposal is re-proposed.
  1218. type ProposalPOLMessage struct {
  1219. Height int64
  1220. ProposalPOLRound int
  1221. ProposalPOL *cmn.BitArray
  1222. }
  1223. // String returns a string representation.
  1224. func (m *ProposalPOLMessage) String() string {
  1225. return fmt.Sprintf("[ProposalPOL H:%v POLR:%v POL:%v]", m.Height, m.ProposalPOLRound, m.ProposalPOL)
  1226. }
  1227. //-------------------------------------
  1228. // BlockPartMessage is sent when gossipping a piece of the proposed block.
  1229. type BlockPartMessage struct {
  1230. Height int64
  1231. Round int
  1232. Part *types.Part
  1233. }
  1234. // String returns a string representation.
  1235. func (m *BlockPartMessage) String() string {
  1236. return fmt.Sprintf("[BlockPart H:%v R:%v P:%v]", m.Height, m.Round, m.Part)
  1237. }
  1238. //-------------------------------------
  1239. // VoteMessage is sent when voting for a proposal (or lack thereof).
  1240. type VoteMessage struct {
  1241. Vote *types.Vote
  1242. }
  1243. // String returns a string representation.
  1244. func (m *VoteMessage) String() string {
  1245. return fmt.Sprintf("[Vote %v]", m.Vote)
  1246. }
  1247. //-------------------------------------
  1248. // HasVoteMessage is sent to indicate that a particular vote has been received.
  1249. type HasVoteMessage struct {
  1250. Height int64
  1251. Round int
  1252. Type byte
  1253. Index int
  1254. }
  1255. // String returns a string representation.
  1256. func (m *HasVoteMessage) String() string {
  1257. return fmt.Sprintf("[HasVote VI:%v V:{%v/%02d/%v}]", m.Index, m.Height, m.Round, m.Type)
  1258. }
  1259. //-------------------------------------
  1260. // VoteSetMaj23Message is sent to indicate that a given BlockID has seen +2/3 votes.
  1261. type VoteSetMaj23Message struct {
  1262. Height int64
  1263. Round int
  1264. Type byte
  1265. BlockID types.BlockID
  1266. }
  1267. // String returns a string representation.
  1268. func (m *VoteSetMaj23Message) String() string {
  1269. return fmt.Sprintf("[VSM23 %v/%02d/%v %v]", m.Height, m.Round, m.Type, m.BlockID)
  1270. }
  1271. //-------------------------------------
  1272. // VoteSetBitsMessage is sent to communicate the bit-array of votes seen for the BlockID.
  1273. type VoteSetBitsMessage struct {
  1274. Height int64
  1275. Round int
  1276. Type byte
  1277. BlockID types.BlockID
  1278. Votes *cmn.BitArray
  1279. }
  1280. // String returns a string representation.
  1281. func (m *VoteSetBitsMessage) String() string {
  1282. return fmt.Sprintf("[VSB %v/%02d/%v %v %v]", m.Height, m.Round, m.Type, m.BlockID, m.Votes)
  1283. }
  1284. //-------------------------------------
  1285. // ProposalHeartbeatMessage is sent to signal that a node is alive and waiting for transactions for a proposal.
  1286. type ProposalHeartbeatMessage struct {
  1287. Heartbeat *types.Heartbeat
  1288. }
  1289. // String returns a string representation.
  1290. func (m *ProposalHeartbeatMessage) String() string {
  1291. return fmt.Sprintf("[HEARTBEAT %v]", m.Heartbeat)
  1292. }