diff --git a/docs/architecture/adr-040-blockchain-reactor-refactor.md b/docs/architecture/adr-040-blockchain-reactor-refactor.md new file mode 100644 index 000000000..520d55b5d --- /dev/null +++ b/docs/architecture/adr-040-blockchain-reactor-refactor.md @@ -0,0 +1,534 @@ +# ADR 040: Blockchain Reactor Refactor + +## Changelog + +19-03-2019: Initial draft + +## Context + +The Blockchain Reactor's high level responsibility is to enable peers who are far behind the current state of the +blockchain to quickly catch up by downloading many blocks in parallel from its peers, verifying block correctness, and +executing them against the ABCI application. We call the protocol executed by the Blockchain Reactor `fast-sync`. +The current architecture diagram of the blockchain reactor can be found here: + +![Blockchain Reactor Architecture Diagram](img/bc-reactor.png) + +The current architecture consists of dozens of routines and it is tightly depending on the `Switch`, making writing +unit tests almost impossible. Current tests require setting up complex dependency graphs and dealing with concurrency. +Note that having dozens of routines is in this case overkill as most of the time routines sits idle waiting for +something to happen (message to arrive or timeout to expire). Due to dependency on the `Switch`, testing relatively +complex network scenarios and failures (for example adding and removing peers) is very complex tasks and frequently lead +to complex tests with not deterministic behavior ([#3400]). Impossibility to write proper tests makes confidence in +the code low and this resulted in several issues (some are fixed in the meantime and some are still open): +[#3400], [#2897], [#2896], [#2699], [#2888], [#2457], [#2622], [#2026]. + +## Decision + +To remedy these issues we plan a major refactor of the blockchain reactor. The proposed architecture is largely inspired +by ADR-30 and is presented on the following diagram: +![Blockchain Reactor Refactor Diagram](img/bc-reactor-refactor.png) + +We suggest a concurrency architecture where the core algorithm (we call it `Controller`) is extracted into a finite +state machine. The active routine of the reactor is called `Executor` and is responsible for receiving and sending +messages from/to peers and triggering timeouts. What messages should be sent and timeouts triggered is determined mostly +by the `Controller`. The exception is `Peer Heartbeat` mechanism which is `Executor` responsibility. The heartbeat +mechanism is used to remove slow and unresponsive peers from the peer list. Writing of unit tests is simpler with +this architecture as most of the critical logic is part of the `Controller` function. We expect that simpler concurrency +architecture will not have significant negative effect on the performance of this reactor (to be confirmed by +experimental evaluation). + + +### Implementation changes + +We assume the following system model for "fast sync" protocol: + +* a node is connected to a random subset of all nodes that represents its peer set. Some nodes are correct and some + might be faulty. We don't make assumptions about ratio of faulty nodes, i.e., it is possible that all nodes in some + peer set are faulty. +* we assume that communication between correct nodes is synchronous, i.e., if a correct node `p` sends a message `m` to + a correct node `q` at time `t`, then `q` will receive message the latest at time `t+Delta` where `Delta` is a system + parameter that is known by network participants. `Delta` is normally chosen to be an order of magnitude higher than + the real communication delay (maximum) between correct nodes. Therefore if a correct node `p` sends a request message + to a correct node `q` at time `t` and there is no the corresponding reply at time `t + 2*Delta`, then `p` can assume + that `q` is faulty. Note that the network assumptions for the consensus reactor are different (we assume partially + synchronous model there). + +The requirements for the "fast sync" protocol are formally specified as follows: + +- `Correctness`: If a correct node `p` is connected to a correct node `q` for a long enough period of time, then `p` +- will eventually download all requested blocks from `q`. +- `Termination`: If a set of peers of a correct node `p` is stable (no new nodes are added to the peer set of `p`) for +- a long enough period of time, then protocol eventually terminates. +- `Fairness`: A correct node `p` sends requests for blocks to all peers from its peer set. + +As explained above, the `Executor` is responsible for sending and receiving messages that are part of the `fast-sync` +protocol. The following messages are exchanged as part of `fast-sync` protocol: + +``` go +type Message int +const ( + MessageUnknown Message = iota + MessageStatusRequest + MessageStatusResponse + MessageBlockRequest + MessageBlockResponse +) +``` +`MessageStatusRequest` is sent periodically to all peers as a request for a peer to provide its current height. It is +part of the `Peer Heartbeat` mechanism and a failure to respond timely to this message results in a peer being removed +from the peer set. Note that the `Peer Heartbeat` mechanism is used only while a peer is in `fast-sync` mode. We assume +here existence of a mechanism that gives node a possibility to inform its peers that it is in the `fast-sync` mode. + +``` go +type MessageStatusRequest struct { + SeqNum int64 // sequence number of the request +} +``` +`MessageStatusResponse` is sent as a response to `MessageStatusRequest` to inform requester about the peer current +height. + +``` go +type MessageStatusResponse struct { + SeqNum int64 // sequence number of the corresponding request + Height int64 // current peer height +} +``` + +`MessageBlockRequest` is used to make a request for a block and the corresponding commit certificate at a given height. + +``` go +type MessageBlockRequest struct { + Height int64 +} +``` + +`MessageBlockResponse` is a response for the corresponding block request. In addition to providing the block and the +corresponding commit certificate, it contains also a current peer height. + +``` go +type MessageBlockResponse struct { + Height int64 + Block Block + Commit Commit + PeerHeight int64 +} +``` + +In addition to sending and receiving messages, and `HeartBeat` mechanism, controller is also managing timeouts +that are triggered upon `Controller` request. `Controller` is then informed once a timeout expires. + +``` go +type TimeoutTrigger int +const ( + TimeoutUnknown TimeoutTrigger = iota + TimeoutResponseTrigger + TimeoutTerminationTrigger +) +``` + +The `Controller` can be modelled as a function with clearly defined inputs: + +* `State` - current state of the node. Contains data about connected peers and its behavior, pending requests, +* received blocks, etc. +* `Event` - significant events in the network. + +producing clear outputs: + +* `State` - updated state of the node, +* `MessageToSend` - signal what message to send and to which peer +* `TimeoutTrigger` - signal that timeout should be triggered. + + +We consider the following `Event` types: + +``` go +type Event int +const ( + EventUnknown Event = iota + EventStatusReport + EventBlockRequest + EventBlockResponse + EventRemovePeer + EventTimeoutResponse + EventTimeoutTermination +) +``` + +`EventStatusResponse` event is generated once `MessageStatusResponse` is received by the `Executor`. + +``` go +type EventStatusReport struct { + PeerID ID + Height int64 +} +``` + +`EventBlockRequest` event is generated once `MessageBlockRequest` is received by the `Executor`. + +``` go +type EventBlockRequest struct { + Height int64 + PeerID p2p.ID +} +``` +`EventBlockResponse` event is generated upon reception of `MessageBlockResponse` message by the `Executor`. + +``` go +type EventBlockResponse struct { + Height int64 + Block Block + Commit Commit + PeerID ID + PeerHeight int64 +} +``` +`EventRemovePeer` is generated by `Executor` to signal that the connection to a peer is closed due to peer misbehavior. + +``` go +type EventRemovePeer struct { + PeerID ID +} +``` +`EventTimeoutResponse` is generated by `Executor` to signal that a timeout triggered by `TimeoutResponseTrigger` has +expired. + +``` go +type EventTimeoutResponse struct { + PeerID ID + Height int64 +} +``` +`EventTimeoutTermination` is generated by `Executor` to signal that a timeout triggered by `TimeoutTerminationTrigger` +has expired. + +``` go +type EventTimeoutTermination struct { + Height int64 +} +``` + +`MessageToSend` is just a wrapper around `Message` type that contains id of the peer to which message should be sent. + +``` go +type MessageToSend struct { + PeerID ID + Message Message +} +``` + +The Controller state machine can be in two modes: `ModeFastSync` when +a node is trying to catch up with the network by downloading committed blocks, +and `ModeConsensus` in which it executes Tendermint consensus protocol. We +consider that `fast sync` mode terminates once the Controller switch to +`ModeConsensus`. + +``` go +type Mode int +const ( + ModeUnknown Mode = iota + ModeFastSync + ModeConsensus +) +``` +`Controller` is managing the following state: + +``` go +type ControllerState struct { + Height int64 // the first block that is not committed + Mode Mode // mode of operation + PeerMap map[ID]PeerStats // map of peer IDs to peer statistics + MaxRequestPending int64 // maximum height of the pending requests + FailedRequests []int64 // list of failed block requests + PendingRequestsNum int // total number of pending requests + Store []BlockInfo // contains list of downloaded blocks + Executor BlockExecutor // store, verify and executes blocks +} +``` + +`PeerStats` data structure keeps for every peer its current height and a list of pending requests for blocks. + +``` go +type PeerStats struct { + Height int64 + PendingRequest int64 // a request sent to this peer +} +``` + +`BlockInfo` data structure is used to store information (as part of block store) about downloaded blocks: from what peer + a block and the corresponding commit certificate are received. +``` go +type BlockInfo struct { + Block Block + Commit Commit + PeerID ID // a peer from which we received the corresponding Block and Commit +} +``` + +The `Controller` is initialized by providing an initial height (`startHeight`) from which it will start downloading +blocks from peers and the current state of the `BlockExecutor`. + +``` go +func NewControllerState(startHeight int64, executor BlockExecutor) ControllerState { + state = ControllerState {} + state.Height = startHeight + state.Mode = ModeFastSync + state.MaxRequestPending = startHeight - 1 + state.PendingRequestsNum = 0 + state.Executor = executor + initialize state.PeerMap, state.FailedRequests and state.Store to empty data structures + return state +} +``` + +The core protocol logic is given with the following function: + +``` go +func handleEvent(state ControllerState, event Event) (ControllerState, Message, TimeoutTrigger, Error) { + msg = nil + timeout = nil + error = nil + + switch state.Mode { + case ModeConsensus: + switch event := event.(type) { + case EventBlockRequest: + msg = createBlockResponseMessage(state, event) + return state, msg, timeout, error + default: + error = "Only respond to BlockRequests while in ModeConsensus!" + return state, msg, timeout, error + } + + case ModeFastSync: + switch event := event.(type) { + case EventBlockRequest: + msg = createBlockResponseMessage(state, event) + return state, msg, timeout, error + + case EventStatusResponse: + return handleEventStatusResponse(event, state) + + case EventRemovePeer: + return handleEventRemovePeer(event, state) + + case EventBlockResponse: + return handleEventBlockResponse(event, state) + + case EventResponseTimeout: + return handleEventResponseTimeout(event, state) + + case EventTerminationTimeout: + // Termination timeout is triggered in case of empty peer set and in case there are no pending requests. + // If this timeout expires and in the meantime no new peers are added or new pending requests are made + // then `fast-sync` mode terminates by switching to `ModeConsensus`. + // Note that termination timeout should be higher than the response timeout. + if state.Height == event.Height && state.PendingRequestsNum == 0 { state.State = ConsensusMode } + return state, msg, timeout, error + + default: + error = "Received unknown event type!" + return state, msg, timeout, error + } + } +} +``` + +``` go +func createBlockResponseMessage(state ControllerState, event BlockRequest) MessageToSend { + msgToSend = nil + if _, ok := state.PeerMap[event.PeerID]; !ok { peerStats = PeerStats{-1, -1} } + if state.Executor.ContainsBlockWithHeight(event.Height) && event.Height > peerStats.Height { + peerStats = event.Height + msg = BlockResponseMessage{ + Height: event.Height, + Block: state.Executor.getBlock(eventHeight), + Commit: state.Executor.getCommit(eventHeight), + PeerID: event.PeerID, + CurrentHeight: state.Height - 1, + } + msgToSend = MessageToSend { event.PeerID, msg } + } + state.PeerMap[event.PeerID] = peerStats + return msgToSend +} +``` + +``` go +func handleEventStatusResponse(event EventStatusResponse, state ControllerState) (ControllerState, MessageToSend, TimeoutTrigger, Error) { + if _, ok := state.PeerMap[event.PeerID]; !ok { + peerStats = PeerStats{ -1, -1 } + } else { + peerStats = state.PeerMap[event.PeerID] + } + + if event.Height > peerStats.Height { peerStats.Height = event.Height } + // if there are no pending requests for this peer, try to send him a request for block + if peerStats.PendingRequest == -1 { + msg = createBlockRequestMessages(state, event.PeerID, peerStats.Height) + // msg is nil if no request for block can be made to a peer at this point in time + if msg != nil { + peerStats.PendingRequests = msg.Height + state.PendingRequestsNum++ + // when a request for a block is sent to a peer, a response timeout is triggered. If no corresponding block is sent by the peer + // during response timeout period, then the peer is considered faulty and is removed from the peer set. + timeout = ResponseTimeoutTrigger{ msg.PeerID, msg.Height, PeerTimeout } + } else if state.PendingRequestsNum == 0 { + // if there are no pending requests and no new request can be placed to the peer, termination timeout is triggered. + // If termination timeout expires and we are still at the same height and there are no pending requests, the "fast-sync" + // mode is finished and we switch to `ModeConsensus`. + timeout = TerminationTimeoutTrigger{ state.Height, TerminationTimeout } + } + } + state.PeerMap[event.PeerID] = peerStats + return state, msg, timeout, error +} +``` + +``` go +func handleEventRemovePeer(event EventRemovePeer, state ControllerState) (ControllerState, MessageToSend, TimeoutTrigger, Error) { + if _, ok := state.PeerMap[event.PeerID]; ok { + pendingRequest = state.PeerMap[event.PeerID].PendingRequest + // if a peer is removed from the peer set, its pending request is declared failed and added to the `FailedRequests` list + // so it can be retried. + if pendingRequest != -1 { + add(state.FailedRequests, pendingRequest) + } + state.PendingRequestsNum-- + delete(state.PeerMap, event.PeerID) + // if the peer set is empty after removal of this peer then termination timeout is triggered. + if state.PeerMap.isEmpty() { + timeout = TerminationTimeoutTrigger{ state.Height, TerminationTimeout } + } + } else { error = "Removing unknown peer!" } + return state, msg, timeout, error +``` + +``` go +func handleEventBlockResponse(event EventBlockResponse, state ControllerState) (ControllerState, MessageToSend, TimeoutTrigger, Error) + if state.PeerMap[event.PeerID] { + peerStats = state.PeerMap[event.PeerID] + // when expected block arrives from a peer, it is added to the store so it can be verified and if correct executed after. + if peerStats.PendingRequest == event.Height { + peerStats.PendingRequest = -1 + state.PendingRequestsNum-- + if event.PeerHeight > peerStats.Height { peerStats.Height = event.PeerHeight } + state.Store[event.Height] = BlockInfo{ event.Block, event.Commit, event.PeerID } + // blocks are verified sequentially so adding a block to the store does not mean that it will be immediately verified + // as some of the previous blocks might be missing. + state = verifyBlocks(state) // it can lead to event.PeerID being removed from peer list + if _, ok := state.PeerMap[event.PeerID]; ok { + // we try to identify new request for a block that can be asked to the peer + msg = createBlockRequestMessage(state, event.PeerID, peerStats.Height) + if msg != nil { + peerStats.PendingRequests = msg.Height + state.PendingRequestsNum++ + // if request for block is made, response timeout is triggered + timeout = ResponseTimeoutTrigger{ msg.PeerID, msg.Height, PeerTimeout } + } else if state.PeerMap.isEmpty() || state.PendingRequestsNum == 0 { + // if the peer map is empty (the peer can be removed as block verification failed) or there are no pending requests + // termination timeout is triggered. + timeout = TerminationTimeoutTrigger{ state.Height, TerminationTimeout } + } + } + } else { error = "Received Block from wrong peer!" } + } else { error = "Received Block from unknown peer!" } + + state.PeerMap[event.PeerID] = peerStats + return state, msg, timeout, error +} +``` + +``` go +func handleEventResponseTimeout(event, state) { + if _, ok := state.PeerMap[event.PeerID]; ok { + peerStats = state.PeerMap[event.PeerID] + // if a response timeout expires and the peer hasn't delivered the block, the peer is removed from the peer list and + // the request is added to the `FailedRequests` so the block can be downloaded from other peer + if peerStats.PendingRequest == event.Height { + add(state.FailedRequests, pendingRequest) + delete(state.PeerMap, event.PeerID) + state.PendingRequestsNum-- + // if peer set is empty, then termination timeout is triggered + if state.PeerMap.isEmpty() { + timeout = TimeoutTrigger{ state.Height, TerminationTimeout } + } + } + } + return state, msg, timeout, error +} +``` + +``` go +func createBlockRequestMessage(state ControllerState, peerID ID, peerHeight int64) MessageToSend { + msg = nil + blockHeight = -1 + r = find request in state.FailedRequests such that r <= peerHeight // returns `nil` if there are no such request + // if there is a height in failed requests that can be downloaded from the peer send request to it + if r != nil { + blockNumber = r + delete(state.FailedRequests, r) + } else if state.MaxRequestPending < peerHeight { + // if height of the maximum pending request is smaller than peer height, then ask peer for next block + state.MaxRequestPending++ + blockHeight = state.MaxRequestPending // increment state.MaxRequestPending and then return the new value + } + + if blockHeight > -1 { msg = MessageToSend { peerID, MessageBlockRequest { blockHeight } } + return msg +} +``` + +``` go +func verifyBlocks(state State) State { + done = false + for !done { + block = state.Store[height] + if block != nil { + verified = verify block.Block using block.Commit // return `true` is verification succeed, 'false` otherwise + + if verified { + block.Execute() // executing block is costly operation so it might make sense executing asynchronously + state.Height++ + } else { + // if block verification failed, then it is added to `FailedRequests` and the peer is removed from the peer set + add(state.FailedRequests, height) + state.Store[height] = nil + if _, ok := state.PeerMap[block.PeerID]; ok { + pendingRequest = state.PeerMap[block.PeerID].PendingRequest + // if there is a pending request sent to the peer that is just to be removed from the peer set, add it to `FailedRequests` + if pendingRequest != -1 { + add(state.FailedRequests, pendingRequest) + state.PendingRequestsNum-- + } + delete(state.PeerMap, event.PeerID) + } + done = true + } + } else { done = true } + } + return state +} +``` + +In the proposed architecture `Controller` is not active task, i.e., it is being called by the `Executor`. Depending on +the return values returned by `Controller`,`Executor` will send a message to some peer (`msg` != nil), trigger a +timeout (`timeout` != nil) or deal with errors (`error` != nil). +In case a timeout is triggered, it will provide as an input to `Controller` the corresponding timeout event once +timeout expires. + + +## Status + +Draft. + +## Consequences + +### Positive + +- isolated implementation of the algorithm +- improved testability - simpler to prove correctness +- clearer separation of concerns - easier to reason + +### Negative + +### Neutral diff --git a/docs/architecture/img/bc-reactor-refactor.png b/docs/architecture/img/bc-reactor-refactor.png new file mode 100644 index 000000000..4cd84a02f Binary files /dev/null and b/docs/architecture/img/bc-reactor-refactor.png differ diff --git a/docs/architecture/img/bc-reactor.png b/docs/architecture/img/bc-reactor.png new file mode 100644 index 000000000..f7fe0f819 Binary files /dev/null and b/docs/architecture/img/bc-reactor.png differ diff --git a/docs/tendermint-core/configuration.md b/docs/tendermint-core/configuration.md index cdbdf1fe2..f24e76d6d 100644 --- a/docs/tendermint-core/configuration.md +++ b/docs/tendermint-core/configuration.md @@ -42,7 +42,7 @@ fast_sync = true # - EXPERIMENTAL # - may be faster is some use-cases (random reads - indexer) # - use boltdb build tag (go build -tags boltdb) -db_backend = "leveldb" +db_backend = "goleveldb" # Database directory db_dir = "data" diff --git a/docs/tendermint-core/running-in-production.md b/docs/tendermint-core/running-in-production.md index 1ec792831..9cb21fc54 100644 --- a/docs/tendermint-core/running-in-production.md +++ b/docs/tendermint-core/running-in-production.md @@ -8,7 +8,7 @@ key-value database. Unfortunately, this implementation of LevelDB seems to suffe install the real C-implementation of LevelDB and compile Tendermint to use that using `make build_c`. See the [install instructions](../introduction/install.md) for details. -Tendermint keeps multiple distinct LevelDB databases in the `$TMROOT/data`: +Tendermint keeps multiple distinct databases in the `$TMROOT/data`: - `blockstore.db`: Keeps the entire blockchain - stores blocks, block commits, and block meta data, each indexed by height. Used to sync new