Add Section for P2P (#53)

* Add Section for P2P - moved over the section on p2p Signed-off-by: Marko Baricevic <marbar3778@yahoo.com> * add some more files Signed-off-by: Marko Baricevic <marbar3778@yahoo.com>
5 years ago · 513c67230f
--- a/spec/README.md
+++ b/spec/README.md
@ -1,9 +1,81 @@
 # Spec
 # Overview

 This folder houses the spec of Tendermint the Protocol.
 This is a markdown specification of the Tendermint blockchain.
 It defines the base data structures, how they are validated,
 and how they are communicated over the network.

 **Note: We are currently working on expanding the spec and will slowly be migrating it from Tendermint the repo**
 If you find discrepancies between the spec and the code that
 do not have an associated issue or pull request on github,
 please submit them to our [bug bounty](https://tendermint.com/security)!

 ### Table of Contents
 ## Contents

 - [ABCI Spec](./abci/README.md)
 - [Overview](#overview)

 ### Data Structures

 - [Encoding and Digests](./blockchain/encoding.md)
 - [Blockchain](./blockchain/blockchain.md)
 - [State](./blockchain/state.md)

 ### Consensus Protocol

 - [Consensus Algorithm](./consensus/consensus.md)
 - [Creating a proposal](./consensus/creating-proposal.md)
 - [Time](./consensus/bft-time.md)
 - [Light-Client](./consensus/light-client.md)

 ### P2P and Network Protocols

 - [The Base P2P Layer](./p2p/): multiplex the protocols ("reactors") on authenticated and encrypted TCP connections
 - [Peer Exchange (PEX)](./reactors/pex/): gossip known peer addresses so peers can find each other
 - [Block Sync](./reactors/block_sync/): gossip blocks so peers can catch up quickly
 - [Consensus](./reactors/consensus/): gossip votes and block parts so new blocks can be committed
 - [Mempool](./reactors/mempool/): gossip transactions so they get included in blocks
 - [Evidence](./reactors/evidence/): sending invalid evidence will stop the peer

 ### Software

 - [ABCI](./software/abci.md): Details about interactions between the
  application and consensus engine over ABCI
 - [Write-Ahead Log](./software/wal.md): Details about how the consensus
  engine preserves data and recovers from crash failures

 ## Overview

 Tendermint provides Byzantine Fault Tolerant State Machine Replication using
 hash-linked batches of transactions. Such transaction batches are called "blocks".
 Hence, Tendermint defines a "blockchain".

 Each block in Tendermint has a unique index - its Height.
 Height's in the blockchain are monotonic.
 Each block is committed by a known set of weighted Validators.
 Membership and weighting within this validator set may change over time.
 Tendermint guarantees the safety and liveness of the blockchain
 so long as less than 1/3 of the total weight of the Validator set
 is malicious or faulty.

 A commit in Tendermint is a set of signed messages from more than 2/3 of
 the total weight of the current Validator set. Validators take turns proposing
 blocks and voting on them. Once enough votes are received, the block is considered
 committed. These votes are included in the _next_ block as proof that the previous block
 was committed - they cannot be included in the current block, as that block has already been
 created.

 Once a block is committed, it can be executed against an application.
 The application returns results for each of the transactions in the block.
 The application can also return changes to be made to the validator set,
 as well as a cryptographic digest of its latest state.

 Tendermint is designed to enable efficient verification and authentication
 of the latest state of the blockchain. To achieve this, it embeds
 cryptographic commitments to certain information in the block "header".
 This information includes the contents of the block (eg. the transactions),
 the validator set committing the block, as well as the various results returned by the application.
 Note, however, that block execution only occurs _after_ a block is committed.
 Thus, application results can only be included in the _next_ block.

 Also note that information like the transaction results and the validator set are never
 directly included in the block - only their cryptographic digests (Merkle roots) are.
 Hence, verification of a block requires a separate data structure to store this information.
 We call this the `State`. Block verification also requires access to the previous block.
--- a/spec/p2p/config.md
+++ b/spec/p2p/config.md
@ -0,0 +1,38 @@
 # P2P Config

 Here we describe configuration options around the Peer Exchange.
 These can be set using flags or via the `$TMHOME/config/config.toml` file.

 ## Seed Mode

 `--p2p.seed_mode`

 The node operates in seed mode. In seed mode, a node continuously crawls the network for peers,
 and upon incoming connection shares some peers and disconnects.

 ## Seeds

 `--p2p.seeds “id100000000000000000000000000000000@1.2.3.4:26656,id200000000000000000000000000000000@2.3.4.5:4444”`

 Dials these seeds when we need more peers. They should return a list of peers and then disconnect.
 If we already have enough peers in the address book, we may never need to dial them.

 ## Persistent Peers

 `--p2p.persistent_peers “id100000000000000000000000000000000@1.2.3.4:26656,id200000000000000000000000000000000@2.3.4.5:26656”`

 Dial these peers and auto-redial them if the connection fails.
 These are intended to be trusted persistent peers that can help
 anchor us in the p2p network. The auto-redial uses exponential
 backoff and will give up after a day of trying to connect.

 **Note:** If `seeds` and `persistent_peers` intersect,
 the user will be warned that seeds may auto-close connections
 and that the node may not be able to keep the connection persistent.

 ## Private Peers

 `--p2p.private_peer_ids “id100000000000000000000000000000000,id200000000000000000000000000000000”`

 These are IDs of the peers that we do not add to the address book or gossip to
 other peers. They stay private to us.
--- a/spec/p2p/connection.md
+++ b/spec/p2p/connection.md
@ -0,0 +1,111 @@
 # P2P Multiplex Connection

 ## MConnection

 `MConnection` is a multiplex connection that supports multiple independent streams
 with distinct quality of service guarantees atop a single TCP connection.
 Each stream is known as a `Channel` and each `Channel` has a globally unique _byte id_.
 Each `Channel` also has a relative priority that determines the quality of service
 of the `Channel` compared to other `Channel`s.
 The _byte id_ and the relative priorities of each `Channel` are configured upon
 initialization of the connection.

 The `MConnection` supports three packet types:

 - Ping
 - Pong
 - Msg

 ### Ping and Pong

 The ping and pong messages consist of writing a single byte to the connection; 0x1 and 0x2, respectively.

 When we haven't received any messages on an `MConnection` in time `pingTimeout`, we send a ping message.
 When a ping is received on the `MConnection`, a pong is sent in response only if there are no other messages
 to send and the peer has not sent us too many pings (TODO).

 If a pong or message is not received in sufficient time after a ping, the peer is disconnected from.

 ### Msg

 Messages in channels are chopped into smaller `msgPacket`s for multiplexing.

 ```
 type msgPacket struct {
 	ChannelID byte
 	EOF       byte // 1 means message ends here.
 	Bytes     []byte
 }
 ```

 The `msgPacket` is serialized using [go-amino](https://github.com/tendermint/go-amino) and prefixed with 0x3.
 The received `Bytes` of a sequential set of packets are appended together
 until a packet with `EOF=1` is received, then the complete serialized message
 is returned for processing by the `onReceive` function of the corresponding channel.

 ### Multiplexing

 Messages are sent from a single `sendRoutine`, which loops over a select statement and results in the sending
 of a ping, a pong, or a batch of data messages. The batch of data messages may include messages from multiple channels.
 Message bytes are queued for sending in their respective channel, with each channel holding one unsent message at a time.
 Messages are chosen for a batch one at a time from the channel with the lowest ratio of recently sent bytes to channel priority.

 ## Sending Messages

 There are two methods for sending messages:

 ```go
 func (m MConnection) Send(chID byte, msg interface{}) bool {}
 func (m MConnection) TrySend(chID byte, msg interface{}) bool {}
 ```

 `Send(chID, msg)` is a blocking call that waits until `msg` is successfully queued
 for the channel with the given id byte `chID`. The message `msg` is serialized
 using the `tendermint/go-amino` submodule's `WriteBinary()` reflection routine.

 `TrySend(chID, msg)` is a nonblocking call that queues the message msg in the channel
 with the given id byte chID if the queue is not full; otherwise it returns false immediately.

 `Send()` and `TrySend()` are also exposed for each `Peer`.

 ## Peer

 Each peer has one `MConnection` instance, and includes other information such as whether the connection
 was outbound, whether the connection should be recreated if it closes, various identity information about the node,
 and other higher level thread-safe data used by the reactors.

 ## Switch/Reactor

 The `Switch` handles peer connections and exposes an API to receive incoming messages
 on `Reactors`. Each `Reactor` is responsible for handling incoming messages of one
 or more `Channels`. So while sending outgoing messages is typically performed on the peer,
 incoming messages are received on the reactor.

 ```go
 // Declare a MyReactor reactor that handles messages on MyChannelID.
 type MyReactor struct{}

 func (reactor MyReactor) GetChannels() []*ChannelDescriptor {
    return []*ChannelDescriptor{ChannelDescriptor{ID:MyChannelID, Priority: 1}}
 }

 func (reactor MyReactor) Receive(chID byte, peer *Peer, msgBytes []byte) {
    r, n, err := bytes.NewBuffer(msgBytes), new(int64), new(error)
    msgString := ReadString(r, n, err)
    fmt.Println(msgString)
 }

 // Other Reactor methods omitted for brevity
 ...

 switch := NewSwitch([]Reactor{MyReactor{}})

 ...

 // Send a random message to all outbound connections
 for _, peer := range switch.Peers().List() {
    if peer.IsOutbound() {
        peer.Send(MyChannelID, "Here's a random message")
    }
 }
 ```
--- a/spec/p2p/node.md
+++ b/spec/p2p/node.md
@ -0,0 +1,66 @@
 # Peer Discovery

 A Tendermint P2P network has different kinds of nodes with different requirements for connectivity to one another.
 This document describes what kind of nodes Tendermint should enable and how they should work.

 ## Seeds

 Seeds are the first point of contact for a new node.
 They return a list of known active peers and then disconnect.

 Seeds should operate full nodes with the PEX reactor in a "crawler" mode
 that continuously explores to validate the availability of peers.

 Seeds should only respond with some top percentile of the best peers it knows about.
 See [the peer-exchange docs](https://github.com/tendermint/tendermint/blob/master/docs/spec/reactors/pex/pex.md)for details on peer quality.

 ## New Full Node

 A new node needs a few things to connect to the network:

 - a list of seeds, which can be provided to Tendermint via config file or flags,
  or hardcoded into the software by in-process apps
 - a `ChainID`, also called `Network` at the p2p layer
 - a recent block height, H, and hash, HASH for the blockchain.

 The values `H` and `HASH` must be received and corroborated by means external to Tendermint, and specific to the user - ie. via the user's trusted social consensus.
 This requirement to validate `H` and `HASH` out-of-band and via social consensus
 is the essential difference in security models between Proof-of-Work and Proof-of-Stake blockchains.

 With the above, the node then queries some seeds for peers for its chain,
 dials those peers, and runs the Tendermint protocols with those it successfully connects to.

 When the peer catches up to height H, it ensures the block hash matches HASH.
 If not, Tendermint will exit, and the user must try again - either they are connected
 to bad peers or their social consensus is invalid.

 ## Restarted Full Node

 A node checks its address book on startup and attempts to connect to peers from there.
 If it can't connect to any peers after some time, it falls back to the seeds to find more.

 Restarted full nodes can run the `blockchain` or `consensus` reactor protocols to sync up
 to the latest state of the blockchain from wherever they were last.
 In a Proof-of-Stake context, if they are sufficiently far behind (greater than the length
 of the unbonding period), they will need to validate a recent `H` and `HASH` out-of-band again
 so they know they have synced the correct chain.

 ## Validator Node

 A validator node is a node that interfaces with a validator signing key.
 These nodes require the highest security, and should not accept incoming connections.
 They should maintain outgoing connections to a controlled set of "Sentry Nodes" that serve
 as their proxy shield to the rest of the network.

 Validators that know and trust each other can accept incoming connections from one another and maintain direct private connectivity via VPN.

 ## Sentry Node

 Sentry nodes are guardians of a validator node and provide it access to the rest of the network.
 They should be well connected to other full nodes on the network.
 Sentry nodes may be dynamic, but should maintain persistent connections to some evolving random subset of each other.
 They should always expect to have direct incoming connections from the validator node and its backup(s).
 They do not report the validator node's address in the PEX and
 they may be more strict about the quality of peers they keep.

 Sentry nodes belonging to validators that trust each other may wish to maintain persistent connections via VPN with one another, but only report each other sparingly in the PEX.
--- a/spec/p2p/peer.md
+++ b/spec/p2p/peer.md
@ -0,0 +1,119 @@
 # Peers

 This document explains how Tendermint Peers are identified and how they connect to one another.

 For details on peer discovery, see the [peer exchange (PEX) reactor doc](https://github.com/tendermint/tendermint/blob/master/docs/spec/reactors/pex/pex.md).

 ## Peer Identity

 Tendermint peers are expected to maintain long-term persistent identities in the form of a public key.
 Each peer has an ID defined as `peer.ID == peer.PubKey.Address()`, where `Address` uses the scheme defined in `crypto` package.

 A single peer ID can have multiple IP addresses associated with it, but a node
 will only ever connect to one at a time.

 When attempting to connect to a peer, we use the PeerURL: `<ID>@<IP>:<PORT>`.
 We will attempt to connect to the peer at IP:PORT, and verify,
 via authenticated encryption, that it is in possession of the private key
 corresponding to `<ID>`. This prevents man-in-the-middle attacks on the peer layer.

 ## Connections

 All p2p connections use TCP.
 Upon establishing a successful TCP connection with a peer,
 two handhsakes are performed: one for authenticated encryption, and one for Tendermint versioning.
 Both handshakes have configurable timeouts (they should complete quickly).

 ### Authenticated Encryption Handshake

 Tendermint implements the Station-to-Station protocol
 using X25519 keys for Diffie-Helman key-exchange and chacha20poly1305 for encryption.
 It goes as follows:

 - generate an ephemeral X25519 keypair
 - send the ephemeral public key to the peer
 - wait to receive the peer's ephemeral public key
 - compute the Diffie-Hellman shared secret using the peers ephemeral public key and our ephemeral private key
 - generate two keys to use for encryption (sending and receiving) and a challenge for authentication as follows:
  - create a hkdf-sha256 instance with the key being the diffie hellman shared secret, and info parameter as
    `TENDERMINT_SECRET_CONNECTION_KEY_AND_CHALLENGE_GEN`
  - get 96 bytes of output from hkdf-sha256
  - if we had the smaller ephemeral pubkey, use the first 32 bytes for the key for receiving, the second 32 bytes for sending; else the opposite
  - use the last 32 bytes of output for the challenge
 - use a separate nonce for receiving and sending. Both nonces start at 0, and should support the full 96 bit nonce range
 - all communications from now on are encrypted in 1024 byte frames,
  using the respective secret and nonce. Each nonce is incremented by one after each use.
 - we now have an encrypted channel, but still need to authenticate
 - sign the common challenge obtained from the hkdf with our persistent private key
 - send the amino encoded persistent pubkey and signature to the peer
 - wait to receive the persistent public key and signature from the peer
 - verify the signature on the challenge using the peer's persistent public key

 If this is an outgoing connection (we dialed the peer) and we used a peer ID,
 then finally verify that the peer's persistent public key corresponds to the peer ID we dialed,
 ie. `peer.PubKey.Address() == <ID>`.

 The connection has now been authenticated. All traffic is encrypted.

 Note: only the dialer can authenticate the identity of the peer,
 but this is what we care about since when we join the network we wish to
 ensure we have reached the intended peer (and are not being MITMd).

 ### Peer Filter

 Before continuing, we check if the new peer has the same ID as ourselves or
 an existing peer. If so, we disconnect.

 We also check the peer's address and public key against
 an optional whitelist which can be managed through the ABCI app -
 if the whitelist is enabled and the peer does not qualify, the connection is
 terminated.

 ### Tendermint Version Handshake

 The Tendermint Version Handshake allows the peers to exchange their NodeInfo:

 ```golang
 type NodeInfo struct {
  Version    p2p.Version
  ID         p2p.ID
  ListenAddr string

  Network    string
  SoftwareVersion    string
  Channels   []int8

  Moniker    string
  Other      NodeInfoOther
 }

 type Version struct {
 	P2P uint64
 	Block uint64
 	App uint64
 }

 type NodeInfoOther struct {
 	TxIndex          string
 	RPCAddress       string
 }
 ```

 The connection is disconnected if:

 - `peer.NodeInfo.ID` is not equal `peerConn.ID`
 - `peer.NodeInfo.Version.Block` does not match ours
 - `peer.NodeInfo.Network` is not the same as ours
 - `peer.Channels` does not intersect with our known Channels.
 - `peer.NodeInfo.ListenAddr` is malformed or is a DNS host that cannot be
  resolved

 At this point, if we have not disconnected, the peer is valid.
 It is added to the switch and hence all reactors via the `AddPeer` method.
 Note that each reactor may handle multiple channels.

 ## Connection Activity

 Once a peer is added, incoming messages for a given reactor are handled through
 that reactor's `Receive` method, and output messages are sent directly by the Reactors
 on each peer. A typical reactor maintains per-peer go-routine(s) that handle this.
--- a/spec/reactors/block_sync/bcv1/img/bc-reactor-new-datastructs.png
+++ b/spec/reactors/block_sync/bcv1/img/bc-reactor-new-datastructs.png
--- a/spec/reactors/block_sync/bcv1/img/bc-reactor-new-fsm.png
+++ b/spec/reactors/block_sync/bcv1/img/bc-reactor-new-fsm.png
--- a/spec/reactors/block_sync/bcv1/img/bc-reactor-new-goroutines.png
+++ b/spec/reactors/block_sync/bcv1/img/bc-reactor-new-goroutines.png
--- a/spec/reactors/block_sync/bcv1/impl-v1.md
+++ b/spec/reactors/block_sync/bcv1/impl-v1.md
@ -0,0 +1,237 @@
 # Blockchain Reactor v1

 ### Data Structures
 The data structures used are illustrated below.

 ![Data Structures](img/bc-reactor-new-datastructs.png)

 #### BlockchainReactor
 - is a `p2p.BaseReactor`.
 - has a `store.BlockStore` for persistence.
 - executes blocks using an `sm.BlockExecutor`.
 - starts the FSM and the `poolRoutine()`.
 - relays the fast-sync responses and switch messages to the FSM.
 - handles errors from the FSM and when necessarily reports them to the switch.
 - implements the blockchain reactor interface used by the FSM to send requests, errors to the switch and state timer resets.
 - registers all the concrete types and interfaces for serialisation.

 ```go
 type BlockchainReactor struct {
 	p2p.BaseReactor

 	initialState sm.State // immutable
 	state        sm.State

 	blockExec *sm.BlockExecutor
 	store     *store.BlockStore

 	fastSync bool

 	fsm          *BcReactorFSM
 	blocksSynced int

 	// Receive goroutine forwards messages to this channel to be processed in the context of the poolRoutine.
 	messagesForFSMCh chan bcReactorMessage

 	// Switch goroutine may send RemovePeer to the blockchain reactor. This is an error message that is relayed
 	// to this channel to be processed in the context of the poolRoutine.
 	errorsForFSMCh chan bcReactorMessage

 	// This channel is used by the FSM and indirectly the block pool to report errors to the blockchain reactor and
 	// the switch.
 	eventsFromFSMCh chan bcFsmMessage
 }
 ```

 #### BcReactorFSM
 - implements a simple finite state machine.
 - has a state and a state timer.
 - has a `BlockPool` to keep track of block requests sent to peers and blocks received from peers.
 - uses an interface to send status requests, block requests and reporting errors. The interface is implemented by the `BlockchainReactor` and tests.

 ```go
 type BcReactorFSM struct {
 	logger log.Logger
 	mtx    sync.Mutex

 	startTime time.Time

 	state      *bcReactorFSMState
 	stateTimer *time.Timer
 	pool       *BlockPool

 	// interface used to call the Blockchain reactor to send StatusRequest, BlockRequest, reporting errors, etc.
 	toBcR bcReactor
 }
 ```

 #### BlockPool
 - maintains a peer set, implemented as a map of peer ID to `BpPeer`.
 - maintains a set of requests made to peers, implemented as a map of block request heights to peer IDs.
 - maintains a list of future block requests needed to advance the fast-sync. This is a list of block heights. 
 - keeps track of the maximum height of the peers in the set.
 - uses an interface to send requests and report errors to the reactor (via FSM).

 ```go
 type BlockPool struct {
 	logger log.Logger
 	// Set of peers that have sent status responses, with height bigger than pool.Height
 	peers map[p2p.ID]*BpPeer
 	// Set of block heights and the corresponding peers from where a block response is expected or has been received.
 	blocks map[int64]p2p.ID

 	plannedRequests   map[int64]struct{} // list of blocks to be assigned peers for blockRequest
 	nextRequestHeight int64              // next height to be added to plannedRequests

 	Height        int64 // height of next block to execute
 	MaxPeerHeight int64 // maximum height of all peers
 	toBcR         bcReactor
 }
 ```
 Some reasons for the `BlockPool` data structure content:
 1. If a peer is removed by the switch fast access is required to the peer and the block requests made to that peer in order to redo them.
 2. When block verification fails fast access is required from the block height to the peer and the block requests made to that peer in order to redo them.
 3. The `BlockchainReactor` main routine decides when the block pool is running low and asks the `BlockPool` (via FSM) to make more requests. The `BlockPool` creates a list of requests and triggers the sending of the block requests (via the interface). The reason it maintains a list of requests is the redo operations that may occur during error handling. These are redone when the `BlockchainReactor` requires more blocks.

 #### BpPeer
 - keeps track of a single peer, with height bigger than the initial height.
 - maintains the block requests made to the peer and the blocks received from the peer until they are executed.
 - monitors the peer speed when there are pending requests.
 - it has an active timer when pending requests are present and reports error on timeout.

 ```go
 type BpPeer struct {
 	logger log.Logger
 	ID     p2p.ID

 	Height                  int64                  // the peer reported height
 	NumPendingBlockRequests int                    // number of requests still waiting for block responses
 	blocks                  map[int64]*types.Block // blocks received or expected to be received from this peer
 	blockResponseTimer      *time.Timer
 	recvMonitor             *flow.Monitor
 	params                  *BpPeerParams // parameters for timer and monitor

 	onErr func(err error, peerID p2p.ID) // function to call on error
 }
 ```

 ### Concurrency Model

 The diagram below shows the goroutines (depicted by the gray blocks), timers (shown on the left with their values) and channels (colored rectangles). The FSM box shows some of the functionality and it is not a separate goroutine.

 The interface used by the FSM is shown in light red with the `IF` block. This is used to:
 - send block requests 
 - report peer errors to the switch - this results in the reactor calling `switch.StopPeerForError()` and, if triggered by the peer timeout routine, a `removePeerEv` is sent to the FSM and action is taken from the context of the `poolRoutine()`
 - ask the reactor to reset the state timers. The timers are owned by the FSM while the timeout routine is defined by the reactor. This was done in order to avoid running timers in tests and will change in the next revision.
 
 There are two main goroutines implemented by the blockchain reactor. All I/O operations are performed from the `poolRoutine()` context while the CPU intensive operations related to the block execution are performed from the context of the `executeBlocksRoutine()`. All goroutines are detailed in the next sections.

 ![Go Routines Diagram](img/bc-reactor-new-goroutines.png)

 #### Receive()
 Fast-sync messages from peers are received by this goroutine. It performs basic validation and:
 - in helper mode (i.e. for request message) it replies immediately. This is different than the proposal in adr-040 that specifies having the FSM handling these.
 - forwards response messages to the `poolRoutine()`.

 #### poolRoutine()
 (named kept as in the previous reactor). 
 It starts the `executeBlocksRoutine()` and the FSM. It then waits in a loop for events. These are received from the following channels:
 - `sendBlockRequestTicker.C` - every 10msec the reactor asks FSM to make more block requests up to a maximum. Note: currently this value is constant but could be changed based on low/ high watermark thresholds for the number of blocks received and waiting to be processed, the number of blockResponse messages waiting in messagesForFSMCh, etc.
 - `statusUpdateTicker.C` - every 10 seconds the reactor broadcasts status requests to peers. While adr-040 specifies this to run within the FSM, at this point this functionality is kept in the reactor.
 - `messagesForFSMCh` - the `Receive()` goroutine sends status and block response messages to this channel and the reactor calls FSM to handle them.
 - `errorsForFSMCh` - this channel receives the following events: 
    - peer remove - when the switch removes a peer
    - sate timeout event - when FSM state timers trigger
  The reactor forwards this messages to the FSM.
 - `eventsFromFSMCh` - there are two type of events sent over this channel:
    - `syncFinishedEv` - triggered when FSM enters `finished` state and calls the switchToConsensus() interface function.
    - `peerErrorEv`- peer timer expiry goroutine sends this event over the channel for processing from poolRoutine() context.
    
 #### executeBlocksRoutine()
 Started by the `poolRoutine()`, it retrieves blocks from the pool and executes them:
 - `processReceivedBlockTicker.C` - a ticker event is received over the channel every 10msec and its handling results in a signal being sent to the doProcessBlockCh channel.
 - doProcessBlockCh - events are received on this channel as described as above and upon processing blocks are retrieved from the pool and executed. 


 ### FSM

 ![fsm](img/bc-reactor-new-fsm.png)

 #### States
 ##### init (aka unknown)
 The FSM is created in `unknown` state. When started, by the reactor (`startFSMEv`), it broadcasts Status requests and transitions to `waitForPeer` state.

 ##### waitForPeer
 In this state, the FSM waits for a Status responses from a "tall" peer. A timer is running in this state to allow the FSM to finish if there are no useful peers.

 If the timer expires, it moves to `finished` state and calls the reactor to switch to consensus.
 If a Status response is received from a peer within the timeout, the FSM transitions to `waitForBlock` state.

 ##### waitForBlock
 In this state the FSM makes Block requests (triggered by a ticker in reactor) and waits for Block responses. There is a timer running in this state to detect if a peer is not sending the block at current processing height. If the timer expires, the FSM removes the peer where the request was sent and all requests made to that peer are redone.

 As blocks are received they are stored by the pool. Block execution is independently performed by the reactor and the result reported to the FSM:
 - if there are no errors, the FSM increases the pool height and resets the state timer.
 - if there are errors, the peers that delivered the two blocks (at height and height+1) are removed and the requests redone.

 In this state the FSM may receive peer remove events in any of the following scenarios: 
 - the switch is removing a peer
 - a peer is penalized because it has not responded to some block requests for a long time
 - a peer is penalized for being slow

 When processing of the last block (the one with height equal to the highest peer height minus one) is successful, the FSM transitions to `finished` state.
 If after a peer update or removal the pool height is same as maxPeerHeight, the FSM transitions to `finished` state.

 ##### finished
 When entering this state, the FSM calls the reactor to switch to consensus and performs cleanup.

 #### Events

 The following events are handled by the FSM:

 ```go
 const (
 	startFSMEv = iota + 1
 	statusResponseEv
 	blockResponseEv
 	processedBlockEv
 	makeRequestsEv
 	stopFSMEv
 	peerRemoveEv = iota + 256
 	stateTimeoutEv
 )
 ```

 ### Examples of Scenarios and Termination Handling
 A few scenarios are covered in this section together with the current/ proposed handling.
 In general, the scenarios involving faulty peers are made worse by the fact that they may quickly be re-added.

 #### 1. No Tall Peers

 S: In this scenario a node is started and while there are status responses received, none of the peers are at a height higher than this node.

 H: The FSM times out in `waitForPeer` state, moves to `finished` state where it calls the reactor to switch to consensus.

 #### 2. Typical Fast Sync

 S: A node fast syncs blocks from honest peers and eventually downloads and executes the penultimate block.

 H: The FSM in `waitForBlock` state will receive the processedBlockEv from the reactor and detect that the termination height is achieved.

 #### 3. Peer Claims Big Height but no Blocks

 S: In this scenario a faulty peer claims a big height (for which there are no blocks).

 H: The requests for the non-existing block will timeout, the peer removed and the pool's `MaxPeerHeight` updated. FSM checks if the termination height is achieved when peers are removed.

 #### 4. Highest Peer Removed or Updated to Short

 S: The fast sync node is caught up with all peers except one tall peer. The tall peer is removed or it sends status response with low height.

 H: FSM checks termination condition on peer removal and updates.

 #### 5. Block At Current Height Delayed

 S: A peer can block the progress of fast sync by delaying indefinitely the block response for the current processing height (h1).

 H: Currently, given h1 < h2, there is no enforcement at peer level that the response for h1 should be received before h2. So a peer will timeout only after delivering all blocks except h1. However the `waitForBlock` state timer fires if the block for current processing height is not received within a timeout. The peer is removed and the requests to that peer (including the one for current height) redone.
--- a/spec/reactors/block_sync/img/bc-reactor-routines.png
+++ b/spec/reactors/block_sync/img/bc-reactor-routines.png
--- a/spec/reactors/block_sync/img/bc-reactor.png
+++ b/spec/reactors/block_sync/img/bc-reactor.png
--- a/spec/reactors/block_sync/impl.md
+++ b/spec/reactors/block_sync/impl.md
@ -0,0 +1,44 @@
 ## Blockchain Reactor v0 Modules

 ### Blockchain Reactor

 - coordinates the pool for syncing
 - coordinates the store for persistence
 - coordinates the playing of blocks towards the app using a sm.BlockExecutor
 - handles switching between fastsync and consensus
 - it is a p2p.BaseReactor
 - starts the pool.Start() and its poolRoutine()
 - registers all the concrete types and interfaces for serialisation

 #### poolRoutine

 - listens to these channels:
  - pool requests blocks from a specific peer by posting to requestsCh, block reactor then sends
    a &bcBlockRequestMessage for a specific height
  - pool signals timeout of a specific peer by posting to timeoutsCh
  - switchToConsensusTicker to periodically try and switch to consensus
  - trySyncTicker to periodically check if we have fallen behind and then catch-up sync
    - if there aren't any new blocks available on the pool it skips syncing
 - tries to sync the app by taking downloaded blocks from the pool, gives them to the app and stores
  them on disk
 - implements Receive which is called by the switch/peer
  - calls AddBlock on the pool when it receives a new block from a peer

 ### Block Pool

 - responsible for downloading blocks from peers
 - makeRequestersRoutine()
  - removes timeout peers
  - starts new requesters by calling makeNextRequester()
 - requestRoutine():
  - picks a peer and sends the request, then blocks until:
    - pool is stopped by listening to pool.Quit
    - requester is stopped by listening to Quit
    - request is redone
    - we receive a block
      - gotBlockCh is strange


 ### Go Routines in Blockchain Reactor

 ![Go Routines Diagram](img/bc-reactor-routines.png)
--- a/spec/reactors/block_sync/reactor.md
+++ b/spec/reactors/block_sync/reactor.md
@ -0,0 +1,308 @@
 # Blockchain Reactor

 The Blockchain Reactor's high level responsibility is to enable peers who are
 far behind the current state of the consensus to quickly catch up by downloading
 many blocks in parallel, verifying their commits, and executing them against the
 ABCI application.

 Tendermint full nodes run the Blockchain Reactor as a service to provide blocks
 to new nodes. New nodes run the Blockchain Reactor in "fast_sync" mode,
 where they actively make requests for more blocks until they sync up.
 Once caught up, "fast_sync" mode is disabled and the node switches to
 using (and turns on) the Consensus Reactor.

 ## Message Types

 ```go
 const (
    msgTypeBlockRequest    = byte(0x10)
    msgTypeBlockResponse   = byte(0x11)
    msgTypeNoBlockResponse = byte(0x12)
    msgTypeStatusResponse  = byte(0x20)
    msgTypeStatusRequest   = byte(0x21)
 )
 ```

 ```go
 type bcBlockRequestMessage struct {
    Height int64
 }

 type bcNoBlockResponseMessage struct {
    Height int64
 }

 type bcBlockResponseMessage struct {
    Block Block
 }

 type bcStatusRequestMessage struct {
    Height int64

 type bcStatusResponseMessage struct {
    Height int64
 }
 ```

 ## Architecture and algorithm

 The Blockchain reactor is organised as a set of concurrent tasks:

 - Receive routine of Blockchain Reactor
 - Task for creating Requesters
 - Set of Requesters tasks and - Controller task.

 ![Blockchain Reactor Architecture Diagram](img/bc-reactor.png)

 ### Data structures

 These are the core data structures necessarily to provide the Blockchain Reactor logic.

 Requester data structure is used to track assignment of request for `block` at position `height` to a peer with id equals to `peerID`.

 ```go
 type Requester {
  mtx          Mutex
  block        Block
  height       int64
   peerID       p2p.ID
  redoChannel  chan p2p.ID //redo may send multi-time; peerId is used to identify repeat
 }
 ```

 Pool is a core data structure that stores last executed block (`height`), assignment of requests to peers (`requesters`), current height for each peer and number of pending requests for each peer (`peers`), maximum peer height, etc.

 ```go
 type Pool {
  mtx                Mutex
  requesters         map[int64]*Requester
  height             int64
  peers              map[p2p.ID]*Peer
  maxPeerHeight      int64
  numPending         int32
  store              BlockStore
  requestsChannel    chan<- BlockRequest
  errorsChannel      chan<- peerError
 }
 ```

 Peer data structure stores for each peer current `height` and number of pending requests sent to the peer (`numPending`), etc.

 ```go
 type Peer struct {
  id           p2p.ID
  height       int64
  numPending   int32
  timeout      *time.Timer
  didTimeout   bool
 }
 ```

 BlockRequest is internal data structure used to denote current mapping of request for a block at some `height` to a peer (`PeerID`).

 ```go
 type BlockRequest {
  Height int64
  PeerID p2p.ID
 }
 ```

 ### Receive routine of Blockchain Reactor

 It is executed upon message reception on the BlockchainChannel inside p2p receive routine. There is a separate p2p receive routine (and therefore receive routine of the Blockchain Reactor) executed for each peer. Note that try to send will not block (returns immediately) if outgoing buffer is full.

 ```go
 handleMsg(pool, m):
    upon receiving bcBlockRequestMessage m from peer p:
      block = load block for height m.Height from pool.store
      if block != nil then
        try to send BlockResponseMessage(block) to p
      else
        try to send bcNoBlockResponseMessage(m.Height) to p

    upon receiving bcBlockResponseMessage m from peer p:
      pool.mtx.Lock()
      requester = pool.requesters[m.Height]
      if requester == nil then
        error("peer sent us a block we didn't expect")
        continue

      if requester.block == nil and requester.peerID == p then
        requester.block = m
        pool.numPending -= 1  // atomic decrement
        peer = pool.peers[p]
        if peer != nil then
          peer.numPending--
          if peer.numPending == 0 then
            peer.timeout.Stop()
            // NOTE: we don't send Quit signal to the corresponding requester task!
        else
          trigger peer timeout to expire after peerTimeout
      pool.mtx.Unlock()


    upon receiving bcStatusRequestMessage m from peer p:
      try to send bcStatusResponseMessage(pool.store.Height)

    upon receiving bcStatusResponseMessage m from peer p:
      pool.mtx.Lock()
      peer = pool.peers[p]
      if peer != nil then
        peer.height = m.height
      else
        peer = create new Peer data structure with id = p and height = m.Height
        pool.peers[p] = peer

      if m.Height > pool.maxPeerHeight then
        pool.maxPeerHeight = m.Height
      pool.mtx.Unlock()

 onTimeout(p):
  send error message to pool error channel
  peer = pool.peers[p]
  peer.didTimeout = true
 ```

 ### Requester tasks

 Requester task is responsible for fetching a single block at position `height`.

 ```go
 fetchBlock(height, pool):
  while true do {
    peerID = nil
    block = nil
    peer = pickAvailablePeer(height)
    peerID = peer.id

    enqueue BlockRequest(height, peerID) to pool.requestsChannel
    redo = false
    while !redo do
      select {
        upon receiving Quit message do
          return
        upon receiving redo message with id on redoChannel do
          if peerID == id {
            mtx.Lock()
            pool.numPending++
            redo = true
            mtx.UnLock()	
          }
      }
    }

 pickAvailablePeer(height):
  selectedPeer = nil
  while selectedPeer = nil do
    pool.mtx.Lock()
    for each peer in pool.peers do
      if !peer.didTimeout and peer.numPending < maxPendingRequestsPerPeer and peer.height >= height then
        peer.numPending++
        selectedPeer = peer
        break
    pool.mtx.Unlock()

    if selectedPeer = nil then
      sleep requestIntervalMS

  return selectedPeer
 ```

 sleep for requestIntervalMS

 ### Task for creating Requesters

 This task is responsible for continuously creating and starting Requester tasks.

 ```go
 createRequesters(pool):
  while true do
    if !pool.isRunning then break
    if pool.numPending < maxPendingRequests or size(pool.requesters) < maxTotalRequesters then
      pool.mtx.Lock()
      nextHeight = pool.height + size(pool.requesters)
      requester = create new requester for height nextHeight
      pool.requesters[nextHeight] = requester
      pool.numPending += 1 // atomic increment
      start requester task
      pool.mtx.Unlock()
    else
      sleep requestIntervalMS
      pool.mtx.Lock()
      for each peer in pool.peers do
        if !peer.didTimeout && peer.numPending > 0 && peer.curRate < minRecvRate then
          send error on pool error channel
          peer.didTimeout = true
        if peer.didTimeout then
          for each requester in pool.requesters do
            if requester.getPeerID() == peer then
              enqueue msg on requestor's redoChannel
          delete(pool.peers, peerID)
      pool.mtx.Unlock()
 ```

 ### Main blockchain reactor controller task

 ```go
 main(pool):
  create trySyncTicker with interval trySyncIntervalMS
  create statusUpdateTicker with interval statusUpdateIntervalSeconds
  create switchToConsensusTicker with interval switchToConsensusIntervalSeconds

  while true do
    select {
 	  upon receiving BlockRequest(Height, Peer) on pool.requestsChannel:
 	    try to send bcBlockRequestMessage(Height) to Peer

 	  upon receiving error(peer) on errorsChannel:
 	    stop peer for error

 	  upon receiving message on statusUpdateTickerChannel:
 	    broadcast bcStatusRequestMessage(bcR.store.Height) // message sent in a separate routine

 	  upon receiving message on switchToConsensusTickerChannel:
 	    pool.mtx.Lock()
 	    receivedBlockOrTimedOut = pool.height > 0 || (time.Now() - pool.startTime) > 5 Seconds
 	    ourChainIsLongestAmongPeers = pool.maxPeerHeight == 0 || pool.height >= pool.maxPeerHeight
 	    haveSomePeers = size of pool.peers > 0
 	    pool.mtx.Unlock()
 	    if haveSomePeers && receivedBlockOrTimedOut && ourChainIsLongestAmongPeers then
 	      switch to consensus mode

          upon receiving message on trySyncTickerChannel:
            for i = 0; i < 10; i++ do
              pool.mtx.Lock()
              firstBlock = pool.requesters[pool.height].block
              secondBlock = pool.requesters[pool.height].block
              if firstBlock == nil or secondBlock == nil then continue
              pool.mtx.Unlock()
              verify firstBlock using LastCommit from secondBlock
              if verification failed
                pool.mtx.Lock()
                peerID = pool.requesters[pool.height].peerID
                redoRequestsForPeer(peerId)
                delete(pool.peers, peerID)
                stop peer peerID for error
                pool.mtx.Unlock()
              else
                delete(pool.requesters, pool.height)
                save firstBlock to store
                pool.height++
                execute firstBlock
    }

 redoRequestsForPeer(pool, peerId):
  for each requester in pool.requesters do
    if requester.getPeerID() == peerID
  	  enqueue msg on redoChannel for requester
 ```

 ## Channels

 Defines `maxMsgSize` for the maximum size of incoming messages,
 `SendQueueCapacity` and `RecvBufferCapacity` for maximum sending and
 receiving buffers respectively. These are supposed to prevent amplification
 attacks by setting up the upper limit on how much data we can receive & send to
 a peer.

 Sending incorrectly encoded data will result in stopping the peer.
--- a/spec/reactors/consensus/consensus-reactor.md
+++ b/spec/reactors/consensus/consensus-reactor.md
@ -0,0 +1,353 @@
 # Consensus Reactor

 Consensus Reactor defines a reactor for the consensus service. It contains the ConsensusState service that
 manages the state of the Tendermint consensus internal state machine.
 When Consensus Reactor is started, it starts Broadcast Routine which starts ConsensusState service.
 Furthermore, for each peer that is added to the Consensus Reactor, it creates (and manages) the known peer state
 (that is used extensively in gossip routines) and starts the following three routines for the peer p:
 Gossip Data Routine, Gossip Votes Routine and QueryMaj23Routine. Finally, Consensus Reactor is responsible
 for decoding messages received from a peer and for adequate processing of the message depending on its type and content.
 The processing normally consists of updating the known peer state and for some messages
 (`ProposalMessage`, `BlockPartMessage` and `VoteMessage`) also forwarding message to ConsensusState module
 for further processing. In the following text we specify the core functionality of those separate unit of executions
 that are part of the Consensus Reactor.

 ## ConsensusState service

 Consensus State handles execution of the Tendermint BFT consensus algorithm. It processes votes and proposals,
 and upon reaching agreement, commits blocks to the chain and executes them against the application.
 The internal state machine receives input from peers, the internal validator and from a timer.

 Inside Consensus State we have the following units of execution: Timeout Ticker and Receive Routine.
 Timeout Ticker is a timer that schedules timeouts conditional on the height/round/step that are processed
 by the Receive Routine.

 ### Receive Routine of the ConsensusState service

 Receive Routine of the ConsensusState handles messages which may cause internal consensus state transitions.
 It is the only routine that updates RoundState that contains internal consensus state.
 Updates (state transitions) happen on timeouts, complete proposals, and 2/3 majorities.
 It receives messages from peers, internal validators and from Timeout Ticker
 and invokes the corresponding handlers, potentially updating the RoundState.
 The details of the protocol (together with formal proofs of correctness) implemented by the Receive Routine are
 discussed in separate document. For understanding of this document
 it is sufficient to understand that the Receive Routine manages and updates RoundState data structure that is
 then extensively used by the gossip routines to determine what information should be sent to peer processes.

 ## Round State

 RoundState defines the internal consensus state. It contains height, round, round step, a current validator set,
 a proposal and proposal block for the current round, locked round and block (if some block is being locked), set of
 received votes and last commit and last validators set.

 ```golang
 type RoundState struct {
 	Height             int64
 	Round              int
 	Step               RoundStepType
 	Validators         ValidatorSet
 	Proposal           Proposal
 	ProposalBlock      Block
 	ProposalBlockParts PartSet
 	LockedRound        int
 	LockedBlock        Block
 	LockedBlockParts   PartSet
 	Votes              HeightVoteSet
 	LastCommit         VoteSet
 	LastValidators     ValidatorSet
 }
 ```

 Internally, consensus will run as a state machine with the following states:

 - RoundStepNewHeight
 - RoundStepNewRound
 - RoundStepPropose
 - RoundStepProposeWait
 - RoundStepPrevote
 - RoundStepPrevoteWait
 - RoundStepPrecommit
 - RoundStepPrecommitWait
 - RoundStepCommit

 ## Peer Round State

 Peer round state contains the known state of a peer. It is being updated by the Receive routine of
 Consensus Reactor and by the gossip routines upon sending a message to the peer.

 ```golang
 type PeerRoundState struct {
 	Height                   int64               // Height peer is at
 	Round                    int                 // Round peer is at, -1 if unknown.
 	Step                     RoundStepType       // Step peer is at
 	Proposal                 bool                // True if peer has proposal for this round
 	ProposalBlockPartsHeader PartSetHeader
 	ProposalBlockParts       BitArray
 	ProposalPOLRound         int                 // Proposal's POL round. -1 if none.
 	ProposalPOL              BitArray            // nil until ProposalPOLMessage received.
 	Prevotes                 BitArray            // All votes peer has for this round
 	Precommits               BitArray            // All precommits peer has for this round
 	LastCommitRound          int                 // Round of commit for last height. -1 if none.
 	LastCommit               BitArray            // All commit precommits of commit for last height.
 	CatchupCommitRound       int                 // Round that we have commit for. Not necessarily unique. -1 if none.
 	CatchupCommit            BitArray            // All commit precommits peer has for this height & CatchupCommitRound
 }
 ```

 ## Receive method of Consensus reactor

 The entry point of the Consensus reactor is a receive method. When a message is received from a peer p,
 normally the peer round state is updated correspondingly, and some messages
 are passed for further processing, for example to ConsensusState service. We now specify the processing of messages
 in the receive method of Consensus reactor for each message type. In the following message handler, `rs` and `prs` denote
 `RoundState` and `PeerRoundState`, respectively.

 ### NewRoundStepMessage handler

 ```
 handleMessage(msg):
    if msg is from smaller height/round/step then return
    // Just remember these values.
    prsHeight = prs.Height
    prsRound = prs.Round
    prsCatchupCommitRound = prs.CatchupCommitRound
    prsCatchupCommit = prs.CatchupCommit

    Update prs with values from msg
    if prs.Height or prs.Round has been updated then
        reset Proposal related fields of the peer state
    if prs.Round has been updated and msg.Round == prsCatchupCommitRound then
        prs.Precommits = psCatchupCommit
    if prs.Height has been updated then
        if prsHeight+1 == msg.Height && prsRound == msg.LastCommitRound then
            prs.LastCommitRound = msg.LastCommitRound
        	prs.LastCommit = prs.Precommits
        } else {
            prs.LastCommitRound = msg.LastCommitRound
        	prs.LastCommit = nil
        }
        Reset prs.CatchupCommitRound and prs.CatchupCommit
 ```

 ### NewValidBlockMessage handler

 ```
 handleMessage(msg):
    if prs.Height != msg.Height then return
    
    if prs.Round != msg.Round && !msg.IsCommit then return
    
    prs.ProposalBlockPartsHeader = msg.BlockPartsHeader
    prs.ProposalBlockParts = msg.BlockParts
 ```

 ### HasVoteMessage handler

 ```
 handleMessage(msg):
    if prs.Height == msg.Height then
        prs.setHasVote(msg.Height, msg.Round, msg.Type, msg.Index)
 ```

 ### VoteSetMaj23Message handler

 ```
 handleMessage(msg):
    if prs.Height == msg.Height then
        Record in rs that a peer claim to have ⅔ majority for msg.BlockID
        Send VoteSetBitsMessage showing votes node has for that BlockId
 ```

 ### ProposalMessage handler

 ```
 handleMessage(msg):
    if prs.Height != msg.Height || prs.Round != msg.Round || prs.Proposal then return
    prs.Proposal = true
    if prs.ProposalBlockParts == empty set then // otherwise it is set in NewValidBlockMessage handler
      prs.ProposalBlockPartsHeader = msg.BlockPartsHeader
    prs.ProposalPOLRound = msg.POLRound
    prs.ProposalPOL = nil
    Send msg through internal peerMsgQueue to ConsensusState service
 ```

 ### ProposalPOLMessage handler

 ```
 handleMessage(msg):
    if prs.Height != msg.Height or prs.ProposalPOLRound != msg.ProposalPOLRound then return
    prs.ProposalPOL = msg.ProposalPOL
 ```

 ### BlockPartMessage handler

 ```
 handleMessage(msg):
    if prs.Height != msg.Height || prs.Round != msg.Round then return
    Record in prs that peer has block part msg.Part.Index
    Send msg trough internal peerMsgQueue to ConsensusState service
 ```

 ### VoteMessage handler

 ```
 handleMessage(msg):
    Record in prs that a peer knows vote with index msg.vote.ValidatorIndex for particular height and round
    Send msg trough internal peerMsgQueue to ConsensusState service
 ```

 ### VoteSetBitsMessage handler

 ```
 handleMessage(msg):
    Update prs for the bit-array of votes peer claims to have for the msg.BlockID
 ```

 ## Gossip Data Routine

 It is used to send the following messages to the peer: `BlockPartMessage`, `ProposalMessage` and
 `ProposalPOLMessage` on the DataChannel. The gossip data routine is based on the local RoundState (`rs`)
 and the known PeerRoundState (`prs`). The routine repeats forever the logic shown below:

 ```
 1a) if rs.ProposalBlockPartsHeader == prs.ProposalBlockPartsHeader and the peer does not have all the proposal parts then
        Part = pick a random proposal block part the peer does not have
        Send BlockPartMessage(rs.Height, rs.Round, Part) to the peer on the DataChannel
        if send returns true, record that the peer knows the corresponding block Part
 	    Continue

 1b) if (0 < prs.Height) and (prs.Height < rs.Height) then
        help peer catch up using gossipDataForCatchup function
        Continue

 1c) if (rs.Height != prs.Height) or (rs.Round != prs.Round) then
        Sleep PeerGossipSleepDuration
        Continue

 //  at this point rs.Height == prs.Height and rs.Round == prs.Round
 1d) if (rs.Proposal != nil and !prs.Proposal) then
        Send ProposalMessage(rs.Proposal) to the peer
        if send returns true, record that the peer knows Proposal
 	    if 0 <= rs.Proposal.POLRound then
 	    polRound = rs.Proposal.POLRound
        prevotesBitArray = rs.Votes.Prevotes(polRound).BitArray()
        Send ProposalPOLMessage(rs.Height, polRound, prevotesBitArray)
        Continue

 2)  Sleep PeerGossipSleepDuration
 ```

 ### Gossip Data For Catchup

 This function is responsible for helping peer catch up if it is at the smaller height (prs.Height < rs.Height).
 The function executes the following logic:

    if peer does not have all block parts for prs.ProposalBlockPart then
        blockMeta =  Load Block Metadata for height prs.Height from blockStore
        if (!blockMeta.BlockID.PartsHeader == prs.ProposalBlockPartsHeader) then
            Sleep PeerGossipSleepDuration
    	return
        Part = pick a random proposal block part the peer does not have
        Send BlockPartMessage(prs.Height, prs.Round, Part) to the peer on the DataChannel
        if send returns true, record that the peer knows the corresponding block Part
        return
    else Sleep PeerGossipSleepDuration

 ## Gossip Votes Routine

 It is used to send the following message: `VoteMessage` on the VoteChannel.
 The gossip votes routine is based on the local RoundState (`rs`)
 and the known PeerRoundState (`prs`). The routine repeats forever the logic shown below:

 ```
 1a) if rs.Height == prs.Height then
        if prs.Step == RoundStepNewHeight then
            vote = random vote from rs.LastCommit the peer does not have
            Send VoteMessage(vote) to the peer
            if send returns true, continue

        if prs.Step <= RoundStepPrevote and prs.Round != -1 and prs.Round <= rs.Round then
            Prevotes = rs.Votes.Prevotes(prs.Round)
            vote = random vote from Prevotes the peer does not have
            Send VoteMessage(vote) to the peer
            if send returns true, continue

        if prs.Step <= RoundStepPrecommit and prs.Round != -1 and prs.Round <= rs.Round then
     	    Precommits = rs.Votes.Precommits(prs.Round)
            vote = random vote from Precommits the peer does not have
            Send VoteMessage(vote) to the peer
            if send returns true, continue

        if prs.ProposalPOLRound != -1 then
            PolPrevotes = rs.Votes.Prevotes(prs.ProposalPOLRound)
            vote = random vote from PolPrevotes the peer does not have
            Send VoteMessage(vote) to the peer
            if send returns true, continue

 1b)  if prs.Height != 0 and rs.Height == prs.Height+1 then
        vote = random vote from rs.LastCommit peer does not have
        Send VoteMessage(vote) to the peer
        if send returns true, continue

 1c)  if prs.Height != 0 and rs.Height >= prs.Height+2 then
        Commit = get commit from BlockStore for prs.Height
        vote = random vote from Commit the peer does not have
        Send VoteMessage(vote) to the peer
        if send returns true, continue

 2)   Sleep PeerGossipSleepDuration
 ```

 ## QueryMaj23Routine

 It is used to send the following message: `VoteSetMaj23Message`. `VoteSetMaj23Message` is sent to indicate that a given
 BlockID has seen +2/3 votes. This routine is based on the local RoundState (`rs`) and the known PeerRoundState
 (`prs`). The routine repeats forever the logic shown below.

 ```
 1a) if rs.Height == prs.Height then
        Prevotes = rs.Votes.Prevotes(prs.Round)
        if there is a ⅔ majority for some blockId in Prevotes then
        m = VoteSetMaj23Message(prs.Height, prs.Round, Prevote, blockId)
        Send m to peer
        Sleep PeerQueryMaj23SleepDuration

 1b) if rs.Height == prs.Height then
        Precommits = rs.Votes.Precommits(prs.Round)
        if there is a ⅔ majority for some blockId in Precommits then
        m = VoteSetMaj23Message(prs.Height,prs.Round,Precommit,blockId)
        Send m to peer
        Sleep PeerQueryMaj23SleepDuration

 1c) if rs.Height == prs.Height and prs.ProposalPOLRound >= 0 then
        Prevotes = rs.Votes.Prevotes(prs.ProposalPOLRound)
        if there is a ⅔ majority for some blockId in Prevotes then
        m = VoteSetMaj23Message(prs.Height,prs.ProposalPOLRound,Prevotes,blockId)
        Send m to peer
        Sleep PeerQueryMaj23SleepDuration

 1d) if prs.CatchupCommitRound != -1 and 0 < prs.Height and
        prs.Height <= blockStore.Height() then
        Commit = LoadCommit(prs.Height)
        m = VoteSetMaj23Message(prs.Height,Commit.Round,Precommit,Commit.blockId)
        Send m to peer
        Sleep PeerQueryMaj23SleepDuration

 2)  Sleep PeerQueryMaj23SleepDuration
 ```

 ## Broadcast routine

 The Broadcast routine subscribes to an internal event bus to receive new round steps and votes messages, and broadcasts messages to peers upon receiving those 
 events.
 It broadcasts `NewRoundStepMessage` or `CommitStepMessage` upon new round state event. Note that
 broadcasting these messages does not depend on the PeerRoundState; it is sent on the StateChannel.
 Upon receiving VoteMessage it broadcasts `HasVoteMessage` message to its peers on the StateChannel.

 ## Channels

 Defines 4 channels: state, data, vote and vote_set_bits. Each channel
 has `SendQueueCapacity` and `RecvBufferCapacity` and
 `RecvMessageCapacity` set to `maxMsgSize`.

 Sending incorrectly encoded data will result in stopping the peer.
--- a/spec/reactors/consensus/consensus.md
+++ b/spec/reactors/consensus/consensus.md
@ -0,0 +1,184 @@
 # Tendermint Consensus Reactor

 Tendermint Consensus is a distributed protocol executed by validator processes to agree on
 the next block to be added to the Tendermint blockchain. The protocol proceeds in rounds, where
 each round is a try to reach agreement on the next block. A round starts by having a dedicated
 process (called proposer) suggesting to other processes what should be the next block with
 the `ProposalMessage`.
 The processes respond by voting for a block with `VoteMessage` (there are two kinds of vote
 messages, prevote and precommit votes). Note that a proposal message is just a suggestion what the
 next block should be; a validator might vote with a `VoteMessage` for a different block. If in some
 round, enough number of processes vote for the same block, then this block is committed and later
 added to the blockchain. `ProposalMessage` and `VoteMessage` are signed by the private key of the
 validator. The internals of the protocol and how it ensures safety and liveness properties are
 explained in a forthcoming document.

 For efficiency reasons, validators in Tendermint consensus protocol do not agree directly on the
 block as the block size is big, i.e., they don't embed the block inside `Proposal` and
 `VoteMessage`. Instead, they reach agreement on the `BlockID` (see `BlockID` definition in
 [Blockchain](https://github.com/tendermint/tendermint/blob/master/docs/spec/blockchain/blockchain.md#blockid) section) that uniquely identifies each block. The block itself is
 disseminated to validator processes using peer-to-peer gossiping protocol. It starts by having a
 proposer first splitting a block into a number of block parts, that are then gossiped between
 processes using `BlockPartMessage`.

 Validators in Tendermint communicate by peer-to-peer gossiping protocol. Each validator is connected
 only to a subset of processes called peers. By the gossiping protocol, a validator send to its peers
 all needed information (`ProposalMessage`, `VoteMessage` and `BlockPartMessage`) so they can
 reach agreement on some block, and also obtain the content of the chosen block (block parts). As
 part of the gossiping protocol, processes also send auxiliary messages that inform peers about the
 executed steps of the core consensus algorithm (`NewRoundStepMessage` and `NewValidBlockMessage`), and
 also messages that inform peers what votes the process has seen (`HasVoteMessage`,
 `VoteSetMaj23Message` and `VoteSetBitsMessage`). These messages are then used in the gossiping
 protocol to determine what messages a process should send to its peers.

 We now describe the content of each message exchanged during Tendermint consensus protocol.

 ## ProposalMessage

 ProposalMessage is sent when a new block is proposed. It is a suggestion of what the
 next block in the blockchain should be.

 ```go
 type ProposalMessage struct {
    Proposal Proposal
 }
 ```

 ### Proposal

 Proposal contains height and round for which this proposal is made, BlockID as a unique identifier
 of proposed block, timestamp, and POLRound (a so-called Proof-of-Lock (POL) round) that is needed for
 termination of the consensus. If POLRound >= 0, then BlockID corresponds to the block that 
 is locked in POLRound. The message is signed by the validator private key.

 ```go
 type Proposal struct {
    Height           int64
    Round            int
    POLRound         int
    BlockID          BlockID
    Timestamp        Time
    Signature        Signature
 }
 ```

 ## VoteMessage

 VoteMessage is sent to vote for some block (or to inform others that a process does not vote in the
 current round). Vote is defined in the [Blockchain](https://github.com/tendermint/tendermint/blob/master/docs/spec/blockchain/blockchain.md#blockid) section and contains validator's
 information (validator address and index), height and round for which the vote is sent, vote type,
 blockID if process vote for some block (`nil` otherwise) and a timestamp when the vote is sent. The
 message is signed by the validator private key.

 ```go
 type VoteMessage struct {
    Vote Vote
 }
 ```

 ## BlockPartMessage

 BlockPartMessage is sent when gossipping a piece of the proposed block. It contains height, round
 and the block part.

 ```go
 type BlockPartMessage struct {
    Height int64
    Round  int
    Part   Part
 }
 ```

 ## NewRoundStepMessage

 NewRoundStepMessage is sent for every step transition during the core consensus algorithm execution.
 It is used in the gossip part of the Tendermint protocol to inform peers about a current
 height/round/step a process is in.

 ```go
 type NewRoundStepMessage struct {
    Height                int64
    Round                 int
    Step                  RoundStepType
    SecondsSinceStartTime int
    LastCommitRound       int
 }
 ```

 ## NewValidBlockMessage

 NewValidBlockMessage is sent when a validator observes a valid block B in some round r, 
 i.e., there is a Proposal for block B and 2/3+ prevotes for the block B in the round r.
 It contains height and round in which valid block is observed, block parts header that describes 
 the valid block and is used to obtain all
 block parts, and a bit array of the block parts a process currently has, so its peers can know what
 parts it is missing so they can send them.
 In case the block is also committed, then IsCommit flag is set to true.

 ```go
 type NewValidBlockMessage struct {
    Height           int64
    Round            int    
    BlockPartsHeader PartSetHeader
    BlockParts       BitArray
    IsCommit         bool
 }
 ```

 ## ProposalPOLMessage

 ProposalPOLMessage is sent when a previous block is re-proposed.
 It is used to inform peers in what round the process learned for this block (ProposalPOLRound),
 and what prevotes for the re-proposed block the process has.

 ```go
 type ProposalPOLMessage struct {
    Height           int64
    ProposalPOLRound int
    ProposalPOL      BitArray
 }
 ```

 ## HasVoteMessage

 HasVoteMessage is sent to indicate that a particular vote has been received. It contains height,
 round, vote type and the index of the validator that is the originator of the corresponding vote.

 ```go
 type HasVoteMessage struct {
    Height int64
    Round  int
    Type   byte
    Index  int
 }
 ```

 ## VoteSetMaj23Message

 VoteSetMaj23Message is sent to indicate that a process has seen +2/3 votes for some BlockID.
 It contains height, round, vote type and the BlockID.

 ```go
 type VoteSetMaj23Message struct {
    Height  int64
    Round   int
    Type    byte
    BlockID BlockID
 }
 ```

 ## VoteSetBitsMessage

 VoteSetBitsMessage is sent to communicate the bit-array of votes a process has seen for a given
 BlockID. It contains height, round, vote type, BlockID and a bit array of
 the votes a process has.

 ```go
 type VoteSetBitsMessage struct {
    Height  int64
    Round   int
    Type    byte
    BlockID BlockID
    Votes   BitArray
 }
 ```
--- a/spec/reactors/consensus/proposer-selection.md
+++ b/spec/reactors/consensus/proposer-selection.md
@ -0,0 +1,291 @@
 # Proposer selection procedure in Tendermint

 This document specifies the Proposer Selection Procedure that is used in Tendermint to choose a round proposer.
 As Tendermint is “leader-based protocol”, the proposer selection is critical for its correct functioning.

 At a given block height, the proposer selection algorithm runs with the same validator set at each round .
 Between heights, an updated validator set may be specified by the application as part of the ABCIResponses' EndBlock. 

 ## Requirements for Proposer Selection

 This sections covers the requirements with Rx being mandatory and Ox optional requirements.
 The following requirements must be met by the Proposer Selection procedure:

 #### R1: Determinism
 Given a validator set `V`, and two honest validators `p` and `q`, for each height `h` and each round `r` the following must hold:

  `proposer_p(h,r) = proposer_q(h,r)`

 where `proposer_p(h,r)` is the proposer returned by the Proposer Selection Procedure at process `p`, at height `h` and round `r`.

 #### R2: Fairness 
 Given a validator set with total voting power P and a sequence S of elections. In any sub-sequence of S with length C*P, a validator v must be elected as proposer P/VP(v) times, i.e. with frequency:

 f(v) ~ VP(v) / P

 where C is a tolerance factor for validator set changes with following values:
 - C == 1 if there are no validator set changes
 - C ~ k when there are validator changes 

 *[this needs more work]*

 ### Basic Algorithm

 At its core, the proposer selection procedure uses a weighted round-robin algorithm.

 A model that gives a good intuition on how/ why the selection algorithm works and it is fair is that of a priority queue. The validators move ahead in this queue according to their voting power (the higher the voting power the faster a validator moves towards the head of the queue). When the algorithm runs the following happens:
 - all validators move "ahead" according to their powers: for each validator, increase the priority by the voting power
 - first in the queue becomes the proposer: select the validator with highest priority 
 - move the proposer back in the queue: decrease the proposer's priority by the total voting power

 Notation:
 - vset - the validator set
 - n - the number of validators
 - VP(i) - voting power of validator i
 - A(i) - accumulated priority for validator i
 - P - total voting power of set
 - avg - average of all validator priorities
 - prop - proposer

 Simple view at the Selection Algorithm:

 ```
    def ProposerSelection (vset):

        // compute priorities and elect proposer
        for each validator i in vset:
            A(i) += VP(i)
        prop = max(A)
        A(prop) -= P
 ```

 ### Stable Set

 Consider the validator set:

 Validator | p1| p2 
 ----------|---|---
 VP        | 1 | 3

 Assuming no validator changes, the following table shows the proposer priority computation over a few runs. Four runs of the selection procedure are shown, starting with the 5th the same values are computed.
 Each row shows the priority queue and the process place in it. The proposer is the closest to the head, the rightmost validator. As priorities are updated, the validators move right in the queue. The proposer moves left as its priority is reduced after election. 

 |Priority   Run  | -2| -1| 0   |  1| 2   | 3 | 4 | 5 | Alg step
 |--------------- |---|---|---- |---|---- |---|---|---|--------
 |                |   |   |p1,p2|   |     |   |   |   |Initialized to 0
 |run 1           |   |   |     | p1|     | p2|   |   |A(i)+=VP(i)
 |                |   | p2|     | p1|     |   |   |   |A(p2)-= P
 |run 2           |   |   |     |   |p1,p2|   |   |   |A(i)+=VP(i)
 |                | p1|   |     |   |   p2|   |   |   |A(p1)-= P
 |run 3           |   | p1|     |   |     |   |   | p2|A(i)+=VP(i)
 |                |   | p1|     | p2|     |   |   |   |A(p2)-= P
 |run 4           |   |   |   p1|   |     |   | p2|   |A(i)+=VP(i)
 |                |   |   |p1,p2|   |     |   |   |   |A(p2)-= P

 It can be shown that:
 - At the end of each run k+1 the sum of the priorities is the same as at end of run k. If a new set's priorities are initialized to 0 then the sum of priorities will be 0 at each run while there are no changes.
 - The max distance between priorites is (n-1) * P. *[formal proof not finished]*

 ### Validator Set Changes
 Between proposer selection runs the validator set may change. Some changes have implications on the proposer election.

 #### Voting Power Change
 Consider again the earlier example and assume that the voting power of p1 is changed to 4:

 Validator | p1| p2 
 ----------|---| ---
 VP        | 4 | 3

 Let's also assume that before this change the proposer priorites were as shown in first row (last run). As it can be seen, the selection could run again, without changes, as before. 

 |Priority   Run| -2 | -1 | 0    | 1   | 2   | Comment
 |--------------| ---|--- |------|---  |---  |--------
 | last run     |    | p2 |      |  p1 |     |__update VP(p1)__
 | next run     |    |    |      |     | p2  |A(i)+=VP(i)
 |              | p1 |    |      |     | p2  |A(p1)-= P

 However, when a validator changes power from a high to a low value, some other validator remain far back in the queue for a long time. This scenario is considered again in the Proposer Priority Range section.

 As before:
 - At the end of each run k+1 the sum of the priorities is the same as at run k.
 - The max distance between priorites is (n-1) * P.

 #### Validator Removal
 Consider a new example with set:

 Validator | p1 | p2 | p3 |
 --------- |--- |--- |--- |
 VP        | 1  | 2  | 3  |

 Let's assume that after the last run the proposer priorities were as shown in first row with their sum being 0. After p2 is removed, at the end of next proposer selection run (penultimate row) the sum of priorities is -2 (minus the priority of the removed process). 

 The procedure could continue without modifications. However, after a sufficiently large number of modifications in validator set, the priority values would migrate towards maximum or minimum allowed values causing truncations due to overflow detection.
 For this reason, the selection procedure adds another __new step__ that centers the current priority values such that the priority sum remains close to 0. 

 |Priority   Run  |-3  | -2 | -1 | 0  | 1   | 2   | 4 |Comment
 |--------------- |--- | ---|--- |--- |---  |---  |---|--------
 | last run       |p3  |    |    |    | p1  | p2  |   |__remove p2__
 | nextrun        |    |    |    |    |     |     |   |
 | __new step__   |    | p3 |    |    |     | p1  |   |A(i) -= avg, avg = -1
 |                |    |    |    |    | p3  | p1  |   |A(i)+=VP(i)
 |                |    |    | p1 |    | p3  |     |   |A(p1)-= P

 The modified selection algorithm is:

    def ProposerSelection (vset):

        // center priorities around zero
        avg = sum(A(i) for i in vset)/len(vset)
        for each validator i in vset:
            A(i) -= avg

        // compute priorities and elect proposer
        for each validator i in vset:
            A(i) += VP(i)
        prop = max(A)
        A(prop) -= P

 Observations:
 - The sum of priorities is now close to 0. Due to integer division the sum is an integer in (-n, n), where n is the number of validators.

 #### New Validator
 When a new validator is added, same problem as the one described for removal appears, the sum of priorities in the new set is not zero. This is fixed with the centering step introduced above.

 One other issue that needs to be addressed is the following. A validator V that has just been elected is moved to the end of the queue. If the validator set is large and/ or other validators have significantly higher power, V will have to wait many runs to be elected. If V removes and re-adds itself to the set, it would make a significant (albeit unfair) "jump" ahead in the queue. 

 In order to prevent this, when a new validator is added, its initial priority is set to:

    A(V) = -1.125 *  P

 where P is the total voting power of the set including V.

 Curent implementation uses the penalty factor of 1.125 because it provides a small punishment that is efficient to calculate. See [here](https://github.com/tendermint/tendermint/pull/2785#discussion_r235038971) for more details.

 If we consider the validator set where p3 has just been added:

 Validator | p1 | p2 | p3
 ----------|--- |--- |---
 VP        | 1  | 3  | 8

 then p3 will start with proposer priority:

    A(p3) = -1.125 * (1 + 3 + 8) ~ -13

 Note that since current computation uses integer division there is penalty loss when sum of the voting power is less than 8.

 In the next run, p3 will still be ahead in the queue, elected as proposer and moved back in the queue.

 |Priority   Run |-13 | -9 | -5 | -2 | -1 | 0  | 1 | 2 | 5 | 6 | 7 |Alg step
 |---------------|--- |--- |--- |----|--- |--- |---|---|---|---|---|--------
 |last run       |    |    |    | p2 |    |    |   | p1|   |   |   |__add p3__
 |               | p3 |    |    | p2 |    |    |   | p1|   |   |   |A(p3) = -4
 |next run       |    | p3 |    |    |    |    |   | p2|   | p1|   |A(i) -= avg, avg = -4
 |               |    |    |    |    | p3 |    |   |   | p2|   | p1|A(i)+=VP(i)
 |               |    |    | p1 |    | p3 |    |   |   | p2|   |   |A(p1)-=P

 ### Proposer Priority Range
 With the introduction of centering, some interesting cases occur. Low power validators that bind early in a set that includes high power validator(s) benefit from subsequent additions to the set. This is because these early validators run through more right shift operations during centering, operations that increase their priority.

 As an example, consider the set where p2 is added after p1, with priority -1.125 * 80k = -90k. After the selection procedure runs once:

 Validator | p1  | p2  | Comment
 ----------|-----|---- |---
 VP        | 80k |  10 |
 A         |  0  |-90k | __added p2__
 A         |-45k | 45k | __run selection__

 Then execute the following steps:

 1. Add a new validator p3:

 Validator | p1  | p2 | p3 
 ----------|-----|--- |----
 VP        | 80k | 10 | 10 

 2. Run selection once. The notation '..p'/'p..' means very small deviations compared to column priority.

 |Priority  Run | -90k..| -60k | -45k   | -15k| 0 | 45k | 75k  | 155k   | Comment
 |--------------|------ |----- |------- |---- |---|---- |----- |------- |---------
 | last run     |   p3  |      | p2     |     |   |  p1 |      |        | __added p3__
 | next run     
 | *right_shift*|       |  p3  |        |  p2 |   |     |  p1  |        | A(i) -= avg,avg=-30k
 |              |       |  ..p3|        | ..p2|   |     |      |  p1    | A(i)+=VP(i)
 |              |       |  ..p3|        | ..p2|   |     | p1.. |        | A(p1)-=P, P=80k+20


 3. Remove p1 and run selection once:

 Validator | p3   | p2  | Comment
 ----------|----- |---- |--------
 VP        | 10   | 10  |
 A         |-60k  |-15k |  
 A         |-22.5k|22.5k| __run selection__

 At this point, while the total voting power is 20, the distance between priorities is 45k. It will take 4500 runs for p3 to catch up with p2.

 In order to prevent these types of scenarios, the selection algorithm performs scaling of priorities such that the difference between min and max values is smaller than two times the total voting power. 

 The modified selection algorithm is:

    def ProposerSelection (vset):

        // scale the priority values
        diff = max(A)-min(A)
        threshold = 2 * P
 	    if  diff > threshold:
            scale = diff/threshold
            for each validator i in vset:
 		        A(i) = A(i)/scale

        // center priorities around zero
        avg = sum(A(i) for i in vset)/len(vset)
        for each validator i in vset:
            A(i) -= avg
        
        // compute priorities and elect proposer
        for each validator i in vset:
            A(i) += VP(i)
        prop = max(A)
        A(prop) -= P

 Observations:
 - With this modification, the maximum distance between priorites becomes 2 * P.

 Note also that even during steady state the priority range may increase beyond 2 * P. The scaling introduced here  helps to keep the range bounded. 

 ### Wrinkles

 #### Validator Power Overflow Conditions
 The validator voting power is a positive number stored as an int64. When a validator is added the `1.125 * P` computation must not overflow. As a consequence the code handling validator updates (add and update) checks for overflow conditions making sure the total voting power is never larger than the largest int64 `MAX`, with the property that `1.125 * MAX` is still in the bounds of int64. Fatal error is return when overflow condition is detected.

 #### Proposer Priority Overflow/ Underflow Handling
 The proposer priority is stored as an int64. The selection algorithm performs additions and subtractions to these values and in the case of overflows and underflows it limits the values to:

    MaxInt64  =  1 << 63 - 1
    MinInt64  = -1 << 63

 ### Requirement Fulfillment Claims
 __[R1]__ 

 The proposer algorithm is deterministic giving consistent results across executions with same transactions and validator set modifications. 
 [WIP - needs more detail]

 __[R2]__ 

 Given a set of processes with the total voting power P, during a sequence of elections of length P, the number of times any process is selected as proposer is equal to its voting power. The sequence of the P proposers then repeats. If we consider the validator set:

 Validator | p1| p2 
 ----------|---|---
 VP        | 1 | 3

 With no other changes to the validator set, the current implementation of proposer selection generates the sequence:
 `p2, p1, p2, p2, p2, p1, p2, p2,...` or [`p2, p1, p2, p2`]*
 A sequence that starts with any circular permutation of the [`p2, p1, p2, p2`] sub-sequence would also provide the same degree of fairness. In fact these circular permutations show in the sliding window (over the generated sequence) of size equal to the length of the sub-sequence.

 Assigning priorities to each validator based on the voting power and updating them at each run ensures the fairness of the proposer selection. In addition, every time a validator is elected as proposer its priority is decreased with the total voting power.

 Intuitively, a process v jumps ahead in the queue at most (max(A) - min(A))/VP(v) times until it reaches the head and is elected. The frequency is then:

    f(v) ~ VP(v)/(max(A)-min(A)) = 1/k * VP(v)/P

 For current implementation, this means v should be proposer at least VP(v) times out of k * P runs, with scaling factor k=2.
--- a/spec/reactors/evidence/reactor.md
+++ b/spec/reactors/evidence/reactor.md
@ -0,0 +1,10 @@
 # Evidence Reactor

 ## Channels

 [#1503](https://github.com/tendermint/tendermint/issues/1503)

 Sending invalid evidence will result in stopping the peer.

 Sending incorrectly encoded data or data exceeding `maxMsgSize` will result
 in stopping the peer.
--- a/spec/reactors/mempool/concurrency.md
+++ b/spec/reactors/mempool/concurrency.md
@ -0,0 +1,8 @@
 # Mempool Concurrency

 Look at the concurrency model this uses...

 - Receiving CheckTx
 - Broadcasting new tx
 - Interfaces with consensus engine, reap/update while checking
 - Calling the ABCI app (ordering. callbacks. how proxy works alongside the blockchain proxy which actually writes blocks)
--- a/spec/reactors/mempool/config.md
+++ b/spec/reactors/mempool/config.md
@ -0,0 +1,54 @@
 # Mempool Configuration

 Here we describe configuration options around mempool.
 For the purposes of this document, they are described
 as command-line flags, but they can also be passed in as
 environmental variables or in the config.toml file. The
 following are all equivalent:

 Flag: `--mempool.recheck=false`

 Environment: `TM_MEMPOOL_RECHECK=false`

 Config:

 ```
 [mempool]
 recheck = false
 ```

 ## Recheck

 `--mempool.recheck=false` (default: true)

 Recheck determines if the mempool rechecks all pending
 transactions after a block was committed. Once a block
 is committed, the mempool removes all valid transactions
 that were successfully included in the block.

 If `recheck` is true, then it will rerun CheckTx on
 all remaining transactions with the new block state.

 ## Broadcast

 `--mempool.broadcast=false` (default: true)

 Determines whether this node gossips any valid transactions
 that arrive in mempool. Default is to gossip anything that
 passes checktx. If this is disabled, transactions are not
 gossiped, but instead stored locally and added to the next
 block this node is the proposer.

 ## WalDir

 `--mempool.wal_dir=/tmp/gaia/mempool.wal` (default: $TM_HOME/data/mempool.wal)

 This defines the directory where mempool writes the write-ahead
 logs. These files can be used to reload unbroadcasted
 transactions if the node crashes.

 If the directory passed in is an absolute path, the wal file is
 created there. If the directory is a relative path, the path is
 appended to home directory of the tendermint process to
 generate an absolute path to the wal directory
 (default `$HOME/.tendermint` or set via `TM_HOME` or `--home``)
--- a/spec/reactors/mempool/functionality.md
+++ b/spec/reactors/mempool/functionality.md
@ -0,0 +1,43 @@
 # Mempool Functionality

 The mempool maintains a list of potentially valid transactions,
 both to broadcast to other nodes, as well as to provide to the
 consensus reactor when it is selected as the block proposer.

 There are two sides to the mempool state:

 - External: get, check, and broadcast new transactions
 - Internal: return valid transaction, update list after block commit

 ## External functionality

 External functionality is exposed via network interfaces
 to potentially untrusted actors.

 - CheckTx - triggered via RPC or P2P
 - Broadcast - gossip messages after a successful check

 ## Internal functionality

 Internal functionality is exposed via method calls to other
 code compiled into the tendermint binary.

 - ReapMaxBytesMaxGas - get txs to propose in the next block. Guarantees that the
    size of the txs is less than MaxBytes, and gas is less than MaxGas
 - Update - remove tx that were included in last block
 - ABCI.CheckTx - call ABCI app to validate the tx

 What does it provide the consensus reactor?
 What guarantees does it need from the ABCI app?
 (talk about interleaving processes in concurrency)

 ## Optimizations

 The implementation within this library also implements a tx cache.
 This is so that signatures don't have to be reverified if the tx has
 already been seen before. 
 However, we only store valid txs in the cache, not invalid ones.
 This is because invalid txs could become good later.
 Txs that are included in a block aren't removed from the cache,
 as they still may be getting received over the p2p network.
 These txs are stored in the cache by their hash, to mitigate memory concerns.
--- a/spec/reactors/mempool/messages.md
+++ b/spec/reactors/mempool/messages.md
@ -0,0 +1,61 @@
 # Mempool Messages

 ## P2P Messages

 There is currently only one message that Mempool broadcasts
 and receives over the p2p gossip network (via the reactor):
 `TxMessage`

 ```go
 // TxMessage is a MempoolMessage containing a transaction.
 type TxMessage struct {
    Tx types.Tx
 }
 ```

 TxMessage is go-amino encoded and prepended with `0x1` as a
 "type byte". This is followed by a go-amino encoded byte-slice.
 Prefix of 40=0x28 byte tx is: `0x010128...` followed by
 the actual 40-byte tx. Prefix of 350=0x015e byte tx is:
 `0x0102015e...` followed by the actual 350 byte tx.

 (Please see the [go-amino repo](https://github.com/tendermint/go-amino#an-interface-example) for more information)

 ## RPC Messages

 Mempool exposes `CheckTx([]byte)` over the RPC interface.

 It can be posted via `broadcast_commit`, `broadcast_sync` or
 `broadcast_async`. They all parse a message with one argument,
 `"tx": "HEX_ENCODED_BINARY"` and differ in only how long they
 wait before returning (sync makes sure CheckTx passes, commit
 makes sure it was included in a signed block).

 Request (`POST http://gaia.zone:26657/`):

 ```json
 {
  "id": "",
  "jsonrpc": "2.0",
  "method": "broadcast_sync",
  "params": {
    "tx": "F012A4BC68..."
  }
 }
 ```

 Response:

 ```json
 {
  "error": "",
  "result": {
    "hash": "E39AAB7A537ABAA237831742DCE1117F187C3C52",
    "log": "",
    "data": "",
    "code": 0
  },
  "id": "",
  "jsonrpc": "2.0"
 }
 ```
--- a/spec/reactors/mempool/reactor.md
+++ b/spec/reactors/mempool/reactor.md
@ -0,0 +1,22 @@
 # Mempool Reactor

 ## Channels

 See [this issue](https://github.com/tendermint/tendermint/issues/1503)

 Mempool maintains a cache of the last 10000 transactions to prevent
 replaying old transactions (plus transactions coming from other
 validators, who are continually exchanging transactions). Read [Replay
 Protection](../../../app-dev/app-development.md#replay-protection)
 for details.

 Sending incorrectly encoded data or data exceeding `maxMsgSize` will result
 in stopping the peer.

 The mempool will not send a tx back to any peer which it received it from.

 The reactor assigns an `uint16` number for each peer and maintains a map from
 p2p.ID to `uint16`. Each mempool transaction carries a list of all the senders
 (`[]uint16`). The list is updated every time mempool receives a transaction it
 is already seen. `uint16` assumes that a node will never have over 65535 active
 peers (0 is reserved for unknown source - e.g. RPC).
--- a/spec/reactors/pex/pex.md
+++ b/spec/reactors/pex/pex.md
@ -0,0 +1,132 @@
 # Peer Strategy and Exchange

 Here we outline the design of the AddressBook
 and how it used by the Peer Exchange Reactor (PEX) to ensure we are connected
 to good peers and to gossip peers to others.

 ## Peer Types

 Certain peers are special in that they are specified by the user as `persistent`,
 which means we auto-redial them if the connection fails, or if we fail to dial
 them.
 Some peers can be marked as `private`, which means
 we will not put them in the address book or gossip them to others.

 All peers except private peers and peers coming from them are tracked using the
 address book.

 The rest of our peers are only distinguished by being either
 inbound (they dialed our public address) or outbound (we dialed them).

 ## Discovery

 Peer discovery begins with a list of seeds.

 When we don't have enough peers, we

 1. ask existing peers
 2. dial seeds if we're not dialing anyone currently

 On startup, we will also immediately dial the given list of `persistent_peers`,
 and will attempt to maintain persistent connections with them. If the
 connections die, or we fail to dial, we will redial every 5s for a few minutes,
 then switch to an exponential backoff schedule, and after about a day of
 trying, stop dialing the peer.

 As long as we have less than `MaxNumOutboundPeers`, we periodically request
 additional peers from each of our own and try seeds.

 ## Listening

 Peers listen on a configurable ListenAddr that they self-report in their
 NodeInfo during handshakes with other peers. Peers accept up to
 `MaxNumInboundPeers` incoming peers.

 ## Address Book

 Peers are tracked via their ID (their PubKey.Address()).
 Peers are added to the address book from the PEX when they first connect to us or
 when we hear about them from other peers.

 The address book is arranged in sets of buckets, and distinguishes between
 vetted (old) and unvetted (new) peers. It keeps different sets of buckets for vetted and
 unvetted peers. Buckets provide randomization over peer selection. Peers are put
 in buckets according to their IP groups.

 A vetted peer can only be in one bucket. An unvetted peer can be in multiple buckets, and
 each instance of the peer can have a different IP:PORT.

 If we're trying to add a new peer but there's no space in its bucket, we'll
 remove the worst peer from that bucket to make room.

 ## Vetting

 When a peer is first added, it is unvetted.
 Marking a peer as vetted is outside the scope of the `p2p` package.
 For Tendermint, a Peer becomes vetted once it has contributed sufficiently
 at the consensus layer; ie. once it has sent us valid and not-yet-known
 votes and/or block parts for `NumBlocksForVetted` blocks.
 Other users of the p2p package can determine their own conditions for when a peer is marked vetted.

 If a peer becomes vetted but there are already too many vetted peers,
 a randomly selected one of the vetted peers becomes unvetted.

 If a peer becomes unvetted (either a new peer, or one that was previously vetted),
 a randomly selected one of the unvetted peers is removed from the address book.

 More fine-grained tracking of peer behaviour can be done using
 a trust metric (see below), but it's best to start with something simple.

 ## Select Peers to Dial

 When we need more peers, we pick addresses randomly from the addrbook with some
 configurable bias for unvetted peers. The bias should be lower when we have
 fewer peers and can increase as we obtain more, ensuring that our first peers
 are more trustworthy, but always giving us the chance to discover new good
 peers.

 We track the last time we dialed a peer and the number of unsuccessful attempts
 we've made. If too many attempts are made, we mark the peer as bad.

 Connection attempts are made with exponential backoff (plus jitter). Because
 the selection process happens every `ensurePeersPeriod`, we might not end up
 dialing a peer for much longer than the backoff duration.

 If we fail to connect to the peer after 16 tries (with exponential backoff), we
 remove from address book completely.

 ## Select Peers to Exchange

 When we’re asked for peers, we select them as follows:

 - select at most `maxGetSelection` peers
 - try to select at least `minGetSelection` peers - if we have less than that, select them all.
 - select a random, unbiased `getSelectionPercent` of the peers

 Send the selected peers. Note we select peers for sending without bias for vetted/unvetted.

 ## Preventing Spam

 There are various cases where we decide a peer has misbehaved and we disconnect from them.
 When this happens, the peer is removed from the address book and black listed for
 some amount of time. We call this "Disconnect and Mark".
 Note that the bad behaviour may be detected outside the PEX reactor itself
 (for instance, in the mconnection, or another reactor), but it must be communicated to the PEX reactor
 so it can remove and mark the peer.

 In the PEX, if a peer sends us an unsolicited list of peers,
 or if the peer sends a request too soon after another one,
 we Disconnect and MarkBad.

 ## Trust Metric

 The quality of peers can be tracked in more fine-grained detail using a
 Proportional-Integral-Derivative (PID) controller that incorporates
 current, past, and rate-of-change data to inform peer quality.

 While a PID trust metric has been implemented, it remains for future work
 to use it in the PEX.

 See the [trustmetric](https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-006-trust-metric.md)
 and [trustmetric useage](https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-007-trust-metric-usage.md)
 architecture docs for more details.
--- a/spec/reactors/pex/reactor.md
+++ b/spec/reactors/pex/reactor.md
@ -0,0 +1,12 @@
 # PEX Reactor

 ## Channels

 Defines only `SendQueueCapacity`. [#1503](https://github.com/tendermint/tendermint/issues/1503)

 Implements rate-limiting by enforcing minimal time between two consecutive
 `pexRequestMessage` requests. If the peer sends us addresses we did not ask,
 it is stopped.

 Sending incorrectly encoded data or data exceeding `maxMsgSize` will result
 in stopping the peer.