From 6f54fee4dbfa268705265de5345ab8ea8932d0ee Mon Sep 17 00:00:00 2001 From: Marko Date: Tue, 25 Aug 2020 09:59:00 +0200 Subject: [PATCH] docs: remove interview transcript (#5282) ## Description Remove interview transcript. Closes: #XXX --- docs/interviews/readme.md | 4 - docs/interviews/tendermint-bft.md | 250 ------------------------------ 2 files changed, 254 deletions(-) delete mode 100644 docs/interviews/readme.md delete mode 100644 docs/interviews/tendermint-bft.md diff --git a/docs/interviews/readme.md b/docs/interviews/readme.md deleted file mode 100644 index cc9223496..000000000 --- a/docs/interviews/readme.md +++ /dev/null @@ -1,4 +0,0 @@ ---- -parent: - order: false ---- diff --git a/docs/interviews/tendermint-bft.md b/docs/interviews/tendermint-bft.md deleted file mode 100644 index 9a28ac853..000000000 --- a/docs/interviews/tendermint-bft.md +++ /dev/null @@ -1,250 +0,0 @@ -# Interview Transcript with Tendermint core researcher, Zarko Milosevic, by Chjango - -**ZM**: Regarding leader election, it's round robin, but a weighted one. You -take into account the amount of bonded tokens. Depending on how much weight -they have of voting power, they would be elected more frequently. So we do -rotate, but just the guys who are having more voting power would be elected -more frequently. We are having 4 validators, and 1 of them have 2 times more -voting power, they have 2 times more elected as a leader. - -**CC**: 2x more absolute voting power or probabilistic voting power? - -**ZM**: It's actually very deterministic. It's not probabilistic at all. See -[Tendermint proposal election specification][1]. In Tendermint, there is no -pseudorandom leader election. It's a deterministic protocol. So leader election -is a built-in function in the code, so you know exactly—depending on the voting -power in the validator set, you'd know who exactly would be the leader in round -x, x + 1, and so on. There is nothing random there; we are not trying to hide -who would be the leader. It's really well known. It's just that there is a -function, it's a mathematical function, and it's just basically—it's kind of an -implementation detail—it starts from the voting power, and when you are -elected, you get decreased some number, and in each round you keep increasing -depending on your voting power, so that you are elected after k rounds again. -But knowing the validator set and the voting power, it's very simple function, -you can calculate yourself to know exactly who would be next. For each round, -this function will return you the leader for that round. In every round, we do -this computation. It's all part of the same flow. It enforces the properties -which are: proportional to your voting power, you will be elected, and we keep -changing the leaders. So it can't happen to have one guy being more elected -than other guys, if they have the same voting power. So one time it will be guy -B, and next time it will be guy B1. So it's not random. - -**CC**: Assuming the validator set remains unchanged for a month, then if you -run this function, are you able to know exactly who is going to go for that -entire month? - -**ZM**: Yes. - -**CC**: What're the attack scenarios for this? - -**ZM**: This is something which is easily attacked by people who argue that -Tendermint is not decentralized enough. They say that by knowing the leader, -you can DDoS the leader. And by DDoSing the leader, you are able to stop the -progress. Because it's true. If you would be able to DDoS the leader, the -leader would not be able to propose and then effectively will not be making -progress. How we are addressing this thing is Sentry Architecture. So the -validator—or at least a proper validator—will never be available. You don't -know the ip address of the validator. You are never able to open the connection -to the validator. So validator is spawning sentry nodes and this is the single -administration domain and there is only connection from validator in the sense -of sentry nodes. And ip address of validator is not shared in the p2p network. -It’s completely private. This is our answer to DDoS attack. By playing clever -at this sentry node architecture and spawning additional sentry nodes in case, -for ex your sentry nodes are being DDoS’d, bc your sentry nodes are public, -then you will be able to connect to sentry nodes. this is where we will expect -the validator to be clever enough that so that in case they are DDoS’d at the -sentry level, they will spawn a different sentry node and then you communicate -through them. We are in a sense pushing the responsibility on the validator. - -**CC**: So if I understand this correctly, the public identity of the validator -doesn’t even matter because that entity can obfuscate where their real full -nodes reside via a proxy through this sentry architecture. - -**ZM**: Exactly. So you do know what is the address or identity of the validator -but you don’t know the network address of it; you’re not able to attack it -because you don’t know where they are. They are completely obfuscated by the -sentry nodes. There is now, if you really want to figure out….There is the -Tendermint protocol, the structure of the protocol is not fully decentralized -in the sense that the flow of information is going from the round proposer, or -the round coordinator, to other nodes, and then after they receive this it’s -basically like [inaudible: “O to 1”]. So by tracking where this information is -coming from, you might be able to identify who are the sentry nodes behind it. -So if you are doing some network analysis, you might be able to deduce -something. If the thing would be completely stuck, where the validator would -never change their sentry nodes or ip addresses of sentry nodes, it could be -possible to deduce something. This is where economic game comes into play. We -are doing an economics game there. We say that it’s a validator business. If -they are not able to hide themselves well enough, they’ll be DDoS’d and they -will be kicked out of the active validator set. So it’s in their interest. - -[Proposer Selection Procedure in Tendermint][1]. This is how it should work no -matter what implementation. - -**CC**: Going back to the proposer, lets say the validator does get DDoS’d, then -the proposer goes down. What happens? - -**ZM**: How the proposal mechanism works—there’s nothing special there—it goes -through a sequence of rounds. Normal execution of Tendermint is that for each -height, we are going through a sequence of rounds, starting from round 0, and -then we are incrementing through the rounds. The nodes are moving through the -rounds as part of normal procedure until they decide to commit. In case you -have one proposer—the proposer of a single round—being DDoS’d, we will probably -not decide in that round, because he will not be able to send his proposal. So -we will go to the next round, and hopefully the next proposer will be able to -communicate with the validators and then we’ll decide in the next round. - -**CC**: Are there timeouts between one round to another, if a round gets -skipped? - -**ZM**: There are timeouts. It’s a bit more complex. I think we have 5 timeouts. -We may be able to simplify this a bit. What is important to understand is: The -only condition which needs to be satisfied so we can go to the next round is -that your validator is able to communicate with more than 2/3rds of voting -power. To be able to move to the next round, you need to receive more than -2/3rd of voting power equivalent of pre-commit messages. - -We have two kinds of messages: 1) Proposal: Where the current round proposer is -suggesting how the next block should look like. This is first one. Every round -starts with proposer sending a proposal. And then there are two more rounds of -voting, where the validator is trying to agree whether they will commit the -proposal or not. And the first of such vote messages is called `pre-vote` and -the second one is `pre-commit`. Now, to be able to move between steps, between -a `pre-vote` and `pre-commit` step, you need to receive enough number of -messages where if message is sent by validator A, then also this message has a -weight, or voting power which is equal to the voting power of the validator who -sent this message. Before you receive more than 2/3 of voting power messages, you are not -able to move to the higher round. Only when you receive more than 2/3 of -messages, you actually start the timeout. The timeout is happening only after -you receive enough messages. And it happens because of the asynchrony of the -message communication so you give more time to guys with this timeout to -receive some messages which are maybe delayed. - -**CC**: In this way that you just described via the whole network gossiping -before we commit a block, that is what makes Tendermint BFT deterministic in a -partially synchronous setting vs Bitcoin which has synchrony assumptions -whereby blocks are first mined and then gossiped to the network. - -**ZM**: It's true that in Bitcoin, this is where the synchrony assumption comes -to play because if they're not able to communicate timely, they are not able to -converge to a single longest chain. Why are they not able to decrease timeout -in Bitcoin? Because if they would decrease, there would be so many forks that -they won't be able to converge to a single chain. By increasing this -complexity and the block time, they're able to have not so many forks. This is -effectively the timing assumption—the block duration in a sense because it's -enough time so that the decided block is propagated through the network before -someone else start deciding on the same block and creating forks. It's very -different from the consensus algorithms in a distributed computing setup where -Tendermint fits. In Tendermint, where we talk about the timing dependency, they -are really part of this 3-communication step protocol I just explained. We have -the following assumption: If the good guys are not able to communicate timely -and reliably without having message loss within a round, the Tendermint will -not make progress—it will not be making blocks. So if you are in a completely -asynchronous network where messages get lost or delayed unpredictably, -Tendermint will not make progress, it will not create forks, but it will not -decide, it will not tell you what is the next block. For termination, it's a -liveness property of consensus. It's a guarantee to decide. We do need timing -assumptions. Within a round, correct validators are able to communicate to each -other the consensus messages, not the transactions, but consensus messages. -They need to communicate in a timely and reliable fashion. But this doesn't -need to hold forever. It's just that what we are assuming when we say it's a -partially synchronous system, we assume that the system will be going through a -period of asynchrony, where we don't have this guarantee; the messages will be -delayed or some will be lost and then will not make progress for some period of -time, or we're not guaranteed to make progress. And the period of synchrony -where these guarantees hold. And if we think about internet, internet is best -described using such a model. Sometimes when we send a message to SF to -Belgrade, it takes 100 ms, sometimes it takes 300 ms, sometimes it takes 1 s. -But in most cases, it takes 100 ms or less than this. - -There is one thing which would be really nice if you understand it. In a global -wide area network, we can't make assumption on the communication unless we are -very conservative about this. If you want to be very fast, then we can't make -assumption and say we'll be for sure communicating with 1 ms communication -delay. Because of the complexity and various congestion issues on the network, -it might happen that during a short period of time, this doesn't hold. If this -doesn't hold and you depend on this for correctness of your protocol, you will -have a fork. So the partially synchronous protocol, most of them like -Tendermint, they don't depend on the timing assumption from the internet for -correctness. This is where we state: safety always. So we never make a fork no -matter how bad our estimates about the internet communication delays are. We'll -never make a fork, but we do make some assumptions, and these assumptions are -built-in our timeouts in our protocol which are actually adaptive. So we are -adapting to the current condition and this is where we're saying...We do assume -some properties, or some communication delays, to eventually hold on the -network. During this period, we guarantee that we will be deciding and -committing blocks. And we will be doing this very fast. We will be basically on -the speed of the current network. - -**CC**: We make liveness assumptions based on the integrity of the validator -businesses, assuming they're up and running fine. - -**ZM**: This is where we are saying, the protocol will be live if we have at -most 1/3, or a bit less than 1/3, of faulty validators. Which means that all -other guys should be online and available. This is also for liveness. This is -related to the condition that we are not able to make progress in rounds if we -don't receive enough messages. If half of our voting power, or half of our -validators are down, we don't have enough messages, so the protocol is -completely blocked. It doesn't make progress in a round, which means it's not -able to be signed. So it's completely critical for Tendermint that we make -progress in rounds. It's like breathing. Tendermint is breathing. If there is -no progress, it's dead; it's blocked, we're not able to breathe, that's why -we're not able to make progress. - -**CC**: How does Tendermint compare to other consensus algos? - -**ZM**: Tendermint is a very interesting protocol. From an academic point of -view, I'm convinced that there is value there. Hopefully, we prove it by -publishing it on some good conference. What is novel is, if we compare first -Tendermint to this existing BFT problem, it's a continuation of academic -research on BFT consensus. What is novel in Tendermint is that it somehow -merges consensus protocol with gossip. This is completely novel idea. -Originally, in BFT, people were assuming the single administration domain, -small number of nodes, local area network, 4-7 nodes max. If you look at the -research paper, 99% of them have this kind of setup. Wide area was studied but -there is significantly less work in wide area networks. No one studied how to -scale those protocols to hundreds or thousands of nodes before blockchain. It -was always a single administration domain. So in Tendermint now, you are able -to reach consensus among different administration domains which are potentially -hundreds of them in wide area network. The system model is potentially harder -because we have more nodes and wide area network. The second thing is that: -normally, in bft protocols, the protocol itself are normally designed in a way -that has two phases, or two parts. The one which is called normal case, which -is normally quite simple, in this normal case. In spite of some failures, which -are part of the normal execution of the protocol, like for example leader -crashes or leader being DDoS'd, they need to go through a quite complex -protocol, which is like being called view change or leader election or -whatever. These two parts of the same protocol are having quite different -complexity. And most of the people only understand this normal case. In -Tendermint, there is no this difference. We have only one protocol, there are -not two protocols. It's always the same steps and they are much closer to the -normal case than this complex view change protocol. - -_This is a bit too technical but this is on a high level things to remember, -that: The system it addresses it's harder than the others and the algorithm -complexity in Tendermint is simpler._ The initial goal of Jae and Bucky which -is inspired by Raft, is that it's simpler so normal engineers could understand. - -**CC**: Can you expand on the termination requirement? - -> **Important point about Liveness in Tendermint** - -**ZM**: In Tendermint, we are saying, for termination, we are making assumption -that the system is partially synchronous. And in a partially synchronous system -model, we are able to mathematically prove that the protocol will make -decisions; it will decide. - -**CC**: What is a persistent peer? - -**ZM**: It's a list of peer identities, which you will try to establish -connection to them, in case connection is broken, Tendermint will automatically -try to reestablish connection. These are important peers, you will really try -persistently to establish connection to them. For other peers, you just drop it -and try from your address book to connect to someone else. The address book is a -list of peers which you discover that they exist, because we are talking about a -very dynamic network—so the nodes are coming and going away—and the gossiping -protocol is discovering new nodes and gossiping them around. So every node will -keep the list of new nodes it discovers, and when you need to establish -connection to a peer, you'll look to address book and get some addresses from -there. There's categorization/ranking of nodes there. - -[1]: https://docs.tendermint.com/master/spec/reactors/consensus/proposer-selection.html