diff --git a/PHILOSOPHY.md b/PHILOSOPHY.md new file mode 100644 index 000000000..cf79710fd --- /dev/null +++ b/PHILOSOPHY.md @@ -0,0 +1,158 @@ +## Design goals + +The design goals for Tendermint (and the SDK and related libraries) are: + + * Simplicity and Legibility + * Parallel performance, namely ability to utilize multicore architecture + * Ability to evolve the codebase bug-free + * Debuggability + * Complete correctness that considers all edge cases, esp in concurrency + * Future-proof modular architecture, message protocol, APIs, and encapsulation + + +### Justification + +Legibility is key to maintaining bug-free software as it evolves toward more +optimizations, more ease of debugging, and additional features. + +It is too easy to introduce bugs over time by replacing lines of code with +those that may panic, which means ideally locks are unlocked by defer +statements. + +For example, + +```go +func (obj *MyObj) something() { + mtx.Lock() + obj.something = other + mtx.Unlock() +} +``` + +It is too easy to refactor the codebase in the future to replace `other` with +`other.String()` for example, and this may introduce a bug that causes a +deadlock. So as much as reasonably possible, we need to be using defer +statements, even though it introduces additional overhead. + +If it is necessary to optimize the unlocking of mutex locks, the solution is +more modularity via smaller functions, so that defer'd unlocks are scoped +within a smaller function. + +Similarly, idiomatic for-loops should always be preferred over those that use +custom counters, because it is too easy to evolve the body of a for-loop to +become more complicated over time, and it becomes more and more difficult to +assess the correctness of such a for-loop by visual inspection. + + +### On performance + +It doesn't matter whether there are alternative implementations that are 2x or +3x more performant, when the software doesn't work, deadlocks, or if bugs +cannot be debugged. By taking advantage of multicore concurrency, the +Tendermint implementation will at least be an order of magnitude within the +range of what is theoretically possible. The design philosophy of Tendermint, +and the choice of Go as implementation language, is designed to make Tendermint +implementation the standard specification for concurrent BFT software. + +By focusing on the message protocols (e.g. ABCI, p2p messages), and +encapsulation e.g. IAVL module, (relatively) independent reactors, we are both +implementing a standard implementation to be used as the specification for +future implementations in more optimizable languages like Rust, Java, and C++; +as well as creating sufficiently performant software. Tendermint Core will +never be as fast as future implementations of the Tendermint Spec, because Go +isn't designed to be as fast as possible. The advantage of using Go is that we +can develop the whole stack of modular components **faster** than in other +languages. + +Furthermore, the real bottleneck is in the application layer, and it isn't +necessary to support more than a sufficiently decentralized set of validators +(e.g. 100 ~ 300 validators is sufficient, with delegated bonded PoS). + +Instead of optimizing Tendermint performance down to the metal, lets focus on +optimizing on other matters, namely ability to push feature complete software +that works well enough, can be debugged and maintained, and can serve as a spec +for future implementations. + + +### On encapsulation + +In order to create maintainable, forward-optimizable software, it is critical +to develop well-encapsulated objects that have well understood properties, and +to re-use these easy-to-use-correctly components as building blocks for further +encapsulated meta-objects. + +For example, mutexes are cheap enough for Tendermint's design goals when there +isn't goroutine contention, so it is encouraged to create concurrency safe +structures with struct-level mutexes. If they are used in the context of +non-concurrent logic, then the performance is good enough. If they are used in +the context of concurrent logic, then it will still perform correctly. + +Examples of this design principle can be seen in the types.ValidatorSet struct, +and the cmn.Rand struct. It's one single struct declaration that can be used +in both concurrent and non-concurrent logic, and due to its well encapsulation, +it's easy to get the usage of the mutex right. + +#### example: cmn.Rand: + +`The default Source is safe for concurrent use by multiple goroutines, but +Sources created by NewSource are not`. The reason why the default +package-level source is safe for concurrent use is because it is protected (see +`lockedSource` in https://golang.org/src/math/rand/rand.go). + +But we shouldn't rely on the global source, we should be creating our own +Rand/Source instances and using them, especially for determinism in testing. +So it is reasonable to have cmn.Rand be protected by a mutex. Whether we want +our own implementation of Rand is another question, but the answer there is +also in the affirmative. Sometimes you want to know where Rand is being used +in your code, so it becomes a simple matter of dropping in a log statement to +inject inspectability into Rand usage. Also, it is nice to be able to extend +the functionality of Rand with custom methods. For these reasons, and for the +reasons which is outlined in this design philosophy document, we should +continue to use the cmn.Rand object, with mutex protection. + +Another key aspect of good encapsulation is the choice of exposed vs unexposed +methods. It should be clear to the reader of the code, which methods are +intended to be used in what context, and what safe usage is. Part of this is +solved by hiding methods via unexported methods. Another part of this is +naming conventions on the methods (e.g. underscores) with good documentation, +and code organization. If there are too many exposed methods and it isn't +clear what methods have what side effects, then there is something wrong about +the design of abstractions that should be revisited. + + +### On concurrency + +In order for Tendermint to remain relevant in the years to come, it is vital +for Tendermint to take advantage of multicore architectures. Due to the nature +of the problem, namely consensus across a concurrent p2p gossip network, and to +handle RPC requests for a large number of consuming subscribers, it is +unavoidable for Tendermint development to require expertise in concurrency +design, especially when it comes to the reactor design, and also for RPC +request handling. + + +## Guidelines + +Here are some guidelines for designing for (sufficient) performance and concurrency: + + * Mutex locks are cheap enough when there isn't contention. + * Do not optimize code without analytical or observed proof that it is in a hot path. + * Don't over-use channels when mutex locks w/ encapsulation are sufficient. + * The need to drain channels are often a hint of unconsidered edge cases. + * The creation of O(N) one-off goroutines is generally technical debt that + needs to get addressed sooner than later. Avoid creating too many +goroutines as a patch around incomplete concurrency design, or at least be +aware of the debt and do not invest in the debt. On the other hand, Tendermint +is designed to have a limited number of peers (e.g. 10 or 20), so the creation +of O(C) goroutines per O(P) peers is still O(C\*P=constant). + * Use defer statements to unlock as much as possible. If you want to unlock sooner, + try to create more modular functions that do make use of defer statements. + +## Matras + +* Premature optimization kills +* Readability is paramount +* Beautiful is better than fast. +* In the face of ambiguity, refuse the temptation to guess. +* In the face of bugs, refuse the temptation to cover the bug. +* There should be one-- and preferably only one --obvious way to do it.