add design philosophy doc (#3034)

6 years ago · ddbdffb4e5
--- a/PHILOSOPHY.md
+++ b/PHILOSOPHY.md
@ -0,0 +1,158 @@
 ## Design goals

 The design goals for Tendermint (and the SDK and related libraries) are:

 * Simplicity and Legibility
 * Parallel performance, namely ability to utilize multicore architecture
 * Ability to evolve the codebase bug-free
 * Debuggability
 * Complete correctness that considers all edge cases, esp in concurrency
 * Future-proof modular architecture, message protocol, APIs, and encapsulation


 ### Justification

 Legibility is key to maintaining bug-free software as it evolves toward more
 optimizations, more ease of debugging, and additional features.

 It is too easy to introduce bugs over time by replacing lines of code with
 those that may panic, which means ideally locks are unlocked by defer
 statements.

 For example,

 ```go
 func (obj *MyObj) something() {
 	mtx.Lock()
 	obj.something = other
 	mtx.Unlock()
 }
 ```

 It is too easy to refactor the codebase in the future to replace `other` with
 `other.String()` for example, and this may introduce a bug that causes a
 deadlock.  So as much as reasonably possible, we need to be using defer
 statements, even though it introduces additional overhead.

 If it is necessary to optimize the unlocking of mutex locks, the solution is
 more modularity via smaller functions, so that defer'd unlocks are scoped
 within a smaller function.

 Similarly, idiomatic for-loops should always be preferred over those that use
 custom counters, because it is too easy to evolve the body of a for-loop to
 become more complicated over time, and it becomes more and more difficult to
 assess the correctness of such a for-loop by visual inspection.


 ### On performance

 It doesn't matter whether there are alternative implementations that are 2x or
 3x more performant, when the software doesn't work, deadlocks, or if bugs
 cannot be debugged.  By taking advantage of multicore concurrency, the
 Tendermint implementation will at least be an order of magnitude within the
 range of what is theoretically possible.  The design philosophy of Tendermint,
 and the choice of Go as implementation language, is designed to make Tendermint
 implementation the standard specification for concurrent BFT software.

 By focusing on the message protocols (e.g. ABCI, p2p messages), and
 encapsulation e.g. IAVL module, (relatively) independent reactors, we are both
 implementing a standard implementation to be used as the specification for
 future implementations in more optimizable languages like Rust, Java, and C++;
 as well as creating sufficiently performant software. Tendermint Core will
 never be as fast as future implementations of the Tendermint Spec, because Go
 isn't designed to be as fast as possible.  The advantage of using Go is that we
 can develop the whole stack of modular components **faster** than in other
 languages.

 Furthermore, the real bottleneck is in the application layer, and it isn't
 necessary to support more than a sufficiently decentralized set of validators
 (e.g. 100 ~ 300 validators is sufficient, with delegated bonded PoS).

 Instead of optimizing Tendermint performance down to the metal, lets focus on
 optimizing on other matters, namely ability to push feature complete software
 that works well enough, can be debugged and maintained, and can serve as a spec
 for future implementations.


 ### On encapsulation

 In order to create maintainable, forward-optimizable software, it is critical
 to develop well-encapsulated objects that have well understood properties, and
 to re-use these easy-to-use-correctly components as building blocks for further
 encapsulated meta-objects.

 For example, mutexes are cheap enough for Tendermint's design goals when there
 isn't goroutine contention, so it is encouraged to create concurrency safe
 structures with struct-level mutexes.  If they are used in the context of
 non-concurrent logic, then the performance is good enough.  If they are used in
 the context of concurrent logic, then it will still perform correctly.

 Examples of this design principle can be seen in the types.ValidatorSet struct,
 and the cmn.Rand struct.  It's one single struct declaration that can be used
 in both concurrent and non-concurrent logic, and due to its well encapsulation,
 it's easy to get the usage of the mutex right.

 #### example: cmn.Rand:

 `The default Source is safe for concurrent use by multiple goroutines, but
 Sources created by NewSource are not`.  The reason why the default
 package-level source is safe for concurrent use is because it is protected (see
 `lockedSource` in https://golang.org/src/math/rand/rand.go).

 But we shouldn't rely on the global source, we should be creating our own
 Rand/Source instances and using them, especially for determinism in testing.
 So it is reasonable to have cmn.Rand be protected by a mutex.  Whether we want
 our own implementation of Rand is another question, but the answer there is
 also in the affirmative.  Sometimes you want to know where Rand is being used
 in your code, so it becomes a simple matter of dropping in a log statement to
 inject inspectability into Rand usage.  Also, it is nice to be able to extend
 the functionality of Rand with custom methods.  For these reasons, and for the
 reasons which is outlined in this design philosophy document, we should
 continue to use the cmn.Rand object, with mutex protection.

 Another key aspect of good encapsulation is the choice of exposed vs unexposed
 methods.  It should be clear to the reader of the code, which methods are
 intended to be used in what context, and what safe usage is.  Part of this is
 solved by hiding methods via unexported methods.  Another part of this is
 naming conventions on the methods (e.g. underscores) with good documentation,
 and code organization.  If there are too many exposed methods and it isn't
 clear what methods have what side effects, then there is something wrong about
 the design of abstractions that should be revisited.


 ### On concurrency

 In order for Tendermint to remain relevant in the years to come, it is vital
 for Tendermint to take advantage of multicore architectures.  Due to the nature
 of the problem, namely consensus across a concurrent p2p gossip network, and to
 handle RPC requests for a large number of consuming subscribers, it is
 unavoidable for Tendermint development to require expertise in concurrency
 design, especially when it comes to the reactor design, and also for RPC
 request handling.


 ## Guidelines

 Here are some guidelines for designing for (sufficient) performance and concurrency:

 * Mutex locks are cheap enough when there isn't contention.
 * Do not optimize code without analytical or observed proof that it is in a hot path.
 * Don't over-use channels when mutex locks w/ encapsulation are sufficient.
 * The need to drain channels are often a hint of unconsidered edge cases.
 * The creation of O(N) one-off goroutines is generally technical debt that
   needs to get addressed sooner than later.  Avoid creating too many
 goroutines as a patch around incomplete concurrency design, or at least be
 aware of the debt and do not invest in the debt.  On the other hand, Tendermint
 is designed to have a limited number of peers (e.g. 10 or 20), so the creation
 of O(C) goroutines per O(P) peers is still O(C\*P=constant).
  * Use defer statements to unlock as much as possible.  If you want to unlock sooner,
    try to create more modular functions that do make use of defer statements.

 ## Matras

 * Premature optimization kills
 * Readability is paramount
 * Beautiful is better than fast.
 * In the face of ambiguity, refuse the temptation to guess.
 * In the face of bugs, refuse the temptation to cover the bug.
 * There should be one-- and preferably only one --obvious way to do it.