Distributed Systems
A grab-bag of notes on building reliable systems out of unreliable parts.
The fundamental problem of distributed computing: you have many machines, each of which fails independently, and you need them to agree on something.
Eight fallacies
Peter Deutsch’s classic list of things you must never assume about a distributed system:
- The network is reliable.
- Latency is zero.
- Bandwidth is infinite.
- The network is secure.
- Topology doesn’t change.
- There is one administrator.
- Transport cost is zero.
- The network is homogeneous.
Every distributed-systems failure mode is a corollary of violating one of these. See Consistency Models for the most important consequence: you can’t have everything.
Consensus
The canonical problem is consensus: all nodes agree on a value despite some subset failing. Paxos and Raft are the standard solutions. The cost is a network round trip per decision.
The Zettelkasten note (Zettelkasten) is unrelated content-wise, but the shape is the same — a graph of nodes that must converge by exchanging messages.