Distributed Systems

A grab-bag of notes on building reliable systems out of unreliable parts.

March 8, 2024 · computing, theory

The fundamental problem of distributed computing: you have many machines, each of which fails independently, and you need them to agree on something.

Eight fallacies

Peter Deutsch’s classic list of things you must never assume about a distributed system:

The network is reliable.
Latency is zero.
Bandwidth is infinite.
The network is secure.
Topology doesn’t change.
There is one administrator.
Transport cost is zero.
The network is homogeneous.

Every distributed-systems failure mode is a corollary of violating one of these. See Consistency Models for the most important consequence: you can’t have everything.

Consensus

The canonical problem is consensus: all nodes agree on a value despite some subset failing. Paxos and Raft are the standard solutions. The cost is a network round trip per decision.

The Zettelkasten note (Zettelkasten) is unrelated content-wise, but the shape is the same — a graph of nodes that must converge by exchanging messages.