Impossibility of Distributed Consensus with One Faulty Process

Reference

Fischer, M. J., Lynch, N. A., & Paterson, M. S. (1985). “Impossibility of Distributed Consensus with One Faulty Process.” Journal of the ACM, 32(2), 374-382. URL

Summary

The FLP result is the canonical impossibility theorem of asynchronous distributed computing. Its statement is sharp: no deterministic consensus protocol can guarantee termination in an asynchronous message-passing system if even a single process may crash. Unlike earlier results that required Byzantine faults or lossy networks, FLP assumes reliable messaging and only one benign crash failure — yet still derives impossibility.

The proof proceeds by showing that every consensus protocol admits an initial bivalent configuration (one from which either decision value is still reachable), and that from any bivalent configuration an adversary scheduler can always delay one message to force the system into another bivalent configuration. Thus an admissible run exists in which no process ever decides. The core technical tool is the commutativity of disjoint process steps (Lemma 1) and a careful analysis of “critical” configurations where a specific process’s next step is decision-forcing.

The result cleaves distributed computing into what is possible under various synchrony assumptions. Real-world protocols respond by weakening one axis: Paxos and Raft adopt partial synchrony and accept that liveness can only be guaranteed “eventually”; randomized consensus (Ben-Or, Rabin) achieves termination with probability 1; failure detectors (Chandra-Toueg ◊S) encapsulate the synchrony needed. FLP remains the bedrock boundary against which all consensus engineering is measured.

Key Ideas

Consensus problem: N processes, binary inputs; non-faulty processes must all decide the same value; some initial configuration must admit each decision.
Asynchronous model: unbounded message delays; no clocks; no timeouts.
One crash failure: the weakest possible fault assumption that still breaks consensus.
Bivalent configurations: states from which both 0 and 1 outcomes are still reachable.
Adversary scheduler: by reordering message deliveries, keeps the system in a bivalent configuration forever.
Safety vs. liveness: FLP shows safety + liveness + fault-tolerance cannot coexist in pure async.
Escape hatches: partial synchrony, randomization, failure detectors, or accepting non-termination in corner cases.

Connections

CAP Theorem — CAP is a direct relative: in partition-prone systems, atomic read/write also unattainable.
CALM Theorem — monotonic logic sidesteps consensus by avoiding it.
Keeping CALM - When Distributed Consistency is Easy
Coordination Avoidance — the design pattern motivated by FLP.
Gossip Protocols — probabilistic convergence as an alternative to deterministic agreement.
Time Clocks and the Ordering of Events in a Distributed System — Lamport’s logical time underlies the proof’s commutation arguments.
Knowledge and Common Knowledge in a Distributed Environment — common knowledge likewise unattainable in async systems.

Conceptual Contribution

In an asynchronous system where even a single process may fail silently, there is no deterministic protocol that guarantees every non-faulty process eventually decides — not because of cleverness gaps, but because a single slow process is indistinguishable from a crashed one, and that indistinguishability is weaponizable by the scheduler. Consensus requires timing assumptions, randomness, or failure oracles; these are not optional design choices but mathematical necessities.

Backlinks

Linked Pages

Knowledge and Common Knowledge in a Distributed Environment

Reference

Halpern, J. Y., & Moses, Y. (1990). “Knowledge and Common Knowledge in a Distributed Environment.” Journal of the ACM, 37(3), 549-587. URL

Summary

Halpern and Moses recast distributed protocol design as knowledge transformation: sending a message changes the knowledge state of the system, and correctness specifications can be stated in terms of what individual processes, groups, or “the system” know at various points. The paper develops a formal epistemic logic for distributed systems with operators K_i (agent i knows), E (everyone knows), and C (common knowledge — everyone knows that everyone knows, to infinite depth).

The central technical result is striking: in a truly asynchronous distributed system, common knowledge is unattainable. The classical “coordinated attack” problem (two generals must attack simultaneously but communicate only via lossy messengers) is unsolvable because simultaneous action requires common knowledge of the time to attack, and no finite message chain ever reaches the fixpoint. The muddy children puzzle — a beloved epistemic set piece walked through in the opening — shows how a public announcement can create common knowledge that sequential private reasoning cannot, illuminating why synchronous broadcast matters.

Because strict common knowledge is unattainable, the paper introduces and characterises weaker variants: eventual common knowledge, ε-common knowledge (bounded delay), timestamped common knowledge, and concurrent common knowledge. Each corresponds to a different system model (synchronous, partially synchronous, etc.) and a different class of solvable coordination tasks. This framework turned “what does a protocol achieve?” into a precise epistemic question and seeded the entire field of Theoretical Aspects of Reasoning about Knowledge (TARK).

Key Ideas

Epistemic hierarchy: distributed knowledge ⊑ someone knows ⊑ everyone knows (E) ⊑ common knowledge (C).
Common knowledge (C): fixpoint E^ω — needed for any simultaneous coordinated action.
Coordinated attack / muddy children: canonical illustrations of how public announcements create C.
Impossibility: in async or even synchronous-with-uncertain-delivery systems, C cannot be attained by message exchange.
Weakenings: ε-C (bounded-time C), eventual C, timestamped C, concurrent C — each aligned with a synchrony assumption.
Protocols as knowledge transformers: specifications are claims about K_i, E, C; message sends are knowledge updates.
Runs-and-systems semantics: each global history is a “run”; agent’s knowledge is the set of runs it cannot distinguish from the actual one.

Connections

Epistemic Logic
Epistemic S5
Knowing What vs Knowing That
The Synthesis of Digital Machines with Provable Epistemic Properties — Rosenschein-Kaelbling operationalise this framework.
Time Clocks and the Ordering of Events in a Distributed System — the “local history” underpinning indistinguishable runs is Lamport’s past-cone.
Impossibility of Distributed Consensus with One Faulty Process — consensus impossibility has an epistemic reading: common knowledge of a decision cannot be attained.
CAP Theorem

Conceptual Contribution

Time Clocks and the Ordering of Events in a Distributed System

Time, Clocks, and the Ordering of Events in a Distributed System

Reference

Lamport, L. (1978). “Time, Clocks, and the Ordering of Events in a Distributed System.” Communications of the ACM, 21(7), 558-565. URL

Summary

Lamport’s seminal paper reframes time in distributed systems. Rather than depending on physical clocks that cannot be perfectly synchronized, he defines a partial order on events — the “happened-before” relation (→) — using only causality derived from local process order and message exchange. Two events are concurrent iff neither happened before the other. This reorientation showed that a distributed system’s truth about “when” is inherently relational, not absolute, mirroring the space-time view of special relativity.

Lamport then introduces logical clocks — counters per process that assign numbers to events such that a → b implies C(a) < C(b). A simple algorithm (increment on local events; piggyback timestamps on messages; bump local clock on receipt) provides this. Extending the partial order to a total order via tiebreaking by process ID enables the classic distributed mutual exclusion algorithm that uses only message passing without a central coordinator.

The final portion generalises to physical clocks, deriving synchronization bounds that tolerate clock drift and message delay. The paper is foundational: it underlies vector clocks, causal broadcast, state-machine replication, Paxos, version vectors in Dynamo/Riak, CRDT causality tracking, and distributed snapshotting.

Key Ideas

Happened-before (→): irreflexive partial order capturing causal precedence.
Concurrency: a ∦ b when neither a → b nor b → a; cannot be eliminated in a distributed system.
Logical clocks: monotone counters satisfying the Clock Condition; do not measure real time but respect causality.
Total ordering: arbitrary tie-break on process IDs yields a global total order usable for coordination.
Distributed mutual exclusion: a worked example of using total ordering of timestamped requests to implement a resource without a central arbiter.
Space-time diagrams: process lines + message arrows — a durable visual vocabulary for distributed reasoning.
Physical clock synchronization: bounded drift + bounded message delay yield clock-sync with provable error.

Connections

CAP Theorem — partitions and concurrent writes are grounded in Lamport’s concurrent events.
CALM Theorem — monotonic computation avoids the coordination that Lamport’s algorithms provide.
Keeping CALM - When Distributed Consistency is Easy
Coordination Avoidance — when → does not need to be totally ordered.
Gossip Protocols — epidemic dissemination implicitly maintains a causal partial order.
Impossibility of Distributed Consensus with One Faulty Process — FLP shows limits of coordination Lamport makes feasible in fault-free settings.
Knowledge and Common Knowledge in a Distributed Environment — epistemic reading of causal pasts.

Conceptual Contribution

Gossip Protocols

Epidemic information dissemination and aggregation in distributed systems.

Sources

Coordination Avoidance

The design discipline of eliminating synchronisation (locks, quorums, consensus rounds) wherever a computation can be proved correct without it. The CALM Theorem makes this precise: coordination is avoidable exactly for monotonic programs.

Practical techniques:

Build from monotonic primitives (CRDTs, set unions, counters-up-only)
Use Immutable Data Structures and Tombstones to convert mutation into monotonic history
Isolate the non-monotonic boundary (e.g. a shopping-cart checkout) and coordinate there only
Program in a language like Bloom Language that makes monotonicity syntactically checkable

Contrasts with storage-consistency work (linearizability, serializability) which enforces coordination inside the storage layer regardless of program shape.

In this vault

Keeping CALM - When Distributed Consistency is Easy

Keeping CALM: When Distributed Consistency is Easy

Reference: Joseph M. Hellerstein & Peter Alvaro (2019). arXiv:1901.01930v2. Source file: ../1901.01930v2.pdf (in parent directory). URL

Summary

An accessible, updated statement of the CALM Theorem — Consistency As Logical Monotonicity: a program has a consistent, coordination-free distributed implementation if and only if it is monotonic in a formal logical sense. Monotonic programs are “safe” under missing information and can proceed without waiting: once something has been concluded, it stays concluded. Non-monotonic programs (“change their mind” in the face of new inputs) are necessarily order-sensitive and therefore require coordination to produce a single deterministic outcome.

CALM is a positive counterpart to the negative results of the CAP Theorem and the FLP impossibility. Where CAP says “you can’t always have it all”, CALM delineates the class of programs that can in fact satisfy all of C, A, and P simultaneously: exactly the monotonic ones. The theorem shifts attention away from storage consistency (linearizability, serializability) toward program consistency — Confluence: deterministic outcomes despite non-deterministic message delivery and ordering.

The paper traces the theorem’s implications for Bloom Language, Dedalus, CRDTs, coordination-free design patterns (monotonic accumulation, tombstones, immutable data), distributed garbage collection, and Amazon’s shopping-cart example. It closes with open questions: expressiveness of monotonic languages, program synthesis targeting monotonicity, repair of non-monotonic code, and a possible Stochastic CALM for near-optimum stochastic algorithms.

Key Ideas

Monotonicity ⇔ coordination-freeness ⇔ consistency (CALM)
Program consistency = confluence: same outputs regardless of message order / batching
Non-monotonic operations retract earlier conclusions, hence must wait for “all the news” — requiring Coordination Avoidance logic to know when
Formalisation via Relational Transducer networks (Ameloot, Neven, Van den Bussche)
CAP is a special case: linearizable storage is one non-monotonic operation; CALM asks the question across all programs
Practical playbook: immutable data structures, tombstones, set-monotonic accumulation, CRDTs as object-oriented monotonic types
Bloom gives syntactic monotonicity checking — a programmer writing monotonic Bloom is guaranteed coordination-free correctness

Connections

CALM Theorem
CAP Theorem
Confluence
Monotonic Logic
Coordination Avoidance
Bloom Language
Dedalus
CRDTs
Relational Transducer
Non-monotonic Reasoning
Gossip-Based Computation of Aggregate Information
Gossip-based Aggregation in Large Dynamic Networks
Extensible Distributed Coordination
Are Multiagent Systems Resilient to Communication Failures
Foundations of Logic Programming - Lloyd
Logic and Lattices for Distributed Programming — Conway et al. 2012 (BloomL; lattice-typed lifting of CALM)
Relational Transducers for Declarative Networking — Ameloot, Neven & Van den Bussche 2011 (the formal proof of CALM)
A Comprehensive Study of Convergent and Commutative Replicated Data Types — Shapiro et al. 2011 (the CRDT canonical reference)

Conceptual Contribution

Claim: A distributed program is consistent and coordination-free iff it is expressible in monotonic logic — a tight, provable characterisation that converts “distributed consistency is hard” into a precise question about a program’s logical shape.
Mechanism: Formalise programs as networks of Relational Transducers (Ameloot et al.); define Confluence as program-level determinism under non-deterministic delivery; prove biconditional between confluence and monotonic-logic expressibility.
Concepts introduced/used: Monotonic Logic, Confluence, Coordination Avoidance, Relational Transducer, Bloom Language, Dedalus, CRDTs, CAP Theorem, Immutable Data Structures, Tombstones, Stochastic CALM
Stance: foundational / survey
Relates to: Provides the theoretical companion to the Gossip Protocols cluster (Gossip-Based Computation of Aggregate Information, Gossip-based Aggregation in Large Dynamic Networks) — gossip aggregation with mass conservation is exactly a monotonic accumulation, hence coordination-free. Frames why Extensible Distributed Coordination must reach for stored-procedure coordination precisely where programs are non-monotonic. Contrasts with Are Multiagent Systems Resilient to Communication Failures: that paper shows game-theoretic MAS are brittle under message loss — CALM identifies the logical class of programs immune to such loss. The monotonic/non-monotonic split echoes Foundations of Logic Programming - Lloyd’s distinction between Horn-clause logic and Negation as Failure.

CALM Theorem

Consistency As Logical Monotonicity (Hellerstein 2010 conjecture; Ameloot, Neven, Van den Bussche 2013 proof). A distributed program has a consistent, coordination-free implementation if and only if it is expressible in monotonic logic.

Monotonic programs produce a set of outputs that only grows as inputs arrive (S ⊆ T ⟹ P(S) ⊆ P(T)). Such programs are safe under missing information: any conclusion reached on a subset of inputs remains valid when more arrive. Non-monotonic programs may retract earlier conclusions, so they must wait for “all the news” — which is what distributed coordination enforces.

CALM is the positive counterpart to the CAP Theorem: it names the exact class of programs that can satisfy Consistency, Availability, and Partition-tolerance simultaneously.

Why it matters

Moves consistency from a property of storage (linearizability, serializability) to a property of programs — Confluence.
Tells programmers what kinds of features are free (monotonic: set union, counting up, set-containment) and what must be paid for (non-monotonic: set difference, strict aggregation, count-exact, deletions without tombstones).
Gives a design rule for coordination-free systems: build from monotonic primitives — CRDTs, Immutable Data Structures, Tombstones, monotonic accumulators — and only add coordination at non-monotonic boundaries.

Formal statement

Using Relational Transducer networks as the execution model and Confluence as the consistency criterion, Ameloot et al. prove: a query Q has a coordination-free distributed evaluation plan iff Q is monotone.

In this vault

Keeping CALM - When Distributed Consistency is Easy — the definitive survey
CAP Theorem
Confluence
Monotonic Logic
Coordination Avoidance
Relational Transducer
Bloom Language
Dedalus
CRDTs
Non-monotonic Reasoning

CAP Theorem

Brewer (2000); proved Gilbert & Lynch (2002). A replicated storage system cannot simultaneously provide all three of: Consistency (linearizable reads), Availability (every request gets a response), Partition tolerance (continues operating under network partitions). Under a partition, the system must choose between C and A.

CAP is a negative result about storage. The CALM Theorem is its positive counterpart about programs: the subclass of programs achieving all three is exactly the monotonic ones.

Impossibility of Distributed Consensus with One Faulty Process

Reference

Summary

Key Ideas

Connections

Conceptual Contribution

Tags

Backlinks