Lesson 1 — The BFT problem from scratch
Question
Distributed systems pick one of two failure models. Crash-fault tolerance assumes nodes only fail by stopping; Byzantine fault tolerance assumes nodes can lie, equivocate, and collude. Ethereum, Hyperliquid, and every L1 are in the Byzantine model. Why does that single decision determine the entire consensus design?
Principle (minimum model)
- The Byzantine model. Up to f of 3f+1 nodes can behave arbitrarily — lying, equivocating, sending conflicting messages. The system must still agree on a single value.
- The CAP / FLP impossibility. No deterministic protocol can be safe + live + asynchronous all at once. Real consensus picks two; the third is degraded under stress.
- Safety vs liveness. Safety = "no two nodes commit different values". Liveness = "eventually a value is committed". BFT chains generally favour safety (halt over fork).
- The 2/3 majority threshold. Comes from
3f+1— you need ≥ 2/3 of nodes to ensure that for any two quorums, their intersection includes at least one honest node. - Why this matters for engineering. The chain spec's timeout / vote-collection / fork-choice rules are not arbitrary — they fall out of these constraints. Reading consensus code without this framing makes everything look magical.
Worked example + steps
The BFT problem from scratch
It's 3 a.m. One validator in your 30-node chain just signed two conflicting blocks. Other validators are voting both ways. The chain's split-brained. What do you reach for? If you don't have a mental model for why this is even possible — for what failure modes consensus is supposed to survive — you'll spend the next eight hours guessing.
This lesson builds the model: failure modes, the safety/liveness split, and why FLP (the 1985 impossibility theorem) makes "perfect" consensus mathematically off the table.
1. The problem statement
Multiple nodes, one decision. They must all decide:
- The same value — if A decides "yes" and B decides "no," the system has split-brained
- A value that was proposed — they can't just agree on "42" if nobody asked about 42
- Eventually — at some point, not "later if we feel like it"
These three are safety, validity, liveness. They aren't free — you cannot get all three under arbitrary network conditions. The whole field of consensus research is figuring out which trade-offs are tolerable for which use case.
2. The failure modes you must plan for
| Failure | What it means | Example |
|---|---|---|
| Crash | A node stops responding. Period. | The validator's machine power-cycles. |
| Omission | A node selectively drops messages. | Packet loss; intentional censorship. |
| Network partition | A subset of nodes can't reach another subset. | Submarine cable cut; AWS region outage. |
| Byzantine | A node lies. Sends contradictory messages. Signs both A and ¬A. | Compromised validator key; bug. |
The classical literature calls all of these "faults." The Byzantine case (named after the Byzantine Generals problem — a node that behaves arbitrarily, including lying) is the hardest because the faulty node is actively adversarial. Crash + omission are comparatively easy.
The answer is f < n/3. With 4 nodes you tolerate 1 Byzantine. With 7 you tolerate 2. You always need 3f+1 nodes tolerate f Byzantines. This is a hard math result, not a design choice. Proven by Lamport, Pease, and Shostak in 1982.
3. Safety vs liveness
The two properties consensus protocols promise:
- Safety: "We never disagree." If A decides x and B decides y, then x = y. This is correctness.
- Liveness: "We eventually decide." Some valid decision will get made in finite time. This is progress.
You cannot have both unconditionally in an asynchronous network with Byzantine faults. This is the heart of the FLP impossibility theorem (1985).
The choice every protocol makes:
| Protocol family | Sacrifice when partitioned |
|---|---|
| Classical BFT (Tendermint, HotStuff, HyperBFT) | Liveness: halts during partition, never forks |
| Nakamoto (Bitcoin, ETH 1.0 PoW) | Safety: keeps producing, can fork temporarily |
| Ethereum PoS (Casper FFG + LMD-GHOST) | Hybrid: liveness for tips, finality for older blocks |
There is no protocol that gets safety AND liveness AND fault tolerance AND no synchrony assumption. One of those four must give.
4. FLP impossibility — the founding result
Fischer, Lynch, Paterson (1985): In a fully asynchronous network with at least one crash failure, no deterministic protocol can guarantee both safety and liveness for all executions.
Note three things about this result:
- Asynchronous — no upper bound on message delay. Real networks aren't fully asynchronous; we can sometimes assume eventually-bounded delays.
- One crash — not Byzantine. Even with crash-only faults, FLP applies.
- Deterministic — randomized protocols can sidestep FLP probabilistically.
How real protocols escape FLP:
- Timeouts (synchrony assumption): if a message doesn't arrive in T seconds, assume the sender crashed. Now you have synchronous fallback.
- Randomness (probabilistic finality): Bitcoin's proof-of-work uses randomness to elect leaders; finality is probabilistic, not absolute.
- View changes (give up liveness temporarily): BFT protocols pause progress during leader failure, then resume.
Every real protocol uses at least one of these escapes.
5. The 3f+1 rule, intuited
Why exactly 3f+1? Here's the intuition before the algebra.
You ask for votes. f nodes might lie. You need a quorum (a vote count that "counts as agreement") where:
- The quorum is large enough that f Byzantines can't form a majority in it
- Two quorums must overlap in at least one honest node (otherwise you could decide both x and ¬x)
If quorums are size q out of n, then:
- Two quorums overlap in 2q - n nodes
- Overlap must contain at least one honest node: 2q - n > f
- Honest nodes in overlap (q - f) must outvote Byzantines: q - f > f, so q > 2f, so q ≥ 2f+1
- Substituting: 2(2f+1) - n > f, so n > 3f, so n ≥ 3f+1
This is the algebra behind every BFT system you'll ever use. Cometbft, HotStuff, HyperBFT, Casper FFG — all assume 3f+1 and break if you go below.
6. What this means for your L1
You're building Tempo. You pick a consensus. You face four questions immediately:
| Question | Answer driver |
|---|---|
| How many validators? | Decentralization goal vs latency budget |
| Sync or async? | Sub-second finality requires synchronous timeout assumption |
| What to sacrifice on partition? | Payments → halt (BFT). Mainnet ETH → fork (Nakamoto-ish). |
| Slashing or pure economic? | Slashing requires fork detection; pure economic = staking + nothing else |
Tempo (likely) picks: ~30 validators, synchronous, BFT (halt on partition), slashing. Hyperliquid: ~20 validators, synchronous, HotStuff family, slashing.
Your protocol choice flows from your business model. Get the trade-offs straight before writing code.
7. Practice
- Read Lamport's original paper (1982) — Section 1 and 2 only, the rest is dense
- Sketch on paper why 3f+1 is tight (you can't do it with 3f)
- List 3 places in Ethereum PoS where FLP escapes are used
Final check: in one sentence, why is "Bitcoin doesn't need 3f+1" not a counterexample to the 3f+1 rule? What does Bitcoin sacrifice that lets it use chain weight instead of voting? If your answer doesn't include "probabilistic finality," re-read §3.
Summary (3 lines)
- BFT model = up to f of 3f+1 can be arbitrarily malicious. The 2/3 threshold guarantees honest-quorum intersection.
- Safety vs liveness is a real choice; BFT chains favour safety (halt over fork). FLP says you can't get all three of safety / liveness / asynchrony.
- Every consensus parameter (timeout / vote collection / fork choice) falls out of these constraints. Next: three consensus families.