Lesson 1 — The BFT problem from scratch

Question

Distributed systems pick one of two failure models. Crash-fault tolerance assumes nodes only fail by stopping; Byzantine fault tolerance assumes nodes can lie, equivocate, and collude. Ethereum, Hyperliquid, and every L1 are in the Byzantine model. Why does that single decision determine the entire consensus design?

Principle (minimum model)

The Byzantine model. Up to f of 3f+1 nodes can behave arbitrarily — lying, equivocating, sending conflicting messages. The system must still agree on a single value.
The CAP / FLP impossibility. No deterministic protocol can be safe + live + asynchronous all at once. Real consensus picks two; the third is degraded under stress.
Safety vs liveness. Safety = "no two nodes commit different values". Liveness = "eventually a value is committed". BFT chains generally favour safety (halt over fork).
The 2/3 majority threshold. Comes from 3f+1 — you need ≥ 2/3 of nodes to ensure that for any two quorums, their intersection includes at least one honest node.
Why this matters for engineering. The chain spec's timeout / vote-collection / fork-choice rules are not arbitrary — they fall out of these constraints. Reading consensus code without this framing makes everything look magical.

Worked example + steps

The BFT problem from scratch

It's 3 a.m. One validator in your 30-node chain just signed two conflicting blocks. Other validators are voting both ways. The chain's split-brained. What do you reach for? If you don't have a mental model for why this is even possible — for what failure modes consensus is supposed to survive — you'll spend the next eight hours guessing.

This lesson builds the model: failure modes, the safety/liveness split, and why FLP (the 1985 impossibility theorem) makes "perfect" consensus mathematically off the table.

1. The problem statement

Multiple nodes, one decision. They must all decide:

The same value — if A decides "yes" and B decides "no," the system has split-brained
A value that was proposed — they can't just agree on "42" if nobody asked about 42
Eventually — at some point, not "later if we feel like it"

These three are safety, validity, liveness. They aren't free — you cannot get all three under arbitrary network conditions. The whole field of consensus research is figuring out which trade-offs are tolerable for which use case.

2. The failure modes you must plan for

Failure	What it means	Example
Crash	A node stops responding. Period.	The validator's machine power-cycles.
Omission	A node selectively drops messages.	Packet loss; intentional censorship.
Network partition	A subset of nodes can't reach another subset.	Submarine cable cut; AWS region outage.
Byzantine	A node lies. Sends contradictory messages. Signs both A and ¬A.	Compromised validator key; bug.

The classical literature calls all of these "faults." The Byzantine case (named after the Byzantine Generals problem — a node that behaves arbitrarily, including lying) is the hardest because the faulty node is actively adversarial. Crash + omission are comparatively easy.

The answer is f < n/3. With 4 nodes you tolerate 1 Byzantine. With 7 you tolerate 2. You always need 3f+1 nodes tolerate f Byzantines. This is a hard math result, not a design choice. Proven by Lamport, Pease, and Shostak in 1982.

3. Safety vs liveness

The two properties consensus protocols promise:

Safety: "We never disagree." If A decides x and B decides y, then x = y. This is correctness.
Liveness: "We eventually decide." Some valid decision will get made in finite time. This is progress.

You cannot have both unconditionally in an asynchronous network with Byzantine faults. This is the heart of the FLP impossibility theorem (1985).

The choice every protocol makes:

Protocol family	Sacrifice when partitioned
Classical BFT (Tendermint, HotStuff, HyperBFT)	Liveness: halts during partition, never forks
Nakamoto (Bitcoin, ETH 1.0 PoW)	Safety: keeps producing, can fork temporarily
Ethereum PoS (Casper FFG + LMD-GHOST)	Hybrid: liveness for tips, finality for older blocks

There is no protocol that gets safety AND liveness AND fault tolerance AND no synchrony assumption. One of those four must give.

4. FLP impossibility — the founding result

Fischer, Lynch, Paterson (1985): In a fully asynchronous network with at least one crash failure, no deterministic protocol can guarantee both safety and liveness for all executions.

Note three things about this result:

Asynchronous — no upper bound on message delay. Real networks aren't fully asynchronous; we can sometimes assume eventually-bounded delays.
One crash — not Byzantine. Even with crash-only faults, FLP applies.
Deterministic — randomized protocols can sidestep FLP probabilistically.

How real protocols escape FLP:

Timeouts (synchrony assumption): if a message doesn't arrive in T seconds, assume the sender crashed. Now you have synchronous fallback.
Randomness (probabilistic finality): Bitcoin's proof-of-work uses randomness to elect leaders; finality is probabilistic, not absolute.
View changes (give up liveness temporarily): BFT protocols pause progress during leader failure, then resume.

Every real protocol uses at least one of these escapes.

5. The 3f+1 rule, intuited

Why exactly 3f+1? Here's the intuition before the algebra.

You ask for votes. f nodes might lie. You need a quorum (a vote count that "counts as agreement") where:

The quorum is large enough that f Byzantines can't form a majority in it
Two quorums must overlap in at least one honest node (otherwise you could decide both x and ¬x)

If quorums are size q out of n, then:

Two quorums overlap in 2q - n nodes
Overlap must contain at least one honest node: 2q - n > f
Honest nodes in overlap (q - f) must outvote Byzantines: q - f > f, so q > 2f, so q ≥ 2f+1
Substituting: 2(2f+1) - n > f, so n > 3f, so n ≥ 3f+1

This is the algebra behind every BFT system you'll ever use. Cometbft, HotStuff, HyperBFT, Casper FFG — all assume 3f+1 and break if you go below.

6. What this means for your L1

You're building Tempo. You pick a consensus. You face four questions immediately:

Question	Answer driver
How many validators?	Decentralization goal vs latency budget
Sync or async?	Sub-second finality requires synchronous timeout assumption
What to sacrifice on partition?	Payments → halt (BFT). Mainnet ETH → fork (Nakamoto-ish).
Slashing or pure economic?	Slashing requires fork detection; pure economic = staking + nothing else

Tempo (likely) picks: ~30 validators, synchronous, BFT (halt on partition), slashing. Hyperliquid: ~20 validators, synchronous, HotStuff family, slashing.

Your protocol choice flows from your business model. Get the trade-offs straight before writing code.

7. Practice

Read Lamport's original paper (1982) — Section 1 and 2 only, the rest is dense
Sketch on paper why 3f+1 is tight (you can't do it with 3f)
List 3 places in Ethereum PoS where FLP escapes are used

Final check: in one sentence, why is "Bitcoin doesn't need 3f+1" not a counterexample to the 3f+1 rule? What does Bitcoin sacrifice that lets it use chain weight instead of voting? If your answer doesn't include "probabilistic finality," re-read §3.

Summary (3 lines)

BFT model = up to f of 3f+1 can be arbitrarily malicious. The 2/3 threshold guarantees honest-quorum intersection.
Safety vs liveness is a real choice; BFT chains favour safety (halt over fork). FLP says you can't get all three of safety / liveness / asynchrony.
Every consensus parameter (timeout / vote collection / fork choice) falls out of these constraints. Next: three consensus families.