Lesson 2 — Reth's pipeline: 10 stages, in order

Question

Reth runs 10 stages in sequence. Each is a Stage impl; the pipeline runs them top-down for sync. What each does + the parallelisation opportunities.

Principle (minimum model)

1. HeaderStage. Fetches block headers from peers; validates parent chain.
2. BodyStage. Fetches block bodies (txs + uncles); pairs with headers.
3. SenderRecoveryStage. Recovers tx sender addresses from signatures. Embarrassingly parallel; rayon par_iter for speed.
4. ExecutionStage. Executes every tx through revm; updates state. The expensive stage; ~70 % of sync time.
5. MerkleStage. Builds the state Merkle Patricia Trie; computes state root.
6. AccountHashingStage + 7. StorageHashingStage + 8. AccountHistoryIndexStage + 9. StorageHistoryIndexStage. Indexes for fast historical queries.
10. TransactionLookupStage. Indexes tx hashes for fast lookup.
Pipelining. Each stage hands off to the next; they can run in parallel up to a checkpoint.
Why this many. Each is a known bottleneck; isolating them allows optimisation. SenderRecovery uses rayon; Execution uses revm parallel; hashing batches.

Worked example + steps

Reth's pipeline: 10 stages, in order

Picture your node a minute after it joins the network. It has just pulled down 10 million blocks worth of headers from peers. Now it has to turn that pile of headers into a fully validated chain — execute every transaction, recompute every state root, build every index. Done block-by-block the way old clients did it, that takes weeks. Reth does it in hours.

The trick: don't process one block at a time. Process one operation at a time, across thousands of blocks — recover all the senders, then execute all the transactions, then hash all the accounts, and so on. That's "staged sync." There are 10 stages, they run in a fixed order, and the order is not arbitrary — every constraint between stages is encoded in which one runs when.

(Last lesson you built up the Stage trait. The 10 stages below are the actual implementations.)

flowchart LR
    H[HeaderStage] --> B[BodyStage]
    B --> S[SenderRecoveryStage]
    S --> E[ExecutionStage]
    E --> AH[AccountHashingStage]
    AH --> SH[StorageHashingStage]
    SH --> M[MerkleStage]
    M --> T[TransactionLookupStage]
    T --> I[IndexHistoryStages]
    I --> F[FinishStage]

Open crates/stages/stages/src/stages/ in the Reth repo as you read.

The 10 stages

#	Stage	What it does	Hot loop
1	`HeaderStage`	Download block headers	network I/O
2	`BodyStage`	Download tx bodies + uncles	network I/O
3	`SenderRecoveryStage`	ECDSA-recover the sender of each tx (turn the signature back into the address that signed it)	CPU (parallel)
4	`ExecutionStage`	Run Revm; accumulate state diffs	CPU (Revm)
5	`AccountHashingStage`	Sort account changes by hashed key	sort + write
6	`StorageHashingStage`	Sort storage changes by hashed key	sort + write
7	`MerkleStage`	Update Merkle Patricia Trie roots (Ethereum's state-tree hash structure)	tree compute
8	`TransactionLookupStage`	Build `tx_hash → (block, index)` index	sort + write
9	`IndexAccountHistoryStage` + `IndexStorageHistoryStage`	Historical access indices	sort + write
10	`FinishStage`	Bookkeep, finalize	none

Order matters: three constraints

The pipeline's order is fixed by three constraints. Each one forces one stage to come before another.

Constraint 1 — `MerkleStage` must come after hashing

The Merkle stage consumes sorted hashed keys. Account hashing and storage hashing must finish — and commit their sort — before Merkle starts. Interleaving doesn't work: the Merkle stage needs the whole sorted set for the block range it's processing.

That's why AccountHashingStage (5) and StorageHashingStage (6) come before MerkleStage (7). In this specific order, with full commits between them.

Constraint 2 — `AccountHashingStage` and `StorageHashingStage` could run in parallel

Both consume ExecutionStage's output. They produce independent sorted change sets (account-keyed vs storage-keyed). So why does the pipeline run them sequentially?

🔍 Open account_hashing.rs and storage_hashing.rs. Read the first 30 lines of each. What do they share? What's the practical reason Reth doesn't fork off two threads here?

Two reasons:

Disk write contention. Both stages write to MDBX (Reth's embedded key-value store). Running them in parallel would contend on the database lock with no compute benefit.
Pipeline simplicity. Sequential execution lets the orchestrator's scheduler stay a flat list. Parallel branches would require a DAG scheduler (one that can run independent stages concurrently) — more complexity, marginal gain.

The Frontiers 2025 talk (linked at the bottom) discusses exactly this trade-off — what did get parallelized and what stayed sequential, and why.

Constraint 3 — `SenderRecoveryStage` is the parallelism win

SenderRecoveryStage. ECDSA recovery — turning a transaction signature back into the signer's address — is pure CPU, no shared state, embarrassingly parallel. Reth uses Rayon (Rust's data-parallelism library) to fan it out across all CPU cores.

What makes this stage the standout:

Massive batch size. Each block has 100–300 transactions; a stage call processes 100K+ blocks at a time = 10–30M signatures per call.
No data dependencies. Each recovery is independent — no waiting on previous results.
Pure compute. No I/O between recoveries.

ExecutionStage (4) is also CPU-bound but has sequential state dependencies — block N's storage writes affect block N+1's reads. You can't trivially parallelize it without optimistic execution (Block-STM and similar schemes, which speculatively run blocks in parallel and re-run on conflict), which brings its own consensus complications.

Why staged sync wins

Block-by-block sync (geth's old default) does ~50–100 blocks/sec at full tilt. Staged sync does 10K+ blocks/sec. The 100× comes from three compounding factors:

Batching. Sender recovery, hashing, Merkle root computation — all amortized across thousands of blocks per call.
Stage-level parallelism. Within a stage (especially SenderRecoveryStage), Rayon fans work across all cores.
I/O amortization. Disk writes happen in big sorted batches at stage boundaries, not after every block.

Roughly: ECDSA recovery batched + parallelized contributes ~10×; Merkle root computed once per range instead of per block contributes another ~10×; sorted batched writes that MDBX handles efficiently contribute ~3×. Multiplied: ~300× theoretical; the practical number lands around 100–200× depending on hardware.

Recall before the quiz

Without scrolling:

Why is MerkleStage after hashing, not interleaved?
Why don't AccountHashingStage and StorageHashingStage run in parallel, even though they could?
Of the 10 stages, which has the biggest parallelism win and why?
Three reasons staged sync is faster than block-by-block?

The next lesson is a quiz. Engage with these recalls now if any answer is shaky.

📺 Further watching

zntRpCKHyDc | Georgios Konstantopoulos — Reth: A New Rust Ethereum Client (architecture intro)

z3tj8Lk_Ydo | Alexey Shekhirin & Dan Cline — Hyperoptimizing Reth (Frontiers 2025, pipeline perf)

Summary (3 lines)

10 stages: HeaderStage → BodyStage → SenderRecoveryStage (rayon parallel) → ExecutionStage (~70 % time, revm) → MerkleStage (state root) → 4 indexing stages → TransactionLookupStage.
Each is independently optimisable. Pipelining hands off checkpoints between stages.
Why 10: every known bottleneck is isolated. Next: quiz.