Lesson 6 — Building the ExEx API step by step

Question

ExEx = "Execution Extension" = Reth's plug-in for indexing / observing chain state. Tempo's tidx is built on it. Build the API from scratch.

Principle (minimum model)

Step 0 — naive subscriber. subscribe_blocks(|block| index(block)). Doesn't handle reorgs; can't see state diffs.
Step 1 — three notification variants. enum ExExEvent { Committed(...) / ChainReorged(...) / Reverted(...) }. Production needs all three.
Step 2 — ExExContext. Per-subscriber context with init, run, lifecycle hooks. Lets ExEx do setup before subscribing.
Step 3 — install_exex(...). NodeBuilder method to register ExExs. Multiple ExExs share the same node.
Step 4 — backfill API. When ExEx restarts, replay from a known block. Reth provides this.
Step 5 — pause/resume. Mid-run management. Used by Tempo's tidx for schema migrations.

Worked example + steps

Building the ExEx API step by step

ExEx (Execution Extension) is Reth's mechanism for injecting Rust code into the execution loop. With it you build node-speed indexers, MEV bots, and live risk engines — directly in the same process as the chain itself.

But the API has 4 parts that look weighty: an init/run split, a notification enum with 3 variants, an event channel for pruning hints, and an install method on the node builder. Walk it cold and you get four ideas at once.

This lesson builds the API up from the simplest possible "block listener." By the end you'll have built every piece of the real minimal ExEx — which the next lesson reads in detail.

📂 Open paradigmxyz/reth-exex-examples/minimal in another tab. That's the file we're building toward.

Step 0 — The naive indexer: separate process polling RPC

Without thinking, you'd index Ethereum like this:

fn main() {
    let rpc = HttpProvider::new("http://localhost:8545");
    let mut last_block = 0;
    loop {
        let head = rpc.get_block_number().unwrap();
        for n in (last_block+1)..=head {
            let block = rpc.get_block(n).unwrap();
            index(block);
        }
        last_block = head;
        sleep(Duration::from_secs(1));
    }
}

A separate process. Polls the RPC every second. Indexes any new blocks.

The three:

Latency. The RPC poll has request/response overhead. The indexer is always seconds behind the tip — useless for MEV, risk, real-time UX.
Atomicity. Reth commits a new block to disk before your indexer sees it. There's a window where Reth has a block your code hasn't processed. If your code is the source of truth for a derived view, that window is a race condition.
Reorgs. Polling sees head = N, then later head = N (different block). Your indexer has to detect and handle reorgs from outside, with weaker information than Reth itself has.

The fix: run in the same process as Reth. Get notified the moment a block commits, with full chain context.

Step 1 — First stab: a callback per block

Naive in-process API:

fn on_new_block<F: Fn(&Block)>(reth: &mut Reth, callback: F) {
    reth.add_listener(callback);
}

Reth calls your closure for every new block. Simple.

The two:

Reorgs aren't append-only. A callback that only fires on "new block added" can't represent "block N at hash X was replaced by block N at hash Y." Your indexer's derived state silently corrupts on every reorg.
No way to tell Reth you're done. If your indexer is processing block N, Reth doesn't know whether it can prune block N-100,000's data. Without this signal, Reth keeps everything forever.

Step 2 — A richer notification: the three chain events

Replace the bare callback with an enum that captures all three things that can happen to the chain:

enum ExExNotification {
    ChainCommitted { new: Chain },             // canonical blocks added
    ChainReorged   { old: Chain, new: Chain }, // old replaced by new
    ChainReverted  { old: Chain },             // removed (no replacement yet)
}

Each variant carries enough information for the indexer to undo or redo derived state:

ChainCommitted { new } — append the new blocks' state to your index.
ChainReorged { old, new } — undo old's state, apply new's state. Atomic swap.
ChainReverted { old } — undo old's state, wait. Reth will follow up with a future ChainCommitted once it picks a new tip.

The HashMap contains the old chain's transactions, but the canonical chain is now the new chain. Any later read off your index will return transactions that no longer exist on the canonical chain — a phantom-data bug. Worse: the new chain's transactions never got indexed (you didn't see a ChainCommitted for them — you saw a ChainReorged you ignored).

This is the #1 ExEx bug. The three-variant enum exists specifically to prevent it.

Step 3 — Tell Reth what you've finished: `FinishedHeight`

If your indexer has processed block N, Reth needs to know. Otherwise it can't safely prune anything below N.

ctx.events.send(ExExEvent::FinishedHeight(block_number_hash))?;

ctx.events is a write-only channel back to Reth. Whenever your handler finishes a block (or chain), you send FinishedHeight(N). Reth aggregates the minimum across all installed ExExes and prunes below that.

The node retains all historical state Reth would otherwise prune — because it can't safely prune anything your ExEx might want to read later. Disk usage compounds. Without FinishedHeight, an "innocuous indexer" turns Reth into a full-archive node by accident.

Step 4 — The notification stream: async pull, not callback

A callback would force Reth to wait for your slow code on every block. Better: push notifications into a stream, your handler pulls when ready:

while let Some(notification) = ctx.notifications.try_next().await? {
    // process at your pace
}

ctx.notifications is a Stream of ExExNotification. try_next is async — your handler runs in the same async runtime as Reth. Reth's progress isn't gated on your indexing speed, but your handler still observes every event in order.

When Reth shuts down or the channel closes, the loop exits cleanly with Ok(()).

Step 5 — The init/run split

A user wants to do synchronous setup (open files, init a database, allocate a buffer) before the async loop starts. A single async fn would force this setup into the future itself, where it can't be reasoned about clearly:

async fn exex_init<Node: FullNodeComponents>(
    ctx: ExExContext<Node>,
) -> eyre::Result<impl Future<Output = eyre::Result<()>>> {
    // synchronous setup goes here
    Ok(exex(ctx))  // return the long-running future
}

Two functions:

exex_init — runs once at node startup. Synchronous setup. Returns a future.
exex (the future) — runs forever (or until shutdown). Polls the notification stream.

Reth needs to acknowledge the ExEx is alive before it starts pushing notifications. If you put File::open(...) inside exex, the file open happens after Reth has already started buffering notifications — and if it fails (permissions, missing path), notifications pile up while Reth thinks the ExEx is healthy. The init/run split lets Reth distinguish "ExEx couldn't start" from "ExEx ran for a while and crashed."

Step 6 — `install_exex`: multiple extensions

Your main wires the ExEx into the node builder:

.install_exex("MyIndexer", exex_init)
.install_exex("MevWatcher", mev_init)
.install_exex("RiskEngine", risk_init)

The first arg is a name (used in metrics and logs); the second is the init function. You can chain multiple .install_exex(...) calls — each ExEx gets its own notification stream and its own FinishedHeight channel.

Practical implications:

They don't interfere. If the indexer falls behind, the MEV watcher keeps processing (each stream is independently buffered).
The pruner aligns to the slowest one. The pruner only prunes below the minimum FinishedHeight across all installed ExExs — if the indexer stalls at block 100, Reth keeps everything ≤ 100, even if the MEV watcher is at 1000.
Crashes are independent. If one ExEx panics, the others and Reth itself keep running (unless init fails, which fails node startup).
Metrics are per-ExEx. The first-arg name labels the reth_exex_<name>_* metrics.

The pruner freezes at the indexer's last FinishedHeight — every block after that is treated as "the indexer might still want to read this," so nothing prunes. Until you restart the indexer and let it advance FinishedHeight, Reth accumulates history for the entire window the indexer was down. This is the canonical production failure mode of "an unmonitored dead ExEx eats your disk."

What you've built

Every piece earned its keep:

ExExNotification enum with 3 variants (Step 2) — handles append, reorg, revert
FinishedHeight event (Step 3) — opt-in pruning, prevents disk bloat
Stream-pulled notifications (Step 4) — Reth doesn't block on your handler
init/run split (Step 5) — synchronous setup before async loop
install_exex (Step 6) — multiple extensions, each with its own stream

The next lesson reads the minimal ExEx — ~40 lines of main.rs — and shows how all five pieces fit together in real code.

Recall before moving on

Without scrolling:

Why does the API push notifications via a stream instead of calling your code directly?
What are the three ExExNotification variants, and why do you need all three?
What does FinishedHeight tell Reth, and what's the disk consequence of forgetting it?
Why is the API split into exex_init (sync) and exex (async future)?

If any answer is shaky, scroll back. The next lesson reads the real minimal ExEx in detail.

🧭 Where you are now in the stack: you've built the database layer's pub-sub + prune protocol — 3-variant ExExNotification, FinishedHeight backpressure, stream pull, init/run split, install_exex injection. Same shape as Kafka consumer offset management and Postgres logical replication slots, applied to EVM chain sync. Next lesson reads the minimal ExEx's real source against this model.

Summary (3 lines)

6-step buildup: naive subscriber → three notification variants → ExExContext → install_exex registration → backfill + pause/resume.
Production-grade indexers (Tempo tidx) use the full pattern.
Next: walkthrough the minimal ExEx.