Lesson 13 — How Revm tests itself — state tests, EOF tests, and execution-spec compliance
Question
How does revm verify it matches Ethereum? State tests + EOF tests + execution-spec compliance. Each is a different layer of assurance.
Principle (minimum model)
- Ethereum state tests. 1000+ JSON files describing pre-state + tx + expected post-state. Revm runs each; asserts the post-state matches.
- Where state tests live.
tests/EthereumTests/GeneralStateTests/. Subdirectories per category (Berlin / Cancun / Prague). - EOF tests. Tests for EVM Object Format (Prague-era). Reject malformed EOF bytecode; validate well-formed.
- Execution-spec tests. Compliance against the Ethereum execution-spec (
execution-specs/). Format-agnostic. - Why three layers. State tests catch wrong opcode behaviour. EOF tests catch wrong bytecode validation. Execution-spec catches wrong semantics. Defence in depth.
- Test runner.
cargo test --test ethereum. Iterates JSON files; runs revm; compares results. - CI integration. Every PR runs the full suite. Slowdowns + failures gate the merge.
- Why this matters. Revm is embedded in Reth + Foundry + Hyperliquid + Tempo. A bug here breaks everything downstream. Testing is non-optional.
- Production parallel. All Ethereum clients (geth / nethermind / besu / reth) run the same state tests. Cross-client compatibility.
Worked example + steps
How Revm tests itself — state tests, EOF tests, and execution-spec compliance
You walked the interpreter, the instruction table, and the Database trait. Now: how does the Revm team prove that Revm actually executes the EVM correctly? The answer is not "we read it and nodded." A consensus-critical engine — one bug ships a chain split — is held to a different bar. This lesson reads the test infrastructure that bar requires.
The three test surfaces every EVM implementation has to pass
| Test surface | Where it lives | What it proves |
|---|---|---|
State tests (ethereum/tests) | the canonical, multi-client test corpus | a single transaction takes a pre-state to the right post-state with the right gas cost |
EOF tests (ethereum/tests/EOFTests) | EVM Object Format conformance | the new bytecode container format (validation, sub-containers) accepts/rejects what the spec says |
execution-spec-tests (ethereum/execution-spec-tests) | spec-derived test generator | tests are generated from the spec, so passing them = matching the spec by construction |
Revm runs all three. Passing them is what makes Revm safe to ship as the engine inside Reth, Hyperliquid's HyperEVM, Foundry, Tempo. Without this discipline, every downstream consumer is exposed to consensus bugs.
1. State tests — the standard format
A state test is a JSON file. Open any one in ethereum/tests/GeneralStateTests. The shape:
{
"TestName": {
"env": { "currentNumber": "...", "currentTimestamp": "...", "currentGasLimit": "..." },
"pre": {
"0xAlice": { "balance": "0x..", "nonce": "0x..", "code": "0x..", "storage": {} }
},
"transaction": {
"data": ["0x..."],
"gasLimit": ["0x..."],
"to": "0xBob",
"value": ["0x.."]
},
"post": {
"Cancun": [{
"hash": "0x...post-state-trie-root...",
"logs": "0x...logs-bloom...",
"indexes": { "data": 0, "gas": 0, "value": 0 }
}]
}
}
}
Three sections: pre (account state before), transaction (what to apply), post (the state-root + logs hash that should result, per fork). The runner builds the pre-state, executes the tx, hashes the post-state, and compares against post.Cancun[].hash. Match → pass. Diverge → bug.
🔍 Find in repo. In
bluealloy/revm, search forstatetest(likely underbins/revme/or a similar runner crate). Note that the runner is a separate binary — you can run it locally against the upstream test suite. The same suite Geth, Erigon, Nethermind, and Besu run.
2. EOF tests — validation conformance
EOF (EVM Object Format, EIP-3540 family) introduces a new bytecode container with sections, type signatures, and structural validation. Unlike legacy bytecode (anything-goes), EOF must be parsed and validated before execution. The validator is consensus-critical: accepting a malformed container or rejecting a valid one is a chain split.
EOF tests are JSON like state tests, but the assertion is just "this bytecode validates" or "this bytecode is rejected with this error code":
{
"EmptyContainer": {
"code": "0x",
"results": { "Cancun": { "exception": "EOFException.MISSING_HEADER" } }
}
}
Hundreds of these cover edge cases: misaligned section sizes, invalid type sections, unreachable code, jump tables that escape sub-containers. The Revm validator is run against all of them on every CI run.
3. execution-spec-tests — tests generated from the spec
The most powerful tier. ethereum/execution-spec-tests is a Python framework where you write test scenarios in spec-aware DSL, and the framework generates concrete state tests across all forks:
@pytest.mark.valid_from("Cancun")
def test_my_opcode(state_test, fork):
pre = { Address(0x1000): Account(code=Op.MY_NEW_OPCODE + Op.STOP) }
tx = Transaction(to=Address(0x1000), gas_limit=100_000)
post = { Address(0x1000): Account(storage={0: 1}) } # opcode wrote 1 to slot 0
state_test(env=Environment(), pre=pre, post=post, tx=tx)
The framework runs this test against every fork that defined MY_NEW_OPCODE — automatically generating the right pre-state, gas costs, and post-state hashes from the spec. Passing execution-spec-tests means matching the spec by construction, not "we wrote some tests and they happen to agree."
This is how new EIPs get coverage before they ship. Every new opcode, every new precompile, every gas-rule change comes with execution-spec-tests; client implementations (Geth, Erigon, Revm-based clients) run them and report compatibility before mainnet activation.
What this teaches the Revm consumer
You will not (usually) write state tests yourself — they're written upstream and you consume them. But the patterns matter for any code you write that re-implements EVM behavior:
- Pre-state → tx → post-state is the universal shape of "I claim to execute the EVM correctly." Use it whenever you build something that processes transactions (Foundry cheatcode, custom precompile, ExEx that re-executes).
- Differential against a non-Revm reference is the "I'm not the spec, but I match it" pattern. The Building tier's Validate Your Revm Simulation Against a Production Provider lesson is exactly this discipline applied at the application layer.
- Generated tests ≥ hand-written when the spec is authoritative. If you build something with formal semantics (a custom CFMM, a sponsorship policy), generating tests from the semantics catches the bugs hand-written tests miss.
All three would catch it, but at different latencies:
- execution-spec-tests would catch it first — the spec change generates new tests automatically; CI runs them on the EIP draft branch before activation. Issue filed before the EIP merges.
- State tests catch it next — once the spec is finalized, the Ethereum tests team produces canonical state tests for the new behavior. CI catches divergence before mainnet activation. Issue filed during fork rollout.
- EOF tests would not catch this specific bug (gas cost is opcode behavior, not container validation). EOF tests catch a different class of bug — structural validation, not execution semantics.
The lesson: the three test surfaces are not redundant — they partition the consensus-correctness space.
Drill
- Run the state-test runner locally. Clone
bluealloy/revmand find thestatetestbinary (often underbins/revme/). Cloneethereum/tests. Run the runner against a small subset (e.g.,GeneralStateTests/stArgsZeroOneBalance/). Verify all pass. 30 minutes. - Read one state test JSON end-to-end. Pick a single test (e.g., one of the
stArgsZeroOneBalancefiles). Map every field in the JSON to where the Revm runner uses it. Trace one full execution mentally. 30 minutes. - Read one execution-spec-test source. Pick any test in
execution-spec-tests/tests/. Identify the DSL operators (Op.X,Account(...),Transaction(...)) and read the docs to understand how they generate concrete state tests. 45 minutes. - Find a Revm-resolved consensus issue. Search
bluealloy/revmclosed issues for one that was caught by state tests. Read the bug, the fix, and the regression test added. This is what consensus correctness costs in practice. 45 minutes.
After drill 4 you've seen the full feedback loop: spec changes → tests generated → Revm fails → Revm fixed → regression test added → consensus protected.
📺 Further reading
- Ethereum tests README — the canonical state-test corpus
- execution-spec-tests docs — the test-generation framework
- EIP-3540 — EOF v1 — the format whose validator the EOF tests cover
Expert continuation
This lesson covers state-test discipline as a property of revm itself. Two Expert lessons extend it at the systems level:
- Differential fuzzing & execution-spec-tests — random-sequence diff testing across multiple EVM implementations
- Systems-code auditing — how to read a Reth/revm patch the way an auditor does
Summary (3 lines)
- 3-layer testing: state tests (1000+ JSON pre/post-state) + EOF tests (bytecode validation) + execution-spec (semantics).
- Run via
cargo test --test ethereum. CI runs full suite on every PR. - Cross-client compatible (geth / nethermind / besu / reth all run same state tests). Non-optional given revm's downstream reach.