Lesson 15 — Systems-code auditing — finding bugs in Reth / Revm / consensus impls
Question
Auditing Rust systems code = reading for bugs. Specific patterns: race conditions, type-system bypasses, gas misaccounting, consensus divergence.
Principle (minimum model)
- Read for patterns, not lines. Bug-prone patterns: unsafe blocks, lock holding across await, arithmetic without saturating, panic in hot path.
- Type-system bypasses.
transmute,from_raw_parts, manualSend/Syncimpls. Each is a known bug source. - Concurrency bugs. Hold
std::sync::Mutexacross.await= deadlock; missed wakeups; race conditions in shared state. - Consensus bugs. Diverging behaviour between Reth + geth = consensus bug. Differential fuzzing catches most.
- Gas bugs. Misaccounting → DoS or chain-fork. Audit every gas computation path.
- Documentation as audit signal. Code without "why" comments is suspect; assumptions hidden = future bugs.
- Tools. Rust analyzer + cargo clippy + cargo audit + miri (UB detection). Each catches different bug classes.
- Production audit process. External auditor + bug bounty + continuous internal review. Layered.
Worked example + steps
Systems-code auditing — finding bugs in Reth / Revm / consensus impls
📌 Scope honesty. This is systems-code auditing — Reth / Revm / Rust consensus impls. Not smart-contract auditing (Solidity bugs, EVM exploits at the contract layer). The latter is well-covered elsewhere (Trail of Bits, Code4rena, Spearbit material); RethLab has no unique angle on contract auditing. Systems-code auditing is the angle nobody else covers, and where RethLab's source-first thesis pays off.
You've shipped the differential fuzz harness. You've shipped chaos drills. The code passes both. Are you done?
No. Both disciplines exercise the code by running it. Audit catches what running doesn't surface — code paths that haven't been exercised yet, invariants that work today but break under future modifications, trust assumptions that aren't validated. Reading is its own discipline.
1. Why systems-code auditing is different from smart-contract auditing
Smart-contract auditing has a well-known taxonomy: integer overflow, reentrancy, access-control bugs, oracle manipulation, flash-loan exploits. The bug-class library is finite and well-cataloged. The unit of bug is at the Solidity / EVM contract level.
Systems-code auditing has a different taxonomy:
- Race conditions (Tokio task interleavings that produce wrong state)
- State corruption windows (non-atomic writes that interrupt in the wrong place)
- Consensus invariant violations (safety/liveness assumptions silently broken)
unsafeblock correctness (everyunsafeis a soundness boundary)- Trust-boundary leaks (P2P peer trust, RPC auth bypass, signer trust)
Different bug shapes, different mental models, different tools. The auditor of a Solidity contract and the auditor of a Reth fork are doing different jobs, even though both are called "auditing."
This lesson covers the systems-code side. It's the audit that matters for a Tempo / OP / Hyperliquid fork.
2. The 5 bug classes in the Rust EVM stack
2.1 State corruption windows
A "state corruption window" is a code path where state is partially updated when an unexpected interruption happens (process crash, MDBX write failure, panic in a downstream call). If the partial update isn't rolled back, the on-disk state becomes inconsistent.
The audit question: for every state mutation, what happens if execution stops mid-way?
Common patterns to look for:
- Multi-step writes without a transaction wrapper
- "Save then return" sequences where the save can fail silently
- Caches updated before the underlying store is committed
- Indexes updated separately from the data they index
Reth's stage commit logic is the canonical example to audit. Every stage's execute should be paired with an unwind that perfectly undoes whatever execute did. The auditor reads both and asks: is there any state mutation in execute that unwind doesn't undo? If yes, that's a corruption window.
2.2 Concurrency bugs
Rust's type system prevents data races at compile time. It does not prevent logic races — situations where multiple Tokio tasks interleave in ways that produce wrong outputs.
The audit question: for every shared state, what's the contract on who modifies it and when?
Common patterns to look for:
Arc<Mutex<T>>held acrossawait(deadlock risk; sometimes correctness risk)- Multiple tasks reading-then-writing the same
Arc<AtomicU64>(TOCTOU) - Channel receivers that assume sender ordering preserves causality
tokio::spawnof a task that captures a stale snapshot of state
Tools that help: loom (concurrency permutation testing), miri (UB detection under multi-threaded execution).
2.3 Consensus invariant violations
Every consensus protocol has explicit invariants (no two finalized blocks at the same height) and implicit ones (proposer rotation produces fair distribution, validator votes can't be replayed). When you fork or customize a consensus impl, these invariants are easy to silently violate.
The audit question: for every consensus-relevant code path, which invariant does it touch, and does this code path preserve it?
Common patterns to look for:
- Vote processing that doesn't deduplicate by
(validator, slot)— replay attack - Fork-choice code that assumes monotonic timestamps — fails on clock drift
- Finality logic that doesn't check the 2f+1 quorum strictly — accepts under-quorum
- Slashing-evidence handling that doesn't verify the signed-by-validator condition — false-positive slashing
Auditing a HotStuff or Tendermint impl is heavy work — you need the protocol paper open in one window, the code in another, and a spreadsheet of "invariant X is preserved by code path Y."
2.4 unsafe block correctness
Every unsafe block in Rust is a security boundary. It opts out of the borrow checker's guarantees and asserts that the programmer manually maintained the safety invariants the compiler would have otherwise enforced.
The audit question (3-part, for every unsafe):
- What invariant is this
unsafeblock relying on? - What conditions, present or future, could violate that invariant?
- How is the invariant verified — tests, types, code-comment proof?
If any answer is "I'm not sure," the unsafe block is an audit finding.
Real example: Revm's stack operations sometimes use unsafe get_unchecked for performance. The invariant relied on is "stack depth was verified before this call." The condition that could violate it: a refactor that splits stack-depth verification from stack-access. The verification: tests that exercise underflow scenarios.
Tools: cargo geiger counts unsafe blocks per crate. The audit isn't about reducing the count to zero — it's about ensuring each one has a clear, documented invariant.
2.5 Trust-boundary leaks
Every external input to a node is a trust boundary. The auditor maps every boundary and asks: what's validated at the boundary, and what's silently trusted?
The four major trust boundaries in a Reth node:
- RPC — clients can submit arbitrary requests. Auth, rate-limiting, payload validation should all be at this boundary.
- P2P — peers send blocks, transactions, state-trie nodes. Each must be validated before being trusted.
- CLI / config — node operator's config file. Less adversarial but still a boundary (typos in genesis hash, etc.).
- Engine API — consensus client supplies block-execution requests. Should be validated against the consensus rules.
Common bugs at each boundary:
- RPC: missing auth on a privileged method (e.g.,
admin_addPeer) - P2P: accepting a peer's claim about state without verifying against the state root
- CLI: not validating that a chainspec is internally consistent
- Engine API: accepting a block that violates a hardfork rule the EL doesn't yet know about
3. Reading a Reth PR — a worked example
Auditing isn't always a separate review session. Reading every PR that touches consensus-affecting code IS auditing, if you read it with the right questions in mind.
Worked example: imagine a Reth PR that refactors stage commit logic to add a new optimization — batch commit groups of stages together rather than one at a time.
The questions to ask while reading:
- What was the old
execute → commit → unwindflow? What's the new one? - For each new "batch commit," what state does it touch?
- What happens if the batch commit fails halfway? Specifically: did the new code introduce a state corruption window?
- Did
unwindget updated to handle the batched case? If not, that's the bug. - Are there tests for "execute partial-batch then crash then restart"? If not, the test gap is itself a finding.
This is what reading a PR for bugs looks like. The 5 questions are the same shape every time — just applied to whatever code area the PR changes.
4. Auditing a consensus impl against its invariants
A consensus implementation is auditable in a structured way: list the protocol's invariants, then for each invariant trace through code paths that affect it.
Example: HotStuff has the safety invariant "a correct replica will not vote for two conflicting blocks at the same height." The audit:
- Find every code path where the replica produces a vote. Usually 1–3 places in the codebase.
- For each, check what state is consulted before voting. Specifically: is the validator's local "last voted block at height N" checked?
- What persists this state across restarts? If it's in-memory only, a restart-induced double-vote is a bug.
- Is the check atomic with the vote emission? If there's a window between "check" and "send vote," a concurrent code path could vote twice.
Repeat for every invariant the protocol specifies. Tedious but mechanical.
5. Tools for the systems-code auditor
A small toolbox carries you a long way:
| Tool | What it does | When to use |
|---|---|---|
cargo audit | Checks for known CVEs in dependencies | Run on every CI; baseline hygiene |
cargo geiger | Counts unsafe blocks per crate | Use to scope your audit — which crates need the most unsafe review? |
kani | Model checker for Rust | Use on small unsafe blocks or critical functions; doesn't scale to whole programs |
loom | Concurrency permutation testing | Use on Arc<Mutex<T>>-heavy code paths; finds race conditions deterministically |
miri | UB detection at runtime | Run a subset of your tests under miri to detect undefined behavior in unsafe |
cargo clippy -- -W clippy::all | Lint-based bug finding | Baseline; catches common mistakes |
| Manual review checklist | Apply the 5 bug classes above systematically | Always |
None of these tools find bugs that the reviewer doesn't think to look for. Tools amplify a careful auditor, they don't replace one.
6. The auditor's deliverable
A systems-code audit produces a report. The industry-standard structure (used by Trail of Bits, Sigma Prime, OpenZeppelin, ConsenSys Diligence, Spearbit):
For each finding:
- Severity (Critical / High / Medium / Low / Informational)
- Title (one-line summary)
- Location (file and line numbers)
- Description (what the bug is, in 2–3 sentences)
- Exploit / consequence (what could happen if this bug is triggered)
- Recommendation (what to change, specifically)
- Status (Open / Acknowledged / Fixed)
For the report as a whole:
- Executive summary (1 page; what was audited, scope, top-line findings)
- Methodology (how the audit was conducted)
- Findings (the list above)
- Out-of-scope items (what was deliberately not audited and why)
Good audit reports are public. Read existing audits of Reth / Revm / Foundry / other Rust EVM components from Sigma Prime, OpenZeppelin, Spearbit — they're freely available, and the format is the same across firms.
7. The reliability triangle, revisited
Three reliability disciplines, each catching a different bug class:
| Discipline | Catches | Misses |
|---|---|---|
| Differential fuzzing | Wrong answer under valid input | Failure modes; latent design bugs |
| Chaos engineering | Right answer ceases under perturbed conditions | Bugs in code paths never injected; latent bugs |
| Systems-code auditing | Latent design bugs in code paths not yet exercised | Bugs that need specific runtime triggers; unknown unknowns |
The three together are the reliability bar a serious L1 team holds itself to. Ship none, and you ship known-broken code. Ship one or two, you ship code with known gaps. Ship all three, and you've earned the right to call your fork "production-grade."
Recall
Without scrolling:
- Smart-contract auditing and systems-code auditing are different jobs. Name three bug classes each catches that the other doesn't.
- For every
unsafeblock, you should ask 3 questions. What are they? - Reth's stage
executeandunwindshould be perfectly symmetric. What's the audit question this symmetry implies? - Loom and Miri serve different purposes. When do you reach for each?
- Why is "differential fuzzing + chaos engineering + auditing" the reliability bar, rather than any one or two of them?
If any answer is shaky, re-read the section.
📂 Reference audits worth reading
- Sigma Prime — public audits — the closest industry reference for systems-code audit format
- Trail of Bits — publications — many Rust audits; good examples
- OpenZeppelin — security audits — strong on consensus invariant analysis
- ConsenSys Diligence — audit reports — broad coverage including infrastructure
- Spearbit — audit portfolio — Rust / Solidity / consensus
🧭 Where you are now in the stack: the reliability triangle is complete. You have differential fuzzing (correctness), chaos engineering (resilience), and systems-code auditing (latent bugs). These three together — paired with the SE substrate (DB / VM / network / concurrency) and the 4 forces (adversarial / verifiable / ordered / live-migrating) — are the skill set that distinguishes "I can write Rust EVM code" from "I can ship Rust EVM code at a Hyperliquid / Tempo / OP-stack quality bar." The Building tier is where you apply all of this to real apps.
Summary (3 lines)
- Systems-code auditing = reading for bugs. Patterns: unsafe blocks, lock-across-await, arithmetic-without-saturating, panic-in-hotpath.
- Type-system bypasses (transmute, manual Send/Sync), concurrency (deadlock, race), consensus (divergence), gas (misaccounting).
- Tools: rust-analyzer + clippy + cargo audit + miri. Production: external auditor + bug bounty + continuous internal review.