Lesson 9 — Building the Database trait — read API

Question

Database is revm's state-providing abstraction. Revm doesn't hold state; it asks via the Database trait. Build it step by step.

Principle (minimum model)

Step 0 — naive. Hardcoded HashMap<Address, Account>. Works for tests; useless in production (no MDBX integration).
Step 1 — Database trait. 4 read methods: basic(addr) / code_by_hash(hash) / storage(addr, key) / block_hash(number). Revm calls these during execution.
Step 2 — error association. type Error: Debug. Different backends have different error types.
Step 3 — &mut self (not &self). Revm may mutate the DB during execution; explicit mut. Trade-off: harder to share via Arc.
Step 4 — Send + Sync bounds. Required for parallel execution and cross-task DB access.
Step 5 — auto_impl(&mut, Box). Two wrappers; not 5 like Provider because &mut self methods.

Worked example + steps

Building the `Database` trait — read API

When the EVM hits an SLOAD, where does the value come from? Not from Revm — Revm is the execution engine and doesn't own state. The answer comes through a trait called Database, and implementing that trait is how you connect Revm to anything: an in-memory map for tests, a remote JSON-RPC node to fork mainnet, MDBX for a real Reth client, a network of shards for an exotic L1. Same four-method shape, four wildly different backends.

This lesson builds that trait up from the simplest possible sketch. By the end you'll have built every piece of:

#[auto_impl(&mut, Box)]
pub trait Database {
    type Error: DBErrorMarker;
    fn basic(&mut self, address: Address) -> Result<Option<AccountInfo>, Self::Error>;
    fn code_by_hash(&mut self, code_hash: B256) -> Result<Bytecode, Self::Error>;
    fn storage(&mut self, address: Address, index: StorageKey)
        -> Result<StorageValue, Self::Error>;
    fn block_hash(&mut self, number: u64) -> Result<B256, Self::Error>;
}

📂 Open bluealloy/revm in another tab. Cross-check at every step.

Step 0 — The naive Revm: state baked in

Without thinking, you'd write Revm with state owned internally:

pub struct Revm {
    stack: Vec<U256>,
    storage: HashMap<(Address, U256), U256>,
    accounts: HashMap<Address, AccountInfo>,
    // ...
}

The interpreter calls self.storage.get(...) directly. Simple. Works for a toy.

The three:

Forked mainnet. State lives on a remote RPC, not in your HashMap.
MDBX-backed production. A real Reth node uses on-disk MDBX, not in-memory maps.
Custom schemas. Your app-chain might want a sparse Merkle store, a network of remote shards, or anything else.

Each requires different code to fetch state. You don't want to fork Revm three ways.

Step 1 — Push state behind a trait

The fix is the classic dependency inversion move (instead of Revm depending on a concrete storage implementation, both Revm and any storage backend depend on an abstract trait — the dependency arrow gets inverted). Don't make Revm own storage; make it ask for what it needs through an interface. Define a trait that describes what Revm needs from state, without owning the storage:

pub trait Database {
    fn storage(&mut self, address: Address, key: U256) -> U256;
    fn balance(&mut self, address: Address) -> U256;
    fn code(&mut self, address: Address) -> Vec<u8>;
    fn block_hash(&mut self, number: u64) -> B256;
}

Now the interpreter takes db: &mut dyn Database instead of owning storage. Anyone can implement the trait — your forked-mainnet impl, your MDBX impl, your in-memory impl all fit the same socket.

Caching. A real implementation (forked mainnet, RPC-backed) wants to cache reads — first call to storage(addr, key) hits the network; subsequent calls return from a local cache. Cache mutation requires &mut self. &self would force every impl to wrap its cache in RwLock or RefCell — fine sometimes, but a tax overall. Default to &mut. (Lesson 2 covers the &self case via a companion trait.)

Step 2 — Group the methods correctly

Look at the naive trait: balance and code both ask about an account, but they're separate methods. Are they really independent?

In practice, you almost always want both. Networked impls especially — you don't want two RPC round-trips for the same account. Better: one method that returns both, and let the impl decide how to fetch them.

fn basic(&mut self, address: Address) -> Result<Option<AccountInfo>, Self::Error>;

AccountInfo bundles balance, nonce, and code hash. One round-trip, three pieces of data. The Option lets the impl signal "no such account" cleanly — useful for EXTCODEHASH, which has special semantics for unknown accounts.

Code stays separate, addressed by hash:

fn code_by_hash(&mut self, code_hash: B256) -> Result<Bytecode, Self::Error>;

Because contract code is content-addressed. A given bytecode (a popular DEX router, say) is shared across many addresses — caching by hash dedupes automatically. basic returns just the hash; code_by_hash materializes the bytes only if you actually need to execute. Lazy load with content addressing.

Step 3 — Add `Result` and an associated `Error` type

Networked impls fail. RPC times out, MDBX returns a stale lock, an Arc gets poisoned. Every method must be allowed to fail.

fn basic(&mut self, ...) -> Result<Option<AccountInfo>, Self::Error>;

But Self::Error — why an associated type instead of a fixed enum?

Because revm cannot know what your errors look like. RPC errors, disk I/O errors, lock poisoning — all different shapes. A fixed enum would either be too narrow (and force every impl to flatten its real errors) or too wide (and force revm to handle 50 variants).

type Error: DBErrorMarker;

DBErrorMarker is a vacuous bound (auto-implemented for any sensible type). Its purpose: document intent ("this is the kind of error a database can produce") and give revm a hook to add bounds later (e.g. Send, Sync) without breaking impls.

You'd have to flatten reqwest::Error, serde_json::Error, network timeouts, and parse errors into the closed enum's variants — and every new failure mode would require a PR against revm. The associated type lets your error stay yours.

Step 4 — `#[auto_impl(&mut, Box)]`

Without this attribute, you'd write the same forwarding code by hand:

impl<T: Database> Database for &mut T {
    type Error = T::Error;
    fn basic(&mut self, addr: Address) -> Result<Option<AccountInfo>, T::Error> {
        (**self).basic(addr)
    }
    // ... 3 more methods, all the same pattern
}
impl<T: Database> Database for Box<T> { /* ... same 4 methods ... */ }

Eight method bodies of identical forwarding boilerplate just for Database (4 methods × &mut and Box) — and the same pattern repeats across DatabaseRef and DatabaseCommit.

auto_impl is a procedural macro that generates these forwarding impls automatically. With #[auto_impl(&mut, Box)], both &mut MyDb and Box<MyDb> automatically implement Database if MyDb does. No user-written boilerplate.

You can't — at least not directly. Arc<T> only gives you &T, not &mut T. Since Database's methods take &mut self, Arc<MyDb> cannot implement Database. This forces a design split that the next lesson resolves: revm has a companion read-only trait (DatabaseRef) for exactly the Arc case.

What you've built

#[auto_impl(&mut, Box)]
pub trait Database {
    type Error: DBErrorMarker;
    fn basic(&mut self, address: Address) -> Result<Option<AccountInfo>, Self::Error>;
    fn code_by_hash(&mut self, code_hash: B256) -> Result<Bytecode, Self::Error>;
    fn storage(&mut self, address: Address, index: StorageKey)
        -> Result<StorageValue, Self::Error>;
    fn block_hash(&mut self, number: u64) -> Result<B256, Self::Error>;
}

Every piece earned its keep:

&mut self (Step 1) — caching without RefCell/RwLock overhead
basic returning AccountInfo (Step 2) — one round-trip per account
code_by_hash (Step 2) — content-addressed, deduped across contracts
type Error: DBErrorMarker (Step 3) — open error taxonomy, marker bound
#[auto_impl(&mut, Box)] (Step 4) — automatic forwarding

The next lesson covers what auto_impl can't do (Arc), how revm splits read from write, and the three real impls that show how the same trait scales from 50 lines to thousands.

Recall before moving on

Without scrolling:

Why does Database use &mut self and not &self?
What's the difference between basic and code_by_hash, and why split them?
Why is Error an associated type instead of a fixed enum?
What does #[auto_impl(&mut, Box)] save you from writing?

If any answer is shaky, scroll back. Next lesson: the read/write split.

🛣️ The road not taken (Solana): Solana's Database-equivalent has a different shape. Solana state is a flat map of accounts, each storing its own data blob — not a trie of slots within contracts. So Solana's "database trait" is account-keyed, with no storage(address, key) method: storage isn't an indirection layer, just an account field. The four-method shape you just built — basic / code_by_hash / storage / block_hash — is the trait-level fingerprint of EVM's "trie + per-contract storage" design choice. Different state models lead to different decoupling seams.

🧭 Where you are now in the stack: you've built the VM–DB seam (read API) — 4 methods + associated Error + auto_impl. The same Revm now runs against an in-memory map, a remote JSON-RPC, MDBX, or a shard network without any of them touching the VM. Next lesson opens the read/write split — the design decision that lets Arc work and decouples eager vs lazy fetch.

Summary (3 lines)

6-step buildup: hardcoded HashMap → 4 read methods → error type → &mut self → Send + Sync → auto_impl(&mut, Box).
4 methods: basic + code_by_hash + storage + block_hash. Each fetches one piece of state.
auto_impl(&mut, Box) — 2 wrappers (not 5 like Provider). Next: companion traits.