Lesson 8 — 3 proptest invariants: 768 random scenarios

Question

Three proptest invariants — conservation, FIFO, no-crossed-book — each run on 256 random scenarios = 768 total. Universalises the unit tests to "holds for all valid inputs".

Principle (minimum model)

Invariant 1: conservation. Σ fill_sizes + remaining == initial_size. Generate random initial book + random taker; submit; assert.
Invariant 2: FIFO. At each price level, oldest order matches first. Generate orders with explicit sequence numbers; submit; assert oldest filled before newer.
Invariant 3: no-crossed-book. After every operation, best_bid < best_ask. Generate random sequence of submits + cancels; after each step, assert.
Proptest input generation. prop_assume!(size > 0) + vec(any_order(), 1..50) + price range. Filters invalid inputs.
256 iterations per invariant. Production CIs run 1000-10000 per invariant.
Shrinking on failure. When proptest finds a failure, it shrinks to the minimal counterexample. Critical for debugging — shows the smallest input that triggers the bug.
Why these three. Conservation = no money leaks; FIFO = fair matching; no-crossed-book = sane state. Together they cover the main classes of bug.

Worked example + steps

Lesson 8 — 3 proptest invariants: 768 random scenarios

Goal

Concepts you'll grasp in this lesson:

Determinism is the load-bearing property for consensus chains — a correct-but-non-deterministic matching engine breaks consensus (validators replay the same actions, see different fills, fail to agree). A deterministic-but-incorrect engine is repairable; a non-deterministic one is not. The determinism invariant protects the chain's safety.
Property tests find bugs in scenarios you didn't think of — 9 hand-traced tests cover what you anticipated. 256 cases × 3 properties = 768 random sequences cover the long tail (e.g. "submit 17 limits, then market against an empty side"). Shrinking automatically reduces a failing 25-action sequence to the minimal counterexample.
Conservation, safety, replayability as the three orthogonal invariants — qty_conservation (no quantity invented or lost), no_crossed_book (best_bid < best_ask always), determinism (same input → same output). These are the universal CLOB invariants any matching engine must satisfy.
proptest belongs in [dev-dependencies], not [dependencies] — property tests run during cargo test, never in production. Putting it in [dependencies] would force every openhl-clob consumer to compile proptest.
The Action enum is a simplified intermediate for generators — proptest combinators work most easily with primitive types; the strategies emit raw u64s, and the test body wraps them in newtypes before calling submit. Newtype discipline holds at the API boundary, not inside the generator.

Verification:

cargo test -p openhl-clob

…passes 12 tests (9 unit + 3 proptest invariants), with each proptest running 256 cases each = 768 random scenarios.

Specific changes:

One new dev-dep — proptest = { workspace = true } in crates/clob/Cargo.toml.
A new #[cfg(test)] mod prop_tests block at the bottom of book.rs, containing:
- Action enum — a simplified action representation for property generators.
- 3 generator strategies — arb_side, arb_action, arb_actions — that produce random valid action sequences.
- 3 proptest! blocks — qty_conservation, no_crossed_book, determinism.

After Lesson 8, the matching engine has property-level proof that its invariants hold across many random orderings — not just the 9 hand-traced scenarios from Lesson 7.

Recap

After Lesson 7:

9 hand-traced unit tests pass.
Each tests a specific invariant on a specific scenario.

What Lesson 7 doesn't test: random sequences. If a bug exists that only triggers when (e.g.) you submit 17 limits, cancel 3 of them, and then submit a Market, Lesson 7's 9 tests will likely miss it. You'd need to either think of that specific scenario yourself (hard — bugs hide in places you don't think to test) or test many scenarios automatically. Lesson 8 does the latter.

Plan

Three things:

Add proptest as a dev-dependency to crates/clob/Cargo.toml. proptest is already a workspace dep (used by consensus for proposer-election tests in the existing rethlab L1 Architect tier); we just need to declare we use it.
Add a new mod prop_tests block below the existing mod tests in book.rs. The new module contains:
- Action enum (subset of operations the property tests will exercise — for now, just SubmitLimit + SubmitMarket; cancel will arrive in a follow-up).
- Generator strategies for random Action sequences.
- 3 proptest! blocks — one per invariant.
Run cargo test -p openhl-clob — 12 tests pass (9 unit + 3 props).

The 3 invariants:

qty_conservation: total quantity entering the book equals total filled + total resting (the "money math is conserved" property).
no_crossed_book: best_bid < best_ask always holds — the safety property test 9 hand-traced, now random-tested.
determinism: the same action sequence produces the same fills + same book state, every time. This is the replayability property the chain's safety relies on.

If proptest finds a counterexample for any of them, it automatically shrinks the failing input to the smallest sequence that still fails. That's the load-bearing benefit of properties over examples.

(Answer: qty_conservation would catch it indirectly — over enough cases, the wrong walk order produces wrong totals when the matched price differs from what the hand math expects. no_crossed_book would catch it directly: a buy that doesn't take the cheapest ask first leaves a cheaper ask on the book, and the next bid that's above that ask creates a cross. determinism would catch it on every run because each run picks a different "random" walk order, so two runs of the same input produce different fills. determinism is the load-bearing property for consensus chains — without it, validators won't agree.)

Walk-through

Step 1: Add `proptest` to `crates/clob/Cargo.toml`

Open crates/clob/Cargo.toml. The current state is:

[package]
name         = "openhl-clob"
# ... shared package fields ...

[dependencies]

[lints]
workspace = true

Add a [dev-dependencies] section:

[package]
name         = "openhl-clob"
# ... shared package fields ...

[dependencies]

[dev-dependencies]
proptest = { workspace = true }

[lints]
workspace = true

proptest is already declared in the workspace Cargo.toml (you don't need to add it there — it's been a workspace dep since L1 Architect's first courses). The [dev-dependencies] block makes it available only when building tests, not when building production code.

Step 2: Set up `mod prop_tests` with the `Action` enum

In crates/clob/src/book.rs, after the existing mod tests { ... } block (still at module scope), add:

#[cfg(test)]
mod prop_tests {
    use super::*;
    use proptest::prelude::*;

    /// A simplified action enum for property-based testing.
    #[derive(Clone, Debug)]
    enum Action {
        SubmitLimit {
            id: u64,
            account: u64,
            side: Side,
            price: u64,
            qty: u64,
        },
        SubmitMarket {
            id: u64,
            account: u64,
            side: Side,
            qty: u64,
        },
    }

The Action enum is a simplified representation of what proptest will randomly generate. Each variant carries the raw u64s that a real call to Book::submit would need (wrapped in newtypes later). Two variants for now — Limit and Market submits. Cancel actions aren't here yet; we add them in a follow-up stage of openhl.

Why model actions as an enum? Because property tests need to generate sequences of actions, and each action can be one of N kinds. The enum captures that variability. proptest's strategy combinators (prop_oneof!, prop::collection::vec, etc.) work well with enums.

Step 3: Write the strategies

Continue inside mod prop_tests:

    fn arb_side() -> impl Strategy<Value = Side> {
        prop_oneof![Just(Side::Buy), Just(Side::Sell)]
    }

    fn arb_action(id: u64) -> impl Strategy<Value = Action> {
        let limit_action = (1u64..=200, 1u64..=20, arb_side(), 50u64..=150)
            .prop_map(move |(account, qty, side, price)| Action::SubmitLimit {
                id,
                account,
                side,
                price,
                qty,
            });
        let market_action = (1u64..=200, 1u64..=20, arb_side()).prop_map(
            move |(account, qty, side)| Action::SubmitMarket {
                id,
                account,
                side,
                qty,
            },
        );
        prop_oneof![3 => limit_action, 1 => market_action]
    }

    fn arb_actions() -> impl Strategy<Value = Vec<Action>> {
        prop::collection::vec(0u64..1000, 1..30)
            .prop_flat_map(|ids| {
                ids.into_iter()
                    .enumerate()
                    .map(|(i, _)| arb_action(i as u64 + 1))
                    .collect::<Vec<_>>()
            })
    }

Three strategies, building up:

arb_side() — uniformly picks Buy or Sell. prop_oneof![Just(...), Just(...)] is proptest's "one of these literals" combinator.
arb_action(id) — generates a random Action with a fixed id. The Limit branch generates (account, qty, side, price) in ranges; the Market branch generates (account, qty, side). Weights: 3 => limit_action, 1 => market_action — Limit actions happen 3× as often as Market, reflecting realistic order-book usage.
arb_actions() — generates a random Vec<Action> of length 1..30. The .prop_flat_map pattern is a bit unusual: it first calls prop::collection::vec(0u64..1000, 1..30) to produce a vec of u64s, of length 1..30, with values in 0..1000. The u64 values themselves are dummies — the .enumerate() inside .prop_flat_map overwrites them with indices immediately. The 0u64..1000 range is there only to give the vector a shape; the actual values are not used. Then each position is mapped to an arb_action(i+1) so order IDs are assigned in increasing order. The trick is that arb_actions produces sequences with strictly-increasing order IDs (avoiding collisions in the book).
Here enumerate() yields i: usize, so i as u64 + 1 is an explicit type bridge into arb_action's u64 ID parameter. Conceptually, it maps generator position to deterministic OrderId(1..) numbering.

Why use ranges (1..=200 for account, 50..=150 for price)? To bias proptest toward generating plausible scenarios. With 0..=u64::MAX ranges, proptest would mostly generate extreme outliers (account_id = 18_446_744_073_709_551_614). Realistic ranges produce scenarios that look like real trading: accounts 1-200, prices 50-150, quantities 1-20. The matching engine's bugs are most likely to hide in normal-looking sequences.

Step 4: The first invariant — `qty_conservation`

Append below the strategies:

    proptest! {
        #![proptest_config(ProptestConfig {
            cases: 256,
            ..ProptestConfig::default()
        })]

        /// Quantity is conserved: every fill_qty came from a resting maker;
        /// total qty in/out balances.
        #[test]
        fn qty_conservation(actions in arb_actions()) {
            let mut book = Book::new();
            let mut total_in = 0u64;
            let mut total_filled = 0u64;
            let mut total_market_unfilled = 0u64;

            for action in actions {
                match action {
                    Action::SubmitLimit { id, account, side, price, qty } => {
                        total_in += qty;
                        let r = book.submit(Order {
                            id: OrderId(id),
                            account: AccountId(account),
                            side,
                            qty: Qty(qty),
                            order_type: OrderType::Limit { price: Price(price) },
                        });
                        total_filled += r.total_filled().0;
                    }
                    Action::SubmitMarket { id, account, side, qty } => {
                        total_in += qty;
                        let r = book.submit(Order {
                            id: OrderId(id),
                            account: AccountId(account),
                            side,
                            qty: Qty(qty),
                            order_type: OrderType::Market,
                        });
                        total_filled += r.total_filled().0;
                        total_market_unfilled += r.remaining_qty.0;
                    }
                }
            }

            // Resting quantity = total_in - 2*total_filled - total_market_unfilled.
            // (Each fill consumes one unit from a maker AND one unit from a taker,
            // so total_filled counts qty, but the qty appeared in total_in twice
            // — once when the maker was submitted, once when the taker arrived.)
            let resting: u64 = book.bids.values()
                .flat_map(|q| q.iter())
                .chain(book.asks.values().flat_map(|q| q.iter()))
                .map(|o| o.qty.0)
                .sum();
            prop_assert_eq!(total_in, 2 * total_filled + total_market_unfilled + resting);
        }

This is the "quantity is conserved" invariant. The body has three counters:

total_in: sum of all qty values from submitted orders.
total_filled: sum of fill_qty across all Fills produced.
total_market_unfilled: sum of remaining_qty from Market orders (the leftover discarded).

The invariant is a conservation law:

total_in = 2 × total_filled + total_market_unfilled + resting_qty

Why 2 ×? Because a fill consumes 1 unit from the maker AND 1 unit from the taker, so 1 unit of fill_qty appears in total_in twice — once when the maker was originally submitted, once when the taker arrived. The math:

Action	`total_in`	What's left at the end
Submit Limit 10 units that fully rest	+10	10 units resting
Submit Market 10 units, no liquidity	+10	10 units discarded (no fill)
Submit Limit 10 units that match a 5-unit ask	+10	5 units filled (one from each), 5 units left over to rest

If 5 units fill, that means: maker offered 5 (already in total_in), taker took 5 (also in total_in). The 5 filled units appear in total_in as 10 — once from each side. That's why 2 * total_filled.

Timeline view of the third row:

[t=0]  Sell Limit(qty=5) arrives → fully rests
         total_in += 5         → total_in = 5
         (no fill; rests on the book)
         Check: total_in(5) = 2×total_filled(0) + market_unfilled(0) + resting(5) ✓

[t=1]  Buy Limit(qty=10) arrives → 5 units cross the maker, 5 units rest
         total_in += 10        → total_in = 15
         total_filled += 5     → total_filled = 5
         resting = 5 (taker remainder; the maker's 5 units are consumed)
         Check: total_in(15) = 2×total_filled(10) + market_unfilled(0) + resting(5) ✓

total_in is just a running counter — "sum the qty of every submitted order in arrival order." The seemingly mysterious 2 * encodes a real accounting fact: one fill contributes to total_in twice — once when the maker was originally submitted, once when the taker arrived.

The #![proptest_config(ProptestConfig { cases: 256, .. })] line at the top of the proptest! block sets each test to run 256 times. With 3 invariants × 256 cases = 768 random scenarios.

The prop_assert_eq! (not assert_eq!) is important — proptest needs to distinguish "test failed" from "test panicked due to a system error." prop_assert_eq! reports the failure to proptest's shrinking machinery, which then tries to find a minimal counterexample.

Step 5: The second invariant — `no_crossed_book`

Below the first proptest, still inside the same proptest! { ... } block:

        /// Book invariant: best bid is strictly less than best ask. The book
        /// should never be crossed after submit() completes.
        #[test]
        fn no_crossed_book(actions in arb_actions()) {
            let mut book = Book::new();
            for action in actions {
                match action {
                    Action::SubmitLimit { id, account, side, price, qty } => {
                        book.submit(Order {
                            id: OrderId(id),
                            account: AccountId(account),
                            side,
                            qty: Qty(qty),
                            order_type: OrderType::Limit { price: Price(price) },
                        });
                    }
                    Action::SubmitMarket { id, account, side, qty } => {
                        book.submit(Order {
                            id: OrderId(id),
                            account: AccountId(account),
                            side,
                            qty: Qty(qty),
                            order_type: OrderType::Market,
                        });
                    }
                }
                if let (Some(b), Some(a)) = (book.best_bid(), book.best_ask()) {
                    prop_assert!(b < a, "book crossed: bid={} ask={}", b.0, a.0);
                }
            }
        }

The body:

For each action, submit the order.
After each submission, check that book.best_bid() < book.best_ask() (if both exist).
If at any point best_bid >= best_ask, the test fails — the book is crossed.

This is the same invariant as Lesson 7's book_does_not_cross_after_match, but tested against random sequences. Lesson 7 proves the invariant holds on one scenario; Lesson 8 proves it holds on 256 randomized scenarios.

The prop_assert!(b < a, "...") macro includes a format string — when proptest fails, the error message shows the actual bid/ask values that crossed. This is more informative than the plain assert!(b < a).

Step 6: The third invariant — `determinism`

The most important one:

        /// Determinism: applying the same action sequence produces the same
        /// book + fill history every time. (The "replayability" property
        /// from the architecture doc — required for consensus determinism.)
        #[test]
        fn determinism(actions in arb_actions()) {
            let run = |actions: &[Action]| {
                let mut book = Book::new();
                let mut all_fills: Vec<Fill> = Vec::new();
                for action in actions {
                    let order = match action {
                        Action::SubmitLimit { id, account, side, price, qty } => Order {
                            id: OrderId(*id),
                            account: AccountId(*account),
                            side: *side,
                            qty: Qty(*qty),
                            order_type: OrderType::Limit { price: Price(*price) },
                        },
                        Action::SubmitMarket { id, account, side, qty } => Order {
                            id: OrderId(*id),
                            account: AccountId(*account),
                            side: *side,
                            qty: Qty(*qty),
                            order_type: OrderType::Market,
                        },
                    };
                    all_fills.extend(book.submit(order).fills);
                }
                (book.best_bid(), book.best_ask(), book.depth_bid(), book.depth_ask(), all_fills)
            };
            prop_assert_eq!(run(&actions), run(&actions));
        }
    }
}

This invariant defines a helper closure run that applies a sequence of actions to a fresh Book and returns a 5-tuple summarizing the end state: (best_bid, best_ask, depth_bid, depth_ask, all_fills_in_order).

Then: prop_assert_eq!(run(&actions), run(&actions)).

Two runs of the same input must produce the same output. If the matching engine has any non-determinism — randomness, HashMap iteration order, threading races — this test will catch it.

Why this is the most important property: a consensus chain relies on every validator computing the same fills from the same inputs. If one validator's matching engine produces different fills than another's, validators can't agree on the block, and the chain forks. Determinism is the load-bearing property — no_crossed_book is about correctness, but determinism is about agreement. A correct-but-nondeterministic engine breaks consensus; a deterministic-but-incorrect engine is at least repairable.

The Action::SubmitLimit { id, account, side, price, qty } destructuring uses *id, *account, etc. because actions is borrowed as &[Action], so each field is a borrowed &u64. The * dereferences to get the value.

Test

cargo test -p openhl-clob

Expected (12 tests):

running 12 tests
test prop_tests::determinism ... ok
test prop_tests::no_crossed_book ... ok
test prop_tests::qty_conservation ... ok
test tests::book_does_not_cross_after_match ... ok
test tests::buy_market_takes_best_ask ... ok
test tests::cancel_removes_resting_order ... ok
test tests::cancel_unknown_returns_false ... ok
test tests::empty_book_has_no_best_prices ... ok
test tests::limit_buy_walks_asks_within_price ... ok
test tests::market_with_insufficient_liquidity_returns_remaining ... ok
test tests::price_time_priority_within_level ... ok
test tests::resting_limit_creates_bid_or_ask ... ok

test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Total runtime: a few seconds. Proptest runs 256 cases per test, each case is a small in-memory matching simulation, so the total cost is well under 10 seconds.

If any prop test fails, you'll see:

proptest: Saving this and future failures in /Users/.../proptest-regressions/...
proptest: If this test was expected to be flaky, ...

Proptest caches the failing input in a file under proptest-regressions/. Subsequent runs will re-test the cached input first, so once you find a bug, fixing it is verified against the same minimal counterexample every time. Add the regressions file to git (it's tiny).

Common errors and fixes:

error: cannot find macro 'proptest' in this scope — your mod prop_tests is missing use proptest::prelude::*;. Re-check Step 2.
error: trait 'Strategy' not satisfied — your generator function's return type isn't impl Strategy<Value = T>. The prop_oneof![Just(...)] returns impl Strategy<Value = T> for the type inside Just; chaining .prop_map(...) may change the value type. Make sure your Strategy<Value = ...> type matches what you actually generate.
prop_assert_eq fails with totals not matching — your total_in accumulator is wrong. Each submit adds the order's qty to total_in, not the fill quantity. Re-check Step 4 — only sum at submit, not at fill.
Determinism fails — you likely introduced a HashMap somewhere, or a time::Instant, or some non-deterministic primitive. Check the recent diff against Lessons 1–7's code; the bug is wherever a non-deterministic primitive was added.

Design reflection

Three load-bearing decisions encoded here:

Proptest is dev-dep, not runtime dep. Property tests run during cargo test, not in production. Putting proptest in [dependencies] would force every consumer of openhl-clob to compile and link proptest. The [dev-dependencies] discipline keeps the production dependency graph clean.
The Action enum is a simplified intermediate. Each variant carries raw u64s, not the OrderId(u64) / AccountId(u64) newtype-wrapped versions. The proptest strategies generate the raw values; the test body wraps them in newtypes before calling submit. This is on purpose — proptest's combinators work most easily with primitive types, and the as u64 ergonomics save us boilerplate. Newtype enforcement happens at the API boundary (calling submit), not inside the test generator.
determinism is the load-bearing property for consensus. A correct-but-non-deterministic matching engine breaks consensus; a deterministic-but-incorrect one is repairable. The test that catches non-determinism is the one that protects the chain's safety. Naming and prioritizing properties by what they protect — not by what they test — is the discipline.

Answer key

cd ~/code/openhl-reference
git checkout 55a9dff
diff -u ~/code/my-openhl/crates/clob/src/book.rs ./crates/clob/src/book.rs
diff -u ~/code/my-openhl/crates/clob/Cargo.toml ./crates/clob/Cargo.toml

After Lesson 8, your book.rs mirrors the reference at 55a9dff (modulo doc comments). Cargo.toml has the [dev-dependencies] proptest line.

Return:

git checkout main

Common questions

Q: Why cases: 256 and not 1024 or 100? A balance. 256 cases × 3 properties × ~10ms per case ≈ 8 seconds total — fast enough to run on every cargo test. 1024 cases would push it to 30+ seconds, becoming a friction in dev iteration. 100 cases would risk missing rare bugs. Pick a case count that's small enough to run cheaply but large enough to catch common bugs.

Q: Why no cancel in the proptest actions? Because cancel actions complicate the determinism + conservation properties. On the conservation side it looks easy — just add + total_canceled_qty to the right-hand side: total_in = 2 * total_filled + total_market_unfilled + resting_qty + total_canceled_qty. The real difficulty is on the generator side: to avoid producing "double-cancel of an already-filled-or-canceled order id" sequences, the strategy needs to statefully track which order IDs are still live — that's exactly the boundary Lesson 7's Test 8 (cancel_unknown_returns_false) covers, but in proptest it becomes the generator's responsibility. Stateful strategies break out of proptest's pure-combinator pattern, so the generator code grows quickly. The simplification "submit-only sequences" makes the 3 invariants tractable. Adding cancel-aware properties is a follow-up — get the foundation clean first, then layer on top. The existing 3 invariants are the highest-value ones to get right first.

Q: What happens when proptest finds a failing input? It enters the shrinking phase. Starting from the failing input, proptest tries to find the smallest subset / smallest values that still fail. For our test case generators (which produce Vec<Action>), shrinking might reduce a 25-action sequence to a 3-action sequence that still reproduces the bug. The minimal sequence is what you debug against — much easier than the original input.

Q: Can I make arb_actions produce only Limit orders? Yes — change arb_action's prop_oneof![3 => limit_action, 1 => market_action] to prop_oneof![1 => limit_action] (or just return limit_action directly without the prop_oneof). For the invariants we have, Market orders are useful (they exercise the discard-remainder path), but if you want to focus testing on Limit-only flows, you can. proptest strategies are composable.

Next lesson (Lesson 9)

The matching engine is fully tested. It's not yet integrated with consensus. Lesson 9 starts Module 4 (Bridge integration): adding Book + pending_fills fields to LiveRethEvmBridge, and a submit_order method that routes orders into the CLOB and accumulates resulting Fills in a buffer. After Lesson 9, the bridge owns a matching engine; Lesson 10 will drain the buffer at build_payload time.

Summary (3 lines)

Three proptests: conservation (Σ fills + remaining == initial) + FIFO (oldest first at same level) + no-crossed-book (best_bid < best_ask).
256 iterations each = 768 total. Shrinking finds minimal counterexamples on failure.
Together cover money leaks + fairness + sane state. Production CI 1000-10000 per invariant. Next: bridge integration.

Lesson 8 — 3 proptest invariants: 768 random scenarios

Question

Principle (minimum model)

Worked example + steps

Lesson 8 — 3 proptest invariants: 768 random scenarios

Goal

Recap

Plan

Walk-through

Step 1: Add proptest to crates/clob/Cargo.toml

Step 2: Set up mod prop_tests with the Action enum

Step 3: Write the strategies

Step 4: The first invariant — qty_conservation

Step 5: The second invariant — no_crossed_book

Step 6: The third invariant — determinism

Test

Design reflection

Answer key

Common questions

Next lesson (Lesson 9)

Summary (3 lines)

Step 1: Add `proptest` to `crates/clob/Cargo.toml`

Step 2: Set up `mod prop_tests` with the `Action` enum

Step 4: The first invariant — `qty_conservation`

Step 5: The second invariant — `no_crossed_book`

Step 6: The third invariant — `determinism`