Lesson 8 — 3 proptest invariants: 768 random scenarios
Question
Three proptest invariants — conservation, FIFO, no-crossed-book — each run on 256 random scenarios = 768 total. Universalises the unit tests to "holds for all valid inputs".
Principle (minimum model)
- Invariant 1: conservation.
Σ fill_sizes + remaining == initial_size. Generate random initial book + random taker; submit; assert. - Invariant 2: FIFO. At each price level, oldest order matches first. Generate orders with explicit sequence numbers; submit; assert oldest filled before newer.
- Invariant 3: no-crossed-book. After every operation, best_bid < best_ask. Generate random sequence of submits + cancels; after each step, assert.
- Proptest input generation.
prop_assume!(size > 0)+vec(any_order(), 1..50)+ price range. Filters invalid inputs. - 256 iterations per invariant. Production CIs run 1000-10000 per invariant.
- Shrinking on failure. When proptest finds a failure, it shrinks to the minimal counterexample. Critical for debugging — shows the smallest input that triggers the bug.
- Why these three. Conservation = no money leaks; FIFO = fair matching; no-crossed-book = sane state. Together they cover the main classes of bug.
Worked example + steps
Lesson 8 — 3 proptest invariants: 768 random scenarios
Goal
Concepts you'll grasp in this lesson:
- Determinism is the load-bearing property for consensus chains — a correct-but-non-deterministic matching engine breaks consensus (validators replay the same actions, see different fills, fail to agree). A deterministic-but-incorrect engine is repairable; a non-deterministic one is not. The
determinisminvariant protects the chain's safety. - Property tests find bugs in scenarios you didn't think of — 9 hand-traced tests cover what you anticipated. 256 cases × 3 properties = 768 random sequences cover the long tail (e.g. "submit 17 limits, then market against an empty side"). Shrinking automatically reduces a failing 25-action sequence to the minimal counterexample.
- Conservation, safety, replayability as the three orthogonal invariants —
qty_conservation(no quantity invented or lost),no_crossed_book(best_bid < best_ask always),determinism(same input → same output). These are the universal CLOB invariants any matching engine must satisfy. proptestbelongs in[dev-dependencies], not[dependencies]— property tests run duringcargo test, never in production. Putting it in[dependencies]would force everyopenhl-clobconsumer to compile proptest.- The
Actionenum is a simplified intermediate for generators — proptest combinators work most easily with primitive types; the strategies emit rawu64s, and the test body wraps them in newtypes before callingsubmit. Newtype discipline holds at the API boundary, not inside the generator.
Verification:
cargo test -p openhl-clob
…passes 12 tests (9 unit + 3 proptest invariants), with each proptest running 256 cases each = 768 random scenarios.
Specific changes:
- One new dev-dep —
proptest = { workspace = true }incrates/clob/Cargo.toml. - A new
#[cfg(test)] mod prop_testsblock at the bottom ofbook.rs, containing:Actionenum — a simplified action representation for property generators.- 3 generator strategies —
arb_side,arb_action,arb_actions— that produce random valid action sequences. - 3
proptest!blocks —qty_conservation,no_crossed_book,determinism.
After Lesson 8, the matching engine has property-level proof that its invariants hold across many random orderings — not just the 9 hand-traced scenarios from Lesson 7.
Recap
After Lesson 7:
- 9 hand-traced unit tests pass.
- Each tests a specific invariant on a specific scenario.
What Lesson 7 doesn't test: random sequences. If a bug exists that only triggers when (e.g.) you submit 17 limits, cancel 3 of them, and then submit a Market, Lesson 7's 9 tests will likely miss it. You'd need to either think of that specific scenario yourself (hard — bugs hide in places you don't think to test) or test many scenarios automatically. Lesson 8 does the latter.
Plan
Three things:
- Add
proptestas a dev-dependency tocrates/clob/Cargo.toml.proptestis already a workspace dep (used byconsensusfor proposer-election tests in the existing rethlab L1 Architect tier); we just need to declare we use it. - Add a new
mod prop_testsblock below the existingmod testsinbook.rs. The new module contains:Actionenum (subset of operations the property tests will exercise — for now, just SubmitLimit + SubmitMarket; cancel will arrive in a follow-up).- Generator strategies for random
Actionsequences. - 3
proptest!blocks — one per invariant.
- Run
cargo test -p openhl-clob— 12 tests pass (9 unit + 3 props).
The 3 invariants:
qty_conservation: total quantity entering the book equals total filled + total resting (the "money math is conserved" property).no_crossed_book:best_bid < best_askalways holds — the safety property test 9 hand-traced, now random-tested.determinism: the same action sequence produces the same fills + same book state, every time. This is the replayability property the chain's safety relies on.
If proptest finds a counterexample for any of them, it automatically shrinks the failing input to the smallest sequence that still fails. That's the load-bearing benefit of properties over examples.
(Answer: qty_conservation would catch it indirectly — over enough cases, the wrong walk order produces wrong totals when the matched price differs from what the hand math expects. no_crossed_book would catch it directly: a buy that doesn't take the cheapest ask first leaves a cheaper ask on the book, and the next bid that's above that ask creates a cross. determinism would catch it on every run because each run picks a different "random" walk order, so two runs of the same input produce different fills. determinism is the load-bearing property for consensus chains — without it, validators won't agree.)
Walk-through
Step 1: Add proptest to crates/clob/Cargo.toml
Open crates/clob/Cargo.toml. The current state is:
[package]
name = "openhl-clob"
# ... shared package fields ...
[dependencies]
[lints]
workspace = true
Add a [dev-dependencies] section:
[package]
name = "openhl-clob"
# ... shared package fields ...
[dependencies]
[dev-dependencies]
proptest = { workspace = true }
[lints]
workspace = true
proptest is already declared in the workspace Cargo.toml (you don't need to add it there — it's been a workspace dep since L1 Architect's first courses). The [dev-dependencies] block makes it available only when building tests, not when building production code.
Step 2: Set up mod prop_tests with the Action enum
In crates/clob/src/book.rs, after the existing mod tests { ... } block (still at module scope), add:
#[cfg(test)]
mod prop_tests {
use super::*;
use proptest::prelude::*;
/// A simplified action enum for property-based testing.
#[derive(Clone, Debug)]
enum Action {
SubmitLimit {
id: u64,
account: u64,
side: Side,
price: u64,
qty: u64,
},
SubmitMarket {
id: u64,
account: u64,
side: Side,
qty: u64,
},
}
The Action enum is a simplified representation of what proptest will randomly generate. Each variant carries the raw u64s that a real call to Book::submit would need (wrapped in newtypes later). Two variants for now — Limit and Market submits. Cancel actions aren't here yet; we add them in a follow-up stage of openhl.
Why model actions as an enum? Because property tests need to generate sequences of actions, and each action can be one of N kinds. The enum captures that variability. proptest's strategy combinators (prop_oneof!, prop::collection::vec, etc.) work well with enums.
Step 3: Write the strategies
Continue inside mod prop_tests:
fn arb_side() -> impl Strategy<Value = Side> {
prop_oneof![Just(Side::Buy), Just(Side::Sell)]
}
fn arb_action(id: u64) -> impl Strategy<Value = Action> {
let limit_action = (1u64..=200, 1u64..=20, arb_side(), 50u64..=150)
.prop_map(move |(account, qty, side, price)| Action::SubmitLimit {
id,
account,
side,
price,
qty,
});
let market_action = (1u64..=200, 1u64..=20, arb_side()).prop_map(
move |(account, qty, side)| Action::SubmitMarket {
id,
account,
side,
qty,
},
);
prop_oneof![3 => limit_action, 1 => market_action]
}
fn arb_actions() -> impl Strategy<Value = Vec<Action>> {
prop::collection::vec(0u64..1000, 1..30)
.prop_flat_map(|ids| {
ids.into_iter()
.enumerate()
.map(|(i, _)| arb_action(i as u64 + 1))
.collect::<Vec<_>>()
})
}
Three strategies, building up:
arb_side()— uniformly picks Buy or Sell.prop_oneof![Just(...), Just(...)]is proptest's "one of these literals" combinator.arb_action(id)— generates a randomActionwith a fixedid. The Limit branch generates(account, qty, side, price)in ranges; the Market branch generates(account, qty, side). Weights:3 => limit_action, 1 => market_action— Limit actions happen 3× as often as Market, reflecting realistic order-book usage.arb_actions()— generates a randomVec<Action>of length 1..30. The.prop_flat_mappattern is a bit unusual: it first callsprop::collection::vec(0u64..1000, 1..30)to produce a vec of u64s, of length 1..30, with values in0..1000. The u64 values themselves are dummies — the.enumerate()inside.prop_flat_mapoverwrites them with indices immediately. The0u64..1000range is there only to give the vector a shape; the actual values are not used. Then each position is mapped to anarb_action(i+1)so order IDs are assigned in increasing order. The trick is thatarb_actionsproduces sequences with strictly-increasing order IDs (avoiding collisions in the book).- Here
enumerate()yieldsi: usize, soi as u64 + 1is an explicit type bridge intoarb_action'su64ID parameter. Conceptually, it maps generator position to deterministicOrderId(1..)numbering.
Why use ranges (1..=200 for account, 50..=150 for price)? To bias proptest toward generating plausible scenarios. With 0..=u64::MAX ranges, proptest would mostly generate extreme outliers (account_id = 18_446_744_073_709_551_614). Realistic ranges produce scenarios that look like real trading: accounts 1-200, prices 50-150, quantities 1-20. The matching engine's bugs are most likely to hide in normal-looking sequences.
Step 4: The first invariant — qty_conservation
Append below the strategies:
proptest! {
#![proptest_config(ProptestConfig {
cases: 256,
..ProptestConfig::default()
})]
/// Quantity is conserved: every fill_qty came from a resting maker;
/// total qty in/out balances.
#[test]
fn qty_conservation(actions in arb_actions()) {
let mut book = Book::new();
let mut total_in = 0u64;
let mut total_filled = 0u64;
let mut total_market_unfilled = 0u64;
for action in actions {
match action {
Action::SubmitLimit { id, account, side, price, qty } => {
total_in += qty;
let r = book.submit(Order {
id: OrderId(id),
account: AccountId(account),
side,
qty: Qty(qty),
order_type: OrderType::Limit { price: Price(price) },
});
total_filled += r.total_filled().0;
}
Action::SubmitMarket { id, account, side, qty } => {
total_in += qty;
let r = book.submit(Order {
id: OrderId(id),
account: AccountId(account),
side,
qty: Qty(qty),
order_type: OrderType::Market,
});
total_filled += r.total_filled().0;
total_market_unfilled += r.remaining_qty.0;
}
}
}
// Resting quantity = total_in - 2*total_filled - total_market_unfilled.
// (Each fill consumes one unit from a maker AND one unit from a taker,
// so total_filled counts qty, but the qty appeared in total_in twice
// — once when the maker was submitted, once when the taker arrived.)
let resting: u64 = book.bids.values()
.flat_map(|q| q.iter())
.chain(book.asks.values().flat_map(|q| q.iter()))
.map(|o| o.qty.0)
.sum();
prop_assert_eq!(total_in, 2 * total_filled + total_market_unfilled + resting);
}
This is the "quantity is conserved" invariant. The body has three counters:
total_in: sum of allqtyvalues from submitted orders.total_filled: sum offill_qtyacross allFills produced.total_market_unfilled: sum ofremaining_qtyfrom Market orders (the leftover discarded).
The invariant is a conservation law:
total_in = 2 × total_filled + total_market_unfilled + resting_qty
Why 2 ×? Because a fill consumes 1 unit from the maker AND 1 unit from the taker, so 1 unit of fill_qty appears in total_in twice — once when the maker was originally submitted, once when the taker arrived. The math:
| Action | total_in | What's left at the end |
|---|---|---|
| Submit Limit 10 units that fully rest | +10 | 10 units resting |
| Submit Market 10 units, no liquidity | +10 | 10 units discarded (no fill) |
| Submit Limit 10 units that match a 5-unit ask | +10 | 5 units filled (one from each), 5 units left over to rest |
If 5 units fill, that means: maker offered 5 (already in total_in), taker took 5 (also in total_in). The 5 filled units appear in total_in as 10 — once from each side. That's why 2 * total_filled.
Timeline view of the third row:
[t=0] Sell Limit(qty=5) arrives → fully rests
total_in += 5 → total_in = 5
(no fill; rests on the book)
Check: total_in(5) = 2×total_filled(0) + market_unfilled(0) + resting(5) ✓
[t=1] Buy Limit(qty=10) arrives → 5 units cross the maker, 5 units rest
total_in += 10 → total_in = 15
total_filled += 5 → total_filled = 5
resting = 5 (taker remainder; the maker's 5 units are consumed)
Check: total_in(15) = 2×total_filled(10) + market_unfilled(0) + resting(5) ✓
total_in is just a running counter — "sum the qty of every submitted order in arrival order." The seemingly mysterious 2 * encodes a real accounting fact: one fill contributes to total_in twice — once when the maker was originally submitted, once when the taker arrived.
The #![proptest_config(ProptestConfig { cases: 256, .. })] line at the top of the proptest! block sets each test to run 256 times. With 3 invariants × 256 cases = 768 random scenarios.
The prop_assert_eq! (not assert_eq!) is important — proptest needs to distinguish "test failed" from "test panicked due to a system error." prop_assert_eq! reports the failure to proptest's shrinking machinery, which then tries to find a minimal counterexample.
Step 5: The second invariant — no_crossed_book
Below the first proptest, still inside the same proptest! { ... } block:
/// Book invariant: best bid is strictly less than best ask. The book
/// should never be crossed after submit() completes.
#[test]
fn no_crossed_book(actions in arb_actions()) {
let mut book = Book::new();
for action in actions {
match action {
Action::SubmitLimit { id, account, side, price, qty } => {
book.submit(Order {
id: OrderId(id),
account: AccountId(account),
side,
qty: Qty(qty),
order_type: OrderType::Limit { price: Price(price) },
});
}
Action::SubmitMarket { id, account, side, qty } => {
book.submit(Order {
id: OrderId(id),
account: AccountId(account),
side,
qty: Qty(qty),
order_type: OrderType::Market,
});
}
}
if let (Some(b), Some(a)) = (book.best_bid(), book.best_ask()) {
prop_assert!(b < a, "book crossed: bid={} ask={}", b.0, a.0);
}
}
}
The body:
- For each action, submit the order.
- After each submission, check that
book.best_bid() < book.best_ask()(if both exist). - If at any point
best_bid >= best_ask, the test fails — the book is crossed.
This is the same invariant as Lesson 7's book_does_not_cross_after_match, but tested against random sequences. Lesson 7 proves the invariant holds on one scenario; Lesson 8 proves it holds on 256 randomized scenarios.
The prop_assert!(b < a, "...") macro includes a format string — when proptest fails, the error message shows the actual bid/ask values that crossed. This is more informative than the plain assert!(b < a).
Step 6: The third invariant — determinism
The most important one:
/// Determinism: applying the same action sequence produces the same
/// book + fill history every time. (The "replayability" property
/// from the architecture doc — required for consensus determinism.)
#[test]
fn determinism(actions in arb_actions()) {
let run = |actions: &[Action]| {
let mut book = Book::new();
let mut all_fills: Vec<Fill> = Vec::new();
for action in actions {
let order = match action {
Action::SubmitLimit { id, account, side, price, qty } => Order {
id: OrderId(*id),
account: AccountId(*account),
side: *side,
qty: Qty(*qty),
order_type: OrderType::Limit { price: Price(*price) },
},
Action::SubmitMarket { id, account, side, qty } => Order {
id: OrderId(*id),
account: AccountId(*account),
side: *side,
qty: Qty(*qty),
order_type: OrderType::Market,
},
};
all_fills.extend(book.submit(order).fills);
}
(book.best_bid(), book.best_ask(), book.depth_bid(), book.depth_ask(), all_fills)
};
prop_assert_eq!(run(&actions), run(&actions));
}
}
}
This invariant defines a helper closure run that applies a sequence of actions to a fresh Book and returns a 5-tuple summarizing the end state: (best_bid, best_ask, depth_bid, depth_ask, all_fills_in_order).
Then: prop_assert_eq!(run(&actions), run(&actions)).
Two runs of the same input must produce the same output. If the matching engine has any non-determinism — randomness, HashMap iteration order, threading races — this test will catch it.
Why this is the most important property: a consensus chain relies on every validator computing the same fills from the same inputs. If one validator's matching engine produces different fills than another's, validators can't agree on the block, and the chain forks. Determinism is the load-bearing property — no_crossed_book is about correctness, but determinism is about agreement. A correct-but-nondeterministic engine breaks consensus; a deterministic-but-incorrect engine is at least repairable.
The Action::SubmitLimit { id, account, side, price, qty } destructuring uses *id, *account, etc. because actions is borrowed as &[Action], so each field is a borrowed &u64. The * dereferences to get the value.
Test
cargo test -p openhl-clob
Expected (12 tests):
running 12 tests
test prop_tests::determinism ... ok
test prop_tests::no_crossed_book ... ok
test prop_tests::qty_conservation ... ok
test tests::book_does_not_cross_after_match ... ok
test tests::buy_market_takes_best_ask ... ok
test tests::cancel_removes_resting_order ... ok
test tests::cancel_unknown_returns_false ... ok
test tests::empty_book_has_no_best_prices ... ok
test tests::limit_buy_walks_asks_within_price ... ok
test tests::market_with_insufficient_liquidity_returns_remaining ... ok
test tests::price_time_priority_within_level ... ok
test tests::resting_limit_creates_bid_or_ask ... ok
test result: ok. 12 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Total runtime: a few seconds. Proptest runs 256 cases per test, each case is a small in-memory matching simulation, so the total cost is well under 10 seconds.
If any prop test fails, you'll see:
proptest: Saving this and future failures in /Users/.../proptest-regressions/...
proptest: If this test was expected to be flaky, ...
Proptest caches the failing input in a file under proptest-regressions/. Subsequent runs will re-test the cached input first, so once you find a bug, fixing it is verified against the same minimal counterexample every time. Add the regressions file to git (it's tiny).
Common errors and fixes:
error: cannot find macro 'proptest' in this scope— yourmod prop_testsis missinguse proptest::prelude::*;. Re-check Step 2.error: trait 'Strategy' not satisfied— your generator function's return type isn'timpl Strategy<Value = T>. Theprop_oneof![Just(...)]returnsimpl Strategy<Value = T>for the type insideJust; chaining.prop_map(...)may change the value type. Make sure yourStrategy<Value = ...>type matches what you actually generate.prop_assert_eqfails with totals not matching — yourtotal_inaccumulator is wrong. Each submit adds the order'sqtytototal_in, not the fill quantity. Re-check Step 4 — only sum at submit, not at fill.- Determinism fails — you likely introduced a HashMap somewhere, or a
time::Instant, or some non-deterministic primitive. Check the recent diff against Lessons 1–7's code; the bug is wherever a non-deterministic primitive was added.
Design reflection
Three load-bearing decisions encoded here:
-
Proptest is dev-dep, not runtime dep. Property tests run during
cargo test, not in production. Puttingproptestin[dependencies]would force every consumer ofopenhl-clobto compile and link proptest. The[dev-dependencies]discipline keeps the production dependency graph clean. -
The Action enum is a simplified intermediate. Each variant carries raw
u64s, not theOrderId(u64)/AccountId(u64)newtype-wrapped versions. The proptest strategies generate the raw values; the test body wraps them in newtypes before callingsubmit. This is on purpose — proptest's combinators work most easily with primitive types, and theas u64ergonomics save us boilerplate. Newtype enforcement happens at the API boundary (callingsubmit), not inside the test generator. -
determinismis the load-bearing property for consensus. A correct-but-non-deterministic matching engine breaks consensus; a deterministic-but-incorrect one is repairable. The test that catches non-determinism is the one that protects the chain's safety. Naming and prioritizing properties by what they protect — not by what they test — is the discipline.
Answer key
cd ~/code/openhl-reference
git checkout 55a9dff
diff -u ~/code/my-openhl/crates/clob/src/book.rs ./crates/clob/src/book.rs
diff -u ~/code/my-openhl/crates/clob/Cargo.toml ./crates/clob/Cargo.toml
After Lesson 8, your book.rs mirrors the reference at 55a9dff (modulo doc comments). Cargo.toml has the [dev-dependencies] proptest line.
Return:
git checkout main
Common questions
Q: Why cases: 256 and not 1024 or 100?
A balance. 256 cases × 3 properties × ~10ms per case ≈ 8 seconds total — fast enough to run on every cargo test. 1024 cases would push it to 30+ seconds, becoming a friction in dev iteration. 100 cases would risk missing rare bugs. Pick a case count that's small enough to run cheaply but large enough to catch common bugs.
Q: Why no cancel in the proptest actions?
Because cancel actions complicate the determinism + conservation properties. On the conservation side it looks easy — just add + total_canceled_qty to the right-hand side: total_in = 2 * total_filled + total_market_unfilled + resting_qty + total_canceled_qty. The real difficulty is on the generator side: to avoid producing "double-cancel of an already-filled-or-canceled order id" sequences, the strategy needs to statefully track which order IDs are still live — that's exactly the boundary Lesson 7's Test 8 (cancel_unknown_returns_false) covers, but in proptest it becomes the generator's responsibility. Stateful strategies break out of proptest's pure-combinator pattern, so the generator code grows quickly. The simplification "submit-only sequences" makes the 3 invariants tractable. Adding cancel-aware properties is a follow-up — get the foundation clean first, then layer on top. The existing 3 invariants are the highest-value ones to get right first.
Q: What happens when proptest finds a failing input?
It enters the shrinking phase. Starting from the failing input, proptest tries to find the smallest subset / smallest values that still fail. For our test case generators (which produce Vec<Action>), shrinking might reduce a 25-action sequence to a 3-action sequence that still reproduces the bug. The minimal sequence is what you debug against — much easier than the original input.
Q: Can I make arb_actions produce only Limit orders?
Yes — change arb_action's prop_oneof![3 => limit_action, 1 => market_action] to prop_oneof![1 => limit_action] (or just return limit_action directly without the prop_oneof). For the invariants we have, Market orders are useful (they exercise the discard-remainder path), but if you want to focus testing on Limit-only flows, you can. proptest strategies are composable.
Next lesson (Lesson 9)
The matching engine is fully tested. It's not yet integrated with consensus. Lesson 9 starts Module 4 (Bridge integration): adding Book + pending_fills fields to LiveRethEvmBridge, and a submit_order method that routes orders into the CLOB and accumulates resulting Fills in a buffer. After Lesson 9, the bridge owns a matching engine; Lesson 10 will drain the buffer at build_payload time.
Summary (3 lines)
- Three proptests: conservation (Σ fills + remaining == initial) + FIFO (oldest first at same level) + no-crossed-book (best_bid < best_ask).
- 256 iterations each = 768 total. Shrinking finds minimal counterexamples on failure.
- Together cover money leaks + fairness + sane state. Production CI 1000-10000 per invariant. Next: bridge integration.