Lesson 2 — forge fuzz — Solidity's proptest!
Question
forge fuzz is proptest! with a different syntax. Same theorem-first mindset — assertions must hold over the entire valid input space, not over hand-picked examples. Four disciplines from proptest! port directly: vm.assume, shrinking, corpus persistence, and profile-per-environment iteration counts.
Principle (minimum model)
vm.assume(cond)is the Solidityprop_assume!. Filters out inputs that violate preconditions before the assertion runs. The test only exercises the well-defined regime of the property.- Default 256 iterations is a floor, not a goal.
foundry.toml'sfuzz.runs = 256is the OOB default. Production CIs run 10 K – 100 K. Nightly fuzzers push to 1 M.[profile.ci.fuzz]enables profile-per-environment. - Shrinking. When the fuzzer finds a counterexample (e.g.
x = 0xa3b8...4d2f), it doesn't stop — it binary-search-shrinks to the smallest input that still triggers the failure (often a single-digit number). Heuristic, not exhaustive. Per-parameter. - Corpus persistence.
cache/fuzz/stores failing inputs; the next run replays them first. Equivalent to Rust'sproptest-regressions/. - The
vm.assumetrap. A pinpoint filter likevm.assume(x == 42)exhaustsmax_test_rejectsinstantly (random sampling from 2²⁵⁶ never lands on 42) →TooManyAssumptions.vm.assumeis only for excluding boundary conditions (< 1 %), not for selecting specific values — that's a unit test. vm.expectRevertandvm.assumeare opposite roles.vm.expectRevert= negative-path test where the revert IS the success criterion.vm.assume= positive-path guard that excludes revert-causing inputs so the assertion runs on a well-defined domain. Same physical event (the contract reverts on this input); opposite test discipline.- Per-iteration isolation.
setUp()runs before every iteration → a failing iteration cannot poison the next. Reproducibility depends on this. - Gas stats (μ / ~) come from passing iterations. Failing iterations don't contribute. So fuzz gas is typical-case, not worst-case. For worst-case, write a specific unit test.
Worked example + steps
Lesson 2 — forge fuzz — Solidity's proptest!
Goal
Concepts you'll grasp in this lesson:
forge fuzzisproptest!with different syntax. Same theorem-first mindset: write an assertion that should hold for all valid inputs (not just hand-picked examples), and let the runner search the input space for a counterexample. Same shrinker that reduces a 32-byte failing input to the minimaluint256that triggers the bug. Same corpus persistence that replays known counterexamples instantly on the next run. If you wroteproptest! { #[test] fn balance_never_negative(...) { ... } }in openhl-liquidation Lesson 9, you already know the shape of atestFuzz_*function — Solidity just wraps it in contract syntax.vm.assume(condition)is the Solidity equivalent ofprop_assume!. Both filter out inputs that violate preconditions before the assertion runs, so the test only exercises the regime where the property is well-defined. The pattern matches Liquidation Lesson 9'sprop_assume!(entry * size > collateral)rule: when an input would push the test out of its meaningful domain, discard it. The fuzzer just generates another input and tries again.- Default 256 iterations is a minimum, not a goal.
foundry.toml'sfuzz.runs = 256is the out-of-the-box default — enough to catch the obvious bugs in seconds, not enough to prove a property. Production codebases bump it to 10_000 or 100_000 for CI and reserve the higher counts for nightly runs. Same trade-off Rust'sproptest!makes with itsCASES = 256default. - Shrinking is the difference between "the test failed somewhere" and "the test failed at exactly this input." When
forge fuzzfinds a counterexample (say,x = 0xa3b8...4d2f— a random 32-byte value), it doesn't just report the failure. It runs a binary-search-style reduction to find the smallestxthat reproduces the failure. The output you see is the minimal counterexample — often a single-digit number — which makes debugging an order of magnitude faster than "well, some input broke it."
Verification:
forge test
…passes 4 tests (3 from Lesson 1 + 1 new fuzz test added in this lesson). All four green at the default 256 iterations; you'll also see what happens at 100_000.
Specific changes:
foundry.toml— adds a[fuzz]profile section withruns = 1000for the default and a profile alias[profile.ci]withruns = 100000for the heavy run. Demonstrates how to tune iteration counts without hard-coding them in each test.test/Counter.t.sol— appends one new fuzz test:testFuzz_IncrementPreservesPlusOne(uint256 x). Usesvm.assume(x < type(uint256).max)to filter out the overflow case before asserting the property holds.
Total: ~15 lines of new code. Lesson 2 is about what fuzzing is and why the shrinker matters, not about clever fuzz coverage.
Recap
After Lesson 1:
forge testruns 3 tests cleanly (the 2forge initdefaults + your newvm.expectReverttest).- You've internalized the project shape, the
setUpper-test isolation pattern, and the-vthrough-vvvvvverbosity ladder. - You've seen the
testFuzz_SetNumber(uint256 x)test pass with 256 runs — but it was unexplained. Lesson 2 explains what it was doing.
Lesson 2 turns that mysterious 256-run line into the central tool of property-based testing.
Plan
Three edits:
- Open
foundry.tomland add a[fuzz]section to tune the default iteration count. Add a[profile.ci]profile with a higher count for heavy runs. (No new contract code yet — just configuration.) - Read
testFuzz_SetNumber(uint256 x)from Lesson 1'sCounter.t.sol. Understand why Foundry treats it as a fuzz test, what the runner does each iteration, and how the result line(runs: 256, μ: 31000, ~: 31161)is generated. - Append one new fuzz test:
testFuzz_IncrementPreservesPlusOne(uint256 x). Setcountertox, callcounter.increment(), assertcounter.number() == x + 1. Usevm.assume(x < type(uint256).max)to filter out the overflow case. Run withforge test -vvv.
(Answer: Most production CI runs at 10_000 or 100_000; nightly fuzzers push to 1_000_000. The trade-off: each iteration runs the full test (setUp → call → assertion → state cleanup). At 256 iterations a single fuzz test takes ~50ms; at 100_000 it takes ~20 seconds; at 1_000_000 it takes ~200 seconds. Past 100_000 the diminishing returns kick in unless your test is exercising a vast input space — most uint256 fuzz tests have de facto small interesting regions, and 100_000 hits them already. Run high counts on dedicated nightly CI, default counts on PR CI, low counts during local development.)
What forge fuzz actually does
flowchart TD
A[1. Generate random uint256] --> B[2. Run setUp<br/>fresh Counter, number = 0]
B --> C[3. Call testFuzz_* x = generated]
C --> D{4. vm.assume cond?}
D -->|false: discard iteration| A
D -->|true| E[5. Run assertion<br/>assertEq / assertTrue]
E -->|PASS: next iteration| A
E -->|FAIL: trigger shrinker| F[find minimal counterexample]
A -.->|max_test_rejects exceeded| H[TooManyAssumptions error exit]
A -.->|after fuzz.runs successes| G[report gas stats μ ~]
Three things to notice about the loop:
setUp()runs every iteration. This is per-iteration state isolation — same discipline as per-test isolation in Lesson 1, just at a finer grain. A failing iteration cannot poison the next iteration; each run is fresh. Per-iteration isolation is what makes fuzz failures reproducible.vm.assume(cond)inside a fuzz test silently discards the iteration if the condition is false. It doesn't fail the test, doesn't count as a pass — it just generates a new input. This is the input-filtering mechanism. Usevm.assumefor preconditions; usevm.expectRevertfor negative-path tests. They sound similar; they do opposite things.- Gas statistics (μ and ~) come from the iterations that passed. Failing iterations don't contribute. So a fuzz test that mostly passes but occasionally hits an expensive edge case still reports a low μ because the cheap iterations dominate. Don't read fuzz gas numbers as worst-case; they're typical-case. For worst-case gas, use unit tests on the specific high-gas inputs.
When the shrinker kicks in
flowchart TD
A[Initial failing input<br/>x = 0xa3b8_f4c2_... huge number] --> B{Try halving<br/>x / 2}
B -->|still fails| B
B -->|passes| C[Roll back to last failure]
C --> D{Try small mutations<br/>x ± 1, x ± 2, ...}
D -->|smaller failure found| D
D -->|shrink exhausted| E[Final report<br/>counterexample args=5<br/>minimal x that reproduces bug]
Two things to notice about shrinking:
- The shrinker is not exhaustive. It uses heuristics — halving, small-step mutations, bit-flipping — to find a small failure, not the absolutely-smallest one. In practice this is fine: a counterexample of
5debugs the same way as the absolute-minimum3. Heuristic shrinking is good enough; exhaustive shrinking is impractical for 32-byte input spaces. - Shrinkage is per-parameter. A fuzz test taking
(uint256 a, uint256 b)shrinks each parameter independently. Foundry doesn't trya, b/2-then-a/2, bcross-products; it shrinks one at a time. Multi-parameter shrinking is local, not global; the minimal counterexample you see is locally minimal per axis.
Walk-through
Step 1: Tune foundry.toml for fuzz iteration counts
Open foundry.toml. It should look like this after forge init:
[profile.default]
src = "src"
out = "out"
libs = ["lib"]
# See more config options https://github.com/foundry-rs/foundry/blob/master/crates/config/README.md#all-options
Append a [fuzz] section to the default profile and a heavier [profile.ci]:
[profile.default]
src = "src"
out = "out"
libs = ["lib"]
[fuzz]
runs = 1000
max_test_rejects = 65536
[profile.ci.fuzz]
runs = 100000
Three things to notice:
runs = 1000is the new default — 4× the out-of-the-box 256. Tight enough to keep local development feedback under a second; loose enough to catch obvious bugs the default would miss. Bump from 256 to 1000 as soon as you write your second fuzz test; the cost is sub-second.max_test_rejects = 65536— the maximum number ofvm.assumerejections before the test gives up and reports a failure. The default is 65536; you'll usually never hit it. If you do, yourvm.assumepredicate is too restrictive — the fuzzer can't find inputs that satisfy it. Amax_test_rejectsfailure is a signal that your precondition is wrong, not that the fuzzer is broken.[profile.ci.fuzz] runs = 100000— when CI runsFOUNDRY_PROFILE=ci forge test, this 100K-iteration value overrides the default. Production codebases (Uniswap, Compound, AAVE) all use this profile-per-environment pattern. Profiles let you tune iteration counts per environment without hard-coding.
Run forge test to confirm the config didn't break anything:
forge test
Expected output now shows (runs: 1000, ...) for the existing fuzz test:
[PASS] testFuzz_SetNumber(uint256) (runs: 1000, μ: 31000, ~: 31161)
Step 2: Read testFuzz_SetNumber from Lesson 1
The test from forge init (which you already have):
function testFuzz_SetNumber(uint256 x) public {
counter.setNumber(x);
assertEq(counter.number(), x);
}
Four things to notice:
- Function name starts with
testFuzz_. Foundry recognizes any function whose name starts withtestAND takes parameters as a fuzz test. ThetestFuzz_prefix is convention (not strict syntax); the parameter is what triggers fuzzing. Convention + parameter signature = fuzz test. uint256 xis the fuzz input. Foundry generates a randomuint256for each iteration. Multi-parameter signatures (e.g.,function testFuzz_Op(uint256 a, address b)) get independently-fuzzed values for each. Each fuzz parameter is independently sampled.- The assertion
assertEq(counter.number(), x)is the property. Read it as: "for all uint256 valuesx, aftersetNumber(x), the counter holdsx." That's a statement of program correctness, not a single example. A fuzz assertion is a universally-quantified property; a unit-test assertion is one example. - There's no
vm.assumebecause there's no precondition. Everyuint256value is valid input tosetNumber. When every input is valid, you don't need to filter — just let the fuzzer iterate.vm.assumeis for restricting the regime; omit it when the property holds universally.
This particular test is trivially true — setNumber just stores the value. The property is "the storage write actually stored what we passed in." It's a property worth proving (a future refactor that masked some bits in the setter would fail this fuzz test), but it's not an interesting demonstration of fuzzing's power. Our new test in Step 3 is.
Step 3: Add testFuzz_IncrementPreservesPlusOne
Append to test/Counter.t.sol:
function testFuzz_IncrementPreservesPlusOne(uint256 x) public {
// Precondition: x must not be at the type ceiling, otherwise
// increment() would overflow and Solidity 0.8 would revert,
// taking the assertion with it. vm.assume filters these inputs
// before the assertion runs — same role as openhl-liquidation
// Lesson 9's prop_assume!(entry * size > collateral).
vm.assume(x < type(uint256).max);
counter.setNumber(x);
counter.increment();
assertEq(counter.number(), x + 1);
}
Six things to notice:
vm.assume(x < type(uint256).max)filters the one input the property doesn't hold for — the maximum value, wherex + 1would overflow. Without this filter, the test would correctly fail on that single input. With the filter, the test proves the property for the meaningful input range.vm.assumedefines the regime where the property is asserted. This is the opposite role from Lesson 1'svm.expectRevert.vm.expectRevertis a negative-path test that expects the revert to happen and treats it as success;vm.assumeis a positive-path test that excludes inputs that would revert, so the property assertion can run on the well-defined domain. Same physical phenomenon (the contract would revert at this input) — opposite test-discipline intent.- The comment cross-references openhl-liquidation Lesson 9's
prop_assume!— same role, same pattern, different syntax. Readers who came through that course recognize the discipline. Cross-language pattern recognition is the load-bearing pedagogical move of this whole course. - The property
counter.number() == x + 1is the conservation law. Before increment:x. After increment:x + 1. The difference is exactly 1 — and it holds for all validx. Same shape as the Lesson 9 proptestwithdraw_amount_plus_unfilled_equals_shortfall. Fuzz tests express conservation laws; unit tests express specific cases. x + 1happens inside the assertion, aftervm.assumerejectedtype(uint256).max. So the+1arithmetic is always safe — never overflows. Thevm.assumeis what protects this assertion from misfire. Preconditions guard arithmetic; preconditions are part of the property.counter.setNumber(x)mutates state before the assertion. Each fuzz iteration is fresh (the per-iterationsetUpfrom Step 1's diagram), so the mutation only affects this iteration's contract instance. State setup + property assertion = one iteration; isolation prevents leak.- No
expectRevert. This is a positive-path fuzz test — we're not testing the overflow case (that was Lesson 1's job). We're testing that when overflow doesn't happen, the conservation law holds. One test per property; one property per test.
Run:
forge test -vvv
Expected output:
[PASS] testFuzz_IncrementPreservesPlusOne(uint256) (runs: 1000, μ: 36000, ~: 36000)
[PASS] testFuzz_SetNumber(uint256) (runs: 1000, μ: 31000, ~: 31161)
[PASS] test_Increment() (gas: 31303)
[PASS] test_RevertWhen_DecrementBelowZero() (gas: 8957)
Suite result: ok. 4 passed; 0 failed; 0 skipped
Four tests, all green at 1000 iterations. The new fuzz test runs in ~50ms despite the iteration count because each iteration is cheap.
[!WARNING] The "Pinpoint Filtering Trap" and
TooManyAssumptionsErrors
vm.assumeshould be used exclusively to filter out a tiny fraction of the input space (typically less than 1%), such as boundary values (like overflow limits or zero).If you write a filter to pass only one specific pinpoint value, like so:
vm.assume(x == 42); // ✗ Dangerous anti-pattern!The fuzzer has practically zero chance of randomly drawing
42from the massive $2^{256}$ space. As a result, the test runner will fail to find a valid input and will quickly exhaust the rejection budget limit ofmax_test_rejects(default 65,536), aborting with aTooManyAssumptionserror (or aResult::unwrap()panic).
- The Core Issue: Writing
vm.assume(x == target)degrades a powerful fuzz test into a highly inefficient unit test.- Remedy and Best Practice:
- If you need to verify code behavior for specific pinpoint values (e.g.,
42or0xdead...), do not use fuzzing. Write a standard unit test (test_...) instead.- Fuzz testing is designed to verify invariants across broad ranges, not to serve as a substitute for explicit scenario tests. Use each tool for its intended purpose.
Step 4: See the shrinker in action by breaking the test
To demonstrate shrinking, deliberately break the property. Change the assertion to:
assertEq(counter.number(), x + 2); // Wrong: should be x + 1
Run forge test -vvv:
[FAIL: assertion failed: ... ≠ ...]
testFuzz_IncrementPreservesPlusOne(uint256) (runs: 1, μ: ...)
counterexample: args=[0]
Notice: args=[0] — the shrinker reduced whatever 32-byte value originally failed to the minimal 0. Even though the first failing iteration probably had x = 0xa3b8_f4c2_... (some random huge number), the shrinker realized 0 also fails ($\text{number} = 0 + 1 = 1 \neq 2$), and reported the minimal case.
If you'd never seen shrinking, you might assume the bug only triggers at specific large inputs. With shrinking, you see immediately that every input fails — the bug is in your assertion, not in the contract.
Revert the assertion back to x + 1 before continuing.
assertEq(counter.number(), x + 1); // Restored
Re-run forge test. All four tests green again.
Step 5: Look at the corpus directory
Foundry persists failing inputs to cache/fuzz/. After your deliberate-break-and-revert above, look:
ls cache/fuzz/
You should see a directory with files named after test signatures. Each file holds failing inputs from past runs. The next time you run forge test, Foundry immediately re-runs against those persisted inputs before generating new random ones.
This means if you fixed a bug and re-broke it, the test fails immediately with the same counterexample — no waiting for the fuzzer to rediscover it. This is the corpus persistence pattern, and it's the same thing proptest's proptest-regressions/ files do in Rust.
# Persist a counterexample by intentionally breaking + reverting:
# (the bad assertion run already did this above)
ls cache/fuzz/
# → Directory holds the seed that broke testFuzz_IncrementPreservesPlusOne
You can git-ignore cache/fuzz/ (and forge init does by default) or commit it. The argument for committing: counterexamples that previously broke your code stay in the test suite forever, so a regression is caught instantly. Some production codebases commit cache/fuzz/; most don't. Pick a side per repo.
Step 6: Run with the CI profile
FOUNDRY_PROFILE=ci forge test
This runs with fuzz.runs = 100000 (the profile we added in Step 1). The output:
[PASS] testFuzz_IncrementPreservesPlusOne(uint256) (runs: 100000, μ: 36000, ~: 36000)
[PASS] testFuzz_SetNumber(uint256) (runs: 100000, μ: 31000, ~: 31161)
...
100× more iterations. On modern hardware this takes ~10–20 seconds for two fuzz tests; production codebases with dozens of fuzz tests run nightly, not on every PR. Use profiles to gate iteration counts to environment.
Common errors
No tests to run— your test function doesn't have a parameter, so Foundry treats it as a non-fuzz test, but its name starts withtestFuzz_. Add auint256 xparameter or rename the function.calledResult::unwrap()on anErrvalue: TooManyAssumptions—vm.assumerejected more thanmax_test_rejectsinputs. Your predicate is too restrictive. Loosen it or rework the test.counterexample: args=[...]with a huge number — your shrinker hint isn't kicking in. Check that the failure is actually in the simple input range; if not,vm.assumemay be filtering valid inputs.runs: 1in the output of a[PASS]line — that's not actually a pass; that'sforge fuzzfinding a counterexample on iteration 1 and the shrinker working. Re-read the full output for the[FAIL]indicator.
Design reflection
Three load-bearing decisions in forge fuzz's design:
-
Parameter signature is the fuzz signal, not a
@fuzzannotation. Same convention-over-attribute discipline asforge testitself. Foundry's testing surface scales by naming + parameters, not by markup. Tooling doesn't need a syntax tree to discover tests. -
vm.assumefilters rather than fails. The alternative would bevm.requirePrecondition(cond)that fails the iteration if false. Foundry chose the filter semantics because: (a) most precondition violations are inputs you genuinely don't want to test, not bugs; (b) treating them as test failures would flood your CI with noise; (c)max_test_rejectsalready catches the case where your precondition is too restrictive to ever find valid inputs.vm.assumesays "this input isn't interesting"; failures say "this property is broken." -
Shrinking is per-parameter local, not global. A multi-parameter test taking
(uint256 a, uint256 b)shrinksaindependently ofb. This trades cross-parameter optimality for runtime speed; in practice, single-axis minimal counterexamples are good enough for 95% of debugging. Heuristic local shrinking beats exhaustive global shrinking when the input space is 64+ bytes.
Answer key
After Lesson 2:
my-foundry-lab/
├── foundry.toml (+ [fuzz] runs = 1000, + [profile.ci.fuzz] runs = 100000)
├── src/Counter.sol (unchanged from Lesson 1)
├── test/Counter.t.sol (+ testFuzz_IncrementPreservesPlusOne)
└── lib/forge-std/ (unchanged)
After Lesson 2:
forge testpasses 4 tests at 1000 iterationsFOUNDRY_PROFILE=ci forge testpasses 4 tests at 100,000 iterations- You've seen the shrinker reduce a failing counterexample to its minimal form
- You've seen
cache/fuzz/persist failures for instant replay
Common questions
Q1: Why isn't the default fuzz.runs higher than 256? Wouldn't more iterations be strictly better?
Tradeoff: 256 is the speed-vs-coverage sweet spot for local development (sub-second feedback per test). Production codebases bump it for CI because they have time budget for it; local development needs to stay tight. 256 is for the inner loop; 10_000–100_000 is for the outer loop.
Q2: Why does forge fuzz use random input generation instead of exhaustive search?
Because uint256's input space is $2^{256} \approx 10^{77}$ values — exhaustive search is impossible. Random sampling with a good distribution finds counterexamples in the interesting regions (around $0$, $1$, type(uint256).max, $2^N$ boundaries, ...) thanks to a slight bias in Foundry's input generator toward edge values. Pure-random over $2^{256}$ would miss every edge case; biased-random + shrinking hits them.
Q3: Should every state-changing function have a corresponding fuzz test?
Ideally yes — every external function that mutates state should have at least one fuzz test proving the relevant invariant. In practice, prioritize: arithmetic (overflow boundaries), access control (caller checks), and any function that has a conservation law (deposit/withdraw, mint/burn). Aim for fuzz coverage of properties, not lines.
Q4: How is forge fuzz different from forge invariant (Lesson 3)?
forge fuzz is single-call: each iteration calls one function with random parameters and checks an assertion. forge invariant (Lesson 3) is multi-call: each iteration calls many functions in random sequence and checks an invariant after each call. Fuzz tests one function in isolation; invariant tests sequences of function calls. Both are property tests; the granularity differs.
Q5: What happens if my fuzz test calls a function that internally calls vm.assume?
vm.assume works wherever you call it — even nested inside other functions called from your fuzz test. The first vm.assume(false) discards the iteration regardless of call depth. Composability is built into the cheatcode model.
Q6: Does shrinking work with bytes and string parameters?
Yes. For bytes, the shrinker tries shorter slices. For string, it tries shorter strings + simpler character sets. Both work, though they're slower than uint256 shrinking (since each shrinking step requires a longer comparison). Don't avoid bytes/string fuzz tests just because they shrink slower; the shrinker still works, just take more wall-clock seconds.
Next lesson (Lesson 3) — forge invariant — multi-call invariant testing
Lesson 3 graduates from single-call fuzz testing to multi-call invariant testing — the closest Solidity primitive to per-scan conservation laws from openhl-liquidation Lesson 13.
The key concept: define a Handler contract whose functions are the "things the system can do" (deposit, withdraw, increment, etc.). Tell Foundry "treat this Handler as the surface area to fuzz." Foundry then generates random sequences of method calls — deposit(100), withdraw(50), increment(), withdraw(75) — and checks an invariant_* function after each step.
This is what catches multi-call bugs that single-call fuzzing never sees: token-balance reentrancy, ordering-dependent state corruption, the kind of bug that crashed Mt. Gox in slow-motion. Lesson 3 is where forge becomes a real adversary, not just a parameter generator.
Expert continuation
Single-call fuzzing is the Solidity-side primitive. The Expert tier takes the same property-testing idea down to the consensus layer — see Differential fuzzing & execution-spec-tests for cross-client diff testing of EVM implementations against the canonical spec.
Summary (3 lines)
forge fuzz= Solidity-sideproptest!.vm.assume=prop_assume!; shrinking, corpus persistence, profile-per-environment are the same disciplines. Function-name + parameter signature = fuzz signal.vm.assumeis for excluding boundary conditions (< 1 %), never for pinning specific values (→TooManyAssumptions).vm.expectRevertandvm.assumeinvert each other — negative-path success vs positive-path guard.- Default 256 is the local-dev sweet spot (sub-second). Production CI 10 K – 100 K; nightly 1 M. Per-iteration
setUpisolation makes failures reproducible; fuzz gas is typical-case. Next lesson:forge invariantfor multi-call conservation laws.