FABRKNT
Build OpenHL Funding — perpetual funding state machine
Clock state machine
Lesson 11 of 12·CONTENT25 min50 XP

Treat this page as a workbench, not a blog post. The goal is to extract a reusable mental model from the source and carry it into the rest of the Fabrknt stack.

Course
Build OpenHL Funding — perpetual funding state machine
Lesson role
CONTENT
Sequence
11 / 12

Lesson 10 — No-catch-up invariant — the design philosophy in one test

Question

The no-catch-up invariant is the load-bearing design choice of openhl funding. One proptest proves it across all input sequences. Get this wrong, and the whole funding system retroactively charges users — a compliance and UX disaster.

Principle (minimum model)

  • The claim. No matter how many intervals are missed (chain halt, validator outage, sequencer downtime), the next Fund pays exactly one interval of funding, not the accumulated amount.
  • Why this matters. If the chain is offline for 3 hours, users don't want to pay 3 hours of funding when it comes back. The Hyperliquid choice: forgive missed intervals.
  • The proptest. Generate a random series of tick timestamps; record every Fund event's payment_amount; assert payment_amount always equals notional × rate × 1 interval — never 2× or 3×.
  • Edge cases caught by the proptest. (1) Long pause then resume. (2) Many fast ticks within one interval. (3) Tick exactly at interval boundary.
  • Why one test, not a suite. This invariant is so important — and so subtle — that a single comprehensive proptest is more reliable than a battery of hand-crafted cases. Proptest covers cases you didn't think of.
  • Trade-off. Catch-up would maximise protocol revenue; no-catch-up prioritises trader UX. Hyperliquid chose the latter.
  • Documented in code. The constant NO_CATCH_UP: bool = true is referenced in apply_funding. Removes ambiguity — the design philosophy is encoded explicitly.

Worked example + steps

Lesson 10 — No-catch-up invariant — the design philosophy in one test

Goal

Concepts you'll grasp in this lesson:

  • No-catch-up as a fairness invariant — after a 10-interval gap, settle once and advance to now; don't replay 10 ticks. Replaying with the current snapshot pummels the losing side 10x for time they couldn't have closed during. Funding's purpose is equilibration, not retroactive enforcement.
  • Advance to now, not to last_settled + interval — the deadline resets to the actual settlement timestamp, not to a mathematically next-aligned one. The clock forgets missed intervals entirely; this is the design choice the test pins.
  • Same-now second-tick is the strictest possible state-machine test — no time elapses between the two calls; only the clock's internal state has changed. Catches all implementations that fail to update last_settled_at on a late tick.
  • Catch-up policy lives outside the clock — a caller wanting catch-up writes a wrapper that calls tick() repeatedly with snapshots at intermediate historical timestamps. The clock can't do this; it doesn't have access to historical state. The primitive stays minimal; policy stays in the caller.
  • Design philosophy lives in three places: doc, code, test — module doc names the invariant, tick()'s self.last_settled_at = now line enforces it, no_catchup_after_long_gap proves it. Each location handles a different reader.

Verification:

cargo test -p openhl-funding

…passes 22 tests (21 from Lessons 4–9 + 1 new).

Specific changes:

The new test is no_catchup_after_long_gap — the milestone test that pins openhl's design choice on what happens when a validator misses multiple intervals.

After Lesson 10:

  • crates/funding/ is byte-identical to Stage 8b (cd94137).
  • All 22 tests pass: 20 hand-traced + 2 proptests.
  • Module 3 (Clock state machine) is complete.
  • The funding state machine is production-shape as a standalone crate.

The teaching focus is design philosophy under failure modes: when the clock falls behind, what's the right semantics? The naive answer (catch up by replaying ticks) is wrong, and Lesson 10 explains why.

Recap

After Lesson 9:

  • 6 of 7 clock tests pass.
  • Both interval-gating sub-invariants verified (boundary, persistence).
  • The math composition surfaces correctly through tick().

Lesson 9 covered the "normal operation" invariant. Lesson 10 covers the "abnormal operation" invariant — what happens when the clock is late.

The scenario

Imagine the openhl chain has been running normally, settling funding every hour. Then something happens:

  • Validator reboot (process restart taking 5 minutes).
  • Network partition (chain pause for 8 hours while validators reconnect).
  • Hardware failure on the leader, fallback validator picks up after 30 minutes.

Whatever the cause, the next tick() call has now - last_settled_at far exceeding interval_secs. What should the clock do?

Two design choices:

Choice A: Catch up

Replay 10 intervals' worth of funding. Each replay uses the current mark/index/positions snapshot. Apply 10 settlements in succession.

Pros: every interval gets a settlement, the chain "doesn't fall behind."

Cons:

  • Stale-snapshot problem: all 10 settlements use the same current snapshot, not the snapshot at each historical interval boundary. A trader who was winning during the gap pays for 10 settlements all computed from the now-favorable rate. Whichever side has been losing gets pummeled 10x, without ever having had a chance to close their position to escape it.
  • Concentrated risk: 10x funding at once can liquidate accounts that would have survived 10 separate hourly payments.
  • Path dependency: the funding history depends on when the gap occurred, not just on the cumulative time.

Choice B: Settle once, advance to now

Apply funding once at the current snapshot, then advance last_settled_at to now. The 10 missed intervals are skipped, not replayed.

Pros:

  • No concentrated punishment: at most one settlement at the cap per outage.
  • Path-independent: the result depends only on the current snapshot, not on the gap's timing.
  • External catch-up possible: a caller wanting catch-up logic can implement it themselves with repeated ticks + fresh snapshots at intermediate timestamps.

Cons:

  • Missing revenue: funding is the equilibration mechanism for the perpetual price; skipping intervals removes pressure on the basis.

openhl chooses Choice B. The catch-up logic, if anyone needs it, lives outside the clock — built on repeated tick() calls with snapshots at the right historical times.

What the state machine does immediately after a large time jump makes Choice A vs Choice B easy to see side by side on one outage scenario:

1_000_000 (Genesis, last_settled_at = 1_000_000)
   │
   ▼  +3,600 s (one healthy interval)
1_003_600 ── 【Normal tick】 success ──► last_settled_at = 1_003_600
   │
   ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
   ░░  Outage! Chain halts for 10 hours          ░░
   ░░  Traders can't close their positions       ░░
   ░░  Suppose mark drifts above index throughout░░
   ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
   │
   ▼  +36,000 s (first block after restart)
1_039_600 ── 【Late tick】
                  │
                  ├─► ❌ Choice A (catch-up replay):
                  │     replay settlement 10 times against the current snapshot
                  │     the losing side (longs) eats 10 consecutive cap-rate hits
                  │     traders had no way to close during the gap
                  │     → retroactive coercion for time they had no agency
                  │     last_settled_at: 1,003,600 → 1,007,200 → ... → 1,039,600
                  │
                  └─► 🟢 Choice B (openhl, this implementation):
                        settle once against the current snapshot
                        the 9 missed intervals are skipped entirely
                        the clock jumps straight to `now = 1_039_600`
                        → loses funding revenue, but fair to traders
                        last_settled_at: 1_003_600 ──► 1_039_600 ✨ (one step)

The key thing this picture pins down is that under Choice B, last_settled_at always advances in a single step. Ten-hour gap or ten-second gap, tick() is called once and advances once. That's the actual content of path-independence (the outcome doesn't depend on gap timing), and it's why a single test can pin the whole invariant.

(Answer: The losing side gets pummeled 10x. During a 10-hour gap, suppose mark drifted high relative to index — longs have been overpaying in the "real" world. Choice A replays 10 settlements at the current rate, all charging longs. The trader who was already on the losing side of the basis pays 10x what they would have if funding had been applied hourly. Worse, they couldn't have closed their position during the gap (the chain was paused); the catch-up appears to charge them retroactively for time they had no agency. Choice B says: skip the 10 missed payments and start fresh now. Bad for funding revenue; fair to traders.)

Plan

One file edit:

  1. Append no_catchup_after_long_gap to crates/funding/src/clock.rs — inside the existing #[cfg(test)] mod tests block, after the Lesson 9 tests.

No production code, no lib.rs changes.

Walk-through

Step 1: Add the milestone test

After capped_rate_when_premium_extreme, add:

    #[test]
    fn no_catchup_after_long_gap() {
        // If 10 intervals elapse before the next tick, we settle ONCE and
        // advance to `now`. We don't replay 10 settlements with stale state.
        let params = FundingParams::hyperliquid_default();
        let mut clock = FundingClock::new(params, 1_000_000);

        let way_later = 1_000_000 + 10 * 3600;
        let out = clock.tick(way_later, MarkPrice(101), IndexPrice(100), &balanced_book());
        assert!(out.is_some(), "elapsed >> interval → tick fires");
        assert_eq!(clock.last_settled_at(), way_later);

        // Immediately ticking again at the same moment does NOT settle.
        let again = clock.tick(way_later, MarkPrice(101), IndexPrice(100), &balanced_book());
        assert!(again.is_none(), "no duplicate settlement at same now");
    }

Two parts. Each pins a separate sub-property of the no-catch-up invariant.

Part 1: settle once after long gap

        let way_later = 1_000_000 + 10 * 3600;
        let out = clock.tick(way_later, MarkPrice(101), IndexPrice(100), &balanced_book());
        assert!(out.is_some(), "elapsed >> interval → tick fires");
        assert_eq!(clock.last_settled_at(), way_later);

The setup: genesis at 1_000_000, then tick at 1_036_000 (= 1_000_000 + 10 × 3600). Ten full intervals have elapsed.

Two assertions:

  1. out.is_some() — the tick does fire. We don't skip it just because it's late. Choice B isn't "skip everything" — it's "settle once."

  2. clock.last_settled_at() == way_later — and crucially, the clock advances to now, not to 1_000_000 + 3600 (one interval after genesis) or 1_000_000 + 10*3600 (ten intervals after genesis — same number but for different reasons). The clock forgets the missed intervals entirely.

Part 2: no re-fire at same now

        let again = clock.tick(way_later, MarkPrice(101), IndexPrice(100), &balanced_book());
        assert!(again.is_none(), "no duplicate settlement at same now");

After the long-gap tick, immediately call tick again at the same now. It must return None. This proves the interval-gating invariant still holds after a late tick — we can't get a double settlement by calling tick twice in a row.

Why is this assertion important? Because without it, a buggy implementation could:

  • Detect "elapsed time >> interval" and decide "fire continuously until we catch up" (the buggy version of catch-up).
  • Forget to update last_settled_at on the long-gap tick, so subsequent ticks at the same now keep firing.

The same now is the strictest possible test. No time has passed between the two ticks; only the clock's internal state has changed. If last_settled_at == way_later (from Part 1), then the guard now < last_settled_at + interval becomes way_later < way_later + 3600, which is 0 < 3600, which is true — so tick correctly returns None.

Step 2: Run tests

cargo test -p openhl-funding

Expected:

running 22 tests
test clock::tests::capped_rate_when_premium_extreme ... ok
test clock::tests::empty_positions_yield_empty_settlements_but_still_advance_clock ... ok
test clock::tests::first_tick_at_exact_interval_fires ... ok
test clock::tests::first_tick_before_interval_returns_none ... ok
test clock::tests::no_catchup_after_long_gap ... ok
test clock::tests::premium_drives_settlement_signs ... ok
test clock::tests::second_tick_requires_another_full_interval ... ok
... (15 tests from Lessons 4–7)

test result: ok. 22 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

22 tests, all green. Module 3 closes. crates/funding/ is byte-identical to Stage 8b.

Common errors:

  • **Part 1 fails: out.is_none()** — your guard's comparison is wrong direction. Re-check: if now < last_settled_at + interval { return None; }. At now = 1_036_000andlast_settled_at = 1_000_000, now < 1_003_600` is false, so the guard doesn't return; the tick fires.
  • **Part 1 fails: last_settled_at() != way_later** — you advanced the clock to something other than now. Re-check the line self.last_settled_at = now;near the end oftick(). Common typo: self.last_settled_at = self.last_settled_at + self.params.interval_secs;(catch-up version) orself.last_settled_at += self.params.interval_secs;` (similarly wrong).
  • Part 2 fails: again.is_some()last_settled_at wasn't updated on Part 1's tick. The Part 2 tick at the same now finds the gate at genesis + interval (still satisfied), so it fires erroneously. Re-check the Part 1 assignment.

Design reflection

Four load-bearing decisions in this lesson:

  1. Settle once on long gaps, advance to now. The alternative (catch up by replaying intervals) creates concentrated punishment for the losing side without giving them the chance to close. Funding's purpose is equilibration, not retroactive enforcement. Choice B aligns the math with fairness, at the cost of some funding revenue.

  2. The same-now second-tick test is the strictest possible. No time elapses; only state has changed. Catches all implementations that fail to update last_settled_at on a late tick. For state machines, "same input, repeated call" reveals state-update bugs.

  3. Catch-up logic lives outside the clock. A caller wanting catch-up can call tick() repeatedly with snapshots at intermediate historical timestamps. The clock is the primitive; the policy is the caller's.

  4. Design philosophy lives in documentation + tests. The clock's module doc names the invariant; this test enforces it; the test comments + this lesson explain why. Three places to find the rationale: doc, code, test.

Answer key

cd ~/code/openhl-reference
git checkout cd94137
diff -u ~/code/my-openhl/crates/funding/ ./crates/funding/ --recursive

After Lesson 10, crates/funding/ is byte-identical to Stage 8b. The diff is empty.

Module 3 closes. Module 4 (capstone) is Lesson 11.

Return:

git checkout main

Common questions

Q: What if I want catch-up semantics? Can I configure it? Not from inside the clock. You'd have to write a wrapper that calls tick() repeatedly with snapshots at historical intermediate timestamps:

// Pseudocode for an external catch-up wrapper:
while clock.last_settled_at() + interval < now {
    let next_target = clock.last_settled_at() + interval;
    let historical_snapshot = fetch_snapshot_at(next_target);  // !!! complex !!!
    clock.tick(next_target, historical_snapshot.mark, ...);
}
clock.tick(now, current_snapshot.mark, ...);

The hard part is fetch_snapshot_at(historical_timestamp) — the caller has to know what mark/index/positions looked like at past times. That's why catch-up isn't in the clock: it requires historical state the clock doesn't have. The application layer (which has the chain database) can do it.

The "complex !!!" comment is pointing at exactly the disaster that would unfold if you tried to pull that complexity inside the clock: the clock would have to persist the last N intervals' worth of (mark, index, position snapshot) tuples on-chain. For HL's 1-hour cadence, keeping even one month means 24 × 30 = 720 snapshots per market, multiplied across every market the chain trades — and since that storage is now consensus state, every layout change requires a network upgrade. The "pure, minimal state machine" virtue of openhl-funding evaporates instantly. The application layer, by contrast, already maintains the chain database, so fetch_snapshot_at(t) is roughly "look up the state root at block T and read the positions." The "primitive minimal, policy outside" separation here shows up concretely as a 720× difference in storage footprint.

Q: How long can the gap be before way_later overflows? u64::MAX seconds is roughly 5.8 × 10^11 years — well past heat death. The saturating_add in the guard handles last_settled_at near u64::MAX, but in practice we don't reach that regime. The pathological case is the guard's responsibility; the realistic case is the design's.

Q: What if mark and index are both reasonable values at way_later, but the gap was caused by mark/index oracle being unavailable? The clock doesn't know about oracle staleness. If you call tick() with a stale mark, you get funding based on the stale data. Oracle freshness is the caller's responsibility. Production deployments add an oracle-staleness check before calling tick() — and skip the call if the oracle is too old. The skip happens above the clock; the clock just trusts its inputs.

Q: Should we add a warning log when a long-gap tick happens? Logging is a side effect. The clock is pure (no I/O). A wrapper can log the gap if it cares: if elapsed > 2*interval { log!("late tick: {} hours behind", elapsed/3600); }. Keep the primitive pure; let the wrapper observe.

Module 3 milestone — what you've built

After Lesson 10:

  • Module 3 complete. Clock state machine + 7 tests covering interval-gating, no-catch-up, math composition, cap surfacing.
  • Entire crate byte-identical to Stage 8b. ~635 LOC across types.rs / compute.rs / clock.rs.
  • 22 tests total: 20 hand-traced + 2 proptest.
  • Zero rustdoc warnings.

The funding state machine is now a complete, tested, production-shape crate. It computes funding deterministically, gates on the right cadence, and refuses to introduce path-dependent settlements after gaps.

What's left:

  • Module 4 (Capstone, Lesson 11) — synthesis, deferred items, bridge-integration preview. No code.
  • Future course — wiring this crate into the bridge (oracle integration, balance updates, liquidation triggers).

Next lesson (Lesson 11)

Lesson 11 is the capstone — no new code. We sketch the architecture, name the items deferred from this course (oracle integration, balance updates, liquidations, multi-market funding, funding-as-EVM-event), and trace where each will live when shipped. The lesson is for cementing the mental model and seeing the funding state machine as a piece of the larger openhl architecture.

Summary (3 lines)

  • No-catch-up: missed intervals are not paid retroactively. One Fund = one interval, regardless of elapsed time.
  • Proptest covers long pause + fast ticks + boundary timing. Hand-crafted suite isn't reliable; proptest catches subtleties.
  • Trade-off: protocol revenue (catch-up) vs trader UX (no catch-up). Hyperliquid chose UX. Capstone next.