FABRKNT
Validator Operations — Keys, Slashing, and Coordinated Upgrades
Validator Operations
Lesson 2 of 4·CONTENT16 min40 XP

Treat this page as a workbench, not a blog post. The goal is to extract a reusable mental model from the source and carry it into the rest of the Fabrknt stack.

Course
Validator Operations — Keys, Slashing, and Coordinated Upgrades
Lesson role
CONTENT
Sequence
2 / 4

Lesson 2 — Slashing detection and the offline validator

Question

There are exactly two ways a validator loses stake. The expensive one: sign two contradictory messages (slashing — one event burns the bulk of the stake). The slow one: be offline when the network needs you (inactivity penalty — a few basis points a day). Every operational decision is "pick the smaller of these two losses." What are the decision axes?

Principle (minimum model)

  • Two paths to losing stake. Active (slashing — cryptographically provable, catastrophic) + passive (inactivity — gradual, small per epoch).
  • Three flavours of slashable violation. Double voting (same height, different blocks) + surround voting (Casper FFG — a later vote brackets an earlier one) + BFT equivocation (same height/round, different pre-commit).
  • Slashing-protection DB is standard. Before signing, check whether a different message at this (H, R) was already signed → if yes, refuse → if no, sign + record.
  • The DB must be persistent + restart-safe + soft-update-safe + backed up.
  • DB write and signature must be atomic. A network-remote DB introduces a window where atomicity breaks → it has to live on the same machine.
  • Remote signer + DB = defence in depth. Even if the validator node has a bug, the remote signer's DB is the last line.
  • Whistleblower reward. On Ethereum the reporter gets ~1/512 of the slashed amount — so a $1 M slashing pays ~$2 k to watchers, creating economic incentive to monitor.
  • Inactivity leak. When >1/3 are offline, finality stalls → every epoch shaves the offline stake → the chain self-heals back to >2/3 online.
  • During a network partition, fail-closed is the right answer. Keep signing (slashing risk) vs stop (small inactivity penalty) → stopping is always correct.

Worked example + steps

Slashing detection and the offline validator

There are exactly two ways to lose validator stake. The expensive way: sign two conflicting messages (slashing — one event, big chunk of stake gone). The slow way: be offline when the network needs you (inactivity penalty — drips out over days). Every operational decision in this lesson comes down to picking the smaller of those two losses when something goes wrong.

1. The two ways to lose stake

WayWhatWhy
Active misbehavior (slashing)Sign 2 conflicting messagesCryptographically provable; major penalty
Passive misbehavior (inactivity)Don't sign at allEroding penalty during partitions

Both are protocol-enforced. Both reduce your stake. The key difference: slashing is catastrophic (lose much/all of stake in one event), inactivity is slow (lose small amounts over time).

For Ethereum mainnet (2026 parameters approx):

  • Slashing: ~1 ETH minimum, scales up with correlated slashing
  • Inactivity leak: ~0.1% of stake per day of non-participation during finality issues

2. Slashable offenses, in detail

2.1 Double voting

Two votes for the same height with different blocks:

Vote1: { height: 1000, block: 0xA..., signature: SigA }
Vote2: { height: 1000, block: 0xB..., signature: SigB }

If both votes are valid and both are signed by validator V, V is slashable. Anyone with both signatures can construct a slashing proof.

The fix: validator software MUST track every signed vote and refuse to sign a second vote for the same height. This is the slashing-protection database.

2.2 Surround voting (Casper FFG specific)

Casper FFG (Ethereum's finality gadget) has validators vote on source → target checkpoint pairs. A surround-vote is one where a later vote's range strictly contains an earlier vote's range:

Vote1: source A → target B Vote2: source C → target D

If C < A and D > B (the second "surrounds" the first), this is slashable. The fix: track every voted source/target pair; reject votes that would surround prior votes.

2.3 Equivocation in BFT (Tendermint, HotStuff)

Two pre-commits (the "I'm committed to this block" message in a BFT round) for the same height/round but different blocks. Same logic as double voting; the slashing-protection database must catch it.

3. The slashing-protection database

Every production validator runs one. Its job:

Before signing message M at height H, round R:
  if any prior signed message exists at (H, R) with different content:
    REFUSE to sign  → no slashing event
  else:
    sign M
    record M in database

The database must:

  • Persist across restarts
  • Survive software updates
  • Be backed up (so a fresh database doesn't fall behind reality)

Common implementations:

  • EIP-3076 format — standard for Ethereum
  • CometBFT priv_validator_state.json — Cosmos chains
  • Custom files — chain-specific formats

4. The remote signer integration

Most production validators run a remote signer that also enforces slashing-protection:

[Validator node]  --request to sign-->  [Remote Signer]
                                            |
                                            +- Check slashing-protection DB
                                            +- If safe: sign + record
                                            +- If unsafe: refuse

This gives defense in depth. The validator node might have a bug, but the remote signer's database is the final check. Even if the validator node tries to double-sign, the remote signer refuses.

Web3Signer (Ethereum) implements this. CometBFT validators have their own variant.

The DB and signer must commit in one atomic operation. If you sign, then try to record but the network fails, you've signed without recording — and on retry you might sign again. Atomicity = both succeed or neither. Network-remote DBs add a window where atomicity is broken.

5. Whistleblower watchers

The protocol enforces slashing only when somebody submits the proof. That's where watchers come in. Slashing is cryptographically provable — anyone with the two conflicting signatures can submit a slashing transaction. Most chains pay a small fraction of the slashed stake to the submitter as a whistleblower reward.

For Ethereum: ~1/512 of the slashed amount goes to whoever submitted the proof. For a major slashing of $1M, that's ~$2k — enough to incentivize watchers.

Watcher implementations:

  • Scan blockchain for attestations
  • Index by (validator, height, round)
  • Detect conflicts
  • Submit slashing tx

If you're building a watcher: it's open-source territory, similar to MEV searchers but for protocol-level violations.

6. The offline validator — inactivity penalty

If a validator is offline:

  • During normal operations: they miss rewards (small daily loss)
  • During finality issues (>1/3 offline): inactivity leak kicks in

Inactivity leak (the mechanism Ethereum uses to force a partitioned chain back to >2/3 online): every epoch, offline validators lose stake. The rate increases the longer finality remains delayed. The chain self-heals — eventually online validators are >2/3 and finality resumes, leaving offline ones with reduced stake.

This is the BFT-style chain's response to mass offline events. Instead of halting forever, the protocol slowly removes offline validators until quorum is achievable.

7. The Network Partition Risk

The classic disaster:

  1. Network partition splits validator set
  2. Each partition might think they're the majority
  3. Each might continue signing on different forks
  4. When partition heals: massive cross-fork slashing

Mitigations:

  • Liveness watchdog: detect partition (no recent blocks from peers); stop signing
  • Network heartbeat: verify connectivity to other validators before signing
  • Forced sync: refuse to sign until catching up with the network

A well-built validator has multiple of these checks. Compromised validator software (where these checks are disabled) is a real risk.

If it continues signing: it might be on a partition and producing blocks the rest of the network doesn't see. Eventually network heals, the validator's chain is the wrong one, double-sign equivalent → slashed.

If it stops signing: 30 minutes of inactivity (small penalty). When connection restores, sync up and resume.

Stop signing is correct. Lose a tiny inactivity penalty rather than risk slashing. The validator software should automatically detect this and fail-closed.

8. For my projects

Tempo validator if you operate

  • Slashing-protection DB on each validator node + remote signer
  • 30-minute heartbeat check; refuse to sign if connectivity lost
  • 2-region active-passive (transition only via consensus protocol)
  • Daily DB backup to S3 with versioning

Watcher opportunity

There may be a market for slashing watchers on Tempo (assuming it has slashing on launch). Building a watcher is ~few hundred lines of Rust + indexing — could be a small income stream.

9. Practice

  1. Compute: validator with 1000 ETH stake. Slashed by 5% via double-vote. How much do they lose?
  2. Identify: if validator A is online with 95% participation, validator B is offline. After 1 month, which has more stake?
  3. Write pseudo-code for slashing-protection logic
  4. Read Web3Signer slashing protection

10. Reading list

Final check: in one sentence, why is "stop signing if uncertain" the correct default for a validator? If your answer doesn't reference "slashing > inactivity penalty," re-read §7.

Pass criteria

  • Name the two paths to losing stake and the order of magnitude of each.
  • Distinguish double-vote, surround-vote, and BFT equivocation in one sentence each.
  • Explain why the slashing-protection DB must be local (atomicity with the signature).
  • Sketch the four DB durability requirements.
  • Explain why remote-signer + local DB is defence in depth, not duplication.
  • Recall the Ethereum whistleblower share and what behaviour it incentivises.
  • Walk the inactivity-leak self-healing mechanism.
  • State the fail-closed rule for network partitions and why it dominates.

Summary (3 lines)

  • Two paths to losing stake: slashing (catastrophic, cryptographically provable) and inactivity (gradual, small). All operational decisions reduce to picking the smaller of the two.
  • Slashing-protection DB is the standard defence — it must be local for atomicity, persistent, restart-safe, and backed up. Remote signer + DB stacks them for defence in depth.
  • Network partitions: fail-closed (stop signing) every time — the inactivity tax is always smaller than the slashing risk. Next lesson covers chain-wide coordinated upgrades.