Lesson 3 — Tokio runtime internals
Question
Reth runs on Tokio. Understanding Tokio's internals = understanding Reth's parallelism. Work-stealing scheduler + reactors.
Principle (minimum model)
- Tokio runtime structure. N worker threads (CPU-bound) + 1 reactor thread (I/O-bound). Default to CPUs.
- Work-stealing scheduler. Each worker has a queue; idle workers steal from busy ones. Balances load automatically.
- Async tasks vs blocking. Async = green-thread on worker pool; blocking =
tokio::task::spawn_blockingon separate thread pool. Don't mix. - Reth conventions. Per-component async; CPU-bound (revm exec) wrapped in
spawn_blocking. Pattern transfers. - Reactor. Polls epoll/kqueue for I/O readiness; wakes appropriate task. Single thread; high throughput.
- Pitfalls. Holding std Mutex across
.await= deadlock. Usetokio::sync::Mutexfor async-safe. - Performance. Tokio's scheduler overhead is ~1 µs per task switch. Compare to OS threads (~10 µs). Wins on high-task-count.
Worked example + steps
Tokio runtime internals
You've been writing #[tokio::main] and sprinkling .await over every async call. Reth has 200+ of those scattered through its codebase, and at peak load it handles thousands of concurrent peer connections plus dozens of background tasks on 8 worker threads. No magic; just a state machine the compiler wrote for you, a work-stealing scheduler, and an epoll loop. This lesson is what's underneath the .await.
1. The runtime stack
Tokio is composed of:
+--------------------+
| Your async code | ← futures
+--------------------+
| Executor | ← polls futures to completion
| (work-stealing) |
+--------------------+
| I/O | ← epoll/kqueue/io_uring
+--------------------+
When you write async fn, the compiler generates a state machine that implements the Future trait. The executor's job is to call poll() on that state machine until it returns Poll::Ready(value).
2. Work-stealing in 60 seconds
The problem: 8 worker threads, thousands of tasks. How do you distribute them without all 8 threads fighting over one shared queue? Tokio's answer: give each worker its own local queue (cheap, no contention), plus a fallback global queue. When a worker runs dry, it steals tasks from a busy neighbor's queue.
Worker A: [task1, task2, task3, task4] ← busy
Worker B: [] ← idle, steals from A
Worker A: [task1, task2]
Worker B: [task3, task4]
This avoids contention on a global mutex while still balancing load.
3. Spawning vs blocking
// Concurrent: spawn onto the runtime
let h1 = tokio::spawn(async { fetch().await });
let h2 = tokio::spawn(async { fetch().await });
let (r1, r2) = (h1.await?, h2.await?);
// CPU-heavy work: keep it OFF the async workers
tokio::task::spawn_blocking(|| {
expensive_sync_calc() // runs on a separate threadpool
}).await?
Rule: never call CPU-bound code in an async context without spawn_blocking. You'll starve the runtime and the whole node grinds.
4. Channels — picking the right one
| Channel | Use |
|---|---|
tokio::sync::mpsc | many producers, one consumer |
tokio::sync::broadcast | one producer, many consumers (e.g., chain events) |
tokio::sync::watch | latest-value broadcast (e.g., latest block) |
tokio::sync::oneshot | a single value, request-response |
ExEx uses broadcast for chain notifications because every ExEx wants every event.
5. Custom executors / Future polling
Eventually you'll want to poll a Future manually:
use std::pin::Pin;
use std::task::{Context, Poll, Waker};
use std::future::Future;
let mut fut = Box::pin(my_async_fn());
let waker = Waker::noop();
let mut cx = Context::from_waker(&waker);
match fut.as_mut().poll(&mut cx) {
Poll::Ready(v) => /* done */,
Poll::Pending => /* not yet — re-poll later when Waker is signaled */,
}
This is the foundation of writing your own Reth-internal scheduler — useful for, say, batching MEV simulations.
6. How Reth uses Tokio in production
Reth doesn't expose Tokio directly — it wraps it in a TaskExecutor that adds panic supervision. From crates/tasks/src/runtime.rs:
pub fn spawn_task<F>(&self, fut: F) -> JoinHandle<()>
where
F: Future<Output = ()> + Send + 'static,
pub fn spawn_critical_task<F>(&self, name: &'static str, fut: F) -> JoinHandle<()>
where
F: Future<Output = ()> + Send + 'static,
Two flavors:
spawn_task— fire and forget. If it panics, the panic is silently lost (Tokio default).spawn_critical_task— registered with a name; if it panics, aTaskManagerchannel fires and the whole node shuts down with the task's name in the log.
This is real production discipline: you don't want a silently-dead background task to leave your node running in a degraded state. Critical tasks fail loudly.
The TaskExecutor = Runtime alias lets you pass it through stage code without dragging in raw Tokio types — clean abstraction with the safety net underneath.
7. Reading list
tokio/tokio/src/runtime/scheduler/multi_thread_alt— the modern multi-thread schedulerreth/crates/tasks/src/runtime.rs— Reth's task supervisor wrapping Tokio
Final check: in one sentence, what's the difference between
async fnandfnas Rust types? If your answer is "one returns a Future and one doesn't," go deeper — what is a Future, structurally? The lesson isn't done with you until "Tokio is magic" becomes "Tokio polls compiler-generated state machines on a work-stealing scheduler."
Summary (3 lines)
- Tokio = N work-stealing workers + 1 reactor. Worker = async tasks; reactor = epoll/kqueue I/O.
- CPU-bound code →
spawn_blocking; async → worker pool. Std Mutex across await = deadlock. - Reth conventions: per-component async, revm exec wrapped in spawn_blocking. ~1 µs task switch.