Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Design decisions

Engine as pure state machine

The engine has no I/O. Every effect is an Action the host dispatches. This makes the engine:

  • Deterministic. Same inputs, same outputs.
  • Fast to test. The sim runs thousands of chaos steps per second.
  • Host-agnostic. Write your own host for gRPC, QUIC, or in-process testing.

A single step() method

Every event becomes an Event, every effect becomes an Action. No handle_request_vote, handle_append_entries, etc. That keeps the public surface small and the causal ordering of Actions obvious.

Async apply as a separate task

User apply() calls can be slow. If they ran on the driver task, heartbeats would stall. Instead, a bounded mpsc channel decouples commit from apply; the state machine lives on its own task. Backpressure is natural: if apply falls behind, the channel fills, the driver blocks on send, and replication slows. That’s the correct behavior when the state machine is the bottleneck.

Snapshots opaque through the stack

The library never looks inside a snapshot. Compress, encrypt, or hash-chain inside StateMachine::snapshot; undo inside restore. The engine, disk, and wire format all treat the bytes as opaque. Keeps the library dependency-free in that axis and gives users the hooks they want.

ReadIndex via a closure

Node::read_linearizable takes an FnOnce(&S) -> R that ships to the apply task and runs in FIFO order with applies. The read observes post-apply state, not mid-apply, and the StateMachine trait doesn’t have to grow a read method.

Causal action ordering

Every Action that must reach stable storage before a subsequent Send appears earlier in the vector. That’s Figure 2’s “respond to RPCs only after updating stable storage”, enforced at the contract level so hosts can’t accidentally violate it.

No OpenTelemetry dependency

The library emits tracing spans and events with stable targets and field names. OTel is bolted on by the user’s main. See Observability. OTel crates churn too fast to pin in a library.

§4.3 single-server over Joint Consensus

Membership changes are §4.3 single-server. Simpler than Joint Consensus, well-understood edge cases, fits the one-at-a-time add_peer / remove_peer API. Joint Consensus may land post-0.1 if operators need faster reconfiguration.