Architecture¶

How Reel is put together. This page is written for someone who has never seen the repo.

The one-paragraph version¶

Reel is a local HTTP proxy. Your application's LLM SDK (OpenAI, Anthropic, Gemini, raw httpx, LangChain, etc.) is configured to send requests to http://127.0.0.1:7878 instead of api.openai.com. In record mode Reel forwards the request to the real provider and writes the full request/response exchange — including timed SSE chunks — into an append-only JSONL cassette. In replay mode Reel matches incoming requests against the cassette and serves the cached response without ever touching the network. Auto mode does the right thing per-request: replay if there's a match, record if there isn't.

High-level diagram¶

┌────────────┐   HTTP / SSE    ┌──────────────────────────┐   HTTP / SSE   ┌──────────────┐
│ Your app   │ ───────────────►│           Reel           │ ──────────────►│  OpenAI /    │
│ (any lang) │                 │     (local proxy :7878)  │                │  Anthropic / │
│            │ ◄───────────────│                          │ ◄──────────────│  Gemini      │
└────────────┘                 └──────────────────────────┘                └──────────────┘
                                          │
                          ┌───────────────┼────────────────┐
                          ▼               ▼                ▼
                   ┌────────────┐  ┌────────────┐   ┌────────────┐
                   │  adapters  │  │  cassette  │   │   redact   │
                   │ (per prov.)│  │ (JSONL r/w)│   │ (secrets,  │
                   │            │  │            │   │  PII)      │
                   └────────────┘  └────────────┘   └────────────┘
                          │               │                │
                          └──────────┬────┴────────────────┘
                                     ▼
                                ┌─────────┐
                                │   CLI   │  reel record | replay | auto | inspect | cost | diff
                                └─────────┘

Modules¶

Module	Responsibility
`proxy/`	Async HTTP server, request/response forwarding, SSE streaming, mode dispatch (`record` / `replay` / `auto`)
`adapters/`	Provider-specific request fingerprinting and response normalization: `openai.py`, `anthropic.py`, `gemini.py`
`cassette/`	JSONL read/write, schema, matching engine (`exact` / `normalized` / `ignore-fields` / `fuzzy`)
`redact/`	Secret + PII scrubbing on capture and post-hoc
`cli/`	The `typer`-based CLI
`sdk/`	The `@cassette` decorator + pytest plugin

Three operating modes¶

`record`¶

Every request is forwarded upstream. The full exchange — request, response, and (if streaming) timed chunks — is appended to the cassette. Use this for first-pass capture and refresh.

`replay`¶

Requests are matched against the cassette and served locally. A cache miss returns HTTP 404 — a loud failure beats a silent regression. Use this in CI.

`auto`¶

Replay if there's a match; record if there isn't. The default for local dev loops.

Cassette format¶

One call per line. Plain JSONL.

{
  "id": "req_01",
  "ts": "2026-05-15T10:23:11Z",
  "provider": "openai",
  "endpoint": "/v1/chat/completions",
  "request": {
    "model": "gpt-5",
    "messages": [...],
    "stream": true,
    "_hash": "sha256:..."
  },
  "response": {
    "status": 200,
    "stream_chunks": [
      {"delta": "Hello", "t_offset_ms": 142},
      {"delta": " world", "t_offset_ms": 198}
    ],
    "final": { ... }
  },
  "meta": {
    "tokens_in": 412,
    "tokens_out": 89,
    "cost_usd": 0.0021,
    "ttft_ms": 142,
    "total_ms": 890
  }
}

JSONL was chosen because cassettes need to be:

Diff-friendly in PRs — line-level reviews
Greppable without parsing JSON
Append-safe — no rewrite-the-world on record
Splittable — large cassettes can be sharded by test name

A first line beginning with {"_meta": {...}} is reserved for per-cassette config (e.g. match-mode override).

Streaming fidelity¶

For SSE responses:

During capture, every data: ...\n\n frame is timestamped relative to the first byte of the response.
During replay, frames are emitted with asyncio.sleep between them so TTFT and inter-chunk gaps mirror the original.
Three timing modes are exposed on the CLI (--timing): realtime (default), fast (no sleeps), slow=<N> (chaos testing).

This matters because behavior under load — timeouts, partial buffering, race conditions in agent loops — depends on TTFT and inter-chunk gaps, not just the final text.

Request matching¶

Different test contexts need different strictness:

Mode	Behavior
`exact`	Byte-for-byte request equality
`normalized` (default)	Whitespace + JSON-key-order normalized before comparison
`ignore-fields`	User specifies fields to skip (e.g. `request_id`, timestamps)
`fuzzy`	Embedding-similarity on prompt text (optional `reel[fuzzy]` install)

Per-cassette config lives in the optional first-line _meta entry, so once a cassette picks a mode every replay honors it.

Provider adapters¶

Each adapter implements a small interface:

Fingerprint a request — compute a stable hash over the bytes that matter (model, messages, tools, stream flag) while ignoring fields that drift between identical-in-intent calls (whitespace, key order).
Identify a response shape — non-stream vs. SSE vs. server-sent JSON line stream (Gemini).
Surface a provider tag on captured entries.

This is where the per-provider quirks live. Everything outside adapters/ is provider-agnostic.

Redaction¶

Two layers run on every captured response:

Secret scrub — regex patterns for OpenAI / Anthropic / Google / GitHub / AWS / Slack key shapes and Bearer tokens.
PII scrub — emails and phone numbers (default on; opt out with REEL_REDACT_PII=0).

Request headers are never serialized. API keys live there, and Reel drops them by design.

The reel redact CLI re-runs both passes on an existing cassette. The repo-local pre-commit-cassette-check.py hook refuses to commit any *.jsonl whose staged content still matches a secret pattern.

Non-goals¶

Not an eval framework. Reel records facts; it doesn't grade outputs.
Not an observability platform. Reel works against local JSONL. Ship cassettes to your store of choice for dashboards.
Not inference. Reel never generates tokens itself.
No telemetry, ever.

Source¶

See ARCHITECTURE.md in the repo for the canonical version of this document.