Reel — VCR for LLM APIs¶
Record real calls to OpenAI, Anthropic, and Gemini once, then replay them deterministically in tests, CI, and your local dev loop — for free, forever. No mocks. No SDK changes. No real network in CI. No surprise bills.
Reel is a local HTTP proxy that sits between your code and the LLM provider. On first call it forwards upstream and captures the wire-level request/response. On every call after, it replays from disk in ~3 ms. Cassettes are plain JSONL — you can grep them, jq them, git diff them in PRs. Secrets and PII are scrubbed at capture time.
Install and record your first cassette in 5 minutes GitHub repo
See it in action — Claude Opus demo¶
The same claude -p job run three times against three real markdown docs. First run records and pays real Opus tokens. Runs 2 and 3 serve every call from disk in 2-3 ms and pay nothing.
Per-call latency in this exact run, from the proxy log:
| Run | Call 1 | Call 2 | Call 3 |
|---|---|---|---|
| 1 (record) | 1865 ms | 1708 ms | 2183 ms |
| 2 (replay) | 2 ms | 2 ms | 3 ms |
| 3 (replay) | 2 ms | 2 ms | 3 ms |
Output bytes are identical across runs. Cassette stays at 3 entries — replay never re-records. Reproduce locally with the bundled opus-demo.sh script.
Why this exists¶
LLM tests are flaky and expensive. A pytest suite that calls OpenAI().chat.completions.create(...) in 40 tests bills real money on every CI run — multiply by every PR push, every retry, every contributor. With Reel: record once locally, commit the cassette, run CI with pytest --reel-mode replay for $0.
Production bugs in LLM responses are impossible to reproduce. A user reports a weird answer; you have logs but no way to replay the exact call from a different machine. Reel cassettes are portable byte-for-byte recordings of what the upstream actually returned.
Prompt iteration burns tokens on every tweak. A two-hour prompt-engineering session might re-spend the same prompt 100 times. Reel makes each unique prompt cost real money exactly once.
AI coding agents are slow. Aider, opencode, Claude Code, Cursor, Codex CLI — most of them send the same file context, tool definitions, and embeddings to the LLM many times during a session. Reel caches the deterministic parts.
30-second demo¶
# 1. Install
pip install reel-vcr
# 2. Start Reel in auto mode (records first time, replays after)
reel auto --cassette tests/cassettes/quickstart.jsonl &
# 3. Point your SDK at it
export OPENAI_BASE_URL=http://127.0.0.1:7878/v1
export OPENAI_API_KEY=sk-... # real key — Reel forwards it on first run only
# 4. Run your code. First run records. Every run after replays.
python my_app.py
That's it. The cassette is plain JSONL:
{"id":"req_01","provider":"openai","endpoint":"/v1/chat/completions",
"request":{"model":"gpt-5","messages":[...]},
"response":{"status":200,"body":{...}}}
Diff cassettes in PRs. Grep them. Share them. They're regular files.
How Reel compares¶
| Tool | Layer | Non-Python clients? | SSE streaming? | Survives SDK transport swaps? |
|---|---|---|---|---|
| Reel | HTTP proxy | ✅ Yes (any language) | ✅ With timing fidelity | ✅ Yes — transport-agnostic |
| VCR.py / pytest-recording / pytest-vcr | Monkey-patches urllib3 / requests |
❌ Python only | Partial | ❌ Breaks when SDK changes transport |
| respx / pytest-httpx | Mocks httpx clients |
❌ Python only | Limited | ❌ Coupled to httpx |
| llm-test-harness | Wraps the SDK in Python (harness.wrap(...)) — bundles eval scoring |
❌ Python only | Limited | ❌ Coupled to specific SDK clients |
| agent-vcr | Records JSON-RPC for MCP servers (different layer entirely) | n/a — MCP, not LLM HTTP | n/a | n/a |
| WireMock / MockServer | HTTP proxy (Java) | ✅ Yes | Manual fixtures | ✅ Generic, not LLM-aware |
| Hand-rolled mocks | Inside your code | ❌ | Whatever you write | ❌ Whenever you forget to update them |
The trade-off: VCR.py is easier to drop into a single test in a single file. Reel is easier to use across a whole project and any client that respects the standard OPENAI_BASE_URL / ANTHROPIC_BASE_URL env-var convention — including non-Python clients like Cursor and Aider.
What works today¶
- OpenAI, Anthropic, Gemini HTTP APIs with path-based routing on a single proxy port
- Any OpenAI-compatible upstream: Ollama, NVIDIA NIM, vLLM, LM Studio, Groq, Together, OpenRouter
- Three modes:
record,replay,auto - SSE streaming captured chunk-by-chunk with millisecond timing fidelity (
--timing realtime | fast | slow=N) - Smart matching:
exact,normalized,ignore-fields,fuzzy(sentence-transformers cosine similarity) - Capture-time redaction of API keys, Bearer tokens, AWS keys, GitHub PATs, emails, US phone numbers
- First-class pytest plugin —
pytest --reel-mode replayfor zero-network CI - Analytics CLI —
reel inspect / cost / diff / stats / doctor - Local web inspector —
reel uifor browsing cassettes in a browser (Starlette + HTMX, no JS build step) - Pre-commit hook that refuses to commit cassettes still containing detectable secret patterns
Get started¶
Install and record your first cassette in 5 minutes
Or jump to a specific topic:
- Add Reel to a pytest suite in 60 seconds
- Use replay mode in CI
- Run all three providers off one proxy
- Keep secrets out of committed cassettes
- CLI reference
- Architecture overview
- Roadmap
Frequently asked¶
Is this just VCR.py with extra steps? No. VCR.py monkey-patches Python HTTP clients. When OpenAI or Anthropic ship a new SDK with a different transport, VCR.py silently breaks. Reel is an HTTP proxy — it sees the actual bytes on the wire, language-agnostic, SDK-agnostic.
How is Reel different from llm-test-harness and agent-vcr?
llm-test-harness wraps the SDK at the Python client level and bundles eval scoring — same Python-only / SDK-coupled shape as VCR.py. Reel sits one layer below as a language-agnostic HTTP proxy, and stays out of eval/scoring on purpose. agent-vcr records JSON-RPC for MCP servers (a different layer entirely) — it's complementary to Reel, not competitive: cassette your MCP tool servers with agent-vcr, cassette the LLM calls underneath with Reel.
Will it work with Claude Code, Aider, opencode, Cursor, Codex CLI?
Yes. All of them respect the standard OPENAI_API_BASE / ANTHROPIC_BASE_URL env-var convention. Cursor needs one settings-file line. Verified live with Claude Code, opencode, and Aider.
What about API keys in committed cassettes?
Reel never captures request headers — that's where keys live. Response bodies are scanned for sk-*, sk-ant-*, AIza*, ghp_*, AKIA*, and Bearer-token patterns; matches are redacted before write. A bundled pre-commit hook refuses to commit cassettes still containing detectable secrets.
Does it work with local models — Ollama, vLLM, LM Studio? Yes. Anything OpenAI-compatible. Even with local models the win is real: replay is ~3 ms while local inference is 200-2000 ms.
Why pip install reel-vcr but import reel?
Bare reel on PyPI was taken by an unrelated async-subprocess library. Same convention as Pillow (pip install pillow → import PIL).
Is there a Reel Cloud?
No, and no plan to build one until there's clear pull. Runs entirely on 127.0.0.1, zero telemetry, no phone-home, Apache 2.0.
More questions: GitHub Discussions · Open an issue