How It Works
Quell's pipeline has four stages: read specs → check coverage → synthesize tests → verify and write.
1. Spec readers
Quell reads every spec source it can find in your codebase:
- Docstrings —
raises,returns, and constraint language in function docstrings - Pydantic models — field validators,
Field(ge=0),model_validator - Type annotations —
Optional,Literal,Union, guard clauses - Bug descriptions — natural language via
quell reproduce
Every source produces the same unified Requirement model. No different paths per source.
2. Coverage checker
Quell AST-scans your existing test files and marks each requirement as covered or uncovered. When uncertain, it marks uncovered — a duplicate test is better than a missed gap.
3. Test synthesis
For uncovered requirements, Quell tries the rule engine first:
- Rule engine — fast, deterministic, handles ~75% of cases (range checks, null guards, type errors, enum violations)
- LLM engine — fallback for complex cases only; only called when the rule engine can't handle it
4. The 5-gate verification pipeline (THE MOAT)
This is what separates Quell from coverage tools. Every generated test must pass all 5 gates:
| Gate | Name | Description |
|---|---|---|
| 1 | AST Valid | Parses to valid Python AST before any execution |
| 2 | Original | Test not already present in any test file |
| 3 | Secure | No shell calls, no filesystem writes, no network |
| 4 | Passes Correct | Test must pass against original (correct) code |
| 5 | Fails Violated | Test must fail when the requirement is violated (guard removed) |
Gates 4 and 5 are the moat. Gate 5 injects a violation into the source (removes the guard), runs the test, and expects it to fail. If it doesn't fail, the test doesn't catch the bug — and Quell won't write it.
Source files are always restored in a finally block after gate 5. No side effects.
5. The three-bucket output
After verification, each requirement lands in one bucket:
- ✓ WRITTEN — All 5 gates passed. Test written to disk via libcst (AST-safe injection, never string pasting).
- ~ SCAFFOLDED — Gates 1–3 passed but gate 4 or 5 couldn't be verified. Quell writes a stub with a clear TODO comment.
- ✗ FLAGGED — No automatable test path. Side effects, non-determinism, or external services. Documented with exact reason.
Nothing is silently dropped. Every requirement is accounted for.
The Production Readiness Score (PRS)
PRS measures the ratio of WRITTEN tests to total requirements, weighted by confidence. It's not coverage — it's a measure of whether your edge cases are actually tested.
PRS = (WRITTEN × 1.0 + SCAFFOLDED × 0.5) / total_requirements × 100
A score of 80+ means production ready. Below 60 means your edge cases need attention.