Quell reads your docstrings, Pydantic models, and type annotations, extracts every testable requirement, finds which ones have no test, generates pytest tests via a rule engine, verifies each test through a 5-gate pipeline, and writes only proven tests to disk.

Does Quell require an LLM API key?

The rule engine runs entirely in-process — no source code is ever transmitted. ~75% of edge cases are handled with no network call and no API key. LLM fallback is opt-in and only sends the function signature, never the full body.

What is the 5-gate pipeline?

Every generated test must pass: Gate 1 (AST valid Python), Gate 2 (not already in a test file), Gate 3 (no shell calls or file writes), Gate 4 (passes against original code), Gate 5 (fails when the requirement is violated). Only gate-5-verified tests are written to disk.

What is the Production Readiness Score (PRS)?

PRS = (WRITTEN × 1.0 + SCAFFOLDED × 0.5) / total_requirements × 100. Tiers: 80-100 Production Ready, 60-79 Review Needed, 0-59 Needs Work.

How is Quell different from GitHub Copilot or Qodo for test generation?

Quell reads specifications that already exist in your code — it does not generate tests from scratch. It finds requirements already documented in your docstrings, Pydantic models, and type annotations that have no test. The 5-gate pipeline, especially Gate 5 (violation injection), verifies each test actually catches the bug it claims to catch. This verification step is not present in Copilot, Qodo, or Hypothesis.

Can Quell be used in CI pipelines?

Yes. Run quell ci src/ --threshold 80 to fail CI if PRS falls below 80. Set prs_threshold in pyproject.toml under [tool.quell]. Works with GitHub Actions, GitLab CI, and any system that checks exit codes.

How It Works

Quell's pipeline has four stages: read specs → check coverage → synthesize tests → verify and write.

1. Spec readers

Quell reads every spec source it can find in your codebase:

Docstrings — raises, returns, and constraint language in function docstrings
Pydantic models — field validators, Field(ge=0), model_validator
Type annotations — Optional, Literal, Union, guard clauses
Bug descriptions — natural language via quell reproduce

Every source produces the same unified Requirement model. No different paths per source.

2. Coverage checker

Quell AST-scans your existing test files and marks each requirement as covered or uncovered. When uncertain, it marks uncovered — a duplicate test is better than a missed gap.

3. Test synthesis

For uncovered requirements, Quell tries the rule engine first:

Rule engine — fast, deterministic, handles ~75% of cases (range checks, null guards, type errors, enum violations)
LLM engine — fallback for complex cases only; only called when the rule engine can't handle it

4. The 5-gate verification pipeline (THE MOAT)

This is what separates Quell from coverage tools. Every generated test must pass all 5 gates:

Gate	Name	Description
1	AST Valid	Parses to valid Python AST before any execution
2	Original	Test not already present in any test file
3	Secure	No shell calls, no filesystem writes, no network
4	Passes Correct	Test must pass against original (correct) code
5	Fails Violated	Test must fail when the requirement is violated (guard removed)

Gates 4 and 5 are the moat. Gate 5 injects a violation into the source (removes the guard), runs the test, and expects it to fail. If it doesn't fail, the test doesn't catch the bug — and Quell won't write it.

Source files are always restored in a finally block after gate 5. No side effects.

5. The three-bucket output

After verification, each requirement lands in one bucket:

✓ WRITTEN — All 5 gates passed. Test written to disk via libcst (AST-safe injection, never string pasting).
~ SCAFFOLDED — Gates 1–3 passed but gate 4 or 5 couldn't be verified. Quell writes a stub with a clear TODO comment.
✗ FLAGGED — No automatable test path. Side effects, non-determinism, or external services. Documented with exact reason.

Nothing is silently dropped. Every requirement is accounted for.

The Production Readiness Score (PRS)

PRS measures the ratio of WRITTEN tests to total requirements, weighted by confidence. It's not coverage — it's a measure of whether your edge cases are actually tested.

PRS = (WRITTEN × 1.0 + SCAFFOLDED × 0.5) / total_requirements × 100

A score of 80+ means production ready. Below 60 means your edge cases need attention.