Quell reads your docstrings, Pydantic models, and type annotations, extracts every testable requirement, finds which ones have no test, generates pytest tests via a rule engine, verifies each test through a 5-gate pipeline, and writes only proven tests to disk.

Does Quell require an LLM API key?

The rule engine runs entirely in-process — no source code is ever transmitted. ~75% of edge cases are handled with no network call and no API key. LLM fallback is opt-in and only sends the function signature, never the full body.

What is the 5-gate pipeline?

Every generated test must pass: Gate 1 (AST valid Python), Gate 2 (not already in a test file), Gate 3 (no shell calls or file writes), Gate 4 (passes against original code), Gate 5 (fails when the requirement is violated). Only gate-5-verified tests are written to disk.

What is the Production Readiness Score (PRS)?

PRS = (WRITTEN × 1.0 + SCAFFOLDED × 0.5) / total_requirements × 100. Tiers: 80-100 Production Ready, 60-79 Review Needed, 0-59 Needs Work.

How is Quell different from GitHub Copilot or Qodo for test generation?

Quell reads specifications that already exist in your code — it does not generate tests from scratch. It finds requirements already documented in your docstrings, Pydantic models, and type annotations that have no test. The 5-gate pipeline, especially Gate 5 (violation injection), verifies each test actually catches the bug it claims to catch. This verification step is not present in Copilot, Qodo, or Hypothesis.

Can Quell be used in CI pipelines?

Yes. Run quell ci src/ --threshold 80 to fail CI if PRS falls below 80. Set prs_threshold in pyproject.toml under [tool.quell]. Works with GitHub Actions, GitLab CI, and any system that checks exit codes.

What Is Mutation Testing — and Why Your Team Keeps Skipping It

Here's a question worth sitting with: how do you know your tests are any good?

Coverage tells you which lines ran. Type checkers tell you whether types match. Linters catch style issues. But none of them answer the core question: if the code were wrong, would any test catch it?

Mutation testing does. And most teams skip it entirely.

The idea in one paragraph

A mutation tester takes your source code, makes a tiny deliberate change — flips a > to >=, swaps True for False, deletes a return statement — and then runs your test suite. If no test fails, the mutation "survived." A survived mutation means your tests didn't notice that the code was wrong. That's a gap.

Do this a few thousand times and you get a mutation score: the percentage of mutations your tests caught. A codebase with 95% line coverage might have a mutation score of 40%. Those surviving 60% are requirements with no enforcement.

Why teams skip it

It's slow. Running your full test suite once for each mutation means 1,000 mutations = 1,000 test runs. On any real codebase that's minutes to hours.

Results are noisy. Not every survived mutation is a real problem. Mutations in dead code, logging-only paths, or equivalent-but-different logic are false positives that eat attention.

It's hard to act on. Even if you get a report, knowing "test_payment_function has gaps" doesn't tell you what to write.

These are real objections. Mutation testing in the traditional sense requires patience and a dedicated engineer to interpret results. Most teams decide it's not worth the investment.

What you actually want from mutation testing

The useful insight from mutation testing isn't the score. It's the specific requirements that have no test catching their violations.

That's the thing worth extracting:

Which requirements does my code assert?
For each one, does any test actually verify it?
If I violated the requirement, would a test catch it?

This is a smaller, answerable question. You don't need to mutate everything — just the requirements you've stated explicitly in your docstrings, type annotations, and Pydantic models.

How Quell approaches this

Quell reads your docstrings and models, extracts each requirement as a structured constraint, and for each uncovered one:

Generates a test targeting that specific constraint
Runs it on the original code — it must pass
Injects a minimal violation of that constraint — just enough to break it
Runs the test again — it must fail

Only tests that pass both rounds are written to disk. This gives you the core benefit of mutation testing — proof that the test actually catches the bug — without scanning the whole codebase or waiting for thousands of runs.

quell find src/payments.py

  process_payment  MUST_RAISE   ValueError: amount <= 0     ✗ no test
  process_payment  MUST_RAISE   ValueError: bad currency    ✗ no test
  PaymentRequest   ENUM_VALID   currency in USD|EUR|GBP     ✗ no test

  → 3 gaps found. Run with --fix to generate and verify tests.

No mutation framework needed. No waiting. No noise from dead code.

When to use traditional mutation testing

Traditional tools like mutmut or Stryker are still worth running on critical modules — payment processing, authentication, data validation — once a quarter or before a major release. They'll catch things Quell won't, like logic errors in complex conditional trees.

But for the day-to-day question of "are my documented requirements tested," Quell is faster and more actionable. Use both in their appropriate lanes.

The mental shift that matters

Coverage answers "did the code run?" Mutation testing answers "would the code's failure be noticed?"

The second question is the one that matters in production. Start asking it.

Install Quell — no API key, no config. Run quell find src/ and see what your tests are missing.