Quell reads your docstrings, Pydantic models, and type annotations, extracts every testable requirement, finds which ones have no test, generates pytest tests via a rule engine, verifies each test through a 5-gate pipeline, and writes only proven tests to disk.

Does Quell require an LLM API key?

The rule engine runs entirely in-process — no source code is ever transmitted. ~75% of edge cases are handled with no network call and no API key. LLM fallback is opt-in and only sends the function signature, never the full body.

What is the 5-gate pipeline?

Every generated test must pass: Gate 1 (AST valid Python), Gate 2 (not already in a test file), Gate 3 (no shell calls or file writes), Gate 4 (passes against original code), Gate 5 (fails when the requirement is violated). Only gate-5-verified tests are written to disk.

What is the Production Readiness Score (PRS)?

PRS = (WRITTEN × 1.0 + SCAFFOLDED × 0.5) / total_requirements × 100. Tiers: 80-100 Production Ready, 60-79 Review Needed, 0-59 Needs Work.

How is Quell different from GitHub Copilot or Qodo for test generation?

Quell reads specifications that already exist in your code — it does not generate tests from scratch. It finds requirements already documented in your docstrings, Pydantic models, and type annotations that have no test. The 5-gate pipeline, especially Gate 5 (violation injection), verifies each test actually catches the bug it claims to catch. This verification step is not present in Copilot, Qodo, or Hypothesis.

Can Quell be used in CI pipelines?

Yes. Run quell ci src/ --threshold 80 to fail CI if PRS falls below 80. Set prs_threshold in pyproject.toml under [tool.quell]. Works with GitHub Actions, GitLab CI, and any system that checks exit codes.

pytest Is Not the Problem — Verification Is

pytest is excellent. Fixtures, parametrize, plugins, readable output — it's the best testing framework Python has. The problem isn't pytest. The problem is what most teams write in their pytest tests.

Most test suites verify execution, not correctness. The distinction is small but it matters a lot.

Execution vs correctness

An execution test checks that something ran and returned something reasonable:

def test_process_payment():
    result = process_payment(amount=100, currency="USD")
    assert result is not None
    assert "transaction_id" in result

This passes. It will keep passing even if the function:

Accepts amount=0 and processes it as a free payment
Accepts currency="BANANA" and stores it in the database
Returns a transaction_id that's always the same string
Silently swallows a network error and returns a fake success

A correctness test checks the specific contract — each stated requirement individually:

def test_process_payment_rejects_zero_amount():
    with pytest.raises(ValueError, match="amount must be positive"):
        process_payment(amount=0, currency="USD")

def test_process_payment_rejects_invalid_currency():
    with pytest.raises(ValueError):
        process_payment(amount=100, currency="BANANA")

These tests say something. They'd fail if the contract were violated. The execution test doesn't.

Why teams write execution tests

It's faster. One test, covers the happy path, gets the coverage number up. The temptation is real — especially under deadline pressure.

The cognitive cost is also lower. You call the function, check the output isn't obviously wrong, move on. Writing correctness tests requires thinking through each failure mode, each boundary condition, each documented constraint. That takes time.

But the time saved writing execution tests is paid back in debugging production bugs.

The "would it catch a regression" test

Here's a useful question to ask of any test: if someone deleted the guard this test is supposed to cover, would this test fail?

Take a docstring that says "raises ValueError if amount is zero." If someone changed the code to silently accept zero:

An execution test: still passes. result is not None doesn't care whether validation happened.
A correctness test with pytest.raises(ValueError): fails immediately.

If the answer to "would this catch a regression" is "maybe not," the test is providing coverage theater. It increments a number without providing safety.

Closing the gap systematically

The gap between "we have tests" and "our tests catch bugs" is the correctness gap. Closing it manually on a large codebase is tedious but tractable with the right tool.

Quell reads your docstrings and Pydantic models, extracts each stated requirement, and checks whether any test would catch a violation. For every gap:

It generates a test targeting that specific constraint
Runs it against the existing code — it must pass
Injects a minimal violation (changes the guard condition just enough to break the requirement)
Runs the test again — it must fail

This is a verification step that most test generation tools skip entirely. Generated tests that "pass" but don't catch violations are worse than no tests — they give false confidence.

pip install quelltest
quell find src/ --fix

The output is verified tests injected directly into your test files. Not skeletons to fill in — complete tests that prove each constraint is enforced.

pytest remains excellent

None of this is a criticism of pytest. pytest is the right tool. The upgrade is in what you ask pytest to verify — not just that functions run, but that documented contracts are enforced.

The goal is a test suite where every test would fail if its corresponding requirement were violated. That's a test suite worth trusting.

Quell on PyPI — run quell find src/ to see the correctness gap in your codebase.