pytest Is Not the Problem — Verification Is
pytest is excellent. Fixtures, parametrize, plugins, readable output — it's the best testing framework Python has. The problem isn't pytest. The problem is what most teams write in their pytest tests.
Most test suites verify execution, not correctness. The distinction is small but it matters a lot.
Execution vs correctness
An execution test checks that something ran and returned something reasonable:
def test_process_payment():
result = process_payment(amount=100, currency="USD")
assert result is not None
assert "transaction_id" in result
This passes. It will keep passing even if the function:
- Accepts
amount=0and processes it as a free payment - Accepts
currency="BANANA"and stores it in the database - Returns a
transaction_idthat's always the same string - Silently swallows a network error and returns a fake success
A correctness test checks the specific contract — each stated requirement individually:
def test_process_payment_rejects_zero_amount():
with pytest.raises(ValueError, match="amount must be positive"):
process_payment(amount=0, currency="USD")
def test_process_payment_rejects_invalid_currency():
with pytest.raises(ValueError):
process_payment(amount=100, currency="BANANA")
These tests say something. They'd fail if the contract were violated. The execution test doesn't.
Why teams write execution tests
It's faster. One test, covers the happy path, gets the coverage number up. The temptation is real — especially under deadline pressure.
The cognitive cost is also lower. You call the function, check the output isn't obviously wrong, move on. Writing correctness tests requires thinking through each failure mode, each boundary condition, each documented constraint. That takes time.
But the time saved writing execution tests is paid back in debugging production bugs.
The "would it catch a regression" test
Here's a useful question to ask of any test: if someone deleted the guard this test is supposed to cover, would this test fail?
Take a docstring that says "raises ValueError if amount is zero." If someone changed the code to silently accept zero:
- An execution test: still passes.
result is not Nonedoesn't care whether validation happened. - A correctness test with
pytest.raises(ValueError): fails immediately.
If the answer to "would this catch a regression" is "maybe not," the test is providing coverage theater. It increments a number without providing safety.
Closing the gap systematically
The gap between "we have tests" and "our tests catch bugs" is the correctness gap. Closing it manually on a large codebase is tedious but tractable with the right tool.
Quell reads your docstrings and Pydantic models, extracts each stated requirement, and checks whether any test would catch a violation. For every gap:
- It generates a test targeting that specific constraint
- Runs it against the existing code — it must pass
- Injects a minimal violation (changes the guard condition just enough to break the requirement)
- Runs the test again — it must fail
This is a verification step that most test generation tools skip entirely. Generated tests that "pass" but don't catch violations are worse than no tests — they give false confidence.
pip install quelltest
quell find src/ --fix
The output is verified tests injected directly into your test files. Not skeletons to fill in — complete tests that prove each constraint is enforced.
pytest remains excellent
None of this is a criticism of pytest. pytest is the right tool. The upgrade is in what you ask pytest to verify — not just that functions run, but that documented contracts are enforced.
The goal is a test suite where every test would fail if its corresponding requirement were violated. That's a test suite worth trusting.
Quell on PyPI — run quell find src/ to see the correctness gap in your codebase.