All posts
·3 min read·Shashank Bindal

pytest Is Not the Problem — Verification Is

pytest is an excellent framework. The gap isn't the tool — it's that most test suites verify execution, not correctness. Here's the distinction and how to close it.

pytest Is Not the Problem — Verification Is

pytest is excellent. Fixtures, parametrize, plugins, readable output — it's the best testing framework Python has. The problem isn't pytest. The problem is what most teams write in their pytest tests.

Most test suites verify execution, not correctness. The distinction is small but it matters a lot.

Execution vs correctness

An execution test checks that something ran and returned something reasonable:

def test_process_payment():
    result = process_payment(amount=100, currency="USD")
    assert result is not None
    assert "transaction_id" in result

This passes. It will keep passing even if the function:

  • Accepts amount=0 and processes it as a free payment
  • Accepts currency="BANANA" and stores it in the database
  • Returns a transaction_id that's always the same string
  • Silently swallows a network error and returns a fake success

A correctness test checks the specific contract — each stated requirement individually:

def test_process_payment_rejects_zero_amount():
    with pytest.raises(ValueError, match="amount must be positive"):
        process_payment(amount=0, currency="USD")

def test_process_payment_rejects_invalid_currency():
    with pytest.raises(ValueError):
        process_payment(amount=100, currency="BANANA")

These tests say something. They'd fail if the contract were violated. The execution test doesn't.

Why teams write execution tests

It's faster. One test, covers the happy path, gets the coverage number up. The temptation is real — especially under deadline pressure.

The cognitive cost is also lower. You call the function, check the output isn't obviously wrong, move on. Writing correctness tests requires thinking through each failure mode, each boundary condition, each documented constraint. That takes time.

But the time saved writing execution tests is paid back in debugging production bugs.

The "would it catch a regression" test

Here's a useful question to ask of any test: if someone deleted the guard this test is supposed to cover, would this test fail?

Take a docstring that says "raises ValueError if amount is zero." If someone changed the code to silently accept zero:

  • An execution test: still passes. result is not None doesn't care whether validation happened.
  • A correctness test with pytest.raises(ValueError): fails immediately.

If the answer to "would this catch a regression" is "maybe not," the test is providing coverage theater. It increments a number without providing safety.

Closing the gap systematically

The gap between "we have tests" and "our tests catch bugs" is the correctness gap. Closing it manually on a large codebase is tedious but tractable with the right tool.

Quell reads your docstrings and Pydantic models, extracts each stated requirement, and checks whether any test would catch a violation. For every gap:

  1. It generates a test targeting that specific constraint
  2. Runs it against the existing code — it must pass
  3. Injects a minimal violation (changes the guard condition just enough to break the requirement)
  4. Runs the test again — it must fail

This is a verification step that most test generation tools skip entirely. Generated tests that "pass" but don't catch violations are worse than no tests — they give false confidence.

pip install quelltest
quell find src/ --fix

The output is verified tests injected directly into your test files. Not skeletons to fill in — complete tests that prove each constraint is enforced.

pytest remains excellent

None of this is a criticism of pytest. pytest is the right tool. The upgrade is in what you ask pytest to verify — not just that functions run, but that documented contracts are enforced.

The goal is a test suite where every test would fail if its corresponding requirement were violated. That's a test suite worth trusting.


Quell on PyPI — run quell find src/ to see the correctness gap in your codebase.

Try Quell

Install Quell and run it on your codebase — no API key, no configuration required.