Quell
Quell finds the edge cases your docstrings describe but your tests never prove. Built by Shashank Bindal after shipping one too many bugs that were documented but untested.
Mission
Code coverage measures which lines ran. It says nothing about whether the code is correct. Quell's goal is simple: for every testable requirement that exists in your codebase — in a docstring, a Pydantic model, a type annotation — there should be a test that proves the requirement holds. Not a test that exercises the line. A test that fails if the requirement is broken.
That's what the 5-gate pipeline does. Gate 5 mutates the source to violate the requirement and runs the test against the mutated code. If the test doesn't fail, the test doesn't ship. Only mathematically proven tests reach your repository.
Principles
Proof, not coverage
A test that only achieves a line hit is worse than no test — it gives false confidence. Every test Quell writes must prove it catches a violation of the requirement it targets.
No silent failures
Every requirement gets a bucket: WRITTEN (proven), SCAFFOLDED (stub), or FLAGGED (reason given). Nothing is quietly skipped. You always know exactly what's covered and why.
Your code stays on your machine
The rule engine handles ~75% of cases locally with no network call. LLM fallback is opt-in and only activated for the remaining complex cases. No source code leaves your machine by default.
Determinism first
The rule engine is the primary path — fast, deterministic, reproducible. LLM is a fallback for the cases rules can't handle, not the default approach.
History
The idea
After shipping a bug that was documented in a docstring — 'must not accept zero amount' — but had no test, it became clear that coverage metrics lie. A line can be executed without ever proving the contract holds.
First prototype
A weekend script that parsed Python docstrings and wrote naive pytest stubs. It caught three real bugs in the first project it ran on. The stubs were ugly, but the idea was validated.
Public beta
v0.4.0 shipped to PyPI. Rule engine, Groq LLM fallback, and a basic 4-gate pipeline. Early adopters gave feedback that shaped the WRITTEN / SCAFFOLDED / FLAGGED split.
v1.0.0 stable
The full 5-gate pipeline shipped. Gate 5 — proving a test fails when the requirement is violated — became the defining feature. It's the moat between coverage and proof.
v2.0.0
Unified `quell find`, Production Readiness Score, libcst AST-safe injection, and the GitHub Action. Quell became a CI-first tool, not just a developer utility.