The high cost of free testing

$ cat content.md

I hold the generally unpopular opinion that we write too many tests. Most tests written have negative value and should not only not have been written, we're better off actively deleting them. Most tests don't ever catch bugs. They fail when the underlying code changes as intended, not when it breaks. The result is a growing tax on improvement disguised as safety.

This used to just be one of my grumpy opinions. The cost of overtesting was bounded by the effort it took to write tests. Before coding agents came along, that effort kept things in check. Now you can generate a full suite of tests in seconds, and the balance has flipped. Writing tests is now free, but comes at a high cost.

Let's talk about how we measure the value of a test. The default assumption seems to be that more tests are always better, as if each one adds a small amount of safety. Test coverage metrics reinforce that idea: more green bars mean more confidence. But that's not actually how value works. A better way to think about it is to ask why we write tests in the first place: to catch future bugs. In finance, the value of an investment is the present value of its expected future cash flows, discounted back to today: more money is better and money today is better than money tomorrow. We can apply the same idea to tests. The value of a test is the future stream of bugs it prevents minus the cost of maintaining it, adjusted for how much we care about the future. The first part is the value of the bugs a test might catch. Some are trivial, others critical. The key question when writing a test is simple: if this test fails, what did we just prevent, and what would it have cost us? That's what makes regression tests valuable, we already know the answer. The second part is maintenance cost. This goes beyond just having more lines of code. Tests fail for reasons unrelated to bugs, and the worst ones are flaky. A test that checks the text on a login button and breaks every time we change the copy is expensive. Even worse, tests can make us afraid to change things. If updating an API means fixing a dozen fragile tests, the cost goes well beyond maintaining them.

The final part is the interest rate. We talk about tech debt as if it's always bad, but that's too simple. Sometimes taking a shortcut gives us more value now, even knowing it will cost more later, making it the right call. When you're experimenting, most things fail, and when they succeed, you'll have the time and resources to clean up. In that world, the future is discounted heavily, and speed now matters more than perfection later.

This balance held as long as writing tests took real effort. That effort was a built-in regulator. Now it's gone. Coding agents can churn out a thousand tests in minutes, and suddenly the math no longer works. LLMs love writing tests, but they're not very good at it. They tend to make the same mistakes, over and over. First, they write trivial tests. In UI code, for instance, they love to assert that the text of a button is exactly what we just set it to. Those tests only fail when we intentionally change the text, not when something actually breaks. Second, they skip the hard part. They'll set up mocks, fixtures, and test scaffolding for twenty lines, and then skip the one line that matters — the part that actually calls the API, writes the file, or performs the logic we care about. Sometimes they even add a helpful comment: "In a real test we'd call the server here." Third, they test how, not what. A good test checks outcomes, not implementation details. LLMs read code too literally and then write tests that encode exactly what the code does. Those tests fail the moment the implementation changes, even if the behavior stays correct. All of this produces tests that look impressive but make things worse. They give us a comforting illusion of safety while quietly increasing the cost of change. Every copy edit, refactor, or dependency update turns into a small storm of meaningless failures. The more "free" tests we add, the more expensive progress becomes. All of this produces tests that look useful but make things worse. They create an illusion of safety while raising the cost of change. Every refactor or copy edit triggers a wave of meaningless failures. The more "free" tests we add, the more expensive progress becomes.

LLMs can also help fix this. Prompt them with the same reasoning and they'll start cleaning up their own mess and possibly yours. They can drop the trivial tests, stop skipping the hard parts, and focus on outcomes instead of implementation. Used this way, they make testing harder to fake and easier to trust.

Writing tests has become free. Maintaining them hasn't. That's the high cost of free testing.

Douwe Osinga