SmHarter insights Blog | AI Coding Test-first Vs Test-after: pros & cons

Test-First is a clear winner in manual coding. But how does AI-assisted development shift the pros and cons and change the balance between Test-First and Test-After?

I am a long-time enthusiast and expert of test-first TDD, and A-TDD.

As part of the task of reimagining how we code with AI, I am reflecting with openness and curiosity on my experience with AI-assisted coding (human-driven, agentic, frequently reviewed AI-generated code) adopted to create production-grade software.

Specifically, in this post, I document the pros and cons I have observed in the test-first and test-after approaches when used with AI-assisted coding.

Structural changes AI coding brings to testing

Intent: The Test-first approach drives developers to clarify code intent and make design decisions, one test at a time. When pairing with an AI, the developer already does that every time they write a good prompt to generate code, regardless of the Test-first or Test-after approach.
Simple design & reusable code:
- AI generates tests in batches, eliminating Test-first traditional Red-Green-Refactor micro-cycles that ensure simple implementation code and protect against false positives.
- The frequent reviews of AI-generated code encompass both code and design refactoring, potentially leading to simple design and reusable code, even with a Test-after approach.
- After implementing a few classes with a simple design and reusable code, they can be referenced in subsequent prompts as examples to follow, obtaining the same simple design and code reusability as with Test-first.
Behaviour documentation: no major differences here.

My own conclusions

Let’s start from the end. So far, these are my conclusions following from the Pros and Cons I have experienced in my own context.

Unlike before in manual coding, where test-first is a clear winner, now in mid-2026, I don’t see a clear winner between AI-Test-first and AI-Test-after. Both seem valid options, each with its pros and cons. Further evolution of LLMs’ capabilities may change this; meanwhile, this is where things stand.

A Sr software engineer experienced in TDD and A-TDD, who naturally adopts the A-TDD and TDD way of thinking about behaviour and code design already when writing each prompt, may benefit from the pros of AI-Test-after while countering the cons with an additional step in which they direct the AI to perform mutation testing, fault injection, or adversarial verification.

A Jr software engineer, and everyone without ingrained TDD and A-TDD habits, may benefit from adopting the AI-Test-first approach. This guides the development along the lines of TDD and A-TDD ways of thinking, improving code design quality (resulting from code generation after tests are generated) and test reliability.

Below is the complete list of pros and cons underpinning these conclusions.

Useful lens: Feedforward and Feedback

Birgitta Böckeler, in her article (*), describes how to harness a coding agent, both anticipating unwanted outputs and preventing them by allowing the agent to self-correct. Specifically, she introduces two concepts that can be used as lenses when analysing the pros and cons of AI Test-first and Test-after:

Feedforward: like the prompt and what comes with it to anticipate the agent’s behaviour and aim to steer it before it acts. The guides provided in the prompt increase the likelihood that the agent creates good results in the first attempt.
Feedback: like the output produced following the agent code generation, by a linter, a compiler, automated tests and so on, helping the agent to self-correct.

With AI test-after, during the 1st step of code generation

Feedforward: it is easier to prompt and make LLMs generate good implementation code than good test code
Feedback: the implementation code generated must also pass the pre-existing test suite (regression) or, in case of failures, generate feedback that triggers self-correction

With AI test-first, during the 2nd step of code generation, which follows the tests’ generation

Feedforward: the tests generated can be easily referenced in the prompt as spec/guides, to quickly and effectively direct the code generation
Feedback: the executable tests create, in case of failures, feedback that triggers self-correction

(*) https://martinfowler.com/articles/harness-engineering.html#FeedforwardAndFeedback

Test-first and Test-after: the Pros and Cons I experienced

About Testable code (modular design, dependency inversion, etc.):
- AI-Test-first generates testable code and architecture and helps less experienced developers to methodically adopt the A-TDD and TDD way of thinking about behaviour and code design, maintaining sight of the important design decisions.
- With AI-Test-after, I direct the code behaviour and design with specific instructions in the prompt (instead of the tests); I further refine the design as necessary, with quick manual or AI-generated small refactorings during the code review step. I find this approach faster and more effective for me as I gain speed without losing quality.
The Asymmetric AI Efficiency (Rapid tests-after generation):
- With AI-Test-after, I have found LLMs extremely efficient at generating good-enough tests against existing code, with little instruction, giving me a huge productivity advantage compared to AI-Test-first.
AI micro-management Trap (for unit tests):
- In AI-Test-first, the prompt for generating granular unit tests (before the implementation code) requires a lot of detail; writing such a prompt is very time-consuming, and not too far from manually writing the unit tests.
- With AI-Test-after, it is enough to reference the class or module and ask in the prompt to generate the unit tests. Voilà.
High Friction:
- With AI-Test-first, the generated tests are more prone to small, time-consuming, hard-to-detect issues, such as using deprecated test framework functions, subtle mocking errors, wrong interface or implementation assumptions.
Increased Cognitive Overhead:
- With AI-Test-first, spotting a problem in the generated tests, identifying the root cause, and fixing it often involves a triple conceptual indirection: Prompt => Test => Implementation code (yet to be generated). In general, it tends to require more time, attention and a higher cognitive effort.
The “Double Hallucination” Trap:
- With AI-Test-after, a hallucination can introduce a bug in the implementation code; later, the generated test may mirror the bug as intended behaviour. The generated test code may not cover all possible execution paths or return false positives. Furthermore, LLMs may not spot complex domain properties, existing semantic gaps, non-explicit invariants, and non-obvious failure modes; as a result, they may not generate tests for these.
  
  To counter these weaknesses, it is useful to take an additional step by directing the AI to perform mutation testing, fault injection, or adversarial verification
Examples better than tests:
- Where the LLM struggled to generate the right code, both with AI-Test-first and AI-Test-after, what worked for me has been providing examples (of the data or of input and output).

Which Pros and Cons have you experienced in your own context?

Develop Technical Excellence that delivers.

See how we can help.
You, your team, your Tech.

Virtual Tech Mentoring Virtual Tech Assessments

Previous: Coding with AI: 2 alternative philosophies