Mutation testing with Stryker.NET
It is becoming apparent to me that better testing is going to be more and more relevant as LLM usage increases within software development. I think this has to be even more important as people additionally let the tools write unit tests for the code they are producing.
In my experience, an LLM dumping its code output and related tests alongside each other in a PR narrows your thinking as to what tests are missing. Maybe this is just a Matt problem, I don’t know. But your vision of edge cases and logical errors is wider when you are thinking-while-typing, an element that is lost when an agent is doing things for you.
One form of testing I don’t hear much about is mutation testing. I took a day recently to explore it further with Stryker.NET, the go-to tool for the .NET ecosystem (and used internal in MS).
For the unaware, mutation testing helps test your tests. Tools like Stryker make changes to your production code (“mutations”), trying to catch tests that still pass. A mutation (often call “mutants” for a little fun) is killed if your unit test fails (good), and survives if your unit test still passes (bad).
Mutations can take several forms - a simple example is reversal of booleans. Stryker will take where you have written if (true) and change it to if (false). There’s a full list here.
I hit a snag early on - Stryker doesn’t (yet) fully support Microsoft.Testing.Platform, and my current big codebase in the office has unit tests written against xUnit v3. I chose to downgrade to v2 on a temp branch so I could use all features available.
Running the analysis tool is straight forward, so this post doesn’t need to be a how-to. The docs are great. I personally started with our domain layer, something I took as well tested, but Stryker handily found quite a few missing tests that I was able to add in short order.
Some surviving mutants I am okay with - it removed log messages and no tests failed, for example. I personally don’t see much value in testing against log message strings. Stryker does allow for excluding mutation types, so I’ll look to refine the setup over time and hopefully get it added to our CI/CD pipeline.
Once we’re comfortable with mutation testing, I’m planning on taking a serious look at property testing with FsCheck - the two combined should help to shore up our test code base nicely.
If you'd like to reply to this post, I'd love to hear from you. Feel free to email me.