torsdag 16 februari 2012

How I Learned to Stop Worrying about my LINQ queries and Love the integration test environment

In the past I've put lots of effort into making Entity Framework LINQ queries testable by faking ObjectSet/DbSet and ObjectContext/DbContext/DataContext. My code to make this possible can be found on GitHub 

The key principle to make this work is to provide interfaces for the context and objectsets so an alternative, faked implementation based on IEnumerable<T> can be used for the tests. A negative aspect about this is that LINQ to Objects will be used in the tests instead of a LINQ implementation that will actually generate SQL code (LINQ to Entities or LINQ to SQL).

I've seen many others putting effort into making this possible.

In the past I thought that testing your queries with LINQ to Objects was good enough. But if you really are thinking why you need the tests, you might start rethinking your approach.

The most important reasons of using automatic tests are in my opinion:

  • To assure the quality of your production code
  • To assert that your codes behave correctly now, but also in the future

When you can trust your tests, you will have more courage to refactor code and know that it will still work. This will lead to better code.

I've finally realized that LINQ to Objects and LINQ to Entities are too different (LINQ to Sql is also to different). I think that the tests utilizing LINQ to Objects can not be trusted. I've been TDD:ing many linq queries and provided LINQ to Object tests that passes, but when I fire up the complete system the query fails. I’ve seen queries that work with LINQ to Objects but fails with LINQ to Entities for lots of different reasons; enum handling, string comparison, use of non entity properties, invalid use of extension methods (for example Distinct()) and several other reasons.

Small and simple LINQ queries might be easy to test by LINQ to Objects, the differences between LINQ to Objects and LINQ to Entities are easier to spot and it will also be much easier to predict which SQL code that will be generated. But when you handle large queries it will be almost impossible (except for Jon Skeet) to guess or predict what SQL the query will generate. It's when you are writing those large and complicated queries that you really will need the tests, both now and in the future. Wouldn't you feel more comfortable editing a complicated query if it was covered by tests?

My suggestion is to stop putting effort into making Entity Framework or Linq to Sql testable with Linq to Objects. Spend the time creating a good integration test environment instead. An environment where you can test your queries against a real database. Some of my advices for your integration environment:

  • Make it resemble the producation environment
  • Try to use the same database server product and version
  • Use the same LINQ implementation as in production
  • Use the same ADO.NET provider
  • Separate your integration tests from your unit tests. If the integration tests becomes slower you can still execute the unit tests really fast and often, the integration tests would probably not run as often as the unit tests
  • Make it easy for the team to execute the integration tests. This can involve scripting creation of database and masterdata and creata an environment that executes the scripts on your locally hosted database before the tests are executed or by using a shared instance of the integration test database. Choose what suits you best!

The above reasons will make me able to trust my tests for the LINQ queries again.

What are your suggestion for a good integration test environment?

1 kommentar:

  1. I like the way you think.

    The key to make this work is really your last point. If the integration tests arent easy to run, they will eventually disintegrate into nothing and become a pain point when it comes to maintenance. I have seen two solutions to make it easy, one is to use a local database in which entries are created and deleted, the second is to use a shared (or local) database but where no entry is actually created. I much prefer the second.

    There's no parallellism problems with this architecture. A really nice solution is to use transactions to make sure that no post actually enter your repository. Sure, you miss out on the very end bit of integration test (data storage) but everything else including constraints will work just as you want it to. Just make an integration test base, and make a transaction in your OnStartup and a Dispose in your OnCleanup. It's magical.