Sunday, July 22, 2012

Practical Thoughts on Using ORMs

Object Relational Mapping (ORM) appears to be a polarizing topic. I recently read a very interesting article (ORM is an Antipattern) that made some very good points against ORMs. Actually, it was interesting for the comments on the article as much as for the content. A more realistic assessment (in my opinion) by Martin Fowler (ORM Hate) points out that ORMs are difficult to use because the problems that they attempt to solve are hard to solve, no matter what the strategy. I take a more practical approach to ORMs. I neither hate ORMs nor think they are the answer to mapping relational data to objects. In this article, I'd like to explore my thoughts and experiences on the subject.

The Benefits or ORMs

Most business data is stored in relational databases. These databases are ubiquitous and well understood in the software development industry. They are great at slicing and dicing data in all the ways we need to analyze the business, which is the rationale for our programs. Yet the two concepts, objects and relational data, don't always match up very well. Our objects don't always map directly to our tables. When we program, we must make the transitions back and forth between our object models and our data models as best we can. ORMs exist to make that easier.

The truth of the matter is that I've always been hesitant about ORMs because it seemed to me that they did the easiest work in my applications. Writing INSERT, UPDATE, DELETE and SELECT statements is easy. But ORMs do attempt to do more than just CRUD work; they manage concurrency and provide lazy loading for example. Yet, they do none of that better than I can do it. The value, then, in an ORM is found in the development time savings they provide, which can be substantial. At least, this is one of their more appealling values on the surface.

Another even more appealing value, in my opinion, is the ability to use LINQ (see Language Integrated Query.) In fact, I would not want to use an ORM if it did not have this magnificent feature. Understandably, some might not like the thought of learning another syntax that is similar, but nearly enough, to T-SQL. I wrestled with the language when I first started and still do. It's worth it, though, for the simple fact that it is refactorable. If I need to change something about a class, my LINQ queries will get flagged by the compiler - or perhaps updated by the refactoring capabilities of my development tool. By contrast, a SQL statement inside a string is only evaluated at runtime.

Another valuable benefit, if the ORM supports it (Entity Framework does), is the capability to build your database during development. This is very valuable when using doing TDD and Outside-In development because you do not have to focus on building the database before you work through your feature set. As you build your object model and run tests, the ORM builds the database. However, you do have to think about your data model to make sure that the ORM builds it correctly and you may have to adjust the data model when it is done, but much of the work will be done as you go through the process, especially if you use a database that allows you to regenerate each time you run your tests, like SQL Compact Edition.

ORM Drawbacks

ORMs are not with out drawbacks - significant drawbacks. Every ORM is different, so I will limit my points to the one with which I am most familiar: Entity Framework.

The most significant issue that I've had is how Entity Framework tracks objects and maintains connections. Because objects in .NET are passed by reference, when you query a database, the ORM holds onto references to the objects it gives you. This is critical for how it functions. You update the objects and then tell the ORM to update the database. That's great. However, this can lead to a number of confusing situations.

The first situation occurs when you treat your objects as if they are not connected. In truth, they are not connected to the database, but they are connected to the ORM. It is quite easy to query objects in such a way that they interfere with the persistence of other objects. For example, if you get your hands on an object and do something to cause it to be invalid, you may get a validation error when you try to save some other object. But the error will appear to be unrelated. To be fair, from the ORM's perspective, you made a mistake; you probably have a design flaw somewhere, but it is sure difficult to troubleshoot. What all this comes down to is that you are not free to manipulate your objects however you want. There can be unintended consequences, thus raising the level of difficulty in working with ORMs.

Another drawback of ORMs, and probably more often cited, is query performances. ORMs don't claim to improve query performance, but they can sometimes make it considerably worse without your realizing it. The particular situation that comes to mind is when the ORM makes multiple queries when you might have only made one. This happens when it must get child objects or collections. Fortunately, some ORMs provide methods to alleviate this problem (the Include() method in Entity Framework, for example.) Yet even these helper methods are limited and you might find an ORM making more queries than you intended, or building a query that is not the way you would have liked it. In the end, you are more suited to optimize reads on writes to the database than any ORM - and that may be critical to your application.


A Practical Approach to Using ORMs

I think in many circumstances the benefits of an ORM out weighs the drawbacks. With the exception of query performance, good design practices can alleviate most drawbacks. But even query performance can be mitigated to a large degree. I have a few recommendations for using an ORM to get the most out of it.

Use Database Views

Displaying data is a fundamental function in any business application. The data you need to display is often a summary or aggregate of the very data that your objects represent. You cannot reasonably display this data with outview objects. View objects, like database views, are not updatable (necessarily) and generally denormalized. If you are going to use such objects, and all you are really doing is pulling the data from the database to show in a report of some sort, you might as well use database views.

With a database view you can optimize the query as much as you like. Mapping qn object to such a view is trivial. The queries are easliy made and you get all the benefits of LINQ - plus for most of us writing the query to create the view is easier than writing the equivalent LINQ query.
The one drawback that I have encountered in using database views occurs when I'm initially developing an application and I'm using the features of the ORM to create the database. The view objects from an ORM's perspective represent tables like any other object. That could be a problem during testing if you don't account for it. The resolution is simply enough: let the view be a table during development and add test data as you need it. Once you are ready to implement the real database, change the generated DDL.

Separate Database and Domain Classes

This is a difficult decision to make. It seems to easy to create classes that define the application domain and the database. That's Code First development and will generally work well enough, most of the time. But it does not take long to get into situations when dealing with the inherent problems of ORMs holding onto references will kill your productivity. So, use Code First techniques to create a data model and build a data access layer, but create a separate set of entities for your domain model.

One of the main reasons I have begun to adopt the approach is because it forces me to organize my project from the beginning in the way that I will inevitably want it once I start having problems with the ORM. It may seem like a lot of extra work, but these classes are fairly simple and, quite frankly, look very similar. However, don't be tempted to use inheritance; your ORM may interpret that in ways you do not intend. Another reason is that I find it often more convenient to denormalize certain properties. Sometimes, it just works better to do that in the application, even though you do not want to do it in the database.

Forget Lazy Loading

Lazy Loading was one of the draws to ORMs. You get an object, but you don't have to pay for loading child collections unless you access them. That's a great concept for a very deep model. I implemented a Lazy Loading collection once for a project that had such a need and it worked great. But, I had a lot of control over how it worked. Not so with ORMs.

I have found that because I have this feature, I will try to use it. The problem is, while the object is connected to the database, accessing a property causes a database call. That's fine once, but it can get out of hand quickly, especially if you try to iterate through the object graph. Once the object is no longer connected to the database, the Lazy Loaded property, which was a proxy class, will throw an exception. I avoid using it for those reasons, so what's the point in even having the feature?

Fortunately, if you follow my advise about separating the domain and database models, you won't be tempted by the feature, since you won't have it. I find it a better design practice to be intentional when I query to get the objects and collections I need. Even if I have to make separate queries for child collections. Also, a little denormalization in the domain layer can go a long way to alleviate the need for additional queries.

Conclusion

Some developers will take a long time to accept ORMs. Perhaps a day will come when a better tool is created. For now, I treat ORMs like any other tool: if it's right for the job, use it. I've only scratched the surface on determining whether an ORM is right for the job, but I hope I've provided some practical ideas that will help make using ORMs less painful. Right SQL in strings is still viable and good unit testing will still catch changes to the data model. So, in the end, even though I'll typically choose Entity Framework in a business application, it's not going to hurt the application if I don't.

No comments:

Post a Comment