Sunday, July 22, 2012

Practical Thoughts on Using ORMs

Object Relational Mapping (ORM) appears to be a polarizing topic. I recently read a very interesting article (ORM is an Antipattern) that made some very good points against ORMs. Actually, it was interesting for the comments on the article as much as for the content. A more realistic assessment (in my opinion) by Martin Fowler (ORM Hate) points out that ORMs are difficult to use because the problems that they attempt to solve are hard to solve, no matter what the strategy. I take a more practical approach to ORMs. I neither hate ORMs nor think they are the answer to mapping relational data to objects. In this article, I'd like to explore my thoughts and experiences on the subject.

The Benefits or ORMs

Most business data is stored in relational databases. These databases are ubiquitous and well understood in the software development industry. They are great at slicing and dicing data in all the ways we need to analyze the business, which is the rationale for our programs. Yet the two concepts, objects and relational data, don't always match up very well. Our objects don't always map directly to our tables. When we program, we must make the transitions back and forth between our object models and our data models as best we can. ORMs exist to make that easier.

The truth of the matter is that I've always been hesitant about ORMs because it seemed to me that they did the easiest work in my applications. Writing INSERT, UPDATE, DELETE and SELECT statements is easy. But ORMs do attempt to do more than just CRUD work; they manage concurrency and provide lazy loading for example. Yet, they do none of that better than I can do it. The value, then, in an ORM is found in the development time savings they provide, which can be substantial. At least, this is one of their more appealling values on the surface.

Another even more appealing value, in my opinion, is the ability to use LINQ (see Language Integrated Query.) In fact, I would not want to use an ORM if it did not have this magnificent feature. Understandably, some might not like the thought of learning another syntax that is similar, but nearly enough, to T-SQL. I wrestled with the language when I first started and still do. It's worth it, though, for the simple fact that it is refactorable. If I need to change something about a class, my LINQ queries will get flagged by the compiler - or perhaps updated by the refactoring capabilities of my development tool. By contrast, a SQL statement inside a string is only evaluated at runtime.

Another valuable benefit, if the ORM supports it (Entity Framework does), is the capability to build your database during development. This is very valuable when using doing TDD and Outside-In development because you do not have to focus on building the database before you work through your feature set. As you build your object model and run tests, the ORM builds the database. However, you do have to think about your data model to make sure that the ORM builds it correctly and you may have to adjust the data model when it is done, but much of the work will be done as you go through the process, especially if you use a database that allows you to regenerate each time you run your tests, like SQL Compact Edition.

ORM Drawbacks

ORMs are not with out drawbacks - significant drawbacks. Every ORM is different, so I will limit my points to the one with which I am most familiar: Entity Framework.

The most significant issue that I've had is how Entity Framework tracks objects and maintains connections. Because objects in .NET are passed by reference, when you query a database, the ORM holds onto references to the objects it gives you. This is critical for how it functions. You update the objects and then tell the ORM to update the database. That's great. However, this can lead to a number of confusing situations.

The first situation occurs when you treat your objects as if they are not connected. In truth, they are not connected to the database, but they are connected to the ORM. It is quite easy to query objects in such a way that they interfere with the persistence of other objects. For example, if you get your hands on an object and do something to cause it to be invalid, you may get a validation error when you try to save some other object. But the error will appear to be unrelated. To be fair, from the ORM's perspective, you made a mistake; you probably have a design flaw somewhere, but it is sure difficult to troubleshoot. What all this comes down to is that you are not free to manipulate your objects however you want. There can be unintended consequences, thus raising the level of difficulty in working with ORMs.

Another drawback of ORMs, and probably more often cited, is query performances. ORMs don't claim to improve query performance, but they can sometimes make it considerably worse without your realizing it. The particular situation that comes to mind is when the ORM makes multiple queries when you might have only made one. This happens when it must get child objects or collections. Fortunately, some ORMs provide methods to alleviate this problem (the Include() method in Entity Framework, for example.) Yet even these helper methods are limited and you might find an ORM making more queries than you intended, or building a query that is not the way you would have liked it. In the end, you are more suited to optimize reads on writes to the database than any ORM - and that may be critical to your application.


A Practical Approach to Using ORMs

I think in many circumstances the benefits of an ORM out weighs the drawbacks. With the exception of query performance, good design practices can alleviate most drawbacks. But even query performance can be mitigated to a large degree. I have a few recommendations for using an ORM to get the most out of it.

Use Database Views

Displaying data is a fundamental function in any business application. The data you need to display is often a summary or aggregate of the very data that your objects represent. You cannot reasonably display this data with outview objects. View objects, like database views, are not updatable (necessarily) and generally denormalized. If you are going to use such objects, and all you are really doing is pulling the data from the database to show in a report of some sort, you might as well use database views.

With a database view you can optimize the query as much as you like. Mapping qn object to such a view is trivial. The queries are easliy made and you get all the benefits of LINQ - plus for most of us writing the query to create the view is easier than writing the equivalent LINQ query.
The one drawback that I have encountered in using database views occurs when I'm initially developing an application and I'm using the features of the ORM to create the database. The view objects from an ORM's perspective represent tables like any other object. That could be a problem during testing if you don't account for it. The resolution is simply enough: let the view be a table during development and add test data as you need it. Once you are ready to implement the real database, change the generated DDL.

Separate Database and Domain Classes

This is a difficult decision to make. It seems to easy to create classes that define the application domain and the database. That's Code First development and will generally work well enough, most of the time. But it does not take long to get into situations when dealing with the inherent problems of ORMs holding onto references will kill your productivity. So, use Code First techniques to create a data model and build a data access layer, but create a separate set of entities for your domain model.

One of the main reasons I have begun to adopt the approach is because it forces me to organize my project from the beginning in the way that I will inevitably want it once I start having problems with the ORM. It may seem like a lot of extra work, but these classes are fairly simple and, quite frankly, look very similar. However, don't be tempted to use inheritance; your ORM may interpret that in ways you do not intend. Another reason is that I find it often more convenient to denormalize certain properties. Sometimes, it just works better to do that in the application, even though you do not want to do it in the database.

Forget Lazy Loading

Lazy Loading was one of the draws to ORMs. You get an object, but you don't have to pay for loading child collections unless you access them. That's a great concept for a very deep model. I implemented a Lazy Loading collection once for a project that had such a need and it worked great. But, I had a lot of control over how it worked. Not so with ORMs.

I have found that because I have this feature, I will try to use it. The problem is, while the object is connected to the database, accessing a property causes a database call. That's fine once, but it can get out of hand quickly, especially if you try to iterate through the object graph. Once the object is no longer connected to the database, the Lazy Loaded property, which was a proxy class, will throw an exception. I avoid using it for those reasons, so what's the point in even having the feature?

Fortunately, if you follow my advise about separating the domain and database models, you won't be tempted by the feature, since you won't have it. I find it a better design practice to be intentional when I query to get the objects and collections I need. Even if I have to make separate queries for child collections. Also, a little denormalization in the domain layer can go a long way to alleviate the need for additional queries.

Conclusion

Some developers will take a long time to accept ORMs. Perhaps a day will come when a better tool is created. For now, I treat ORMs like any other tool: if it's right for the job, use it. I've only scratched the surface on determining whether an ORM is right for the job, but I hope I've provided some practical ideas that will help make using ORMs less painful. Right SQL in strings is still viable and good unit testing will still catch changes to the data model. So, in the end, even though I'll typically choose Entity Framework in a business application, it's not going to hurt the application if I don't.

Wednesday, July 4, 2012

Factories

The design pattern for a factory includes an interface for the factory with methods that return interfaces. Here is a diagram that includes the implementation of the factory:



FactoryInterface contains methods that return interfaces for, in this case, ServiceAInterface and ServiceBInterface. The implementation, Factory, is responsible for determining which objects that implement those services should be created.

A factory could be created without the use of interfaces, of course, but you would be limited in how you could use that factory in testing. See Developing Software with BDD  for more on how factories can be used in testing.


Factory Trees

In most business application where factories would be used, you are likely to need factories in each layer (see Layered Design Pattern for Enterprise Applications .) For example, you will need a factory for the Presentation, Service and Data Access Layer. In this scenario, the creation of concrete implementations at each layer will depend on creating the dependant layers, visa-vi Dependency Injection. To accomplish this, each factory must have a dependent property for the factory (or factories) beneath it. The following diagram illustrates this:


Here you see that the Factory class has a property for FactoryB.

You will notice that this property is not a part of the interface. It is not necessary because the fact that the factory is even needed may be considered an implementation detail. For example, suppose you have an interface that defines a service. Naturally, that service needs some data access objects, so the service factory will have a data access factory. However, you might also use the same service in a thick click scenario where the service is really a proxy for a web service. In that case, the data access factory is not required. Since it is the factory itself that will know whether another factory is needed, the interface need not define that dependency. Of course, it won't really hurt if you included it anyway; you'd just not set it for the service proxy factory.

Here is what the example factory tree would look like:



Instantiating Factories


Using a Factory in code is quite simple:

ServiceAInterface servicea = factory.CreateServiceA();

The challenge is when and where to instantiate the factory. If the factory is by itself, you might just create it when you need it, like so:

FactoryInterface factory = new Factory();
ServiceAInterface servicea = factory.CreateServiceA();

Of course, the problem with this approach is that you hardly get any value out of abstracting the factory itself. A better approach is to create the factory early, such as when the application starts, and use that object throughout the application. You might do this in the Main() method of a Windows Forms application or in the Application.Start of an ASP.NET application.

Wherever you instantiate the factory, you will specify the concrete factories that are used in the tree. By doing this in one place in your application, you are able to switch out factories at different layers when you need to, as would be the case for unit tests. Here is an example of how you would instantiate the above factory tree:

PresenterFactory presenterfactory = new PresenterFactory(
     new ServiceFactory(
           new DataAccessFactory()
     )
);

This would be typical for an ASP.NET application. The variable presenterfactory would end up in Application state or you might even invoke this at each post back, assuming these constructors did nothing.

If you were instantiating the same PresenterFactory for a thick client that used a web service proxy, it would look a bit different:

PresenterFactory presenterfactory = new PresenterFactory(
      new ServiceProxyFactory()
           );

No need for the DataAccessFactory here. Note two that both the PresenterFactory and any presenters it creates will have no knowledge of the fact that you used a different service factory. This allows the presentation layer to be entirely independent of the service layer.

Using Factories in Unit Tests

Finally, a word about factories in unit tests. Since unit tests are granular, i.e. you test a single method in isolation, the dependencies that factories create would normally already be created by your test. For example, if I'm writing a test for some service method, and using Moq, I might setup my service like so:

Mock<DataAccessInterface> dataAccessMock = new Mock<DataAccessInterface>();
ServiceInterface service = new Service(dataAccessMock.Object);

I did not use a factory like I would normally in the application. That's perfectly fine as long as I test my factory methods somewhere.

On the other hand, I might gain a benefit by letting the factory do its job even in unit tests. In that case, I would use the factory implementation along with the service implementation, but I would provide mocks for other layers. So, my test setup would then look like this:

Mock<DataAccessInterface> dataAccessMock = new Mock<DataAccessInterface>();
Mock<DataAccessFactoryInterface> dataAccessFactoryMock
     = new Mock<DataAccessFactoryInterface>();
dataAccessFactoryMock.Setup(f=>f.CreateDataAccess())
     .Returns(dataAccessMock.Object);

ServiceFactoryInterface serviceFactory
     = new ServiceFactory(dataAccessFactoryMock.Object);

Notice how I setup a mock for the data access factory that will provide the data access mock for the service. This is useful in a base class for service tests (or whatever layer) where you can setup all the data access factory methods you need to create all the service objects. This allows your setup to consist of a single line, assuming that the base test fixture exposes the service factory.

Note: When doing this, you will probably NOT validate the data access factory mock because you will have setup all the methods when only a few are needed in any circumstance. That will not pose a problem for testing since the absence of the data access mocks will cause the test to fail IF you did not properly implement the factory. To do this with Moq, use MockBehavior.Loose in the constructor and don't call VerifyAll().


Monday, July 2, 2012

Layered Design Pattern for Enterprise Applications

In this post, I'll briefly describe how I've been layering applications lately. I wanted to capture the concepts and provide some explanations. I do not pretend to think this is the only way to design applications, but I do think it is a good way. It is also fairly typical.


I'll start with the most basic of designs:


Data Access - this layer in my mind if very granular. The classes involved have the single mission to read and write data to the database. There may be a one to one relationship between data access classes and database tables, but not always. There are some cases when it is prudent to denormalize data.

Service - the service layer maps to application functionality from a logical perspective. Take the functionality of your application and partition it into logical modules and you have a basis for services. Thus, service classes are more coarse than data access classes, but you may have multiple services depending on the complexity of the application.

Presentation - the presentation layer provides the business functionality behind the user interfaces. Here is where the interactions between the user and the services are orchestrated. I typcially will have a one to one relationship between Presenters and UIs (views), although sometimes a UI may have a composition of complex user controls that each have their own presenter.

User Interface - this layer depends on the platform (web, windows, modile, etc.) I want this layer to be as simple as possible and contain no business logic. It can get complicated, though, as more and more advanced controls are introduced.


Depending on how you are implementing an application, you might have additional layers. For instance, in a client server environment, there is the need for a web service and proxy layer:


You also might need a layer to sit on top of the UI and Presenter layers. This layer (e.g. Application Controller) would orchestrate the interaction between the User Interfaces and it would interact with the Presenters (assuming an MVP pattern):



Organizing a Solution

Thus far what I've laid out is straightforward and fairly ubiquitous. Now I want to layout some details about how all this fits together in a .NET Solution.

It is reasonable to think that each layer would have its own assembly, or set of assemblies. While that is perfectly fine, There is a little more complexity than that. For example, each class in each layer will likely have an interface. Although those interfaces can coexist with the classes that implement them, I find it makes more sense to separate them. I do not, however, see the need to have a separate assembly for interfaces at each layer. Instead, I prefer a single assembly to contain interfaces. This assembly then becomes something of a definition for the application.

Some may prefer to break this definition assembly into two separate assemblies, one for data access interfaces and another for the other interfaces. This would be meaningful in a client server application where the client hardly needs any knowledge of the data access layer (in truth, it would not need the service layer, either, but the service interfaces are used to define the web service proxies, so they are needed.)

There are other classes that cannot be relagated to a specific layer that will also need assemblies. Domain objects or data access objects for one, but also any framework classes you may require (your framework, not .NET's.) With these considerations, there are at least 6 assemblies needed:

1. Framework and Domain
2. Definitions
3. Data Access
4. Service
5. Presentation
6. Client

I actually like separating Framework classes and Domain classes. When using Entity Framework, it may also be useful to separate the database context and entity classes (assuming they are different from the domain classes) into their own assembly - but that's a topic for another post.

To put a little more meat on these thoughts, here's how I would organize projects in a solution for application X:

  • X.Framework
  • X.Domain
  • X.Definition
  • X.DataAccess
  • X.Service
  • X.Presentation
  • X.Client (a Web App, perhaps?)
If I were using Entity Framework, it would be:

  • X.Framework
  • X.Domain
  • X.Definition
  • X.EntityFramework
  • X.DataAccess
  • X.Service
  • X.Presentation
  • X.Client 
If I needed a client server application, it would be:

  • X.Framework
  • X.Domain
  • X.Definition
  • X.EntityFramework
  • X.DataAccess
  • X.Service
  • X.WebService
  • X.WebServiceProxy
  • X.Presentation
  • X.Client