Entity Framework query performance patterns

Posted: 10.8.2013 5.15.49 (EET/GMT+2)

If you've been working with Entity Framework and large datasets, you might have noticed that table joins (combining data from two or more tables based on a key value) can bring down the performance of your queries.

When working with Entity Framework, I've noticed that it's often faster to "un-join" those queries, and instead create two simpler queries (instead of one larger) and then combine the results on the client (or of the server in case of an ASP.NET web application).

Let's take an example. For instance, say, you wanted to query "orders" and wanted to get the name of the "sales representative" who handled the order along with the order id and date, you'd generally join the "orders" and "salesreps" tables. Now, in the case of large tables, it might be more efficient to first read the names and id values of from the "salesreps" table into memory (as there usually are a lot fewer sales reps than orders), and then with another query get the needed order details.

Finally, with C# code, you would then loop through the received orders, and do a lookup on the list of sales representatives for each found order.

Although the old principle of "letting the database do its work" is still sound and valid in many situations, I've noticed that the above pattern of client-side combination (instead of database joins) yields much better performance especially in larger databases.

What is your experience in similar situations? Send me an email to share your thoughts.