Speeding up your SQL Server data searches with de-normalization in complex databases
Posted: (EET/GMT+2)
Most enterprise-level databases are pretty complex, with dozens of tables interlinked together using primary and foreign keys. For instance, I've been working recently on a database with around 100 tables, and 500,000 rows in the main tables each, which would be a mid-size database on Finnish standards.
Now, assume you're writing an ASP.NET web application that needs to search for data from this database given the user's search keyword. If your SQL database is properly normalized, you have several tables linked together with keys (I prefer id numbers, as in SQL Server's identity fields). If you have large amount of data, it takes a while to scan multiple tables to search for your data, even when you have primary keys set for each table.
For instance, assume a table structure like this. The main table is "Messages", which in turn has "Recipients", which links to "Teams" and finally "Persons". If you wanted to search on this four-table structure based on person's last name, team name or message status, SQL Server would need to do pretty many things at once.
Now, such a database design is a common, properly (but not overly) normalized structure, but if you want to allow fast searches to the data, you might want to de-normalize your table structure slightly, or at least move some of the data "up" to the "Messages" table.
Let's take an example. Given the above database size, number of rows and structure; let's say a particular query returning 500 "Message" records would take about 10 seconds if data is to be searched based on "Person" name.
But, if your data is pretty static in nature (doesn't change too often), you could take this approach: add a new column to the "Messages" table, called something like "RecipientPreview" as a regular (N)VARCHAR field, and then when you save the record originally, store the names of the "Persons" in that preview field. Then, if the user searches by "Person" name, you'd only need to search from one table.
For the system I'm mentioning, the performance benefits could be large. For instance, the new execution time with 500 records returned is a mere 0.05 seconds (50 ms) – a 200x improvement.
Something worth thinking!