Building Scalable Databases: Denormalization, the NoSQL Movement and Digg

Database normalization is a technique for designing relational database schemas that ensures that the data is optimal for ad-hoc querying and that modifications such as deletion or insertion of data does not lead to data inconsistency.

Database denormalization is the process of optimizing your database for reads by creating redundant data. A consequence of denormalization is that insertions or deletions could cause data inconsistency if not uniformly applied to all redundant copies of the data within the database.
Why Denormalize Your Database?

Today, lots of Web applications have “social” features. A consequence of this is that whenever I look at content or a user in that service, there is always additional content from other users that also needs to be pulled in to page. When you visit the typical profile on a social network like Facebook or MySpace, data for all the people that are friends with that user needs to be pulled in.

Or when you visit a shared bookmark on del.icio.us you need data for all the users who have tagged and bookmarked that URL as well. Performing a query across the entire user base for “all the users who are friends with Robert Scoble” or “all the users who have bookmarked this blog link” is expensive even with caching. It is orders of magnitude faster to return the data if it is precalculated and all written to the same place.

This is optimizes your reads at the cost of incurring more writes to the system. It also means that you’ll end up with redundant data because there will be multiple copies of some amount of user data as we try to ensure the locality of data.

Galhano //127.0.0.1