There’s a good practice that says “a database is a representer of facts”. If there’s more than one way to extract a single fact from the database, then there’s a redundancy in it. Every redundancy can cause different anomalies in the data, which in turn cause bugs in the application. To avoid that, there’s a process called normalization, which involves following sets of rules to restructure the database to remove redundancies without losing the original facts. The traditional set of normalization rules are the so-called Normal Forms: First Normal Form, Second, Third, etc. Unfortunately, those are frequently overlooked by developers due to their excessive formalism. But in fact, even the Normal Forms aren’t enough to avoid anomalies, since they’re concerned about redundancies only in a single table*. Since cross-table dependencies are very common in modern applications, we must go beyond normal forms to prevent problems.
In this talk, we’ll present normalization rules on a friendly language, going beyond normal forms. We’ll understand how the software requirements cause dependencies in database tables, both in-table and cross-tables. We’ll show real examples of non-trivial dependencies that happen on Django models. We’ll discuss how normalization prevents redundancies, inconsistencies, anomalies, and bugs. Knowing that normalization can cause slowdowns in queries, we’ll present how to increase performance with denormalization, which is not the same of not normalizing. Instead, denormalization means being able to represent data in multiple ways to speed up queries without introducing inconsistencies. We’ll discuss Django-related denormalization tools that use cronjobs, indexes, caching, materialized views and triggers, and NoSQL.
*It’s common to ignore the fact that normal forms only discuss redundancies inside a single table/record/relval. More about this in this article reviewed by Codd, Fagin and Date, key figures of the relational model.