August 1998        Issue: 4

Journal of Conceptual Modeling
www.inconcept.com/jcm

De-Normalization and Keeping Up With the Constraints
by Patrick Hallock

About the time we get most people understanding normalization and the advantages of doing so, we run into the new rage to create de-normalized models. It's sometimes important to point out that "to de-normalize" really means it was first "normalized". The process of de-normalization should not take the place of an initial normalized model. In fact, I would support many of my colleagues who do not want to de-normalize even for a data warehouse. Many feel that with new database engines and proper physical implementation, there is no need for de-normalization for performance. Further, Object-Relational databases also add new features that promote a de-normalized view such as, multi-valued columns. But, on with the de-normalization!

Image26.gif (3306 bytes)

Figure One: Normalized Model Fragment

This model presents the facts:

    1. Article must be categorized by the Article Type,
    2. Article Type must have a Description.

The constraints, supported by the example populations:

    1. An Article can have only one Article Type,
    2. Article Type has only one Description,
    3. Each Description has only one Article Type.

Side Note: The Article Type has Description with a uniqueness constraint over both columns, is a fast and easy 5th normal form.

There are some implied rules as well. An Article has only one Article Type and Article Type has only one Description therefore: Article has only one Article Type Description. The 5th normal form supports that either the Code or the Description could have been the primary key. It only makes sense that "Article has one Article Type which has one Article TYPE Description, or Article has one Article Type Description that has one Article Type code" are both true. The management of this model has no surprises. The normal issues of changing the code for Article Type apply such as cascading updates, prevention of deletion, etc.

The de-normalized view would have us associate the Description with the Article in addition to the original code. The argument is supported by the desire to avoid the join in order to get the Description during a read operation. There are two ways of doing this. One is merely do it, the other is keep some sense of the integrity between code and description. The first requires no work, so no explanation is given. The second is more interesting. If you want to keep the integrity between code and description which are now located in two places, you need to complete the transformation as follows:

 Image27.gif (4513 bytes)

Figure Two: The basic conceptual model adding the role of Article includes Description

In order to keep the original mandatory rules, the Description is required by the Article. The Article Type is organized by the Article Type was mandatory and the Article Type has Description was also mandatory. Therefore it is mandatory that an Article has an Article Type Description. If you are going to de-normalize, you should be prepared for some rules.

This model creates two tables:

Image28.gif (2330 bytes)

Figure Three: Relational Schema

The Description has been redundantly stored in the ARTICLE table. The double lines indicate primary identifiers and the single underline a unique column. This would probably be the end of it as long we accept the fact we have some code to write to enforce changes in Descriptions. We need to not only allow for changes in the Article Type has Description, but we need to reflect those changes in the 'Article includes Description' fact as well. What are the chances we could change the Description in the Article includes Description to some wrong value, you may ask? Well, maybe the initial developers will get it right, but things have a way of going wrong over time. So, let us apply a little preventive medicine and polish of the model with a subset constraint.

Image29.gif (4076 bytes)

Figure Four: Some control over the situation

An subset constraint does the following:

"Assuming (A) is an instance of Article or Article Type and (D) is an instance of Article or Description then For each Article/Article Type (A) and Article/Description (D) over their join paths, this holds:Article (A) is categorized by the Article Type ... Article (A) including some Description if and only if Article Type (A) has Description (A)."

OK - in English -.

The Article has an Article Type and a Description - which must match an Article Type and the given Description of the Article Type.

During the write, update or delete operations this is enforced. During read operations you do not need to join the Article Type Table to get the Description. We have kept some of the faith by enforcing the equality constraint; no program updating the database can accidentally get it wrong. Even changing the Article Type Description will require a BEGIN and END TRANSACTION while updating both tables. The final commit will only succeed if and only if the code and descriptions match.

Keeping the referential integrity between Article and Article Type is also a good idea, since it creates an index on the Article Type column, which is probably used in the WHERE clause of SQL.

A short start on recursive relationships:

Recursive relationships are usually presented as:

Image30.gif (1483 bytes)

Figure Five: Recursive Relationships

The entity is shown as having a relationship with itself. But, I have a problem with this when the recursive relationship is forming a defined hierarchy or tree. For example, assume we are talking about an organization. Group reports to Department, Department reports to Division, Division reports to Unit and Unit reports to Company. The recursion has an organization reporting to an organization, but is it reporting to the right type of organizational unit? Often these structures go bad after a few updates, because a group is reporting to a Division rather than a Department, and etc. We really need a model that states Organization of a Type is reporting to an Organization of a Type where the types are known to be related to each other. Ah…the types themselves are a hierarchy.

The Organization Type GROUP reports to the Organization Type DEPARTMENT.

Image31.gif (2420 bytes)

Figure Six: Sample Organization

Group is the CHILD of DEPT just as DEPT is the CHILD of DIVN…

Image32.gif (3344 bytes)

Figure Seven: A Hierarchy Pattern

Note: "DOT" means "depends on".

This is just a building block, but give it some thought since we also want to think about the structure of the organization over time. Think of the LEVEL TYPE TREE as a template for a tree, not the tree itself. It is the allowable relationships between LEVELS within a tree. LEVELS can also belong to more that one tree.

In the suggested reading below, Terry Halpin provides the ring constraints to control these structures, while David Hay and Martin Fowler provide patterns for implementing the structure itself. Hay provides the basic idea noted above and Fowler provides a more complete model known as the "responsibility pattern".

Here is some reading you may want to do before the next issue:

Conceptual Schema and Relational Database Design by Terry Halpin; Prentice Hall Australia, 1995, ISBN:0-13-355702-2

Data Model Patterns: Conventions of Thought by David Hay; 1996, Dorset House, ISBN: 0-932633-29-3

Analysis Patterns: Reusable Object Models by Martin Fowler; 1997, Addison-Wesley, ISBN: 0-201-89542-0

Patrick Hallock is a Senior Partner and Principal Consultant for InConcept. He has over 20 years of ORM/NIAM experience and is  a certified ORM consultant, trainer/train the trainer and a certified Visio trainer.

Contact Information:

Patrick Hallock
President and Co-Founder
InConcept
8171 Hidden Bay Trail N
Lake Elmo, MN 55042
path@inconcept.com
(651) 777-8484
fax: (651) 777-9634
http://www.inconcept.com

© Copyright, 1998-2004 InConcept (Information Conceptual Modeling, Inc.) All Rights Reserved. Privacy Statement.
ISSN: 1533-3825