June 1998                     Issue: 3

Journal of Conceptual Modeling
www.inconcept.com/jcm

De-normalized for Speed
 by Dr. John K. Sharp

How many times have you heard someone say that this application has been de-normalized for speed? This comment usually comes in the mist of a review of the project and is supposed to make the analyst feel better when someone (usually a good NIAM/ORM/NLM modeler) points out errors in the model. In my one experience of a "de-normalized for speed" model tested against the fully normalized model, the normalized model was faster for critical transactions. This may not be a standard outcome, but I am concerned that the "de-normalized for speed" comment may be given when the analyst has no idea about what the "normalized" system would look like. Furthermore, the analyst would have no basis for commenting about the difference in speed between the two for critical CRUD (Create, Retrieve, Update, Delete) transactions. After I have engaged the, usually reluctant, analyst in this conversation, I always finish with: "Is the speed of the computer more important than the speed of the human to computer interaction?"

One type of de-normalization that will be illustrated here is when the allowed populations of non-key columns in a table are dependent on the populations of other non-key columns in the same table. This recursive condition commonly occurs in models and it may be ignored or unknown.

Why must this condition be considered an error? Any structure that prevents valid data from being populated or allows invalid data to be populated must be considered an error. Eventually, the application will be maintained by staff who do not know about any undocumented decisions that were made either during design or implementation. If these recursive conditions are not formally maintained then update anomalies will exist and eventually invalid data occurs. Trying to clean this data up during maintenance or in future applications is always a significant problem.

Example Problem: Customer Address

Sentence instance:

1. Customer identified by 23415 has a ship address of 121 Market St., Clay Center, KS 67432.

Sentence type: (showing placeholders)

1. Customer identified by <CustomerNo> has a ship address of <AddressLine1>, <City>, <State> <Zip>.

Analysis using the NLM Matrix Analysis technique:

1. Customer identified by <CustomerNo> has a ship address of <AddressLine1>, <City>, <State> <Zip>.

23415

121 Market St.

Clay Center

KS

67432

 

------------

-------------------

---------------

----------

-----------

Allowed?

another

121 Market St.

Clay Center

KS

67432

Yes

23415

another

Clay Center

KS

67432

No

23415

121 Market St.

another

KS

67432

No

23415

121 Market St.

Clay Center

another

67432

No

23415

121 Market St.

Clay Center

KS

another

No

The matrix answers are generated by asking the subject matter expert the following question. Given that ‘Customer identified by 23415 has a ship address of 121 Market St., Clay Center, KS 67432.’ is true, is it allowed for another customer [such as 21523] such that ‘Customer identified by 21523 has a ship address of 121 Market St., Clay Center, KS 67432.’ may also be true?

The NLM analysis procedure identifies recursive dependencies by analyzing the "N-sentence" (The sentence consisting of the "No" placeholders in the previous sentence.).

2. Some customer has a ship address of <AddressLine1>, <City>, <State> <Zip>.

121 Market St.

Clay Center

KS

67432

 

------------------

----------------

----------

----------

Allowed?

another

Clay Center

KS

67432

Yes

121 Market St.

another

KS

67432

No

121 Market St.

Clay Center

another

67432

No

121 Market St.

Clay Center

KS

another

No

The "No" answers indicate that there is a recursive relationship in the original sentence (i.e., The placeholders with "No" answers in the original sentence have dependencies among themselves.). The NLM analysis procedure continues and then terminates with the following elementary sentences.

3. Zip code 67432 is for Clay Center, KS.

3. Zip code <ZipCode> is for <City>, <State>.

    67432

    Clay Center

    KS

     

    ------------

    ---------------

    ----------

    Allowed?

    another

    Clay Center

    KS

    Yes

    67432

    another

    KS

    No

    67432

    Clay Center

    another

    No

4. Customer identified by 23415 has a ship address line 1 of 121 Market St. and a zip code of 67432.

4. Customer identified by <CustomerNo> has a ship address line 1 of <AddressLine1> . and a zip code of <Zip>.

23415

121 Market St.

67432

 

-----------

------------------

----------

Allowed?

another

121 Market St.

67432

Yes

23415

another

67432

No

23415

121 Market St.

another

No

Sentences 3 and 4 are the fully normalized sentences contained in sentence 1. If the implementation team chose to create the table shown as table 1, then a population constraint must exist for both data entry and updates that maintain the dependency between the table implementing sentence 3 and table 1.

Customer

Customer No

Address Line 1

City

State

Zip

23415

121 Market St.

Clay Center

KS

67432

Table 1

Addresses have been repeatedly implemented without considering the recursive dependency. The official zip code definition (The city and state associated with a zip code is the city and state of the post office that provides local mail service.) establishes the recursive dependency. The de-normalized structure allows individuals to use local names instead of the official city name (e.g., Albuquerque Acres instead of Albuquerque), but this also allows invalid (as well as alternate) names to be entered and requires more data entry time. Deciding not to maintain the recursive relationship must be a business decision and not a design or implementation decision.

Have you ordered a pair of flannel lined blue jeans from L L Bean lately? Order clerks are handling thousands of calls a day. What is the cost of individually typing in the city and state on each order? The cost is very significant! L L Bean order clerks ask the customer for their street address and zip code. The zip code, 67432, is validated by the order clerk stating that the customer lives in Clay Center, KS. I know of the cost of providing the city name because it sometimes takes several minutes to assist in the spelling of "Alberkerturky." Some companies have implemented additional improvements to this procedure that request the telephone number of repeat customers in order to bring up all the previous address information.

The ability to quickly access the data in the de-normalized application may be in direct competition with developing the most efficient way of capturing the input data. The information analyst has a responsibility to the business analyst to support him/her in creating more efficient processes. A rigorous analysis procedure, such as NLM, identifies all of the dependencies within the subject area and generates the required set of elementary sentences through questioning the subject area expert (The "Yes" and "No" answers above.). This provides a solid basis for the business analyst to use in improving processes.

Happy Modeling!

Dr. John Sharp is the founder and principal consultant for Sharp Informatics.Before starting Sharp Informatics in 1997 he was employed by Sandia National Laboratories in Albuquerque, NM for 18 years. While at Sandia he held staff and management positions in all areas of information technology, including analysis, design, implementation, maintenance, information architecture, data administration, and information technology research. He has worked closely with Prof. Shir Nijssen of The Netherlands to improve the NIAM analysis methodology. Dr. Sharp is the creator of the first information analysis procedure known to be mathematically precise.This procedure reformulates the usual (imprecise and inaccurate) statements and examples from a subject area into verified fact types. The output of this productivity enhancing process (a set of information requirements) is compatible with all the latest and most productive database application creation tools. John is the editor of the international standard for conceptual schemas. He has co-chaired two international conferences on natural language modeling and he has presented numerous papers and seminars at professional conferences.

Contact information:

Dr. John Sharp
Sharp Informatics
1604 Vassar SE
Albuquerque, NM 87106
sharp@sharp-informatics.com
505-243-1498
fax 505-248-0345
http://www.sharp-informatics.com

© Copyright, 1998-2004 InConcept (Information Conceptual Modeling, Inc.) All Rights Reserved. Privacy Statement.
ISSN: 1533-3825