September
2004 Issue: 33
Journal of Conceptual Modeling
www.inconcept.com/jcm
A Costly
Illusion That Won’t Go Away
by
Fabian Pascal
My point in Irrational Exuberance, which I think Craig Mullins missed in his posted response to my analysis of his article, was that as long as knowledge about fundamentals such as normalization is poor or inexistent (and getting worse due to a “market-driven” dumbing down of the educational system, see The Myth of Market-based Education), there will be no real improvement in technology or products, and there is no reason to expect that poor practices such as denormalization will go away either, as Craig believes. The industry is trapped into poor tools and practices by its own failure to require and reward foundation knowledge. We have been providing ample evidence to that effect at DATABASE DEBUNKINGS, in our papers, books, and seminars.
Here’s, for example, Craig himself on normalization:
1NF specifies that all underlying domains contain atomic values only. So 1NF eliminates repeating groups. 2NF specifies that every non-key attribute is fully dependent on the key. So 2NF eliminates functional dependencies. 3NF specifies that every non-key attribute is non- transitively dependent on the primary key. So 3NF eliminates functional dependencies. Normalization past 3NF does not occur often in normal practice because most tables in 3NF are usually also in 5NF. The additional normal forms are: * Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. In his later writings Codd refers to BCNF as 3NF [Ed.: ?]. A row [sic] is in BCNF if every determinant is a candidate key. Most entities in 3NF are already in BCNF. * An entity is in 4NF if it is in 3NF and has no multiple sets of multi-valued dependencies [Ed.: ???]. In other words, 4NF states that no entity can have more than a single one-to-many relationship within an entity if the one-to-many attributes are independent of each other. * 5NF specifies that every join dependency for the entity must be a consequence of its candidate keys.
--Craig Mullins, DATABASE ADMINISTRATION
Points arising:
· 1NF: As we explain in PRACTICAL DATABASE FOUNDATIONS papers #1, What Normal Form Really Means, and #2, What First Normal Form Means Not, atomicity is not precisely definable and, therefore, neither would be a 1NF definition based on it. A table faithfully representing a relation variable (relvar) is by definition in 1NF and, therefore, has exactly one attribute value of the pertinent data type in each cell; any types of value are permissible, as long as they are associated with applicable operators. For example, if a data type has an INTEGER possible representation, there should be exactly one integer value per row in any of the columns defined over that data type; ditto for set- or even relation-valued attributes (although that may not be the best design idea).
· 2NF: Every attribute is functionally dependent on the whole key. Further normalization to 2NF eliminates functional dependencies (FDs) on composite key components.
· 3NF: Every attribute is functionally dependent on nothing but the key. Further normalization to 3NF eliminates FDs on nonkey columns.
· BCNF: Further normalization to BCNF strengthens 3NF to cover tables with two composite keys that overlap.
· 4NF and 5NF: Further normalization to these levels eliminates generalized dependencies:
§ 4NF: multivalued dependencies (MVD) that are not FDs
§ 5NF: join dependencies (JD) that are not MVDs
Note: On MVDs and JDs see PRACTICAL DATABASE FOUNDATIONS paper #6, The Costly Illusion: Normalization, Integrity and Performance.
Relational tables (relvars) are in 1NF by definition. Relvars without composite keys that are in 3NF are also fully normalized (in 5NF*).
Note: There is now a sixth normal form (6NF), but it was defined much after the above “teacher” posted his question (see TEMPORAL DATA AND THE RELATIONAL MODEL), so he was referring to a concept inexistent at that time, claiming, nevertheless, that it “made sense” to him.
Here are some debunkings of nonsensical pronouncements on normalization that are, alas, prevalent in the industry.
I have been trying to find the correct way of normalizing tables in Access. From what I understand, it goes from the 1st normal form, to 2nd, then 3rd. Usually, that's as far as it goes, but sometimes to the 5th and 6th. Then, there's also the Cobb [sic] 3rd. This all makes sense to me. I am supposed to teach a class in this starting next week, and I just got the textbook. It says something entirely different. It says 2nd normal form is only for tables with a multiple-field primary key. 3rd normal form is only for tables with a single-field key. 4th normal form can go from 1st to 4th [sic], where there are no independent one-to-many relationships between primary key and non-key fields. Can someone clear this up for me please?
--Online question
One does not know whether to laugh or cry. Not only doesn’t the person have a clue about normalization—he could not get even Codd’s name right!--but he is actually going to teach others! Yet another aspect of The Ignorance Mechanism for anybody who thinks “a whole industry cannot be wrong”.
Let us make the understanding of normalization very simple. Forget about it! Normalization is for academics and in its strictest form is generally impractical due to its adverse effect on performance in a commercial environment, especially 3NF, 4NF and 5NF.
--Gavin Powell, A Layman’s Approach to Relational Database Normalization, DataWarehouse.com
One of the advantages for the many Powell-like “authors” in the industry in writing for “laymen”--of which there too many in the industry too--is that the latter can’t see thru the ignorance of the former.
As we demonstrate in paper #6, normalization and denormalization are purely logical and, therefore, have nothing to do with performance, which is determined entirely at the physical level, by implementation details. Performance gains, if any accompany denormalization, are not due to denormalizing, but rather to trading off integrity for them.
The mother of normalization was not someone idly thinking about data integrity issues, but people concerned with getting RDBMS to perform adequately. Only afterwards did people start "covering up" the fact that the reason normalization rules were invented was to address performance, and make believe that the normalization rules were invented to deal with data integrity.
--B.P. Margolin, microsoft.public.sqlserver.programming
What can we say? Margolin has it completely backwards! Obviously another of the many in the industry who feels competent to pronounce on a subject that he has not bothered to ever educate himself on (or perhaps he had the “teacher” above), and of which he knows practically nothing. A quite common phenomenon that social psychologists were able to identify scientifically (see Unskilled and Unaware of It).
No major application will run in third normal form.
--George Koch, ORACLE 8, THE COMPLETE REFERENCE
Koch means to say here “if the database is in 3NF”, as normal forms pertain to databases, not applications. Anyway, if this claim were true—it is not—whose fault would that be, pray? Certainly not normalization’s. Does Koch, a top Oracle expert who authored many popular books and teaches seminars, understand what determinesperformance? What really yields performance gains that sometimes follow denormalization? Is Koch aware of the cost of such gains, among them bias against certain applications, integrity risk, and update anomalies? Has he ever stopped to criticize his favorite DBMS, which, like all SQL implementations, fails to fully and truly implement the relational model, which would have relieved practitioners from trading off integrity for performance? Without knowledge of fundamentals, we suspect not.
Can Craig Mullins explain why, under these circumstances, does he expect products to improve, and denormalization to go away?
Aside from Date’s AN INTRODUCTION TO DATABASE SYSTEMS 8th Ed. and his series of articles on the subject in Database Programming and Design, here are more references on normalization.
PRACTICAL DATABASE FOUNDATIONS
paper #1, What Normal Form Really Means
paper #2, What First Normal Form Means Not.
paper #6, The Costly Illusion: Normalization, Integrity and Performance
More on Denormalization, Redundancy, and MVDBMS
More On Repeating Groups and Normalization
On Normalization and Repeating Groups
On Automation of Normalization Validation
On Normalization, Performance and Integrity
On Normalization and the Scientific Method
On Denormalization for Performance
On Normalization, Performance and Database Correctness
On Normalization and the Logical-Physical Distinction
_______________________________________________________________________________________________
Fabian
Pascal has a national and international reputation as an independent
technology analyst, consultant, author and lecturer specializing in data
management. He was affiliated with Codd & Date and for 20 years held
various analytical and management positions in the private and public sectors,
has taught and lectured at the business and academic levels, and advised vendor
and user organizations on data management technology, strategy and
implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland,
Cognos, UCSF, and IRS. He is founder, editor and publisher of DATABASE DEBUNKINGS, a web site
dedicated to dispelling persistent fallacies, flaws, myths and misconceptions
prevalent in the IT industry. Together with Chris Date he has
recently launched the PRACTICAL
DATABASE FOUNDATIONS series of papers that also serve as
text for seminars.
Author of three books,
he has published extensively in most trade publications, including DM Review,
Database Programming and Design, DBMS, Byte, Infoworld
and Computerworld. He is author of the contrarian columns Against the Grain, Setting Matters Straight, and Test Your Foundation Knowledge.
Special Offer: Author Fabian Pascal is offering DBAzine.com readers subscriptions to the PRACTICAL DATABASE FOUNDATIONS series of papers at a discount. To receive your discount, just let him know you’re a DBAzine reader before you subscribe! Contact information is available on the About page of his site.
__________________________________________________________________________________________________________