May 2004                                    Issue: 32

Journal of Conceptual Modeling
www.inconcept.com/jcm

TEST YOUR FOUNDATION KNOWLEDGE

by Fabian Pascal

 

RELATIONAL WEAKNESSES MY FOOT!

 

 

Those who follow my writings (here, at DATABASE DEBUNKINGS, and elsewhere) know that the main thrust of my work has been demonstrating the negative effects of the lack of foundation knowledge on data management practice. They are familiar with the Weekly Quotes I post regularly on my site as evidence of the sad state of that knowledge. At the beginning of a new year I collect the “best” pearls from among the preceding year’s quotes for a yearly presentation called To Laugh or To Cry? Fallacies in Data Management (in fact, I have just completed the 2004 edition, which will be offered on May 3rd at the DAMA conference in Los Angeles).

 

If they do not know, coming in, why they ought to laugh, cry, or both, those who attend the presentation get to figure it out by the time they leave (or at least, so I hope). It has occurred to me, though, that given the state of knowledge, those who do not attend may be puzzled as to what’s wrong with those quotes (there is evidence to that effect from the feedback I get); and even for some of those who do attend things may not be entirely clear. So it is probably a good idea to make sure that as many as possible understand what’s wrong with those quotes and why, because they pertain to foundation knowledge.

 

The reader is urged to try and figure out what’s wrong with the quotes before proceeding to the explanation.

 

Note: There is usually more to say about these matters than what the space here allows, so for more in-depth treatment of data management fallacies we refer the reader to our DATABASE DEBUNKINGS web site and our DATABASE FOUNDATIONS papers.

 

Consider the following, posted on an Internet web site:

 

·   With regard to data modeling, you can't define attributes which are of complex types (arrays, records, tables). Each relation has to be in first normal form. Or in other words: A "simple" natural structure must be divided into many flat structures (= tables/relations). The result is a complex structure of relations.

·   Result of a query is a flat table. Any complex structure, which has been input of the query got lost.

·   No way to define recursive programme structure (using SQL).

·   The type system of SQL doesn't match with the type system of the embedding language ("type mismatch").

·   Controlling integrity constraints costs a lot of time (need to control the usage of primary/foreign keys).

·   Lack of mechanisms to control the physical level of the database (only simple clustering).

·   Definition of operations detached from data definition.

--Fachbereich Informatik, Weaknesses of the Relational Model

 

Each and every one of the seven arguments is wrong, quite an achievement. That this quote comes from an academic institution—a technical university—is all the more appalling, but hardly a rare occurrence.

 

We take each bullet turn and number the comments for convenience.

 

1.      First of all, we are sick and tired of the often-regurgitated nonsense that relations are “flat structures”. Their pictures on paper or screen—what is presented to the user—are flat due to the nature of the medium. But an n-ary relation—that is, a relation with n attributes—represents (informally) an n-dimensional “entity type”. That is, in fact, a critical difference between a relation and a table—the picture of a relation--that are, erroneously, deemed the same thing in the quote.

 

Second, organizing the real world is based on perceptions of reality and there is no scientific way to prefer one to another, so what is “natural” is in the eye of the beholder. Anyway, any structure can be represented relationally, even hierarchies such as organizational charts, or part-subpart assemblies (see chapter 7, PRACTICAL ISSUES IN DATABASE MANAGEMENT). But the purpose for which data is structured is manipulation—logical inferencing—and to guarantee correctness, data must have its integrity protected. So the real issue is not structure representation for its own sake—which is what many fail to realize--but what is the “best” representation for integrity and manipulation purposes. And in that sense, for a multiplicity of reasons that we have amply expounded over and over again, the relational structure beats any other hands down. In particular, it is also the simplest general structure known that has a real world interpretation that lends itself to logical inference and mathematical manipulation, one of the main reasons it superseded the systems preceding it more than three decades ago. The two other best known data models, hierarchic and network, add only representation, integrity and manipulation complexity, but no power.

 

Third, a common misconception in the industry is that first normal form (1NF) requires only “simple” (character strings and numbers) data types. In fact, 1NF means that every relation “cell” contains exactly one value; there are no restrictions on the kinds of value attributes can take. Attributes of arbitrary complexity are permitted, as long as pertinent data types are properly defined. For example, given, say, a relation-valued data type (and the appropriate associated operators on relation values), the attributes defined over it will be relation-valuedtoo. 1NF would be violated only if an attribute had zero or two or more relation values of the pertinent type in some row. In such a case the table would not be a relation at all, because a relation is in 1NF by definition (see DATABASE FOUNDATIONS papers #1, What First Normal Form Really Means, and #2, What First Normal Form Means Not). Note very carefully, though, that such violations would not be possible with a truly relational DBMS (TRDBMS), because proper support of data types and enforcement of attribute constraints would prevent values violating 1NF from being entered in the database.

 

Note: The fact that SQL products failed for a long time to support user-defined data types (UDT) of arbitrary complexity is not the relational model’s fault (some of them currently provide some such support, albeit with flaws and limitations). It is fair to say, however, that support of such types raises some nontrivial implementation issues for both vendors and users, which is, at least in part, the reason so-called “universal DBMSs” did not fare as well as expected (see chapter 1, Careful What You Wish For: Data Types, in PRACTICAL ISSUES IN DATA MANAGEMENT). UDTs are a fundamental feature in data management, so they must be implemented correctly and with extreme care, to be usable.

 

2.      See earlier response on “flat tables”.

 

3.      SQL and its implementations have also only themselves, not the relational model, to blame for failure until recently to support recursive queries; and even what they currently support has flaws. Hierarchies can be readily represented relationally and there is nothing in the relational model to prevent recursive operations (see chapter 7, Climbing Trees in SQL, in the book mentioned above).

 

4.      Ditto for any “type mismatch” (see earlier comment), which has nothing to do with the relational model.

 

5.      The argument on integrity is absurd and reveals an appalling lack of basic understanding of what database management is. First, integrity is hardly just keys. Second, the sum total of integrity constraints is the DBMS-understood meaning of a database, and the best approximation a DBMS can have of the user-understood meaning (see DATABASE FOUNDATIONS paper #4, Un-muddling Modeling). Integrity is the DBMS function par excellence: without it, a DBMS does not understand what the data means and cannot manage it. Indeed, the move from application-file based data management to databases and DBMS was driven by the practical need to centralize shared functions in the DBMS, which made integrity even more important, because with shared databases any one application program could corrupt the database for all others. So we would say that integrity is the database shared function par excellence. Renouncing integrity is tantamount to employing “DBMSs that are not DBMSs”, so to speak.

 

6.      Another old and persistent fallacy that won’t die: The Logical-physical Confusion. The relational model is logic and mathematics applied to data management for the purpose of mechanized inferencing, which has absolutely nothing to do with physical implementation. And intentionally so: by being mute on such details—physical data independence—the relational model leaves DBMS designers total freedom to use whatever physical means at their disposal to maximize performance, without affecting the user’s and applications’ views of the data, a huge saver of development and especially maintenance costs.

 

7.      A fallacy induced by object orientation, which is For Application Development, Not Data Management. The quote alludes to the poorly defined OO concept of encapsulation that is used to mean a variety of things. For a deconstruction see Encapsulation Is a Red Herring. Suffice it to state here that the relational model supports encapsulation where it makes sense (scalars) and not where it doesn’t (e.g. relations, see FOUNDATION FOR FUTURE DATABASE SYSTEMS THE THIRD MANIFESTO, 2nd. Ed.).

 

 

Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. He was affiliated with Codd & Date and for 20 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on data management technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCSF, and IRS. He is founder, editor and publisher of DATABASE DEBUNKINGS, a web site dedicated to dispelling persistent fallacies, flaws, myths and misconceptions prevalent in the IT industry. Together with Chris Date he has recently launched the DATABASE FOUNDATIONS SERIES of papers. Author of three books, he has published extensively in most trade publications, including DM Review, Database Programming and Design, DBMS, Byte, Infoworld and Computerworld. He is author of the contrarian columns Against the Grain, Setting Matters Straight, and for The Journal of Conceptual Modeling. His third book, PRACTICAL ISSUES IN DATABASE MANAGEMENTserves as text for his seminars.

 

 

Special Offer: Fabian Pascal is offering JCM readers subscriptions to the DATABASE FOUNDATIONS series of papers at a discount. To receive your discount, contact him (contact information is available on the About page of his site).