October 2003        Issue: 29

Journal of Conceptual Modeling
www.inconcept.com/jcm

 

THE CHASING OF MAYFLIES
REPLY TO RIGGS
by Fabian Pascal
 

  I hope very much that computing science at large will become more mature, as I am annoyed by two phenomena that both strike me as symptoms of immaturity. 

The one is the widespread sensitivity to fads and fashions, and the wholesale adoption of buzzwords and even buzznotes. Write a paper promising salvation, make it a "structured" something or a "virtual" something, or "abstract", "distributed" or "higher-order" or "applicative" and you can almost be certain of having started a new cult. 

The other one is the sensitivity to the market place, the unchallenged assumption that industrial products, just because they are there, become by their mere existence a topic worthy of scientific attention, no matter how grave the mistakes they embody. In the sixties the battle that was needed to prevent computing science from degenerating to "how to live with the 360" has been won, and "courses" -- usually "in depth"!-- about MVS or what have you are now confined to the not so respectable subculture of the commercial training circuit. But now we hear that the advent of the microprocessors is going to revolutionize computing science! I don't believe that, unless the chasing of mayflies is confused with doing research. A similar battle may be needed. 

--E. Dijkstra, Proc. 4th Intl. Conf. on Software Engineering, 1979

 

 In the first editorial for the launch of DATABASE DEBUNKINGS, Skyscrapers with Shack Foundations, I wrote as follows:

 In fact, under industry pressure there is little database education to be had. Product-specific training reigns supreme and even academic computer science programs are becoming increasingly vocational in character.

 I repeated the argument in my second editorial, The Myth of Market-Based Education. Since then, I have demonstrated in at least one article, Denormalization for Performance – Et Tu Academia? that the academic state was actually sadder. That was further reinforced recently in an exchange with a faculty member from the University of Washington who, when I asked if there would be interest in presenting to their department, responded as follows:

I don't think your presentation would generate enough interest in our group, because nobody here works on relational data.  Sorry...

In other words, they are not interested in the theoretical foundation of their field!!! But I do know what they are almost exclusively interested in: XML.

 This brings home why I left academia in disappointment many years ago, and why I have come to dread exchanges with today’s academics. I do not expect much of the industry, but for those who purport to be scientists and educate new generations by teaching them how to think independently, critically and logically, lack of interest in and knowledge/understanding of the history and foundation of their own field is a mortal sin.

Riggs’ reaction to my previous JCM column is an excellent example of the consequences of this sad state of affairs accurately described by Dijkstra. As is usually the case, there are so many imprecise, unsubstantiated and inaccurate arguments packed in his article, that it would be simply impossible, not to mention tedious and painful, to address them all. For an idea of what it would take, consider Figure 1--a copy of only one page of Riggs’ article annotated by Chris Date, to whom I sent a copy for comments. There are five pages altogether!!

 

Figure 1: Annotated page of Riggs’ article

Fortunately—as is also usually the case—it is not necessary to delve deep into the morass of Riggs’ arguments to demonstrate reasoning flaws sufficiently serious not to bother with the rest.

 Says Riggs:

 It is a tenet, since Von Neumann, that anything may be data. In this regard, there is no difference between a conceptual model, a logical model and a physical model. All maintain data. The differences are simply what the models mean to represent. These models are at three levels because: 1) they represent three levels of abstraction in one process, 2) what is modeled at each level is different.

 But as Chris Date points out “Surely, all von Neumann stated was that we could operate on binary programs same as data. Is that the same thing?” Anyway, if the three types of model are all the same, why are there three of them, not one? Is there a purpose each of them serves, which can only mean that there must be some differences between them? Such inaccurate inferences and fuzzy statements do not bode well for Riggs.

Among proponents of XML there are two distinguishable viewpoints, related to the purposes of the viewers. The first view is a data centric view. Persons with this view are interested in more or less the same set of operations as traditional DBMS. XML seems to be, with a suitable query language, adequate for this. This view is fundamental in XML-QL and in semi-structured databases [Abiteboul]. The second (and the original) view of XML is a text centric view. Persons with this view are concerned with capturing the data content of texts without the violence of re-writing the source. XML marking while useful, is not perfect at capturing the data in even the most common texts. [Riggs]

 I instinctively become suspicious of what Dijkstra warns against: arguments by academics based on analysis of what the industry does (even if it is with source quoting). It is the precise point of my whole endeavor that that is the least source of intellectual inspiration.

Be that as it may, with regards to database management, on what grounds does Riggs declare XML “adequate”? What exactly does he mean by “adequate”? Is adequate enough? Isn’t there any evidence that we can do—and in fact, did--better, and should we? Is the academic function to progress technology, or rationalize whatever the industry and trade media promote?

Is Riggs aware of the history of the database field? Does he know what happened—and why--to DBMS products which, like XML, were based on the hierarchic data model? That the relational data model was invented precisely to address serious shortcoming of the hierarchic model, which was found inadequate in practice? (see Those Who Don’t Know or Forget the Past Are Condemned to Repeat It and Managing Data with XML: Forward to the Past).

 In my seminars I argue that any data model claimed to be superior to the relational model—my definition of adequacy—must demonstrate

    (a)  a formal foundation
    (b)
  at least as sound as predicate logic, set theory and normalization theory
    (c)
  which is as complete and general
   
(d)
 
and as simple or simpler

Instead of surveying the industry, has Riggs, as a scientist, subjected XML to a rigorous analysis using those criteria, that demonstrates this is the case? And if it is not the case, why should we reinvent the database wheel (and a square wheel at that)? Is that what progress means today?

 Ironically (given that Riggs is the academic here), my notion of “fundamental” is fundamentally different than his. It is certainly neither XML-QL, nor “semi-structured”, nor any other industry hyped concoction or buzzword (Can Riggs define --precisely, please!— “semi-structured? See Unstructured Thinking for my take on this subject.)

Had Riggs done the research, he would have found that XML, as invented, had no formal foundation and practically no semantics (integrity and manipulation), because those who came up with the idea were unaware of the need for such, nor did they intend it for data management and, in fact, not really for text management either. Like with hierarchical systems before it, theory and semantics in the form of integrity and manipulation, is now being post-fitted to it, a dubious practice whose fruitlessness was painfully and expensively discovered by those who tried it with IMS and CODASYL. It is rather telling that in seeking such retrofitting, the core XML concept of “document” had to be abandoned, and a new abstraction called a “sequence” had to be invented. The industry never learns, in large part because academia renounced its intellectual/scientific function in favor of rationalizing industry practice.

 Technically a model is just a set with relations such that there is an interpretation function that makes the standard commutation diagram work … A data model is not a map. A data model is not “a general theory of data used to map enterprise-specific business model … to enterprise specific logical models that are understood by DBMSs” … XML is a model of data in text, as complete with respect to data as relational theory. XML maps far more directly to text data sources. That this ready mapping extends to many other program systems may be no more than the reflection of the nature of programs themselves. It is however a fact of real world practice, shown by the proliferation of XML as an underlying means of logical storage in more and more software systems.

 When I was in academia I was lucky to have it drilled into me that pronouncements on a subject require knowledge of the scientific history of the subject. Since Codd invented the term data model, Riggs should have been aware of the original definition, and at the very least justify why his own diverges so much from the original. Here is Codd’s, and even though he was not very precise, it is clearly by no means anything like Riggs’:

 “[A data model] is a combination of three components:

1. a collection of data structure types (the building blocks of any database that conforms to the model;

2. a collection of operators or inferencing rules, which can be applied to any valid instances of the data types listed in 1, to retrieve or derive data from any parts of those structures in any combinations desired;

3. a collection of general integrity rules, which implicitly or explicitly define the set of consistent database states or changes of states, or both—these rules may sometimes be expressed as insert-update-delete rules.”

--E. F. Codd, Data Models in Database Management, IBM Research Laboratory, San Jose, CA, 1980

 In the relational model, the structure, integrity and manipulation are precisely defined by predicate logic and set theory. Can Riggs specify in as precise and simple a manner what the structure, integrity and operations underlying the “XML data model”—which XML data model, I hasten to add, as there are many--are? We would probably agree on the tree structure, but how about integrity constraints and operations? Until he can come up with equivalent specs in a manner that practitioners can understand as readily as they do data types, tables, keys, table operations, etc. he should not make such statements. Whether he realizes it or not, whether he likes it or not, a data model is just what Codd invented. If Riggs want to redefine it, he should explain why and to what advantage (instead of digressing to compilers, like he does, see next quote.)

 Moreover, I would like to hope (though I am skeptical) that when he says “adequate” he does not mean that text-structured processing has the same informational nature and value of relationally-structured processing, because that confusion is also a “fact of the real world practice” he is so accepting of.

 A compiler is a map from a conceptual model to a physical model. Compiler theories (such as LR-k grammars and algorithms) are maps from language classes to compilers. Relational theory is a general data model, in which specific models may be constructed. The paradigm here is first order predicate calculus in which a theory may be embedded.

 As a practical matter, creating a database (or information base, etc.) requires a judicious choice of theoretically possible representations. This is quite as true for relational models as any other. Fictitious (non-domain) entities may be created (as for m-n relationships). Constraints must be selected or omitted. The entire design may become subject to efficiency and purposive constraints as well.

 On the one hand the “a data model is “not a general theory of data”, but on the other “relational theory is a general data model”. Riggs is not very clear in his mind, is he?

 Since Riggs does not define what a conceptual model is, it is impossible to assess his statement. However, in my book, conceptual models are informal models of specific enterprises (also referred to as “business models” or “entity-relationship models”). And it is the very informal nature of those models that makes them non-mechanizable. It escapes me, therefore, how conceptual models can be “compiled to physical models”. It is, in fact, the precise purpose of a (proper) data model to effect mapping of such informal enterprise-specific models to formal representations that can be computerized and, thus, implemented physically. That is exactly why the data model must have a formal foundation. The relational model does have a usable one. Experience with old hierarchic and network products has shown that to the extent that XML is retrofitted with a formal foundation—and that has not happened yet—it will be as usable as IMS or CODASYL were; that is, not much.

 Creating a database does, indeed “require a judicious choice of theoretically possible representations”. But Riggs has it somewhat backwards when he says, “this is quite as true for relational models as any other”. Aside from the fact that his use of the plural term “relational models” betrays his confusion of logical models based on the relational model with the relational data model (there is only one) itself, what Riggs should have said is, “this should be as true of nonrelationally derived logical models (such as those based on XML) as they are of relationally derived ones”. Which, of course, it is not.

 I find it rather sad that I have to clarify such things to a practicing academic. I would have addressed the rest of Riggs’ article but, to be honest, I have neither the time nor the inclination.

Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. He was affiliated with Codd & Date and for 20 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on data management technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCSF, and IRS. He is founder, editor and publisher of DATABASE DEBUNKINGS, a web site dedicated to dispelling persistent fallacies, flaws, myths and misconceptions prevalent in the IT industry. Together with Chris Date he has recently launched the DATABASE FOUNDATIONS SERIES of papers. Author of three books, he has published extensively in most trade publications, including DM Review, Database Programming and Design, DBMS, Byte, Infoworld and Computerworld. He is author of the contrarian columns Against the Grain, Setting Matters Straight, and for The Journal of Conceptual Modeling. His third book, PRACTICAL ISSUES IN DATABASE MANAGEMENT serves as text for his seminars.

© Copyright, 1998-2004 InConcept (Information Conceptual Modeling, Inc.) All Rights Reserved. Privacy Statement. ISSN: 1533-3825