December
2003 Issue: 30
Journal of Conceptual Modeling
www.inconcept.com/jcm
To O OR to R:
IS THIS A DATABASE QUESTION?
by Fabian Pascal
The following exchange was culled from the comp.databases.object news group and exposes—yet again—the utter ignorance by practitioners, particularly those with a programming background—of data and database fundamentals. Such ignorance is, of course, rooted in an educational system and industry that fail to require and instill such knowledge, and flouts the scientific foundation of the field. An associated problem—also deriving from educational failure—is the difficulty of many practitioners to express themselves clearly, consistently and succinctly; poorly defined, fuzzy terminology is thrown around, whether it applies/makes sense or not, often to impress the reader with jargon that obscures weaknesses and confusion in arguments. Debunking such material is a royal pain, frequently requiring the invocation of Date’s Incoherence Principle (see The Chasing of Mayflies):
It is difficult to treat coherently that which is incoherent.
Not to mention tedious.
The exchange was initiated by the following, rather meaningless issue statement:
Problem: Which is better, an object database (ODB) or a relational database (RDB)?
I would like to hope—although I suspect it’s not realistic to expect—that not much explanation is necessary as to the silliness of this question. “Better” in what sense? And for what purpose? In fact, are object and relational databases equivalents?
Note: The intended term is DBMS, not database (see next). The two are, of course, distinct, but are used interchangeably in the industry.
The anemic case for relational was made by Mike Bresnahan:
RDBMS's [... provide] a simple flexible model of the data (based in set theory - however that helps) and a general-purpose declarative query language. [...] ODBMS's [...] provide a relatively much more complex data model and procedural query language to navigate it.
It is an indicator of the sad state of foundation knowledge in the industry that even those who take a positive stance on the relational model (RM) do not do so for the proper reasons and, in fact, more often than not have the same poor understanding of it as its detractors.
Essentially RM is predicate logic and set theory applied to database management. It is this dual theoretical foundation that confers a multitude of practical benefits such as soundness, flexibility and simplicity. Yet Bresnahan admits he has no idea why set theory is important. One gets the distinct impression of yet another case of practitioners regurgitating things they “heard” about RM, without having bothered to learn and understand it (and no, I do not mean that they should become logicians or mathematicians). If proponents of RM cannot explain its benefits, what else can we expect other than the general tendency to dismiss it as “just theory” (see below)?
Note: Bresnahan says “model of the data”, which quite likely betrays confusion between the enterprise-specific conceptual or logical models and the general data model via which the former are mapped to the latter. A data model is not a model of the data of a specific enterprise, but a theory of data in general, and we would actually prefer the term data theory (see DATABASE FOUNDATIONS paper #4, Un-muddling Modeling).
Now, true RDBMSs (TRDBMS) could—and would—provide a declarative data (not just query) language, which may or may not be general purpose. But Bresnahan most likely refers to current commercial products, which are not TRDBMSs, but rather SQL DBMSs., First, SQL is a data language that is neither based on the relational model as defined by its inventor, nor a general purpose language. And second, while SQL is more declarative than what preceded it, it is less so than what it could and should have been.
The case for ODBMS (and against RDBMS) made by Drew Wade.
The general purpose declarative query language [SQL] was a big step forward, and is continued with ODBMS's, in fact made more powerful, because you can now query not just the primitives (gears, pistons, nuts and bolts), but all the way up to the highest application objects (engine, frame, car), using methods in your query just as if they were built-ins, using relationships the same way, etc.
Aside from repeating the mistake about the nature of SQL and its confusion with RM, I frankly have no idea what Wade is talking about. Gears, pistons and bolts?! Engines, frames and cars?! The grounds on which he makes this rather vague, if not weird, claim escapes me.
If I try to make some sense out of this claim (which it does not deserve), I can only say that Wade has it backwards. There is little that is “built-in” ODBMSs. Somebody must program classes and methods before any use can be made of the products. Any class or method needed but not programmed is not available to applications. In this sense ODBMSs are not DBMSs, but actually “DBMS building kits”. A TRDBMS (or even SQL DBMS) product, on the other hand, can be used right out of the box and does not require programming.
But the simpler model of the RDB world does not simplify applications, but rather makes them harder. A simple model is good for proving theorems, formal studies, etc. But in the real world, your real-world problem (application) determines the complexity. If your problem is simple, then RDBs work fine (simple, here, means simple, flat, tabular data structures, fixed length fields, no relationships, no traversal, no nested structures, no user-defined structures, no methods, etc.).
Those familiar with the weekly quotes at DATABASE DEBUNKINGS, should hardly be shocked at absurdities; Both Chris Date and myself have seen it all, and we are quite jaded about it. Yet when I passed Wade’s comments to Chris, his reaction was “This is outrageous. You ought to rebut it.” Hence this article, but, unfortunately, if anything evokes the Incoherence Principle, this paragraph does (which is why it takes several long paragraphs to debunk a short one).
Ř Note the “just theory” dismissal I referred to earlier. To understand the absurdity, this is equivalent to, say, civil engineers dismissing the laws of physics as good only for theoretical purposes, not for the bridge building practice, and should, therefore, be ignored!
Ř It was the very purpose of RM to simplify database management in a variety of ways:
· requires only one, the simplest data structure possible, that is both necessary and sufficient* (see Un-muddling Modeling)
· substitutes set-processing for record-at-a-time processing
· does away with navigation
· supports data independence, which drastically reduces application development and maintenance
· centralizes database functions (integrity in particular) in the DBMS, that relieves applications from a huge programming burden
and more. ODBMSs are a regression from most of this progress: they increase complexity several folds, without adding any power.
* Had Wade been aware of this, he would have realized the inconsistency in his position: if relations are both necessary and sufficient, adding more (and more complex) structures can only complicate, not simplify matters. That is also why “user-defined structures” are neither necessary, nor beneficial. Structures are for manipulation (and integrity) purposes and each additional structure requires its own set of operations and constraints, hence the proliferation of methods, a burden that proponents of such schemes are oblivious to.
Ř Wade claims, without any evidence or explanation, that relational databases are only good for “simple problems” (whatever that means), due to the nature of their tabular structure. Relations (or more correctly relation variables, or relvars) are indeed simple and intentionally so. But they are abstractions. Tables are only material representations for visual purposes—two-dimensional pictures, if you will—of relations. And a picture of a thing should not be confused with the thing itself (see DATABASE FOUNDATIONS papers #1 and #2). Consequently:
· While tables are flat*, relations/relvars are not; they are, in fact, multidimensional structures, where each attribute represents a dimension.
· There is no relational requirement of fixed-length fields (this is a physical storage implementation issue, on which RM, a purely logical construct, is intentionally silent; that is what physical data independence is about).
· It is, of course, patently false that relationships are not represented. Some are represented by foreign keys and referential constraints, others—that represent entity types with properties of their own--by associative relvars.
· Ditto for nested structures: as we demonstrate in already mentioned papers #1 and #2, nested relations are perfectly valid structures, although they are not advisable because they add little value, but considerable complications, one of which is navigation, the very “traversal” that Wade wants to bring back*.
* Nesting and traversal are characteristic of older hierarchic databases/DBMSs, which relational databases/DBMSs were invented to replace. Another fallacy in the industry is that relational DBMSs “cannot handle hierarchies”. For a refuting of this claim see Chapter 10 in my PRACTICAL ISSUES IN RELATIONAL DATABASES.
As your problem gets more complex, the more complex the worse the RDB approach and the more advantage from the ODB approach. The RDB approach requires the application programmer to map his actual problem objects (entities, relationships, methods, nets, etc.) down to primitives, to do the hard part of the queries himself (mapping them down to flattened primitives), etc. This mapping layer gets bigger, more complex, more error prone, and very hard to maintain whenever you change your object model (which, of course, always happens...).
Backwards again: exactly the opposite is true. This is also utterly confused* about levels of representation (see DATABASE FOUNDATIONS paper #4, Un-muddling Modeling). The reader for whom this is not obvious should seriously consider education on the fundamentals. I must invoke again the Incoherence Principle--
frankly, I cannot waste more of my time with the remaining nonsense.
* Object terminology and concepts are inherently fuzzy, with little agreement among proponents, which tends to induce such confusions.
I will, however, briefly address comments by Joshua Duhl, who added that:
ODBs have better performance for complex data models [...] The relational model, as implemented in most RDBMSs, can represent a lot of different models, but has difficulty representing inheritance hierarchies, and complex relationships (many many-to-many's) are costly to process.
To repeat, the data model is a purely logical construct; it has nothing to do with performance, which is entirely a function of the physical implementation. Thus, either a RDBMS or an ODBMS can perform very well or badly, depending on how they are implemented, not on their data models.
Moreover, type inheritance is completely orthogonal to (that is, independent of) the data model, which Date and Darwen demonstrate in THE THIRD MANIFESTO by proposing a (sounder) type inheritance scheme within the relational framework. That SQL products do not support such is neither a fault of RM, nor a need to resort to object orientation.
![]()
Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. He was affiliated with Codd & Date and for 20 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on data management technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCSF, and IRS. He is founder, editor and publisher of DATABASE DEBUNKINGS, a web site dedicated to dispelling persistent fallacies, flaws, myths and misconceptions prevalent in the IT industry. Together with Chris Date he has recently launched the DATABASE FOUNDATIONS SERIES of papers. Author of three books, he has published extensively in most trade publications, including DM Review, Database Programming and Design, DBMS, Byte, Infoworld and Computerworld. He is author of the contrarian columns Against the Grain, Setting Matters Straight, and for The Journal of Conceptual Modeling. His third book, PRACTICAL ISSUES IN DATABASE MANAGEMENTserves as text for his seminars.