December
2002 Issue: 27
Journal of Conceptual Modeling
www.inconcept.com/jcm
What Meaning Means
by Fabian Pascal
“… there is really no difference between a document and a database [sic] -- In both cases, you have to abstract information and a certain amount of metadata that helps the system understand the meaning and uses of that extracted information ... XML could put an end to that by breaking down the traditional barriers between document and database processing."
"… XML data is fundamentally different from relational data. XML data are extremely well-suited to hierarchical storage … In XML databases, an online tax return can be stored in its entirety. In a relational database, each line of the return would have to be a different table [of data in rows and columns].Trying to "force fit" an XML document into the rigid relational structure can waste storage space and lead to inefficiencies in queries and retrievals."
By definition random data has no informational content. Only data that is organized in some way, that is, structured in accordance to some organizing principle, carries meaning (see Unstructured Thinking). An organizing principle (or structure) is a central element of meaning that is captured via a data model, the other elements being data types, integrity and manipulation.
There are many organizing principles and, therefore, many possible data models (again, see Unstructured Thinking). For general data management purposes, the following properties are desirable for a data model:
· Generality: the ability to represent as many kinds of data as possible
· Formality: a sound theoretical foundation (not ad-hoc)
· Completeness: captures as much meaning as possible
· Simplicity: as simple as possible (but not simpler!)
The relational model is the only known data model that has all four properties (see PRACTICAL ISSUES IN DATABASE MANAGEMENT):
· Claims to the contrary notwithstanding, there is no information that cannot be represented and manipulated relationally
· Dual theoretical foundation of predicate logic and set theory
· Provides data types (domains), structure (R-table), integrity (domain, table, column and database constraints)
· Simplicity was a major reason it displaced preceding models, which were too complex; no simpler alternative has been proposed
A data model should not be confused, as is so often the case in the IT industry, with either business models, or their representations in the database, logical models. A data model is a general theory of data used to map enterprise-specific business models, that no DBMS can understand, to enterprise-specific logical models, that are understood by DBMSs. Declaring a logical model to the DBMS in the data definition stage is nothing but conveying to the DBMS the meaning of the data (see What is a Data Model, Models, Models Everywhere, Nor Any Time to Think, Something to Call One’s Own).
Simply put, there cannot be data management without some data model and I dare anybody to prove otherwise. Therefore, any data management technology or product -- including, to coin a term, XDBMS -- claimed to complement, extend, improve on, or replace the relational model and TRDBMSs (true relationals DBMSs, and I do not mean SQL DBMSs!) (a) must be based on some data model that (b) has, at a minimum, the four properties above. Is there a data model behind XML and, if so, does it have those properties?
In their introductory article in Scientific
American, Bosak and Bray, two of the people behind XML, write:
"Give people a few hints, and they can figure out the rest. They can look at
this page, see some large type followed by blocks of small type and know that
they are looking at the start of a magazine article. They can look at a list of
groceries and see shopping instructions. They can look at some rows of numbers
and understand the state of their bank account. Computers, of course, are not
that smart; they need to be told exactly what things are, how they are related
and how to deal with them. The solution, in theory, is very simple: use tags
that say what the information is, not what it looks like. For example, label the
parts of an order for a shirt not as boldface, paragraph, row and column --
what HTML offers -- but as price, size, quantity and color.”
There are several interesting things about this. First, the focus is on conveying meaning to the system. By definition this must involve a data model and, in fact, they provide what (roughly) amounts to a description of a data model’s elements (without realizing this, of course):
· what things are = data types and integrity
· how they are related = structure and integrity
· how to deal with them = manipulation
Yet what is quite obvious is that the XML was not approached with any explicit, well-defined data model in mind (see below). If the intention behind XML was, as is claimed, to provide a standard data interchange format, then plenty of those are available already and any agreed-on format will do (see PRACTICAL ISSUES IN DATABASE MANAGEMENT); no data model is necessary. If, on the other hand, XML is to be used for data management, then the specification of a data model cannot be avoded. What is not possible is to use a data interchange format as a data management technology without a specific data model -- the interchange tail cannot wag the management dog (which XML proponents are now learning).
Second, consider Bosak and Bray’s solution: “tags that say what the information is”. Note very carefully that out of their initial three aspects of meaning, only one – “what the information is” – survives in their proposed solution. Whatever happened to “how things are related and how to deal with them”, which are an integral part of meaning? As Chris Date points out, meaning exists only in the context of a proposition. I can say “Mary has 2 children”. I can’t say “2”, which is what a XML document is.” Otherwise put, values in themselves don’t carry meaning, propositions do.
Markup tags do not say much about “how things are related” and nothing at all about “how to handle them”. You can tell a DBMS “this is price, size, quantity and color”, but unless you also tell it what are valid prices, sizes, quantities and colors, and what operations it can perform on these types of data, its understanding of what the data means will be too limited to permit any meaningful (pun intended) management.
You may want to ponder the implications of this for XML data management. More about this in future columns.
![]()
Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. He was affiliated with Codd & Date and for more than 15 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on data management technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCSF, IRS. He is founder, editor and publisher of DATABASE DEBUNKINGS (dbdebunk.com), a web site dedicated to dispelling persistent fundamental fallacies and flaws prevalent in the information management industry, where C.J. Date is a senior contributor. Author of three books, he has published extensively in most trade publications, including DM Review, Database Programming and Design, DBMS, Byte, Infoworld and Computerworld, and is contrarian columnist for The Data Administration Newsletter (TDAN) and DBAzine.com. His third book, PRACTICAL ISSUES IN DATABASE MANAGEMENT serves as text for his seminars.
![]()
© Copyright, 1998-2004 InConcept (Information Conceptual Modeling, Inc.) All Rights Reserved. Privacy Statement. ISSN: 1533-3825