March
2004 Issue: 31
Journal of Conceptual Modeling
www.inconcept.com/jcm
This is the first in a series of three papers that challenge the widely held assumption that the entity relationship (ER) model is the basis for practical data modelling. Entity-relationship (ER) diagrams are widely used and many practitioners talk about doing ‘ER modelling’. Textbooks on data modelling refer to the ER model as the basis for practical data modelling. However, it turns out that many data modelling tools that offer ‘ER diagrams’ are not based on the ER model at all. It has always been assumed that these tools implement variants of the model. This paper shows that on the contrary, these tools represent real data modelling practice and are based on ideas that pre-date the ER model by fifteen years (at least). The practical data modelling that takes place is actually based on something different from the ER model. This is not such a provocative challenge as it sounds given no formal published evidence that the ER model has ever been successfully used in a practical situation. Most of the academic work on the ER model is ‘theory about theory’. The piece of work demonstrating that the theory works in practice isn’t there. At the very least we should be wondering why that is. On the other hand, this author has collected a set of evidence, over a period of years, showing that the ER model is unlikely to be viable in a practical situation. This first paper summarises this research and proposes a new explanation of the basis for the practical data modelling that takes place. The second paper in the series takes the argument further by looking at data modelling tools that use adapted ideas from the ER model to see whether these contradict the challenge. The third paper examines practical high level conceptual modelling.
Keywords
Entity-Relationship Modelling, ERM, data modelling, conceptual modelling, class
diagrams.
Most writers on conceptual modelling would agree that, “One of the most widely accepted enhanced conceptual models is the ER model of Chen … It has been recognized as an excellent method of high level design because of its many convenient tools for the conceptual modelling of reality… The model is easy to use and understand and is pictorial, i.e. graphical. It shows clearly all types of concept abstractions, various relationships, mapping constraints and cardinalities.” (Thalheim, 1998, p.29-30). Similarly, “This book focuses on the Entity-Relationship (ER) model, the most widely used data model for the conceptual design of databases.” (Batini et al 1992, p.30). The ER model formally proposed a design notation. According to Chen (1983, p.127-28) “The entity-relationship (ER) diagramming technique is a graphic way of displaying entity types, relationship types, and attributes … one of the reasons often cited is that the ER diagram is easy to understand …”
The ‘ease of use’ for the ‘conceptual modelling of reality’ claim is a fundamental problem for the proponents of the ER model. This author can find no formal evidence, since 1976, to support the claim that the ER model can be effectively used in a practice situation. This reflects a lackadaisical attitude to checking that the theory predicts what happens in practice. The situation is also confused because we cannot be certain exactly what constitutes the ER model because no ‘standard’ ER model has been maintained. Curiously, many authors writing about the ER model seem to produce their own version of the model (like definitions for entities), additions to the model, or variations of the diagram. For example, Thalheim (1998) also produces a new version, called ‘HERM’. Therefore very few people actually use the ER model that was proposed. The ER model is a theory that is more abused than adhered to. Some authors provide a new diagram notation without reference to a model, compounding the abuse. Therefore the ER model theory is unusual in that it seems to encourage variety rather than succeeding in defining a set of rules for model adherence. The idea of ubiquitous use of the ER model is really based on the use of all kinds of variations of the use of ER diagrams.
The next section summarises the evidence about practical data modelling that has led this author to challenge the provenance of the ER model in the context of modelling tools and notations that do not support the ER model.
2. Multi-method evidence about practical data modelling
There are only two published papers where practitioners have been formally asked for their opinions about modelling notations. A survey of practitioners (Hitchman, 1995) established that practitioners perceive the notations in ER diagrams to be poorly understood by both business experts and other practitioners. Practitioner perceptions contradict the claim that the model is intuitive or easy to use or understand. These finding were confirmed by a second survey (Hitchman 2000) looking at the use of the Object Modelling Technique (OMT). The findings show that the notation proposed by Chen is hardly used at all. OMT is interesting because it specifically supports the idea of ternary relationships. Ternary relationships were perceived by to be problematic and practitioners had difficulty using them.
Widely used positivist research methods have not uncovered information about practical modelling. Although there is quite a large body of laboratory experiments using naïve, particularly student, subjects, the findings from this research do not generalise to practice (Hitchman 1977, 1999). Analysis of positivist methods, concentrating on the use of relationship sets types, or n-ary relationship types, reveals many of the problems of generalising positivist research methods to the practice situation (Hitchman 1999). The positivist research method is generally laboratory based and comparative, showing that students perform better with one model (or diagram). It has always been the case, however, that students generally perform badly whatever model they use. Goldstein and Storey (1990) concluded that the ER model was not natural or intuitive for novice users, and confirmed the perception that students have a surprising amount of difficulty understanding and applying the model. In this sense, the student experiments have served to confirm that ER modelling is difficult for novices.
An analysis of the use of ER model relationship set types, particularly ternary relationships (Hitchman,1999), shows that ternary relationships will not work well in practical data design. An interesting aspect of this evidence was that the positivist researchers lagged several years behind what was happening in the practice domain. An interpretive study (Hitchman 2003) confirmed the idea that ternary relationships undermine the design process. The evidence here suggested that researchers framing the positivist experiment scenario (used on several laboratory experiments) had themselves been misled by the ternary analysis they had undertaken, from a practice point of view.
Practitioners use ER diagrams that constrain the language used for modelling (Hitchman, 2002). Practitioners use the diagram in a purposeful way to talk about data, they do not translate from informal thought. The diagram is the way to think about the data situation, it is not a way to translate ‘natural thinking’. The ER diagram has to facilitate being read back by the designers as an unambiguous business specification of the data. These practitioner diagrams exclude the idea of relationship set types, so the ER model itself is not being used.
Veres & Hitchman (2002) argued that aspects of Psychology, especially the work of Jackendoff (for example, 2002) provide a good basis for understanding how conceptual modelling works. This is because the modelling process includes conversation about design. Up to this point the positivist research method had used single modelers drawing diagrams from written scenarios. This reflected the idea that the diagram and the scenario represented an abstraction of reality. The most fundamental part of the modelling process, talking, had not been widely included in research methods. Positivist methods tend to exclude aspects that are difficult to quantify.
Interpretive research with three practitioners (Hitchman, 2003) found that a version of ER diagramming was being used but this was based on the underlying relational model. This was the first piece of evidence that directly exposed the way that practitioners converse about a situation. Three striking findings were made. Firstly, the diagram that was being used by the practitioners constrained the conversation. There was not a translation into the language of the diagram. The diagram notation provided the language to talk about the model. Secondly, the practitioners were not using the ER model, although they were using an ER diagramming technique. Thirdly, the practitioners were using the diagram to talk about a normalised relational data structure. From the assumed point of view that ER diagrams are part of the ER model, these findings are difficult to explain.
This is strong evidence to support the idea that there is widespread use of a design technique, but that this technique is not, or possibly ever has been, based on the ER model. This is not direct evidence that data design cannot be done with the ER model. Probably any of the many proposed variations and methods will work in some context, particularly when the users have direct contact with the originators. The argument, though, is that, taken as a whole, there emerges a consistent picture that we do not understand how the ER model is relevant to practice. The research raises the interesting question of where to lay provenance for practitioner modelling. Based on this evidence, the next section argues for a new understanding of ER diagramming provenance.
Chen’s (1977, p.77,78) basic argument is that, “In general, to design a database is to decide how to organize data into specific forms (record types, tables) … the output of the database design process – the user schema (a description of the user view of data) – is not a “pure” representation of the real world. …the user schema is not a direct representation of the real world. This makes the user schema difficult to understand ... A possible solution … is to introduce an intermediate stage in the database design process … which is a “pure” representation of the real world …”.
The ER model proposed by Chen included the definition of an ER diagramming notation, as a ‘pure’ representation of the real world. Chen (1976, p.19) says “In this section we introduce a diagrammatic technique for exhibiting entities and relationships …” (The entity relationship diagram actually shows (types of) entity and relationship sets. Unfortunately there is now a widespread misuse of the term entity for entity set type.) The implication is that a diagrammatic technique for entity set types is new. Most authors include attribution of ER diagrams in with the ER model. For example, Thalheim (1998, p.3): “The Entity-relationship … model is one of the most popular database design models. Its popularity is based on simple graphical representations … The ER model is currently widely used … . Ryan & Smith (1995, p.59) are more direct: “The basic notation for Entity-Relationship modelling was introduced by Chen …”.
Chen tells us something about where his ideas originated. After completing his PhD, and before moving to the MIT Sloan School of Managementan, from June 1973 to August 1974 Chen ( 2002) worked as a junior member of the team working on the “next-generation computer system” project for Honeywell Information Systems. Charles Bachman was also a member of this team. Chen (2002), speaking in the third person, says that “One of the requirements of such a “distributed system” was to make the files and databases in different nodes of the network compatible with each other. The ER model was motivated by this requirement. Even though the author started to crystallize the concepts in his mind when he worked for Honeywell, he did not write or speak to anyone about this concept then.”
The ER model claims to act directly on how we think. In some way the ER model represents those aspects of ‘pure’ thinking that also help design data in a business situation. Chen (1983, p.127) proposed a set of guidelines for “… converting English descriptions into ER diagrams. This motivates our research into the correspondence between English sentence structure and entity-relationship diagrams.” The idea here is that the users of the model will be thinking in their own terms, translated into an ER model. It is worth noting that the positivist method fails to question this aspect because the assumption about reality is embedded in the laboratory methodology itself.
The clue to what practitioner do is found in Chen’s (1976, section 4.2, p. 29-31) discussion of diagrams to explain the network model “Each rectangular box represents a record type.”. Thalheim (1998, p.540) explains that “ Network schemas can be represented by Bachman diagrams … using rectangles for the representation of record types and arrows for the representation of set types...”. Similarities with practitioner use of diagrams to represent the underlying relational model encourage us to look back another ten years to Bachman’s original paper on Data Structure Diagrams.
The surprise is that Bachman proposed a diagrammatic design technique based on entity classes, derived from experience of practical data modelling. Users of the General Electric Integrated Data Store used diagrams with entity classes to help design databases for over five years before Bachman published his paper defining the diagramming technique in 1966. Bachman (1966) used different semantic ideas in the diagrammatic technique to those used in the underlying (network) model being structured. The technique defined:
“… entity to mean a particular object being considered; the term entity class will mean an entire group of entities which are sufficiently similar, in terms of attributes that describe them, to be considered collectively … entity set … associates a group of entities in one entity class with one entity of a different entity class in a subordinate relationship” Bachman (1966, p.4).
Bachman clearly proposed the use of entity classes and relationships as concepts in a diagrammatic technique for data design. This technique is based on practical experience. (Note that the use of the term ‘entity set’ can be confusing because in the network model an ‘entity set’ equates to a ‘relationship set’ in the ER model. A Bachman ‘entity class’ equates to a Chen ‘entity set type’. This usage reflects the Bachman idea of ‘classes’ of entities based on similar attributes, and the ER idea of bunches of entities in sets based on a predicate, that are typed.)
An important constraint on Bachman relationships is that an entity set refers to a particular entity. Bachman (1966, p.4-5) modelled ownership in relationships:
“… a department has a set of employees currently assigned to it, these employees can legitimately be considered as subordinate entities .. of that department … Each department is considered to be the owner of the set in which its employees are members”.
The idea of an entity set had various constraints attached to it, which approximate to the idea of referential integrity in the relational model.
Bachman’s data structure diagram was used to design the data in their context of using a network ‘logical’ model. A logical model implemented the important idea of data independence. Data independence means that the logical model of the data structure is separated from the way the data is physically stored. This makes it possible to change the physical storage of the data but maintain the same logical model. This isolates the application code from the physical design, for example. Bachman’s diagrams used new semantic ideas, such as ‘entity class’ to describe the way to talk about the underlying structure of the logical network model independent of the physical data storage. Making entity classes fit the underlying model construct makes sense because the diagram directly exposes the user’s logical view of the data. A user has a single model of the data to work with, and a way of thinking about the main model constructs. The diagram is not part of the model, but is a way of thinking about the model. Using the diagram is a design technique.
The relational model of data eventually became the widely used model and the network model lapsed into largely legacy use. Codd (1970, p.377) proposed the relational model of data as
“… superior in several respects to the … network model … It provides a means of describing data with its natural structure only – that is, without superimposing any additional structure for machine representation purposes”.
In the relational model the idea of relationships again only supports 1:m relations, but without the idea of ownership. We might informally say that an employee would ‘know’ what department they currently worked for through their foreign key. This way of thinking about relationships supports the simple two-dimensional relation and is the way the model encourages the user to perceive the data. This logical model says nothing about how the data is stored. Codd (1970, p.380) proposed that
“Each user need not know more about any relationship than its name together with the names of the domains …”.
Therefore both Bachman and Codd used a constrained version of relationships in their models for thinking about data. Bachman had shown how useful entity class diagrams were in understanding data, but the relational model had no associated diagramming technique. Not surprisingly, we can see that entity class diagrams could be the way to design relational databases. When the network and relational models are used to think about a non-redundant design, the same entity classes would emerge in the diagram. An entity class will equate to either a relation or to a record set in the network (and possibly hierarchic) model. Bachman’s diagrams are interesting because the entity classes provide a mechanism to generalise design thinking using existing logical models.
Chen takes the view that Bachman’s diagrammatic technique does not count as entity relationship modelling because the ‘rectangular box represents a record type’. This counts as ‘impure’ thinking. There is no mention in Chen, or in many subsequent discussions, that Bachman proposed the use of entity classes. This is unfortunate because it does not allow us to separate the development of a diagrammatic technique using entity classes from the development of an ER model.
Chen (1983) provided a category of ER models that included the CODASYL network model. The main classification was based on whether the model included only binary, or included n-ary relationships. Chen (1983, p.24) recognised this binary category as “ …the model used in many commercially available data dictionaries.” This category would include Bachman diagrams. Chen called this category BERM (Binary ER model) but we need to separate the diagrammatic ideas from the model ideas. Entity class diagrams (ECDs) should be used to describe the diagrammatic technique proposed by Bachman. This is a good description of what they are since the diagram shows entity classes. These are diagrammatic techniques that facilitate a conversation about the underlying model. ECDs are not a different data model. ECDs are about modelling and not about the model. The widespread use of these diagrams (for example in commercially available data dictionaries) should be attributed to Bachman.
Therefore, when we research practical data modelling, the findings expose the use of Bachman’s ideas for a period extending to nearly forty years. Practitioner use of ECD technique is not alluded to in the literature, although this practical experience pre-dates the ER model by about fifteen years. The important point to note about the positivist research stream is that it has not uncovered this fact because the methods have been self-referencing in the laboratory.
This section develops the example used in Hitchman (2003) to illustrate the difference between ECDs and the ER model. The first two examples show what happens when practitioners use Bachman’s ideas about relationships in ER diagrams. In Figure 1, early in analysis, we have established two entity classes and recognise that there is a many to many relationship between them. The details of this notation can be found in Hitchman (2002).

Figure 1. A Binary Relationship
At this stage conceptualising about the relationship is deferred. Essentially, we are saying that there is something more complex in the relationship that is not yet considered. We do not directly conceptualise about the relationship itself as a modelling construct, outside of the context of the two entity classes involved. In order to understand the situation we have to provide the missing conceptualisation – a new entity class. This is shown in Figure 2.

Figure 2. Decomposition to a New Entity Class
Now that we have the entity class to conceptualise about we can understand the situation. Examples of the data would help. In this case we might think about the relation:
worker# department# start date#
W1 D1 30/05/2003
W1 D2 30/07/2003
W2 D1 12/05/2003
Probably we would write down the table but say aloud;
This assignment is filled by W1 (Fred) with D1 (sales
department) from 30/05/2003,
This assignment is filled by W1 (Fred) with D2 (IT
department) from 30/07/2003,
This assignment is filled by W2 (Doris) with D1 (sales
department) from 12/05/2003,
or possibly (with a new unique identifier for an assignment);
A1 is an assignment filled by W1 with D1 starting on 30/05/2003,
A2 is an assignment filled by W1 with D2 starting on 30/07/2003,
A3 is an assignment filled by W2 with D1 starting on 12/05/2003.
To understand the situation we have to use the underlying relational data model. The situation for the network model would involve the same basic classes. Our choice in conceptual thinking is to initially defer, or ignore, the details of the situation about assignment. Of course, we may well conceptualise about assignment as an entity class before the many-to-many (m:n) relationship is drawn. The m:n relationship ‘flags’ the existence of a new entity class. It is worth noting that the assignment entity class may reflect the core business – what the business is ‘all about’.
Figure 3 shows a Bachman diagram of the situation, together with the way that Bachman defined the reading of the relationships. Users of the diagram could add information specific to the underlying network model. It is easy to see that the entity class diagrams being used today have simply reversed the arrow heads to show a ‘crow’s foot’ (to indicate ‘fan out’), and explicitly attached the relationship semantics in a more straightforward way – without reference to set membership. Over time, other aspects of notation have been added to make attributes, unique identifiers and relationship identification explicit, for example. Sub-typing notation has also been added. This is perhaps the only aspect of the ECD that is not directly supported in the relational model, but then sub-typing was not defined in the ER model either. Effective ways of masking the presence of reference data entity classes are also used. Masking reference data (or ‘look up’ relations) simplifies the diagram by reducing the number of classes.

Figure 3 Bachman diagram of the situation
The Bachman diagram specifies 1:m relationships with a single line (arrow). When the design is specified as a network schema, a set type specifies the relationship in the form:
set name is department_assignment
owner is department
…..
member is assignment
…..
set name is woker_assignment
owner is worker
…..
member is assignment
…..
These set specifications are separate from the specifications of the record types. In the relational model, on the other hand, we would see the relationship specified as part of one relation, referring to another relation (roughly):
create table assignment
…..
department_name varchar(30) not null;
…
foreign key ass_dep (department_name) references
department
This declaration can only be made in the context of the relation that contains the foreign key. The relational model therefore gives the user a view of the relationship from the relation point of view. The user does not see the relationship as an individual concept. This is partly because of the use of referential integrity defined within the model. The interesting aspect of Bachman diagrams is that, because a line indicates the relationship between two classes, the diagram converts perfectly to relational database design.
The situation in the ER model is different. Figure 4 shows the ER model version of the example. Chen did not provide an explicit way of reading the situation or interpreting the diagram as a sentence. Presumably this is because the model is supposed to be a pure model of reality and not a way of talking about the data. We have two entity set types (department and worker) and a relationship set type (department-worker). (Note that we can also assign role names to the two relationship lines.) In the ER model the user has to think about the relationship set type ‘department-worker’ as a semantic object like the entity classes. To understand this situation we have to be able to reason about the relationship set type as a semantic construct in its own right.

Figure 4. ER model diagram version
It is revealing to consider how Chen thinks about ‘department-worker’. Figure 4 is similar to Chen’s (1976, p.19) Figure 10. The way that Chen explains this is to provide a table showing example entity values, exactly like the relation example used above to show the entities in ‘assignment’. Chen’s example is shown in his Figure 8 and this is the only place that we would see the attribute start date. So the way that we can understand the situation is by thinking about a relation. Comparing Figures 7, 8 and 9 in Chen’s original paper shows that entity and relationship set types are all understood through the same (or very similar) underlying relations. This way of thinking about and explaining relationship types persists. Dey et al (1999, p.457) discussing n-ary relationships (involving any number of entities) use an example of a relationship set type between four entities (doctor, patient, drug and prescription). Like Chen, the way they explain this situation is to provide a relation with example entities.
Choosing a name for the relationship set type is important. In the example we have used the naming convention suggested by Chen in his Figure 10. We could equally have chosen ‘worker department’, ‘assigned to’ or ‘assigned with’. The problem is that the choice may imply ownership, or ‘looking one way’. The relationship set can be viewed from the point of view of either of the entity classes. Using ‘assignment’ is therefore a deliberate choice to prevent ‘one way thinking’. The reader may also be wondering how to decide if an ‘assignment’ (interesting since it is a noun) is an entity class or a relationship set type. This is an interesting question for the ER model and discussed in a later section.
It is also worth noting that Bachman included a dashed line as the notation for non mandatory relationship participation. Bachman called this ‘sometime member entity classes’. We could make the relationship from worker to assignment optional (or ‘sometime’). An assignment, when created, must be with a department, but is an unfilled vacancy if no worker is currently involved. This equates to having a ‘nulls allowed’ constraint on the foreign key from assignment to employee. This makes us check the semantics of the domain, for example is there now a many-to-many relationship between assignment and worker. Clearly, consideration of optionality is crucial in the modelling process, but missing from the notion proposed by Chen. This knowledge from practical modelling experience was not included in the ER model notation.
The user perception of the situation in the ER model is based on the network schema idea of entity sets, but expanded (not generalised) to promote the relationship set type as a semantic construct on the diagram. On the other hand the relational model took the opposite approach and ‘downgraded’ the user view of the entity set to a foreign key within a relation. The ER model created a semantic dissonance with the relational model. The Bachman diagram technique seems to have anticipated the relational view of how to perceive relationships in diagrams. The Bachman ECD and the ER model diagram both contain the same number of constructs. The ER model chooses to differentiate the constructs into either entity set types or relationship set types. The research findings show that practitioners do not make this differentiation. Since this differentiation is at the heart of the ER model definition, this means that practitioners are not using the ER model. Therefore we must be careful to separate the use of ECDs from the use of a model.
Wand & Weber (2001) have shown that there is no solid theoretical foundation for particular models of data. Current research is aiming to provide a model based on firmer foundations using an established ontology. In the context of looking for ‘natural thinking’, naïve subjects (often students) are used to try and discover if proposed semantic concepts are useful or natural for thinking. For example, Weber (1996) used a group of second year undergraduate students to establish that their memory structures appear to reflect that they perceive memory entities and attributes to be two distinct constructs; Shanks et al (2003) studied the use of whole-part representations concluding that these are perceived as entity classes. Evaluating this evidence may eventually support the use of an ontological theory as the foundation for a data model. So, an interesting aspect of the ER model is that the theory used to build it does not directly account for ‘thinking’ even though the model claims to be ‘pure’.
Chen (1976, p.10) claimed that the ER model “can achieve a high degree of data independence … The reader may view the entity-relationship model as a generalisation or extension of existing models …”. Chen was not claiming that his model was ‘conceptual’. The idea of a conceptual model was used later to differentiate the ER model from existing ‘logical’ models. There was an important historical context at the point that Chen proposed the ER model. There were competing existing (‘logical’) models. Chen proposed a model that could generalise both the relational and network (and other) models. The user could think with the ER model and convert the result to think in any available ‘logical’ model. Users of ER model have to deal with two different models of data. This has led to the idea that some users involved in database design will just use the ER model in an initial part of the design process. This in turn leads to the assumption that the ER model is somehow ‘conceptual’ because it has an early use in the design process and is therefore closer to conceptual thinking than the ‘logical’ models later used for database design. For example, “Conceptual design … results in the conceptual schema of the database. A conceptual schema is a high-level description of the structure of the database, independent of the particular DBMS software that will be used to implement the database.” (Batini et al 1992, p.6). The confusion between DBMS (database management system) product support for a model, and the model itself, is used to suggest that it is not a good idea to use supported models. This is a legacy of the time when competing data models were supported. For the last twenty years however, the ubiquitous support for the relational model has really made the case for its use. It is difficult to see why, in most organisations, there is a need to avoid the use of the relational model because of support by DBMS products.
Chen (1976, p.10) was explicit about the realm of “Information about entities and relationships which exist in our minds”. Chen proposed building a model of a particular situation by starting with these ideas in the mind and working ‘top down’ to a data design. This is not entirely ‘top down’. Chen did not adapt the idea of entity classes, but rather worked ‘bottom-up’ from the idea of entities in the mind, rather than ‘top down’ from the idea of classes in the mind. Chen used the idea of classifying distinctly identifiable entities into entity sets using some predicate. This is like finding entity classes by example and account for the name of ‘entity relationship’ model. Do we think this way, or do we devise classes and then find examples to fit ? The ER model is not built on this kind of thinking theory.
An important way that the ER model extended design is through people having ideas in their minds about relationship sets and types. This idea of ER relationship set types is an extension, not a generalisation, of ideas about relationships in the network model. This results in the use of m:n relationships and n-ary relationships as elements of the conceptual model. In the Chen model ‘marriage’ can be conceptualised as a relationship set type. Chen (1976, p.9) claimed that the ER model adopts “… the more natural view that the real world consists of …”. The ER model makes the claim that we will naturally and effectively think with relationship set types. Using the example about departments and workers, what does it mean to think about the relationship ‘department-worker as a construct ? The ER model theory does not explain how or why relationships set types are ‘pure’ concepts for thinking, neither is there evidence to support the claim.
The problems are not restricted to relationship set types. The basic building block, the entity, is also problematic. Thalheim (1998, p.4) is perhaps the most complete exposition of the ER model theory and describes a basic problem with the model definition:
“… a missing standard for ER modelling .. variety of definitions to the entity definition is an example of the confusion … the confusion is almost complete since most of the database and software engineering books do not define the concept of entity at all.”.
To illustrate the confusion of what the ER model consists of, Thalheim lists twelve different definitions for ‘entity’. Thalheim(1998, p.30) uses Chen’s definition “An entity is something which involves information. It is usually identifiable. Each entity has certain characteristics, known as attributes.”. This is unclear in terms of practical use, particularly the use of ‘information’ with data design. This inability to pin down the main building block of the model is a serious flaw and accounts for the widespread and diverse opinions about what an entity actually is. For example, Watson (1996, p. 62). starts with a ‘real world’ definition, which is extended to differentiate physical and conceptual things: ”… entity, which is some real world thing. Some entities are physical – CUSTOMER, ORDER AND STUDENT; others are conceptual – WORK, ASSIGNMENT and AUTHORSHIP.”… Later Watson (1996, p. 143) gives the Bachman definition of an entity: “An entity is a thing about which data should be stored … An entity describes a thing that will be stored in a database.” Given that design “begins with the definition of the required Kernel or basic objects. These are called entities.” (Thalheim, 1998, p.30), the imprecise definition of a main component must present a serious problem for practical use of the model.. The problem arises because the ER model does not theorise about what constitutes an entity in thinking, but assumes that entities can be easily conceptualised because they are in the ‘real world’.
Not surprisingly, it turns out to be difficult to be certain about whether some things can be classified into an entity set type. Chen uses the example of ‘marriage’. The ER model user would need to establish that there were identifiable things ‘Fred and Doris got hitched on 12/04/1976’, that could be then bunched into a set called ‘marriage’. This decision cannot be made on the basis of what the model tells us about entities. Chen delegates this decision to the database designer. The ER model entity is something of a mystery because the theory does not extend to ‘entities in our mind’.
There are some diagram notations that do support n-ary relationships. These need to be viewed with caution when interpreting their use. UML (the universal modelling language) is a good example. To illustrate the difficulty that researchers have understanding practice: “The recently developed UML specification methods only allows binary relationship types. “ (Thalheim 1998, p.41). In fact, UML is interesting because it does specify the use of n-ary relationships, as did OMT, a predecessor. However, we cannot assume that this notation will be used. According to Dorsey & Hudicka (1999, p.292) “The question is whether you want to reflect that the insurance policy is a result of the relationship among Person, Car, and Coverage … or is better represented another way (i.e. as a class). In this example there is no reason to use the n-ary relationship. As of this writing we have not seen an example where the n-ary syntax is obviously clearer than the other representation.” Even where notations support n-ary relationships the diagrams may be used as though the n-ary notation was not there – as an entity class diagram. So even where notation supports ER model ideas, this is not evidence that practitioners make widespread use of them.
Accounting for the research findings exposes the difficulty in understanding how to apply ER theory. The ER model is not theory about using diagrams in design. We might say that ER model is theory about theory (a model), and not theory about modelling. We have no way of knowing whether ER model theory relates at all to ideas in our minds, conceptualisation and, the external world of data modelling practice. Therefore the ER model does not tell us about using ER diagrams as thinking aids in the design process. The ER model could be an abstract exercise in mathematics. It is very important to separate the idea of using ER diagrams to aid data design thinking, from the ideas in the ER model.
We can view the ER model as a response to the practical use of entity class diagramming. The proponents of the ER model did not explain how the ER diagram related to the previous fifteen years of ECD design. The idea of a “pure” representation of the real world, as a basis for ER model assumptions, creates a very difficult situation for making sense of the theory. Take the data to support a sales order, for example. Here we would be modelling the data to support a business process. This process is removed from the actual business process of selling things. The business process support the selling part of the business, but is not actually the business of selling. So here we are modelling the data required to support a function that supports a social invention. It is difficult to see how a ‘pure’ representation of the real world is a firm foundation here. Therefore ECDs are based on the idea that data design is a purposeful technique.
Neither of the logical models made any claim for pure thinking. These were purposeful models for thinking about data requirements. These models succeed if they facilitate our thinking about business data. The relational model “provides a means of describing data with its natural structure only – that is without superimposing any additional structure for machine representation. Accordingly, it provides a basis for a high level data language …” (Codd 1970, p.377). Business data is a purposeful social construction needing a purposeful model for understanding. This is different to laying claim to a ‘natural’, ‘intuitive’ or ‘pure’ model of reality. Both the relational and network models are constrained ways of thinking specifically about data. The ER model has to explain how we naturally think, whereas the relational model has only to explain how we chose to think about data.
Looking again at Chen’s example of ‘marriage’. In the ER model we cannot know if this is an entity or a relationship. Bachman’s (1966, p.4) entities are “particular objects being considered”. An entity class is “an entire group of entities, which are sufficiently similar, in terms of the attributes that describe them, to be considered collectively”. Bachman’s definition is a class with attributes concept. ‘Marriage’ is a class with a set of attributes. In the network and relational model this is exactly the main component of the model. Bachman makes it clear that we are thinking about classes rather than about entities, and the classes are those of the underlying data model. Therefore in Bachman’s diagrams we know exactly what constitutes an entity.
When Chen’s (1976, p.30) argues that a Bachman diagram “ …is a representation of the organisation of records and is not an exact representation of entities and relationship.”. This poses a serious theoretical dilemma, given that “An entity is a “thing” that can be distinctly identified” Chen (1976, p.10). Assignment conforms to this definition, as would any entity class in a Bachman diagram. This is because these all represent ‘things’ in data. The real problem here is knowing what constitutes a ‘thing’ in thinking. Any data item represents a ‘thing’ in thinking about data. Designers can think about rows in tables as ‘things’. Data items are ‘reality’ to designers. Chen’s theory does not help with this point – if the designers have relations and table rows (tuples) in their minds, then using Chen’s definition it is appropriate to model them as entities. To use the ER model the designers would have to ‘pretend’ that the relational model did not exist.
The situation with ECD relationships is considerably simpler than in the ER model. For a particular example we can think about Fred employed, in the past, on assignments with sales and marketing and currently with IT. We can think about the IT department currently owning 30 assignments, including two assignments with Fred and Doris. Generally, we can think about this relationship in terms of ‘a worker employed in one or more assignments’. Relationships exist in thinking only in the context of the entity classes involved. Of course, there are problems with the use of these ideas that are left to the design team expertise. For example, how does the diagram show ‘current’ assignments – are we aware that the diagram shows current and historical assignments ?
The users’ model is restricted to 1:m relationships. As Weber et al. point out, there is no theory to support this either, but it is much easier to see how to convert this way of viewing relationships to a language used to support thinking about business data. Practitioners have adopted this way of thinking because it seems easy to use – adoption is essentially a measure of ease of use here. Chen argues that this is flawed because new ‘impure’ entity classes are required. However, the evidence from practice suggests that the simpler Bachman idea about relationships fits better with data design thinking.
The ECD is a design tool purposefully used to constrain the language used to understand the data situation. The ECD is a purposeful way of talking about and therefore understanding the data situation. It is now clear that theories about language (for example Jackendoff) go some way to explaining how to use ECDs. ECDs are about talking through the data situation, they are not pure abstract models of reality. This helps considerably in undestanding how to deal with different ‘sorts’ of entity classes. For example, worker, department and assignment are different sorts of classes. A worker is a ‘brute’ fact, can be found and even talked to. A department is a different sort of class, and not the same kind of physical thing. An assignment is again different, perhaps representing the event of assignment. However, in language we do not have to worry about explicitly differentiating sorts of sorts of things (otherwise it would be very difficult to have a conversation). By viewing ECDs as a way to talk about the business data, we can rely on language theory to support the treatment of classes in the same sort of way. Therefore the research findings have given a new way to understand classes in ECDs. This may explain why the relational model view of data is so successful – basing thinking on a single concept is sensible from a language point of view.
All of the entity classes reflect data concepts involved in a particular business domain. The classes are subtly different and have to be understood in a business context. This business meaning is captured in the modelling process as a class definition, some examples, a list of attributes and their definitions and so on. Some of these concepts are ‘low level’, like a list of skills owned by workers. Some are ‘brutal’ like a worker, some are organisational concepts like department. There is no requirement to categorise classes in order to talk about and understand them. We have to be careful to separate the ontology of the data model from the ontology of the situation. It is possible, and probably beneficial, to categorise business classes as a taxonomy or ontology for the domain. This would include sub-types, for example. Users of ECDs take the view that the simplicity of relations outweighs any potential gain from using a complex model ontology.
The idea embedded in the ERM is that of the model as an abstraction of reality (the generally used textbook description). This is different from the ECD approach. An ECD is simply a set of propositions about business data – each entity will conform to the class proposition.
Therefore, it is a mistake to assume that when ER diagrams are used in practice they represent a new conceptual layer based on the ER model. Given their long practitioner history, it should not be surprising if the ER diagrams that are actually being used have their origins in Bachman’s structure diagrams. There is an important distinction between using ER diagrams for design and using the ER model. ER model theory is a response to practitioner use of entity class diagrams. We need to ask whether there is widespread change from using ECDs to using the ER model. With the current state of our knowledge of practice it is possible to assert that the ER model is restricted to being an abstract mathematical model.
Positivist methods have resulted in a dearth of evidence to support the widespread use of the ER model in the context of a lackadaisical attitude to checking that the theory account for what happens. Positivist methods have also avoided the messy practice situation. A strand of multi-method research has uncovered the fact that practitioners use a diagrammatic technique based on Bachman’s original ideas. This research method is a useful tool in understanding how to confirm the relevance of theory in practice. Practitioners use ECDs purposefully for talking through data design and understanding the business situation. ECDs are not based on attempting to conceptually model reality with the ER model.
The conclusion is not that the ER model has failed, but that there is a lack of evidence about how to apply the ER model in practice. The available evidence raises difficult questions for the model proponents. A concerted mixed-method research effort could be made to establish what happens in practice so that theorists can learn from expert practitioners. Conceptual modelling theory needs to account for the conceptual thinking that takes place. Certainly, Bachman has not been given due credit in the literature for the ECD (ER diagram) technique. Every text that discusses an ER diagram should be citing Bachman as the source of the technique and should be careful to disentangle the ER model from the diagram technique.
The second paper in this series examines the use of ER diagramming tools that have adapted some of the ideas in the ER model.
|
Bachman, C. W. (1969) Data Structure Diagrams. DATA BASE 1(2): 4-10 |
|
Chen (1983) A Preliminary Framework for Entity-Relationship Models in: Entity-Relationship Approach to Information Modeling and Analysis, (edited by P. Chen), North-Holland (Elsevier), 19 - 28 |
|
Chen P. P ( 2002) Entity-Relationship Modeling: Historical Events, Future Trends, and Lessons Learned in: Software Pioneers: Contributions to Software Engineering, Broy M. and Denert, E. (eds.), Springer-Verlag, Berlin, Lecturing Notes in Computer Sciences, June 2002, pp. 100-114, ISBN# 3-540-43081-4 |
|
Chen P. P. (1977) The Entity-Relationship Model-- A basis for the Enterprise View of Data Proc. of National Computer Conference, 1977, AFIPS Press, 77-84 |
|
Chen, P. P (1983) English sentence structure and entity-relationship diagrams Information Sciences 29 127-149 |
|
Chen, P.P. (1976) The Entity Relationship Model: Towards a unified view of data, ACM Transactions on Database Systems 1(1) pp. 9-36 |
|
Codd, E. F (1970) A Relational Model of Data for Large Shared Data Banks Communications of the ACM 13(6) 377-387. |
|
Darke, P & Shanks, G. (1999) Understanding Corporate Data Models, Information and Management 35 19-30 |
|
Dey, D., Storey, V.C. and Barron, T.M (1999) Improving Database Design Through The Analysis Of Relationships. ACM Transactions on Database Systems (24)4 pp. 453-486 |
|
Dorsey, P & Hudicka J (1999) Oracle 8 Design Using UML Object Modeling Oracle Press/McGraw-Hill. |
|
Goldstein, R.C & Storey, V. (1990) Some findings on the intuitiveness of entity-relationship constructs in F. H Lochovsky (ed) Entity-Relationship Approach to Database Design and Querying, Elsevier Science, Amsterdam |
|
Hitchman, S. (1995) Practitioner perceptions on the use of some semantic concepts in the entity-relationship model. European Journal of Information Systems, 4, 31-40 |
|
Hitchman, S. (1997) Using DEKAF To Understand Modelling In The Practitioner Domain. European Journal of Information Systems, (6)3 pp.181-189. |
|
Hitchman, S. (1999) Ternary Relationships – to three or not to three, is there a question ? European Journal Of Information Systems Vol 8 December pp.224-231 |
|
Hitchman, S. (2000)Object-Oriented Modelling In Practice: Class Model Perceptions In The ERM Context Proceedings of the 19th International Conference on Conceptual Modelling, ER2000, Edited by Liddle, S.W, Mayr, C & Thalheim, B. Salt Lake City, USA 9-12 October 2000 (published as Springer Verlag Lecture Notes In Computer Science vol 1920, ISBN 30540-41072-4) pp.397-408 |
|
Hitchman, S. (2002) The Details Of Conceptual Modelling Notations Are Important – A Comparison Of Relationship Normative Language, Communications of AIS, 9, 167-179 |
|
Hitchman, S. (2003) An Interpretive Study of How Practitioners Use Entity-Relationship Modeling in a Ternary Relationship Situation, Communications of AIS, 11, 451-485 |
|
Jackendoff, R (2002) Foundations Of Language Oxford: Oxford University Press |
|
Shanks, G (1997) The challenges of strategic data planning in practice: an interpretive case study, Journal of Strategic Information Systems 6 69-90 |
|
Sugumaran V, Storey V.C (2002) Ontologies for conceptual modeling:their creation,use, and management Data &Knowledge Engineering 42 251 –271 |
|
Veres, C. and Hitchman, S (2002)Using Psychology to Understand Conceptual Modelling in Wrycza, S. (ed.) Proceedings of the Xth European Conference on Information Systems, Gdansk, Poland, 2002, p. 473-481. |
|
Wand, Y and Weber, R. (2002) Research Commentary: Information Systems and Conceptual Modelling - A Research Agenda Information Systems Research, (13)4363-376. |
|
Watson, R. (1996) Data Management John Wiley |
|
Weber, R (1996) Are Attributes Entities ? A Study Of Database Designers’ Memory Structures Information Systems Research 7(2) 137-162 |
![]()
Steve
Hitchman (steve@infomod.fsbusiness.co.uk) has been lecturing and
consulting in data modelling issues for over ten years. Steve is currently
managing a team of Data Architects in a government department. This paper was
the result of work carried out during a teaching and research semester at
Melbourne University. ![]()
© Copyright, 1998-2004 InConcept (Information Conceptual Modeling, Inc.) All Rights Reserved. Privacy Statement. ISSN: 1533-3825