December
1999 Issue: 11
Journal of Conceptual Modeling
www.inconcept.com/jcm
Entity
Relationship Modeling from an ORM Perspective: Part 1
By Dr. Terry
Halpin
Introduction
Entity Relationship modeling (ER) views the application domain in terms of entities that have attributes and participate in relationships. For example, the fact that an employee was born on a date is modeled by assigning a birthdate attribute to the Employee entity type, whereas the fact that an employee works for a department is modeled as a relationship between them. This view of the world is quite intuitive, and in spite of the recent rise of UML for modeling object-oriented applications, ER is still the most popular data modeling approach for database applications.
The ER approach was originally proposed by Peter Chen in 1976, in the very first issue of an influential ACM journal [2]. As shown in Figure 1, Chen's original notation used rectangles for entity types, diamonds for relationships, and ellipses for attributes. The double ellipse indicates unique identifier attributes, and the "n and "1" indicate the relationship is many to one (each employee works for at most one department, but many employees may work for the same department).

Figure 1 An early ER notation used by Chen
The direction in which relationship names are to be read is formally undecided, unless we add some additional marks (e.g. arrows) or rules (e.g. always read from left to right and from top to bottom). For example, does the employee work for the department, or does the department work for the employee? Although we can use our background knowledge to informally disambiguate this example, it is quite common nowadays to see ER models with relationships whose intended direction can only be guessed at by anybody other than the model's creator. For example, consider the impact of misreading the intended direction for the following: Person killed Animal; Person is loved by Person. This problem is exacerbated if the verb phrase used to name the relationship is shortened to one word (e.g. "work", "love"), unfortunately still a fairly common practice.
Chen's notation evolved over time. His current ER-Designer tool uses hexagons instead of diamonds, and supports n-ary relationships. Outside academia however, Chen's notation seems to be rarely used nowadays, so I'll say no more about it here. One of the problems with the ER approach is that there are so many versions of it, with no single standard. In industrial practice, the most popular versions of ER are the Oracle and Information Engineering (IE) notations. Another popular data modeling notation is IDEF1X, but since this is a hybrid of ER and relational notation, I don't count it as a true ER representative. As discussed in earlier articles [3], UML class diagrams can be regarded as an extended version of ER. The rest of this article focuses on basic aspects of the Oracle notation for ER. Later articles will examine IE and IDEF1X.
Oracle ER: the basics
The Oracle ER notation has been used for many years in CASE tools from Oracle Corporation, and the discussion here is based on the classic treatment by Richard Barker [1]. Oracle's Object Designer tool now supports UML as an alternative to its traditional ER notation. For database applications, many modelers still prefer Oracle's ER notation in preference to UML, and it will be interesting to see whether this changes over time. Dave Hay, an experienced modeler and ardent fan of the Oracle ER notation, argues that "there is no such thing as 'object-oriented analysis'" [5], only object-oriented design, and that "UML is … not suitable for analyzing business requirements in cooperation with business people"[6].
While I agree with Dave Hay that UML class diagrams are less than ideal for data modeling, I feel that his preferred ER notation shares some of UML's weaknesses in being attribute-based. As I've discussed before in a UML context [3, 4], using attributes in a base conceptual model adds complexity and instability, while making it harder to validate models with domain experts using verbalization and sample populations. Attributes are great for logical design, since they allow compact diagrams that directly represent the data structures (e.g. relations or object-relations) used for the actual design. However when I'm performing conceptual analysis, I just want to know what the facts and rules are about the business, and I want to communicate this information in sentences, so that the model can be understood by the domain experts. I sure don't want to bother about how facts are grouped into multi-fact structures. Whether some fact will end up in the design as an attribute is not a conceptual issue to me. As Ron Ross says, "Sponsors of business rule projects must sign off on the sentences-not on graphical data models. Most methodologies and CASE tools have this more or less backwards" [6, p.15]. The ORM reporting facilities in Visio Enterprise allow the domain expert to inspect ORM models fully verbalized into sentences with examples, making validation much easier and safer.
Now that I've stated my bias up front, let's examine the Oracle ER notation itself. The basic conventions are illustrated in Figure 2. Entity types are shown as soft rectangles (rounded corners) with their name in capitals. Attributes are written below the entity type name. Some constraint information may appear before an attribute name. A "#" indicates that the attribute is the primary identifier of the entity type, or at least a component of its primary identification scheme. A "*" or heavy dot "·" indicates the attribute is mandatory (i.e. each instance in the database population of the entity type must have a non-null value recorded for this attribute). A "°" indicates the attribute is optional. Some modelers also use a period "." to indicate the attribute is not part of the identifier.

Figure 2 The basic ER notation used by Oracle
Relationships are restricted to binaries (no unaries, ternaries or longer relationships), and are shown as lines with a relationship name at the end from which that relationship name is to be read. This name placement overcomes the ambiguous direction problem mentioned earlier. Both forward and inverse readings may be displayed for a binary relationship, one on either side of the line. This makes the notation superior to UML for verbalizing relationships.
From an ORM perspective, each end (or half) of a relationship line corresponds to a role. Like ORM, Oracle ER treats role optionality and cardinality as distinct, orthogonal concepts, instead of lumping them together into a single concept (e.g. multiplicity in UML). A solid line-half denotes a mandatory role, and a dotted line-half indicates an optional role. For cardinality, a fork or crow's foot intuitively indicates "many", by its many "toes". The absence of a crow's foot intuitively indicates "one". The crow's foot notation was invented by Gordon Everest, who originally used the term "inverted arrow" but now just calls it a "fork". The basic correspondence with the ORM notation for uniqueness and mandatory role constraints is shown in Figure 3.

Figure 3 The ER diagram (a) is equivalent to the ORM diagram (b)
If each of the two roles in a binary association may be assigned one of optional/mandatory and one of many/one, there are sixteen patterns. The equivalent Oracle ER and ORM diagrams for the first eight of these cases are shown in Figure 4.

Figure 4 Some equivalent cases
The other eight cases are shown in Figure 5. Although all eight are legal in ORM, the last case where both roles of a many:many relationship are mandatory is considered illegal in Oracle.

Figure 5 Other equivalent cases
Ring associations that are illegal in Oracle are shown in Figure 6(a). Although rare, they sometimes occur in reality, so should be allowed at the conceptual level, as permitted in ORM. As an exercise, you may wish to invent satisfying populations for the ORM associations in Figure 6 (b).

Figure 6 Illegal ring associations in Oracle (a) that are rare but allowed in ORM (b)
In Oracle ER, a bar "|" across one end of a relationship indicates that the relationship is a component of the primary identifier for the entity type at that end. In Figure 7 for example, Employee and Building have simple identifiers, but Room has a composite reference scheme, being identified partly by its room number and partly by the building in which it is included.

Figure 7 Room is identified by combining its room nr and its relationship to Building
The use of identification bars provides some of the functionality afforded by external uniqueness constraints in ORM. For example, the schemas in Figure 8 are equivalent. The other attributes of Room and Building in ORM would be modeled in ORM as relationships. ORM's external uniqueness notation seems to me to convey more intuitively the idea that each RoomNr, Building combination is unique (i.e. refers to at most one room). But maybe I'm biased. At any rate, this constraint (as well as any other graphic constraint) can be automatically verbalized in natural language.

Figure 8 Composite identification in Oracle ER (a) and ORM (b)
Some people misread the bar notation for composite identification as a "1", since this is what the symbol means in many other ER notations. But this isn't a problem if you don't have to work with multiple versions of ER. The main problem with the bar notation is that it doesn't generalize, since it can't be used for declaring composite identification schemes involving only attributes (no relationships). Because ORM always uses relationships instead of attributes, it doesn't have this problem. As an example, suppose we wanted to model the information shown in Table 1, as well as other facts about rooms.
Table 1 A simple data use case for room scheduling
| Room | Time | ActivityCode | ActivityName |
| 20 | Mon 9 am | VMC | VisioModeler class |
| 20 | Tue 2 pm | VMC | VisioModeler class |
| 33 | Mon 9 am | AQD | ActiveQuery demo |
| 33 | Fri 5 pm | SP | Staff party |
| ... | ... | ... | ... |
The table suggests that rooms can be simply identified by room numbers, so let's accept that. One way of modeling the situation in Oracle ER is shown in Figure 9. Here the bar notation is used to show that RoomTimeSlot is identified by combining its time and room number.

Figure 9 An ER diagram for room scheduling
The use of attributes in this model makes it hard to verbalize and populate the schema for validation purposes. Moreover, there is at least one constraint missing. Compare this with the populated ORM model for the same situation. Here the facts are naturally verbalized as a ternary (Room at Time is used Activity) and a binary (Activity has ActivityName). The associated fact tables include the original facts, as well as counter-facts (italicized) to test the constraints. The first counter-row (20, Mon 9 am, AQD) tests the uniqueness constraint that a room at a time is used for at most one activity. The second counter-row tests the uniqueness constraint that at most one room can be used for a given activity at a given time. This constraint may well be wrong, but at least we can express it and test it in ORM. With the ER model there is no way of even specifying the constraint, much less testing it. The counter rows (SP, Sales phonecalls) and (PTY, Staff party) are designed to check the uniqueness constraints that each Activity has at most one Activity name and vice versa. If these are rejected, the association really is 1:1, as its basic population suggests. Since the ER notation being discussed doesn't include a way of indicating that attributes other than the primary identifier are unique, it isn't very helpful here. As a small point, the Y2K row has been added to the original population to indicate that it is possible for some listed activities to be unscheduled.

Figure 10 An ORM diagram for room scheduling, with sample and counter data
In case it looks like I'm just bashing attribute-based approaches like ER in this article, let me say again that I find attribute-based models useful for compact overviews and for getting closer to the implementation model. However I generate these by mapping from ORM, which I use exclusively for conceptual analysis. This makes it easier to get the model right in the first place, and to modify it as the underlying domain evolves. Unlike ER (and UML for that matter), ORM was built from a linguistic basis, and its graphic notation was carefully chosen to exploit the potential of sample populations. To reap the benefits of verbalization and population for communication with and validation by domain experts, it's better to use a language that was designed with this in mind. An added benefit of ORM is that its graphic notation can capture many more business rules than popular ER notations.
Next issues
Later articles in this series will consider more advanced aspects of the Oracle ER notation, including exclusion constraints, frequency constraints, subtyping and non-transferable relationships, and then examine the Information Engineering notation for ER, before concluding with a discussion of IDEF1X.
References
![]()
Dr Terry Halpin, BSc, DipEd, BA, MLitStud, PhD, is a Program Manager in Database Modeling for the Enterprise Frameworks and Tools Unit, Microsoft Corporation, USA., Seattle WA, USA. During a lengthy career as an academic in computer science, he also worked in industry on database modeling technology and as a data modeling consultant. His recent positions include head of database research at Asymetrix Corporation, and research director of InfoModelers Inc., which was acquired by Visio Corporation. For several years, his research has focused on conceptual modeling and conceptual query technology for information systems, using a business rules approach. Dr Halpin has presented papers and tutorials at many international conferences. His doctoral thesis provided the first full formalization of Object-Role Modeling (ORM/NIAM), and his publications include over ninety technical papers, as well as four books, including Information Modeling and Relational Databases (Morgan Kaufmann, 2001).
Contact Information:
Dr Terry Halpin
Program Manager, Database Modeling
Enterprise Framework & Tools Unit, Microsoft Corporation
One Microsoft Way
Redmond WA 98052-6399 (USA)
terryha@microsoft.com
(425) 705 9190
fax: (425) 936 7329
http://www.orm.net
![]()
© Copyright, 1998-2004 InConcept
(Information Conceptual Modeling, Inc.) All
Rights Reserved. Privacy Statement.
ISSN: 1533-3825