April 2000        Issue: 13

Journal of Conceptual Modeling
www.inconcept.com/jcm


Case Study: Delaying the "Attribute or Entity" Decision

By Scot A. Becker

Introduction

The following Case Study occurred at a client site. I think it illustrates an important benefit of Object-Role Modeling (ORM): delaying the decision of whether or not an element is an attribute or an entity. It also -- incidentally -- illustrates a good use for an independent object type.

Note: The model shown has been abstracted from the actual client model both to conform to Non-Disclosure Agreements and for the sake of brevity. Therefore, only the relevant objects are shown. Further, I have modeled some reference modes as names rather than the actual system identifiers to improve readability and provided sample data.

Background

My client is using the Unified Modeling Language (UML) to design most of the application artifacts. The Object-Oriented (OO) approach taken at the client site is iterative, meaning a "small" component of the application is taken through analysis, design, construction, and so on to deployment. In this manner, many components are in various stages of development simultaneously. The data modeling "deliverables" are a logical and physical data model as well as the mapping from the data schema to the persistent classes (i.e. classes whose members require storage in the database). This technically occurs only during the construction phase of the iteration.

To ensure data model consistency, I inserted ORM in to the process in the analysis (create the ORM diagram) and design (update the ORM diagram to capture any changes due to design) phases for each component. Doing so allowed me to find a potential error that may have happened had the data modeling effort been completely made using Entity-Relationship (ER) modeling at the logical/physical level.

The Problem

The dependency I discovered via ORM occurred across three seemingly unrelated components. The first component - among many other non-persistent responsibilities -- translated an application class member (a.k.a. variable or property) to an alias (business) name. The second component translated an attribute name to its (allowable values) code set housed in another system. The third component mapped the system element names to the source (legacy) field names to assist in dynamically performing translation between the target (new) system and the source (legacy) system.

Note that these components are of little value to a user interface and therefore any resulting tables are known only to the system. In taking the data modeling approach of deriving the tables completely from the class diagrams of the components, it would have been easy to create the following models for each component (showing only the persistent elements from the classes):

Figure 1: The first component's Logical Data Model (derived from the class diagrams)

Figure 2: The second component's Logical Data Model (derived from the class diagrams)

Figure 3: The third component's Logical Data Model (derived from the class diagrams)

The first component was already modeled as shown in figure one. It was during the analysis phase, when I was creating ORM diagrams for the latter two components, that I discovered a common domain (and thus, an ORM object type) across the three components: namely the Application Attribute Name. The first component merely gives aliases for some of the attributes known to the application. The second component finds the list of allowable values for a given attribute known to the application. The third component maps an attribute known to the application to the source (legacy) system's "equivalent" attribute ("equivalent" being in quotes as some data cleaning and translation is, of course, needed to perform this task).

Without ensuring domain consistency across all instances of Application Attribute Name, it is possible that the same Application Attribute could be used in the different components with different names. For consistency sake, error-free inter-component communication (if any), and semantic clarity some sort of domain enforcement should be performed. Considering that the different components would be constructed and deployed at different times and by different developers, the risk of domain enforcement errors is high.

Further, suppose we want to start tracking other information about the Application Attribute in later components (or in later iterations of any of the three above components). Without modeling the Application Attribute as an Entity early on, larger model (and subsequent code) changes would be required later.

Attempting to Modify the Schema in ER

Of course, it is possible to recognize that primary key of the resulting table in component 2 is the same data element in components one and three. In this case, the tables resulting from component one and three would contain a foreign key to the (primary key of the) table resulting from component two. However, this would enforce a requirement that isn't true: namely than an instance of the attribute must be registered by component 2 in order to be utilized by any of the other components. Most experienced ER modelers would, at this point, realize that Application Attribute is actually an Entity (albeit with a name potentially confusing to those well-versed in the ER lingo) and model it as such (migrating the PK of the resulting new entity to the tables used by the components). However, keep in mind these components aren't very interesting (visible to the user, complex, or otherwise attention catching). I would wager that most folks wouldn't have noticed the missing domain enforcement; unless of course, they used ORM.

The Solution

Using the Conceptual Schema Design Procedure (CSDP), one merely identifies the facts of interest to the components.

For readers not familiar with the CSDP, the steps are [Halpin]:

  1. Transform Familiar information examples into elementary facts, and apply quality checks.

  2. Draw the fact types, and apply a population check.

  3. Check for entity types that should be combined, and note any arithmetic derivations.

  4. Add uniqueness constraints, and check arity of fact types.

  5. Add mandatory role constraints, and check for logical derivations.

  6. Add value, set comparison, and subtyping constraints.

  7. Add other constraints and perform final checks.

For step one, we (with the help of one or more domain experts) verbalize the facts of the three components, complete with information examples (which will be used again in step two). Some of these fact verbalizations are:

  1. The Class Member with the name "gender_code" has the Alias Name "sex"

  2. The Class Member with the name "gender_code" has the Alias Name "sex_code"

  3. The Class Member with the name "born_date" has the Alias Name "dob"

  4. The Class Member with the name "emp_name" has the Alias Name "person_name"

  5. The Class Member with the name "cust_name" has the Alias Name "person_name"

  6. The Attribute with the name "gender_code" maps to the Lookup Code Set identifed by "1"

  7. The Attribute with the name "length_unit" maps to the Lookup Code Set identifed by "2"

  8. The Attribute with the name "width_unit" maps to the Lookup Code Set identifed by "2"

  9. The Attribute with the name "height_unit" maps to the Lookup Code Set identifed by "2"

  10. The System Element with the name "emp_name" maps to the Legacy Field with the name "emp.name"

  11. The System Element with the name "emp_name" maps to the Legacy Field with the name "hr.hirename"

  12. The System Element with the name "factory_code" maps to the Legacy Field with the name "log.sitecode"

  13. The System Element with the name "warehs_code" maps to the Legacy Field with the name "log.sitecode"

Now that we have fact instances, it's easy to identify the fact types. They are:

  1. Class Member (name) has Alias Name ()

  2. Attribute (name) maps to Lookup Code Set (id)

  3. System Element (name) maps to Legacy Field Name ()

For step two, we transform the fact types into their equivalent graphical notation and apply the fact instance sample data to the fact types as example data as shown in the following figure:

Figure 4: The ORM schema at CSDP step two

It is in step three (Check for entity types that should be combined, and note any arithmetic derivations) that we realize the missing object type. Note that the object types "Class Member (name)", "Attribute (name)", and "System Element (name)" have the same reference mode and are, upon further inspection of the provided example data, talking about the same thing: attributes known to the system. They can be combined into one object type (Application Attribute (name)).

Due to combining the three entity types into one, we had to adjust the facts a little bit for component 2. Namely, we introduced another entity type whose reference mode is explicit (the Application Attribute (name)). We then need to adjust the fact related to Lookup Code set (name) accordingly.

Note that by correctly applying the rigorous process used in forming an ORM schema ensures that the mistake we made earlier (using ER) will not occur. Further note that there are no arithmetic derivations to account for in this schema. The resulting ORM conceptual schema now looks like:

Figure 5: The ORM schema at CSDP step three

In step four, we can apply the uniqueness constraints by simply looking at the example data and noting the unique columns. Assuming the sample data is sufficient (and in this case, it is) applying the uniqueness constraints is easy. Namely, the combination of Application Attribute and the Alias Name is unique, as is the combination of the Application Attribute and the Legacy Field Name. Since an Application Attribute uniquely identifies an instance of a Lookup code Set Registration, there is a one to one uniqueness between the two object types. Likewise, a given Lookup Code Set registration can have only one given Lookup Code Set. Further, when a uniqueness constraint is the primary identifier of an entity type, we annotate that constraint with a "P". At the end of step four, one should also look at the arity of the fact types and how they correspond to the uniqueness constraints. Since our ORM schema is simple (no arities higher than 2) I'll forgo this check and any discussion in this article. The reader is encouraged to read more about this subject in [Halpin].

Figure 6: The ORM schema at CSDP step four

In step five, we add mandatory constrains and check for any logical derivations. There are only two fact types that are truly mandatory. They are resulting from the "Lookup Code Set Registration" entity type (we have to do this because this object's reference mode is explicit). Again, since our ORM schema is simple, I will forgo any discussion on checking for logical derivations (there are none).

We apply the mandatory constraints in the following figure:

Figure 7: The ORM schema at CSDP step five

In step six, we look for value, subset, equality, and subtyping constraints. The only one we need to identify in this case is a value constraint. Take a second look at the Application Attribute entity type in figure seven and note that it plays no mandatory roles. This suggests we want to make this object type independent, which means we wish to maintain a separate list of values for this object type regardless of which roles the object type happens to be playing at any given moment of time in the system. In essence, it is here we make the crucial distinction that the system has to know about its attributes regardless of which (if any) components that happen to be using that attribute.

Note that a failure to make this object type independent would make the resulting logical schema look like the combinations of figures one through three (with no domain enforcement between the tables).

Further note that if the object type played any mandatory functional roles (a uniqueness constraint over only the Application Attribute role), specifying it as independent would have been incorrect as a separate table for the Application Attribute and its functional roles would be mapped automatically.

This subtle change from figure seven is shown below in figure eight.

In step seven, we check for constraints not noted in the previous steps such as frequency constraints, ring constraints and any other missing rules that we need to express as text on the model. There are no constraints of this nature that we need to account for in this schema.

The following figures illustrate the ORM diagram and resulting logical model encompassed by the three components:

Figure 8: The final ORM conceptual schema (also at CSDP steps six and seven)

Figure 9: The resulting logical model derived from the (final) ORM conceptual schema

The Benefits of Independence

Now, the Application Attribute is correctly modeled as an entity type, ensuring domain consistency. The independence of the Application Attribute (name) object ensures that it is mapped as its own entity allowing for other information to be tracked for the Application Attribute at a later date resulting in what is likely to be minimal changes to the model. Without the independence setting, the schema in figure four could be modeled differently. However, in doing this, the Application Attribute (name) object is likely to result in being the primary key of one of the tables (in this case, component two) which means the Application Attribute must play the roles of that component before it can play the roles in the subsequent components. This is not the case in the application (i.e. it may have no list of allowable values but is mapped to the legacy system as in the case of attributes like name or description). Therefore, the independence setting is the best approach.

In practice, independent object types of this nature are rare. Usually, entity types have at least one mandatory role (other than the reference scheme). In my experience, independent object types usually result from nesting a predicate as an object type and then attaching some optional role to the nested object type such temporal information (such as a termination Date that would be filled in by the user later).

Conclusion

This case study illustrates the importance of delaying the decision of whether a model element is an attribute or an entity. Object-Role Modeling makes no assumptions about an object's importance until the mapping from the conceptual schema to its logical derivative is performed. In doing so, it alleviates costly data integrity and (later) schema change problems. While experienced ER modelers may scoff and say they would have correctly modeled this "attribute" as an entity in the first place, it still stands to reason that it is better to use a method that (provided you correctly follow the steps of the CSDP) guarantees the correct schema every time.

References

Halpin, Terry, Conceptual Schema and Relational Database Design, Second Edition, WytLytPub, 1999.

Scot A. Becker is a software consultant and the founder of Orthogonal Software Corporation. He is also a certified ORM consultant and trainer, a certified Visio trainer, and former Editor of the Journal of Conceptual Modeling.  

Contact Information:

Scot A. Becker
Orthogonal Software Corporation
scot@orthogonalsoftware.com

www.orthogonalsoftware.com

© Copyright, 1998-2004 InConcept (Information Conceptual Modeling, Inc.) All Rights Reserved. Privacy Statement.
ISSN: 1533-3825