April 2001        Issue: 19

Journal of Conceptual Modeling
www.inconcept.com/jcm

Conceptual Data Modeling
 in an Object-Oriented Process (Part Two)

by
 
Scot A. Becker

Abstract

Object-Orientated (OO) software development processes (iterative and component driven) are being used more frequently in the industry. Each component iteration generally goes through the following phases: Analysis, Design, Construction, Verification, and Deployment. In an OO process, business rules tend to get captured during the analysis and design phases.

Ideally, the components are specified such that they are relatively independent of each other. In actuality, rigorous analysis of data and rule dependencies between components may reveal that some components are not truly independent of each other.

During the analysis phase, use cases are used to create an analysis class diagram that will then specify a component. Traditional use case analysis is often more narrative and less specific which results in use cases inherently weak in capturing business rules in a consistent manner. On the other hand, if a very robust analysis is done, use cases with an excruciating amount of detail can be produced. While many rules can be captured and documented in this manner, from a data perspective, rules are not easily verified and many rules can be easily missed. Such OO analysis approaches also tend to be heavily design-centric, particularly at the application architecture level. Further, such use cases are often heavily biased towards UI and other process/implementation (and often, design) concerns. While this is important, necessary, and better than past software development processes, it is at a level of abstraction that is often lower than the conceptual level.

Typically, the analysis artifacts are then used as inputs into the design phase that will generate, among other artifacts, design class diagrams. However, OO processes are neither formal nor rigorous; success is dependent on the skills and experience of the analysts and designers, as the resulting constraints are not easily verified. In addition, the OO class diagrams (from a data perspective) are inherently rooted at the logical level and not at the conceptual level.

In this this second installment, we'll introduce how Object-Role Modeling (ORM) can not only overcome the above weaknesses in the OO approach, but can actually improve the quality of the resulting artifacts.

3. A Brief Overview of Object-Role Modeling (ORM)

Now that we have fully explored what is meant by an OO process, and have elaborated on what that process's benefits and problems are, it's time to discuss how to improve the process.

The technique I am proposing to improve the quality of the OO Process is that of Object-Role Modeling (ORM). ORM has been in use since the 1970's (about as long as the more common, ER, style of data modeling). It is not within the scope of this paper to fully explain the syntax of the ORM language. Rather, the author will attempt to explain the key concepts of the method in comparison to the OO process, and suggest how to incorporate ORM's features into that process. The reader is encouraged to explore the works in the reference section for more detailed information on ORM's syntax.

The main difference between ORM and ER is that ORM makes no distinction as to whether a model element is an entity or an attribute - they are just objects who play roles with other objects (also known as "facts"). In this manner, constraints can be expressed freely and easily across the objects and roles that they play. In addition, ORM is expressed in a completely natural language (for example, English sentences). Thus, ORM facts are readily extracted from and verified by the users. Further, ORM makes use of "Data Use Cases" which include real data into the model for easy verification of structures and constraints. In fact, with the aid of CASE tools, most of the constraints can be completely derived and verified from a significant set of sample data.

When an ER style abstraction of the model is helpful (and at various stages of the process, the more succinct ER notation has many benefits), an ER model may be completely derived from the ORM model. In this manner, the derived ER model may be easily compared to persistent class diagrams, but this topic will be discussed in greater detail later.

ORM is defined by discovering easily verbalized facts (for example, as recited by the user) about the Universe of Discourse (UoD, also known as the system domain, area of interest, etc.). For example, consider the fact: "the Movie named 'Pulp Fiction' received the Rating described by 'R - Restricted' in the Country identified by the name 'United States'". The capitalized words are the object types, the qualifiers (such as "named") are the reference modes of the object types (i.e. how do you distinguish one movie from another?), and the remainder of the sentence is the predicate, which indicates the roles the object types play. Thus, the above fact instance can be generalized into the fact type: "Movie(name) received Rating(description) in Country(name)". The predicate of this sentence is then "… received … in …" where the ellipses (…) indicate the "object holes" in a "mixfix" notation. In the mixfix notation, the sentence may be rearranged such that the objects may appear in any order. Alternate readings are often referred to as "inverse" or "alternate" readings. Note that because an object type may appear in any order within the corresponding predicate, this mixfix notation works for any language (which may place verbs at the end of the sentence, for example).

The number of roles in a fact type is known as the "arity" of the fact (in the case of the above fact type, the arity is three). Note that in ORM, facts may be of any arity (not so in many styles of ER which mandate binary relationships) as long as the fact is "elementary" (also said to be "atomic", which means it cannot be broken down into smaller facts without some information loss). The above fact is elementary and we cannot express it in any "smaller" way without information loss. For example, "Movie(name) received Rating(description)" would lose the information that a movie is released in many countries and that each country may have a different rating system. An example of a fact that is not elementary would be "the Movie named 'Pulp Fiction" starred the Person named 'John Travolta' and was directed by the Person named 'Quetin Tarnatino'". In this case, the use of the word "and" is a giveaway that the fact may be divided into the fact types: "Movie(name) starred Person(name)" and "Movie(name) was directed by Person(name)" without any loss of information. Also note that a fact may have an arity of one (a "unary" fact) such as "the Person named 'Pat Hallock' is eccentric". Such unary facts are often implemented as Boolean logical fields (true or false, the person either plays the role or they don't)

Due to the nature of attribute-free models, and since facts may be of any arity, it is easy to express virtually any style of static constraint upon the model. Such constraints are of the following types:

These constraint types may also be combined. For example, an exclusion constraint may be combined with a mandatory disjunction to specify that an object must either play role x or role y but not both. Further, some more complicated constraints may be specified. For example, consider the fact types: "Person(name) works for Department(name)", "Department(name) manages Project(TLA)", and "Person(name) works on Project(TLA)". The constraint that a person may only work on projects that are managed by his/her department is a subset constraint in ORM. This constraint is not possible in any variation of ER modeling or in the UML class diagram syntax, for example, without introducing intermediate (and unnatural) structures.

As a side note, ORM also has specifications for subtypes (inheritance) and various ring constraints (applied when an object plays a role with itself, for example, "Person(name) reports to Person(name)").

In addition to all of the benefits of ORM's syntax, it is performed by using a rigorous and complete process known as the Conceptual Schema Design Procedure [Halpin].

3.1. The Conceptual Schema Design Procedure (CSDP)

The Conceptual Schema Design Procedure (CSDP) is a series of steps that encompass verbalization, application of constraints, model validation, specialization and generalization, and various model transformations. The steps of the CSDP are:

  1. Transform familiar information examples into elementary facts, and apply quality checks. 

  2. Draw the fact types, and apply a population check. 

  3. Check for entity types that should be combined, and note any arithmetic derivations. 

  4. Add uniqueness Constraints, and check arity of fact types. 

  5. Add mandatory role constraints, and check for logical derivations. 

  6. Add value, set comparison, and subtyping constraints. 

  7. Add other constraints and perform final checks.

Correctly applying the steps of the CSDP (the details of which are outside of the scope of this article) ensure that the resulting model is correct, validated, consistent, verified by the users, populated with sample data that conforms to the constraints, and, as an added benefit, fully normalized (the elementary nature of the facts, by definition, finds all functional dependencies and can thus derive a fully normalized logical model).

3.2. ORM Compared to Class Diagrams

ORM's attribute free nature overcomes many of the problems previously identified with class diagrams. Thus, the main differences between ORM and class diagrams are:

3.3. ORM Compared to Use Cases

Use cases are best at defining dynamic constraints and other process-specific concerns, while ORM adequately defines the data structures and the static constraints that apply to the data population. In practice, one finds that in some areas use cases and ORM may capture the same rules. But this overlap will be addressed in the section on incorporating ORM into the OO Process. Both methods are similar in that they rely on the use of a natural language that is easily expressed and verified by the users.

3.4. Benefits and weaknesses of ORM

This brief overview of ORM is concluded by addressing ORM's strengths and weaknesses (as compared to the OO Process). The strengths of ORM (which happen to compliment OO's weaknesses) are: ORM uses a formal rigorous process, it is easy to verify via data use cases and a natural language, ORM contains additional "attribute level" constraints, ORM can easily be expressed in many model transformations, ORM is a natural way to express facts and their constraints, ORM can be used for both a "problem centric" analysis and a "solution centric" design, and ORM contains the significant amount of detail that is needed by design but is not needed by analysis (data types, allowable values, other data-centric concerns, etc.).

The weakness of ORM (from an OO perspective) would be that it inherently does not consider process when defining the data structures. It is the opinion of the author that defining the process that acts upon the data is important, however, unless the process defines the static rules that apply to the set of data instances, process has no bearing on what the correct persistent data structure should be (other implementation constraints, such as performance, may indeed have an impact on the persistent data structure, but that is a different issue than process vs. data).

To reiterate: process is important, and OO artifacts do a fine job at capturing those processes. The author agrees that they should be used to specify dynamic behavior, architecture considerations, and other implementation concerns. As we will see in the next section, ORM can be used in tandem with OO techniques and without sacrificing any of OO strengths but rather by complimenting OO's weaknesses resulting in a better system process overall.

4. Integrating ORM into OO Processes

This section will illustrate how to insert the ORM technique into the OO Process to maximize the quality of the artifacts (and indeed the system as whole). This section will conclude with some evidence of this combination of techniques and the benefits reaped from using this technique at a major manufacturing company.

4.1. ORM's Role in Determining Components

Recall that components are discovered in the very early stages of the project by the creation of rough-cut models used to isolate attributes and processes into relatively independent groupings. Further recall that the use of generalized use cases and simple class diagrams, as typically applied during component analysis, does not always achieve the goal of fully encapsulated components.

Using ORM at the component definition stage along with the rough-cut class diagrams and use cases will improve the accuracy of the initial component divisions. ORM can be used to quickly model the important object types and the roles that they play. Often, and in a similar manner, the rough-cut class diagram will attempt to accomplish the same goal. When both the rough cut ORM model and class diagrams are completed, one can simply derive a logical schema from the ORM schema and compare it to the persistent class diagrams. If both models agree, then components can be easily divided out (after considering behavior described by the use cases, of course). If they do not agree, one model may have discovered a dependency that the other did not (and the author would wager that it was the use of ORM that discovered the dependency).

4.2. ORM in Tandem with the OO Approach

Now that the components are identified and divided out, the project team is ready to begin to run the components through the iterations. As such, ORM has a direct impact to three of the phases of each iteration: Analysis, Design, and Construction. It is also useful to note that since each phase of the iteration feeds the next, ORM has an indirect impact on the subsequent iterations (rules and constraints documented in the ORM model, for example, should become test requirements during the verification phase).

4.2.1. ORM in the Analysis Phase

ORM is best applied during the analysis phase in tandem with the use case analysis. The ORM analyst should team up with the use case analyst while conducting requirements gathering sessions with the user community and subject matter experts. The ORM analyst would then capture the data constraints that ORM expresses, and the use case analyst would document the dynamic constraints and general processes.

Once both the use cases and ORM schema have been validated and verified by the users and subject matter experts, the rules, constraints, and other requirements gathered may be grouped together (in for example, a requirements gathering tool such as Rational Software's Requisite Pro) and handed off to the design team. It may also be useful for the ORM analyst to derive a logical schema that can be used as an initial persistent class model by the design team.

4.2.2. ORM in the Design Phase

During the design phase, it is best for the ORM analyst to leave the design team alone while they work out the class diagrams and implementation concerns and generate the class and interaction diagrams such that they meet the requirements specified by the analysis artifacts. Once the design team has completed their models, it is useful to compare their results with the initial use cases and ORM schema to ensure all constraints and requirements have been met.

During this comparison, one often discovers that the persistent class model differs from the logical model derived from the ORM. This may occur for several reasons:

  1. The designer, because of other constraints, generalized the functionality but still met the requirements (for example, created a generalized "data driven" structure who's persistent model looks quite different than the specialized ORM derived logical model). In this case the ORM model may need to be adjusted to correspond with the design model. Remember that there are many ways to model a situation, and transformations can and do need to be performed. Further remember that the initial ORM model illustrated an analysis view of the world (what the problem is) and the design view of the world (how to solve the problem) may and usually will look different. Those differences must then be agreed upon and implemented. In this manner, it is likely that two ORM schemas will exist: one for the analysis view, and one for the design (and implementation) view. This difference is nothing new to data analysts who often construct "conceptual" models, "logical" models, and "physical" models to account for needed changes between the perspectives. 

  2. The designer omitted some constraints or requirements. In this case, the design needs to be adjusted such that it entails the missing requirements. 

  3. The designer, during the process of specifying the design, discovered requirements that the analysts missed (or the requirements were added on after the analysis phase was completed). When this happens, the added functionality is either stripped from the design (and possibly noted for a later iteration of the component) or added to (synchronized with) the analysis artifacts.

Note that in this manner, there exists a "checks and balances" system between the designers and the analysts. The author has found such teamwork to be very healthy for project success.

4.2.3. ORM in the Construction Phase

ORM's role in the construction phase, particularly with the aid of CASE tools, is fairly trivial. The design oriented ORM schema can be used to derive the physical structure of the database schema that the application will be constructed against. The analysis ORM schema may be a useful input to the construction team members if they desire any sort of knowledge as to why the design is the way it is (i.e. what prompted the design structures).

Note that typically during the construction phase, the ORM analyst - usually being one of the few data folks on the team - is also tasked with physical database implementation concerns like abbreviations, naming conventions, and standardizing the resulting schema into meta-data repositories and the like.

4.3. Case Study Results

As mentioned before, this paper is the result of applying the techniques contained herewith at a major Minnesota manufacturing company. When the author first joined the project team (which, incidentally, consisted of very skilled OO designers and analysts), the data modeling activity was centered on generating physical schema from design's persistent class diagrams; mostly with some success but a couple of notable failures. Over time, the author worked more and more towards the analysis phase and introduced the techniques contained in this paper. As a result, the author is pleased to report the following results:

5. Conclusion

This paper detailed the use of an OO process in the practice of software development along with its inherent weaknesses. In addition, this paper briefly covered a rigorous and formal data modeling technique and how it can be incorporated into the OO process with improved results as illustrated by the previous case study.

Thus, it has been demonstrated that the use of ORM in an OO process greatly improves the quality of the resulting analysis and design artifacts and speeds up the overall software development process and better ensures quality assurance. In summary, it is important to reiterate that the use of ORM in an OO process does not detract from nor sacrifice any of the many benefits of using an OO process in the first place. ORM simply counters OO weaknesses while working in tandem to analyze, design, and construct the desired application. The result is a better system.

 

Scot A. Becker is a software consultant and the founder of Orthogonal Software Corporation. He is also a certified ORM consultant and trainer, a certified Visio trainer, and former Editor of the Journal of Conceptual Modeling.  

Contact Information:

Scot A. Becker
Orthogonal Software Corporation
scot@orthogonalsoftware.com

www.orthogonalsoftware.com

© Copyright, 1998-2004 InConcept (Information Conceptual Modeling, Inc.) All Rights Reserved. Privacy Statement.
ISSN: 1533-3825