February 2001        Issue: 18

Journal of Conceptual Modeling
www.inconcept.com/jcm

Conceptual Data Modeling in an
Object-Oriented Process (Part One)

by
Scot A. Becker

 Abstract

Object-Orientated (OO) software development processes (iterative and component driven) are being used more frequently in the industry. Each component iteration generally goes through the following phases: Analysis, Design, Construction, Verification, and Deployment. In an OO process, business rules tend to get captured during the analysis and design phases.

Ideally, the components are specified such that they are relatively independent of each other. In actuality, rigorous analysis of data and rule dependencies between components may reveal that some components are not truly independent of each other.

During the analysis phase, use cases are used to create an analysis class diagram that will then specify a component. Traditional use case analysis is often more narrative and less specific which results in use cases inherently weak in capturing business rules in a consistent manner. On the other hand, if a very robust analysis is done, use cases with an excruciating amount of detail can be produced. While many rules can be captured and documented in this manner, from a data perspective, rules are not easily verified and many rules can be easily missed. Such OO analysis approaches also tend to be heavily design-centric, particularly at the application architecture level. Further, such use cases are often heavily biased towards UI and other process/implementation (and often, design) concerns. While this is important, necessary, and better than past software development processes, it is at a level of abstraction that is often lower than the conceptual level.

Typically, the analysis artifacts are then used as inputs into the design phase that will generate, among other artifacts, design class diagrams. However, OO processes are neither formal nor rigorous; success is dependent on the skills and experience of the analysts and designers, as the resulting constraints are not easily verified. In addition, the OO class diagrams (from a data perspective) are inherently rooted at the logical level and not at the conceptual level.

In this first installment, this paper discusses OO processes in more detail.

1. Introduction

The trend towards using object oriented (OO) techniques to design and develop software is steadily increasing. This trend is often looked upon as the dawn of a new age where software will be easier to design, develop, maintain, and subsequently upgrade with new functionality. Others will contend that the OO way of looking at development is nothing new. This latter group of people will aver that the OO approach is simply the "waterfall" approach used in the previous decades with merely the addition of smaller waterfalls and a few "new" diagrams.

While this is a compelling -- and heated -- discussion, this paper will not attempt to resolve this debate. This paper will, however, try to address the same old issues that have always plagued software development: quality, accuracy, rigor, consistency, speed of software delivery, and user support.

While the OO approach, due to the nature of its many diagrams, tends to improve the resulting analysis and design artifacts as a whole, it is fundamentally lacking in a key area: data. The OO approach tends to be overly process centric. While this approach is results oriented, tends towards greater reuse of code, and yields an overall application consistency, it also neglects key issues centering on data integrity and quality. Thus, and it should come as no surprise, if the supporting data is bad, even the best user interface and middle tiers will not keep the software project alive.

This paper will reveal how to introduce a relatively old way of looking at data into this new way of looking at application analysis and design. As it turns out, this formal method of data modeling does not compete with nor detract from the benefits of an OO approach. Rather, this method actually improves the OO process. As supporting evidence, this paper was formulated after a year of applying these techniques at a major Minnesota manufacturing company with positive results.

2. A Brief Overview of Object-Oriented (OO) Processes

2.1. Components

The typical OO approach centers around two key concepts. The first concept is the notion of a component. A component is a portion of the application that is divided in such a way that each component is relatively independent of the next. Components bundle data and processes acting upon that data in such that they yield greater reuse. The resulting application is then constructed in such a way that it merely calls the components when it needs the data and functionality which the components provide. Keeping like data and functionality centered upon the notion of a component tends towards greater reuse and smaller code that is easier to maintain. Further, a component can be improved upon (upgraded, or fixed) at any time with little interference to the surrounding application. As a final note, from a data perspective a component is usually divided amongst groups/clusters of related tables. From a process perspective, a component is usually centered on core workflows.

Components are typically determined by making rough-cut models of the system as a whole (the typical models used are covered in a later section). These models are not specific enough to begin design, but they identify key process flows and data clusters. In addition, these rough-cut models will aid in the identification of any architecture constraints that will need to be addressed as well as provide some idea of the scope of the project as a whole (which leads to better estimates of time, materials, etc.). It is in this manner that the resulting components are (presumably) correctly identified and divided such that they are independent. Further, in practice this is largely the case. However, the details of these models are not specific enough to ensure 100% encapsulation of the components. The technique addressed in the second part of this paper will ensure the components are (virtually -- not accounting for small amounts of error that will exist no matter how perfect your approach is) 100% encapsulated.

2.2. Iterations

The second key concept of the OO approach is the notion of iterative development. Each version of each component will step through the phases of the iteration (details of the phases are covered in the following sections). In fact, it often happens that some part of the application is actually deployed while the remaining functionality of the application is still in the various phases of development. This produces an exciting and unusual phenomenon: the users get the (part of the) application faster.

This component driven approach is different from the typical "waterfall" approach in which the entire application will move through each phase before proceeding to the next. In this manner, the (version of the) application is not delivered until it has passed through all phases. Further, it is difficult to begin work on subsequent versions of the application (with presumably increased functionality) until the preceding application versions are completed and delivered.

2.2.1. Iteration Phases

The phases of an iteration are generally: analysis, design, construction, verification, and deployment. Each preceding phase of an iteration is crucial to the success of the next. Without proper analysis, the resulting design will be incorrect. Without proper design, the resulting code will be incorrect. Without proper coding, the testing will fail, and so on.

2.2.1.1. Analysis

The analysis phase of an iteration is used to determine the functionality, requirements, interface specifications, and business rules of the component. Typically, use cases and analysis class diagrams (detailed later) are constructed to elaborate the result of the analysis and verify those results with the users. The analysis phase is intended to merely state the requirements without having any design bias. This is good; in this manner analysis identifies the problem and the constraints such that any design that meets those constraints and provides a solution to the problem is right on target. If one thinks about it, why would the users need to worry about how the data is represented in the underlying tables and application classes, or whether the application is using COM or CORBA architecture? The answer is simply that they don't need to know (nor do they care).

The analysis phase is completed once the design team has enough information to design the implementation of the component. It is also interesting to note that the analysis requirements map directly into verification tests (i.e. does the resulting application adhere to the constraints and rules specified by analysis and the users?).

2.2.1.2. Design

The design phase of an iteration is the technical solution to the artifacts presented by the analysis. In the design phase, the details of the architecture, the overall system constraints (i.e. performance, compatibility, etc.), the user interface, the data structures, and overall application cohesiveness and consistency are fretted over. The design is completed once the implementation (construction) team has enough information to write the code to make the application work.

2.2.1.3. Construction

The construction phase of an iteration is the actual implementation of the design. If proper analysis and design was performed, the construction phase is relatively straightforward. The construction phase is complete once they have handed off enough code to begin verification.

2.2.1.4. Verification

The verification phase of an iteration is the testing of the requirements, constraints, business rules, and other specifications of a component against the actual component that was constructed. Verification is complete once the component can be deployed.

2.2.1.5. Deployment

The deployment phase of an iteration is the actual use of the component. Deployment does not necessarily mean that the users have access to it. It may be deployed in such a manner that other components can interface to it or be a subset of the functionality of an overall application release.

2.2.1.6. Iterating a Component

If -- at any phase -- a component is deemed to be fundamentally incorrect, the component is immediately "iterated" (the component is versioned and starts the iteration all over again beginning with analysis). Likewise, if the scope (functionality) of a component is altered, the component is simply versioned and begins a new iteration. Each iteration is divided such that subsequent component versions (or entirely independent components) can be worked on in a pipeline fashion (Figure 1); once a component version has completed a phase, the next component (or version of the same component) may enter that same phase. In this manner, the overall functionality of the application is delivered piece by piece.

Figure 1: How components move through an iteration.

2.3. Typical Use of Data Modeling Techniques in an OO Process

The OO process typically lacks in one important detail: persistence of the data. The data structures are -- by definition -- object oriented. If the target Database Management System (DBMS) is also object oriented, mapping the persistent application classes to persistent OODBMS classes is straightforward. However, if the target DBMS is a Relational Database Management System (RDBMS), as is presently the typical case, the mapping is not so straightforward. Because of this, the traditional data modeling techniques of creating a Logical Data Model (LDM) and Physical Data Model (PDM) are often employed. Because the data structures are defined by the design phase, and are needed for the construction phase, the LDM and PDM are typically defined during the handoff between design and construction. In addition, because mapping the OO classes to tables and columns specifies the persistence of the data, there is seldom much value in having an LDM differing in any significant way from a PDM. In other words, when data modeling is performed in this fashion, one is only mapping classes to tables. Therefore, the only usefulness of an LDM in this scenario is to use "Business" names rather than the physical names usually subject to length and abbreviation standards and constraints. In this manner, the typical deliverables of a data modeler for each component is the realization of the design in persistent data structures and some sort of mapping specifications that detail which member of which class maps to which column of which table. Thus, the data modeler has little impact on the quality and accuracy of the analysis and design.

The author wonders why an OO Process performed in this manner needs a skilled data architect at all; this task could be easily accomplished by someone who is able to read a class diagram and write the SQL on the target database. Further, the use of expensive Computer Aided Software Engineering (CASE) tools to generate the SQL seems rather moot. The author isn't the only one with this opinion; the maker of one of the most popular OO Design CASE tools recently released a set of UML stereotypes that simply express the persistent classes of a class diagram as "tables" and generates the SQL needed to implement them on the target platform. If the reader is expressing any hesitation with this concept, s/he is correct, but the author will address these points shortly.

2.4. OO Artifacts (UML)

In continuing our overview of the OO Process approach, it is necessary to briefly describe the key OO artifacts produced during the analysis and design phases of an iteration. The wildly popular OO syntax (and it is a syntax, not a methodology) known as the Unified Modeling Language (UML) is often employed to graphically express the analysis and design artifacts.

2.4.1. Use Cases

Use cases are the UML's primary way of specifying -- during the analysis phase of an iteration -- the user's perspective of process, constraints, business rules, and other requirements. The core use case diagram is often called a context diagram. In this diagram, stick figures are used to represent actors (such user types/roles/classifications, external systems, etc.) and ovals represent the use case. Association lines are then drawn between the actors and the use cases they call/instantiate/use. Also note that use cases can use other use cases (via uses or extends stereotypes).

While the context diagram is useful for visualizing the behavior of the system as a whole, and is a brilliant representation of the reuse of the system pieces, the "devil is in the details": the use case document. The use case document usually contains all of the business rules, attributes, constraints, requirements, assumptions, and exceptions of the use case. As there may be more than one way to perform the same task, these variants are broken out into separate flows within the same use case. Then, for each flow, the actual path of behavior is usually mapped out in a table fashion with the actor's behavior on one side and the system behavior/response on the other. Attributes are usually generally specified (i.e. "the actor supplies their User ID and Password") and any constraints on those attributes - static or dynamic - are then textually specified. In order to limit repetition of the same requirements/constraints/business rules, Use Cases are often broken up such that a given rule is written once and used (called) often.

The biggest benefit to using use cases is that they are written in a natural language. Their format is usually easy to explain and read, and therefore users can easily validate them. Paradoxically, the biggest weakness of the use case is the fact that it is just a document. There is no formal method for gathering, specifying, or verifying the requirements. Inconsistency, inaccuracy, and vague rules can run rampant unless the analyst is extremely careful. This fact leads to two interesting results, which will be elaborated upon in a following section on the weakness of the OO approach.

2.4.2. Class Diagrams

Class diagrams are employed to graphically express classes, their data structures (such as members and/or dependent classes), and the encapsulated process that acts upon the data (methods). Communication between classes (roughly analogous to data modeling's relationships) is expressed via associations while subtypes are expressed via inheritance. From a pure data-oriented perspective, persistent class diagrams are roughly equivalent to Entity Relationship (ER) diagrams with methods and multi-valued attributes. In mapping persistent class diagrams to ER structures, one needs to account for other relationships types such as composition, aggregation, and dependencies, but that is really beyond the scope of this paper.

The most obvious difference between a class diagram and an ER model is the use of all sorts of non-persistent classes, interfaces, dependencies, and methods. These elements are crucial to proper application design and implementation of the chose architecture, but utterly irrelevant to proper data design.

Because of their inherent detail, class diagrams are seldom fully detailed in the analysis phase. Thus, most of the specification of class diagrams is performed during the design phase.

2.4.3. Other UML Diagrams

The UML contains many other diagram types useful for specifying various parts of the system. Interaction diagrams, for example, detail the behavior of and communication between classes. In fact, interaction diagrams are probably the third most commonly used UML diagram type. Other diagram types will specify states, deployment plans, architecture, and activity. Because this paper is concerned largely with the static constraints expressed by use cases and class diagrams, the author will not elaborate any further on these diagram types.

2.5. Benefits of an OO approach

It is hopefully evident by now what the benefits of an OO approach are; they include: greater re-use, more detailed specification of system constraints and processes, easier communication and verification with the users/stakeholders, and a clearer distinction between what the problem is (analysis) and how the problem will be solved (design and construction). If one where to specify in one word what the biggest advantage of using the OO approach is, that word would be "process"; inversely, if one where to specify the biggest drawback to the OO approach, that word would be "data".

2.6. Weaknesses of an OO approach (from a data perspective)

The largest weakness to the OO approach is its lack of rigor, verification and completeness when expressing static data constraints. The approach is largely process-centric and is usually implementation biased (focusing on how to solve the problem, not what the problem is).

2.6.1. Class Diagrams

As previously discussed, class diagrams, when abstracted down to the persistent data storage layer, are little more than ER diagrams with behavior and multi-valued attributes. Because of this, they are subject to the same criticisms previously reserved for ER diagrams: lack of attribute-level rule specification, un-natural ways of expressing dependencies amongst data, tendency towards un-normalized schemas, inflexibility when it is time to change the system, and ease of introducing errors that are correct as far as the syntax is concerned. All of these criticisms can be addressed via traditional data modeling techniques (specifically, by using a technique that is not so concerned with the entity-attribute classification of system elements).

2.6.2. Specifying Components

The primary weakness in the OO process way of specifying components is that only "rough-cut" models are used. Once those components enter the analysis and design phases, one may discover that they are not so independent after all. This is largely due to incomplete models when specifying the components in the first place.

2.6.3. Analysis Phase

Due to the nature of the analysis artifacts, the analysis phase can be as vague or as elaborate as the analyst wishes. Further, since the details of the analysis are expressed as text, the analyst has the most flexibility of all the iteration phases in performing their tasks. This flexibility, however, can also pose some problems.

2.6.3.1. Overly Vague Use Cases

Use cases can be constructed such that they are overly vague. They may not contain any or all of the attributes needed by the system, they may not express any or all of the constraints on those attributes, and they may not describe all of the desired functionality of the system. Designers tend to like this sort of use case, but the way, in that design is free to implement the system as they need to; in other words, they are told what the problem is, not how to solve it. The author would counter, however, that if the designers do not have an accurate picture of what the system is supposed to do, any resulting design is inherently flawed.

2.6.3.2. Overloaded Use Cases

The converse of the above style of use cases is the "overloaded" use case. In such a use case, the analyst tries to express every single constraint, requirement, and business rule imaginable. Such overloaded use cases tend to include: user interface specifications, attribute data typing, interface requirements when sending attributes between systems, static attribute rules such as mandatory fields, subset, equality, and exclusion constraints, specification as to how the attribute is formatted on the user interface, etc. While this is indeed a noble effort, such use cases tend to attempt to express things that really aren't analysis concerns (again, what is the problem vs. how to solve the problem). Further, they tend to be inconsistent in how they express all of these numerous constraints and they lack the rigor in ensuring all constraints have indeed been documented and subsequently verified by the users.

2.6.3.3. Dangers of Implementation (or Design) Centric Analysis

By now, the dangers of the implementation or design centric approach should be obvious. Without using any rigor in expressing the underlying data structures and constraints, the application may be doomed to failure due to bad data that is otherwise acceptable to the system and the constraints that the said system specifies.

2.6.4. Lack of Formal Techniques

All of the criticisms previously elaborated can be overcome by the addition of a formal process as to how to specify the artifacts. To date, none (that the author is aware of) has been published from a purely OO standpoint. The UML does have a mechanism for expressing anything that its syntax cannot: the note. Specified as text, or in some syntax such as the Object Constraint Language (OCL), the note field can contain anything the modeler wishes to express. That is indeed good, but the question remains as to whether or not the analysts and designers realized those additional concerns where needed at all. The lack of a formal method to determine those rules do indeed exist means such errors are easy to introduce and propagate down to the design, construction, and verification phases.

In the next issue...

We'll finish up this discussion in the next issue of the JCM (April, 2001). In that installment, we'll introduce how Object-Role Modeling (ORM) can not only overcome the above weaknesses in the OO approach, but can actually improve the quality of the resulting artifacts.

 

Scot A. Becker is a software consultant and the founder of Orthogonal Software Corporation. He is also a certified ORM consultant and trainer, a certified Visio trainer, and former Editor of the Journal of Conceptual Modeling.  

Contact Information:

Scot A. Becker
Orthogonal Software Corporation
scot@orthogonalsoftware.com

www.orthogonalsoftware.com

© Copyright, 1998-2004 InConcept (Information Conceptual Modeling, Inc.) All Rights Reserved. Privacy Statement.
ISSN: 1533-3825