Data Warehouse Construction: Transforming STARNET into a Star Schema via Object Role Modeling
by Dick Barden and Necito dela Cruz

Introduction

This paper specifies the steps in a proposed Data Architecture for the analysis, design, and specification of a Data Warehouse solution. Two major techniques are used in this process. First is the STARNET Modeling Technique as described by Harjinder S. Gill in ‘The Official Guide to Data Warehousing’, published in 1996 by Que Corporation. The second is Object-Role Modeling established by Dr. Terry Halpin in ‘Conceptual Schema and Relational Database Design’, published in 1995 by Prentice-Hall Limited.

The Data Architecture

Data ArchitectureThe Data Architecture for this approach is depicted in the following figure. This approach is a fundamental activity for the enterprise to undertake because applications for the data warehouse cannot be developed properly until a somewhat stable understanding of what the enterprise needs to know is understood. This approach utilizes the latest modeling methodologies for creating a Data Warehouse design specification.

Briefly, the business users define subject areas and the set of query footprints. These are consolidated into a STARNET for each subject area. The STARNET diagrams can be thought of as a high level of abstraction for the data model needed further on in this process. The STARNET gives the business user a model of the data warehouse that is simple to understand, navigate, and create ‘what if’ queries against.

After the STARNET and query footprints have been verified with the business user, they are enriched semantically and transformed into an Object-Role Model (ORM). The ORM is used to create a semantic definition of the high level STARNET. At this time the data sources needed to populate the data warehouse are also modeled and integrated into a common data source model. The common data source model is mapped to the data warehouse model to provide the foundation for specifying the loading of the data warehouse. This is reviewed by the data warehouse administrators and a core set of business users.

When the complete definition has been reviewed, it is mapped into a logical and physical model for the data warehouse and data sources. This is reviewed by the data warehouse implementation team.

This forms the core set of documentation and scripting files needed to build the data warehouse, provide a specification for the data warehouse loading programs, and provide a foundation for specifying the user interface to the data warehouse.

STARNET Analysis

STARNETIn the book by Harjinder Gill, there is a sample STARNET on page 124, figure 5.14. The STARNET in the figure below is based on the example subject area in his book. The STARNET below is a consolidation of several different query footprints for the Product Sales subject area. A number of analysis techniques can be applied against the STARNET to explore the subject area, such as aggregation, drill-down, and summary types of queries.

In this way the STARNET can serve to illustrate the business users questions before the process of building the logical and physical database is started.

 

Mapping STARNET into the Object-Role Model

Object Role Modeling (ORM) is used to create an conceptual schema where the conceptual schema specifies all the permitted states and transitions of the conceptual database. In order to visualize the objects of the conceptual schema, the model is flattened from the usual three dimensions, entity, attribute, and relationship to a two dimensional model of only objects and relationships. The conceptual model is also technology or implementation neutral - the resulting use of the model is open. The conceptual schema is acted upon using a mapping algorithm to create a logical model. This model is more commonly understood by the database design community as Entity-Relationship Model (ER). However, many of business rules are lost in the ER diagrams. They are not lost in the ORM conceptual model, they merely do not appear on ER models.

The STARNET has provided a two dimensional view of a multi-dimensional situation. If, this is verbalized we would translate the following dimension line as follows:

Dimension

Each sentence "fact" is expressed in plain language using a meaningful predicate and non-technical object names. These name can be mapped to technical database names latter. Each dimension can be expressed using this same approach. These "facts" express the fundamental step in ORM - "Object plays a role with another Object". Refer to the STARNET example and express each dimension in a like manner. The addition of the "Product Key" provides a reference concept to the object Product dimension.

So how do the dimensions relate to the Product Sale subject area? Simply, state this relationship between objects as:

The subject area "Product Sale" is a conceptual object in the model related to the dimensions. Again we add a key for the subject area. We now have a framework from the STARNET which can be used in ORM. Refer to the attachments for the ORM diagrams and the logical model presented in the appendix to see the actual conceptual model and logical models.

The STARNET provided the framework for the ORM conceptual model. In addition, we gained a semantic representation of the model. The STARNET model does not indicate the information in the subject area that are required based on the dimensions. For example, the cost and price of the product within any given combination of dimensions are not represented. These can be added to the model in ORM. These are additional "facts" about Product Sale.

All other facts are likewise stated in ORM. At this point we have a set of statements entered in ORM which can now be refined for business rules. Before going on, we should look at the seven steps of ORM and decide where we are:

  1. Transform familiar information examples into elementary facts, and apply quality checks.

  2. Draw the fact types, and apply a population check..

  3. Check for entity types that should be combined, and note any arithmetic derivations.

  4. Add uniqueness constrains and check the arity of fact types.

  5. Add mandatory role constraints, and check for logical derivations.

  6. Add value, set comparisons and subtyping constraints.

  7. Add other constraints and perform final checks.

1. Transform familiar information examples into elementary facts, and apply quality checks.

We have completed the first step and need to note the quality checks. At this stage make sure that each fact is a simple sentence that cannot be reduced without loss of information. For examples of the ORM drawings refer to the provided example.

2. Draw the fact types, and apply a population check.

Refer to the ORM drawing provided to see how the facts are represented as drawings.

2. (continued) Population Check

Remember that Product is identified by a Product Key and Product Line is by name.

Note the pattern of the examples. Each Product Key is unique while the Product Line Names repeat (vegetables). This is a proper example set since it shows the true relationship between Product and Product Line Name. If the example set did not include duplicates for Product Line Name we would conclude that each Product is unique and each Product Line is unique within this relationship. This would be wrong. Note: We do not draw conclusions about the relationship without supporting examples that are approved by the review groups. As a reward for doing this step properly InfoModeler will automatically derive the proper uniqueness constraint for us.

3. Check for entity types that should be combined, and note any arithmetic derivations.

Find any facts the are the result of a math operation. Change these objects to derived and state the rules of derivation. In all cases, this the an operation using other fact/s within the model.

4. Add uniqueness constrains and check the arity of fact types.

Returning the examples for a fact. We notes that providing a representative set of examples provided a uniqueness constraint over one of the roles. This is only one pattern. There a several possible patterns for a simple fact. In our model only, a simple single role constraint is required.

5. Add mandatory role constraints, and check for logical derivations.

Apply a mandatory constraint on roles that "must" be played by all instances of an object.

6. Add value, set comparisons and subtyping constraints.

Apply other constraints to the model, such as: equality, subset, exclusionary, disjunctive, frequency, value, range, etc.

7. Perform final checks.

Check the model for consistency and quality.

Hopefully you can see how useful the conceptual model is for non-technical project members.

Also, the ease of moving from the STARNET framework to an ORM framework, which is then expanded to complete the model, is a straight forward process that can be accomplished quickly.

Modeling Sources of Data

The data sources for the data warehouse are modeled using the seven steps in the previous section. Here the expertise of people in the operational world are needed to be sure that we are getting the correct information for the data warehouse and the meta data for that information. If there are several sources of information, as is often the case, then each will be modeled and then integrated. The integration finds the points of commonality with each source and creates a plan to map the data sources into a integrated data source.

Mapping Integrated Data Sources to the ORM

The integrated data sources are then mapped into the data warehouse model. There will be a number of rules created for mapping the data into the star schema structure. These rules are documented and become part of the specification for creating the load programs for the data warehouse.

Creating the Logical and Physical Models

Once the conceptual model has been verified, the logical is derived. We are now closer to the ER modeling approach. We can now take the resulting tables and map technical names to our conceptual names. For Example:

Tables and indexes, and keys can all be aimed to suit the mapping to a physical model. Finally the logical model is used to create a data definition language script of the proposed database. This includes, CREATE DATABASE, CREATE TABLE, CREATE PRIMARY KEY, CREATE UNIQUE INDEX, CREATE CONSTRAINT, CREATE FOREIGN KEY and CREATE TRIGGER.

Conclusion

This is very brief example. Please read the diagrams and reports provided by InfoModeler following this description. It provides the various view starting with conceptual model, then logical model and finally the physical model. Several reports are provided to give you an idea on how much information is stored.

Dick Barden is a Senior Partner and Principal Consultant for InConcept. He has over 15 years of ORM/NIAM experience and is  a certified ORM consultant and trainer and a certified Visio trainer.

Contact Information:

Dick Barden
Vice President and Co-Founder
InConcept
8171 Hidden Bay Trail N
Lake Elmo, MN 55042
dickb@inconcept.com
(651) 777-8484
fax: (651) 777-9634
http://www.inconcept.com

Necito dela Cruz is a Senior Partner and Principal Consultant for InConcept. He has over 20 years of ORM/NIAM experience and is  an ORM expert and trainer and a certified Visio trainer.

Contact Information:

Necito dela Cruz
Vice President and Co-Founder
InConcept
8171 Hidden Bay Trail N
Lake Elmo, MN 55042
nc2@inconcept.com
(651) 777-8484
fax: (651) 777-9634
http://www.inconcept.com