June 2001        Issue: 20

Journal of Conceptual Modeling
www.inconcept.com/jcm

Implementing a Generic Attribute Domain
Validation Model  (Part 1)

by
Adrian Miley

Introduction

In almost every database application ever designed and built there has been a requirement to hold and maintain Attribute Value Constraints (a.k.a. Domain Values, Enumerated Values, Reference Codes etc.) that define the valid values that are allowed for a particular Attribute. 

In its simplest form (as generated by many CASE packages such as Oracle Designer), this is simply a lookup table with the basic following structure:

Table 1: Simple Lookup Table

Attribute Data Type Null Definition
#Domain Name String No Collective name for each distinct set of Domain Values being validated.
#Domain Value String No Unique Value within each Domain being validated.
Meaning String No A free text description of what the Value means.

However, in all but the most trivial cases, this is a simplistic implementation and has the following limitations:

The purpose of this article is to discuss the various functional enhancements that can be made to the basic model to overcome the above limitations and develop a generic data model to support those requirements.

Benefits of a Generic Model

Once in place and populated as part of the development process the information recorded in a Domain Validation database has numerous uses, some of which are:

Defining the Extended Model 

Domain validation

There are two types of Domain validation that may be required in an application:

Checking that an enumerated or code value assigned to a Domain is an allowed value for that Domain. This is the functionality provided by the Lookup table described above.

Secondly we have Data Type Constraints checking that the value being assigned to an Attribute is of the correct datatype (e.g. "Employee Surname" is String, "Date of Birth" is Date, "Number of Children" is Integer), in the correct format (e.g. "Employee Surname" is Alphabetic or "Address Postcode" is Uppercase, "Product Code" format is "999-9-XXX") or is within a valid range of values (e.g. "Number of Children" between 0 and 10).

This type of validation is normally performed by creating a User Defined Datatype (UDT) which is perfectly acceptable in a "closed" database application where the developed programs are the only method of manipulating the data.

However in large scale systems where many different applications may be sharing sub-sets of the data and different programming languages are used to perform different functions (e.g. Java for the GUI, C for large volume back-end feeds, stored procedures for replication, an ETL package for Data Warehouse loading etc.) the use of a single UDT may not be compatible across the entire user community.

Having this information in a central user maintainable database1, and using the information appropriately (a good idea with any information system!) resolves many of the incompatibility problems.

In addition, even if we restrict the model to only performing Domain Value Validation we still need a sub-set of the datatype constraints in order to ensure that the codes we are maintaining conform to the characteristics of the Domain they will be assigned to.

Satisfying this requirement is done by adding a new Domain entity as a parent of the Domain Value entity, as follows:

Figure 1: Domain and Domain Value

At this point, it was tempting to introduce sub-types on the Domain entity based on the underlying datatype of the Domain (e.g. "String Domain", "Numeric Domain" etc.) to describe the characteristics of the Domain which would have been the rigorous and correct way of implementing a Domain. 

However because we are discussing Domain Values validation and are discussing usage rather than maintenance of Domain Values, I decided to leave the issue of Domain sub-typing until later.

The essential attributes of the Domain entity are:

Table 2: Essential attributes of the Domain entity

Attribute Data Type Null Definition
#Domain Name String No The Name of the Domain for which we are recording Domain Values.

The other attributes of the new Domain Entity are purely dependent on the characteristics that will be validated when maintaining the Domain Values. Consequently, because the current scope of the discussion is focused on providing validation for a known Domain Value against a Domain rather than maintaining the Domain Values themselves, identifying the additional attributes has been left until we get to the issue of data maintenance and defining the maintenance rules.

Multiple Attributes sharing a Domain

In an ideal world (usually a virgin development) it is good practice to ensure that the same attribute name is used consistently throughout the model as exemplified by the following model fragment:

Figure 2: Consistent Attribute Naming

In our example, both Payment and Payment Instruction would have an attribute named Payment Method and, therefore, it is very obvious that the Payment Method Domain is used to validate the contents of both Payment Method Attributes2.

However in many large scale applications, especially ones that are initially created by integrating two or more existing systems, consistent attribute naming across the entire application is very unlikely to have happened.

For example, in our model fragment, Payment may already be defined in a Purchase Ledger application and Payment Instruction (a mandate to take payments as and when required) may already be defined in a Customer Management application. Combining the two parts together and we have the situation shown below where two attribute with different names are validated against a single Domain.

Figure 3: Integrated Attributes

Consequently we now have the case where we need to separate the Attributes from the Domains they are validated against. To resolve this requirement we need to add an Attribute entity, relating many Attributes to a single Domain, into our model.

Figure 4: Adding Attribute

The essential attributes of the Attribute entity are:

Table 3: Essential attributes of the Attribute entity

Attribute Data Type Null Definition
#Entity Name String No The Name of the Entity / Class / Program containing the Attribute being validated.
#Attribute Name String No The name of the Attribute / Variable being validated within the Entity.
Domain Name String No The Domain that this Attribute is referenced against.

Synonyms

Nowadays with stateless, message based applications and geographically distributed development teams becoming more popular it is possible to develop any application without knowing the name of the persistent Attribute that a value will be stored in. Instead all the application programmer has access to is the Screen Name / Field Name the data will be maintained through or a Message Name / Field Name used to pass the data across the front-end / back-end interface.

Consequently because we have an additional level of data hiding it is necessary to be able to refer to an Attribute by many different names.

To implement this we simply require a Synonym entity to hold the alternative names that an Attribute may be known by, as follows:

Figure 5: Adding Synonym

The essential attributes of the Synonym entity are:

Table 4: Essential attributes of the Synonym entity

Attribute Data Type Null Definition
#Entity Name String No The Name of the Entity / Class / Program containing the Attribute being validated.
#Attribute Name String No The name of the Attribute / Variable being validated within the Entity.
#Alternative Entity Name String No Alternative Entity / Class name by which the reference Entity many be known.
#Alternative Entity Name String No Alternative Attribute / Variable name by which the Attribute within the Entity being referenced is known.

No other attributes are required because, as it is an alternative name for an Attribute, the Synonym should inherit all it's characteristics and allowed values from the Attribute it is linked to.

Sub-Domains or Sub-Sets (Collections)

In practice, particularly in the area of super-types and sub-types, it is common for an Attribute to only allow certain Domain Values depending on the sub-type but each sub-type may allow Domain Values common to other sub-types.

For example if we recorded Payment Method (with Domain Values of {Direct Debit, Standing Order, Cash} against a Payment and Payment had two sub-types of Manual Payment and Electronic Payment then it may be required to restrict Electronic Payment : Payment Method to {Direct Debit, Standing Order} and Manual Payment : Payment Method to {Cash}.

The question then is how to model groups of Domain Values using our basic Domain -> Attribute structure.

The two choices we have are:

If we decided to implement a Domain hierarchy (and we maintain our restriction that a Domain Value is unique within a Domain then we need to create non-overlapping sub-domains with a potential sub-domain being required for each discrete intersection in the domain.

For two attributes the VENN diagram of the possible non-overlapping sub-domains is:

Figure 6: VENN diagram of the possible non-overlapping sub-domains for two attributes

When three Attributes share the Domain the diagram becomes:

Figure 7: VENN diagram of the possible non-overlapping sub-domains for three attributes

Further progressive analysis of three, four, five … etc. attributes reveals that there is actually a exponential progression where the number of distinct Domains (D) required to exhaustively model all the sub-domain combinations for N distinct attributes sharing the Domain can be expressed as:

which equates to 

D = 2N - 1

In any complex data model we would almost certainly also have a super-type / sub-type hierarchy on at least one of our entities. An extreme (real-life) case of such an hierarchy is shown in Figure 8: 

Figure 8: Complex super-type/sub-type hierarchy

To fully exhaust all possible Sub-Domains for the above hierarchy we would potentially have to create a possible 255 Sub-Domains ordered into a six tier hierarchy. Admittedly this is an extreme example of a Sub-Type hierarchy but three and four tier hierarchies are not uncommon. 

Consequently it is reasonable to say that implementing Sub-Domains is impractical and should not be included in the model. Instead the Domain structure should be a single level flat structure where each Domain contains all the values allowed for that Domain and there is no relationship between one Domain and another.

Instead we allow for the definition of discrete collections of Domain Values against an Attribute by adding an Attribute Value entity as follows:

Figure 9: Adding Attribute Value

The main restriction of the Attribute à Attribute relationship, indicated by the "sub-set" qualifier, is that an Attribute Value must be one of the Domain Values contained in the Domain that the Attribute is linked to.

The essential attributes of the Attribute Value entity are:

Table 5: Essential attributes of the Attribute Value entity

Attribute Data Type Null Definition
#Entity Name String No The Name of the Entity / Class / Program containing the Attribute being validated.
#Attribute Name String No The name of the Attribute / Variable being validated within the Entity.
#Domain value String No Unique Value within each Domain being validated.

No other attributes, such as Meaning, are required because these are inherited from the Domain Value attributes of the equivalent Domain Value.

Super-Types and Sub-Types

Single Inheritance

The structure we have defined so far does not however fully ensure the consistency of the Sub-Type hierarchy.

By definition a Sub-Type can only further restrict an inherited Attribute and never relax the constraints therefore it is a requirement of an inherited Attribute that it does not allow more Values than it's immediate Super-Type equivalent allows. For example given the following entity hierarchy:

Figure 10: Single inheritance example

If Entity 1 had a Status attribute then all of its Sub-Types would inherit that attribute and all the restrictions placed on that attribute so, for example, if Entity1:Status allowed {A,B,C.D} then Entity2.Status could not allow {E}, which is an illegal relaxation of the super-type constraint, but it could be further restricted to only allow {A,B,C} and exclude {D}.

The same applies to Entity4, the Status attribute would only be allowed to further restrict the values as defined for Entity2 and by inference Entity1, but as the hierarchy is recursive we only have to check the immediate Super-Type for allowed values.

Consequently in order to correctly validate an inherited Attribute we need to relate Attributes in a super-type / sub-type hierarchy.

We could place a recursive relationship on the Attribute entity to link many sub-type Attributes to the super-type Attribute they are based on. However, in the spirit of normalisation, the Attribute Name part of the Attribute primary key should be the consistent for all Attributes inherited within the type hierarchy. Consequently only the Super-Type Entity is required to identify the super-type Attribute and, in a single inheritance hierarchy where all occurrences of a particular sub-type Entity always have the same super-type Entity, the hierarchy can be keyed on the sub-type Entity Name alone.

This leads us to the introduction of an "Entity" entity into the model with a relationship to many Attributes link to one Entity and a recursive relationship on Entity linking many sub-type Entities to a single super-type Entity as shown below.

Figure 11: Adding Entity

The essential attributes of the "Entity" entity are:

Table 6: Essential attributes of the "Entity" entity

Attribute Data Type Null Definition
#Entity Name String No The Name of the Entity / Class / Program containing the Attribute being validated.
Supertype Entity Name String No Reference to the Entity that this Entity is a sub-type of.

No other attributes are required.

Multiple Inheritance

Having established that we need to add an "Entity" entity with a recursive relationship to link Super-Types to their Sub-Types to deal with single inheritance, we now need to consider the possible requirements for Attributes with multiple inheritance:

Figure 12: Multiple inheritance example

In this case because a Sub-Type must be a instance of the Super-Type and Entity5 has two Super-Types then the allowed values for the Status attribute on Entity5 would be the intersection of the Status attributes for Entity2 and Entity3.

That is, if Entity2 allowed {A,B,C} and Entity3 allowed {A,C,D} then Entitry5 could at most allow {A,C} in order to be a valid instance of both Super-Types, {B} is not allowed because although Entity5 would be a valid Entity2 it would not be a valid Entity3.

In addition this is the simple case of a single hierarchy with all Sub-Types sharing the same ultimate Super-Type. In the very rare case of two Sub-Types inheriting from two non-related Super-Types, as can happen with Object Orientated Design when designers get carried away with Abstract Classes, the problem becomes even more complex as we need to ensure that not only is the Value allowed for each of the Super-Types but also that the Value has the same meaning within the Domain of the Super-Types.

Figure 13: Adding Entity Subtypes

At this point it is time to take a reality check on whether we actually want to support multiple inheritance for Attribute Value validation. Given that this is an implementation model rather than a conceptual design model because I don't think we do! There are two reasons for this conclusion:

Consequently, because we would almost certainly never have to implement Attribute Value checking for attribute with multiple inheritance I decided to not include it in the data model and stayed with just having a recursive 1:M relationship on the Entity entity.

Conclusion 

So far we have developed a robust metadata model for handling point in time validation of attributes in a data model and a number of constraints that need to be applied to the data held in that model to ensure internal consistency.

Namely:

Figure 14: The meta-model so far

With the following basic entity definitions:

Tables 7 - 12: Basic entity definitions

Name Definition
Domain Defines a distinct set of Domain Values that may be used to validate the contents of one or more Attributes.
Attribute Datatype Null Definition
#Domain Name String No The Name of the Domain for which we are recording Domain Values.
Relationships Definition
Domain Value Domain may contain one or more Domain Values
Attribute Domain may be define one or more Attributes

 

Name Definition
Domain Value Definitions of all the distinct Values allowed for a Domain
Attribute Datatype Null Definition
#Domain Name String No Collective name for each distinct set of Domain Values being validated.
#Domain Value String No Unique Value within each Domain being validated.
Meaning String No A free text description
Relationships Definition
Domain Domain Value must be in one Domain

 

Name Definition
Attribute Definitions of all Attributes being validated against a particular Domain. An Attribute may be any Database Column or Class / Program Variable that exists within the application.
Attribute Datatype Null Definition
#Entity Name String No The Name of the Entity / Class / Program containing the Attribute being validated.
#Attribute Name String No The name of the Attribute / Variable being validated within the Entity.
Domain Name String No Reference to the Domain that this Attribute is validated against.
Relationships Definition
Domain Attribute must be defined by one and only one Domain
Entity Attribute must be part of one and only one Entity
Attribute Value Attribute may contain one or more Attribute Values
Synonym Attribute may be known by one or more Synonyms

 

Name Definition
Synonym Alternative names by which a particular Attribute may be known by the application.
Attribute Datatype Null Definition
#Entity Name String No The Name of the Entity / Class / Program containing the Attribute being validated.
#Attribute Name String No The name of the Attribute / Variable being validated within the Entity.
#Alternative Entity Name String No Alternative Entity / Class name by which the reference Entity many be known.
#Alternative Entity Name String No Alternative Attribute / Variable name by which the Attribute within the Entity being referenced is known.
Relationships Definition
Attribute Synonym is an alternative name for one and only one Attribute

 

Name Definition
Attribute Value Particular Domain Values that the Attribute will be validated against if the Attribute is not validated against the full set of Domain Values. This is to allow non-exclusive collections of Domain Values to be defined for an Attribute.
Attribute Datatype Null Definition
#Entity Name String No The Name of the Entity / Class / Program containing the Attribute being validated.
#Attribute Name String No The name of the Attribute / Variable being validated within the Entity.
#Domain Value String No Unique Value within each Domain being validated.
Relationships Definition
Attribute Attribute Value must be part of one and only one Attribute
Domain Value 
(Derived)
Attribute Value must be one of the set of Domain Values allowed for the Domain that defines the parent Attribute.

 

Name Definition
Entity Defines Entities and the hierarchy of Entities within the application. 

An Entity is any distinct named component of the Logical Data Model. 

Primarily used to define the inheritance hierarchy of Entities within the Logical Data Model.

Attribute Datatype Null Definition
#Entity Name String No The Name of the Entity / Class / Program containing the Attribute being validated.
Supertype Entity Name String Yes Reference to the Entity that this Entity is a sub-type of.
Relationships Definition
Attribute Entity may contain many Attributes
Entity 
(Recursive)
Entity may be a sub-type of one and only one Entity.

Entity may be a super-type for many other Entities.Entity must not itself be:

  • A super-type of the super-type Entity being referenced (a descendant cannot be an ancestor of its own ancestor)
  • A sub-type of the sub-type Entity referencing this Entity (ancestor cannot be a descendant of its own descendants).

However the model still has a number of limitations that need to be addressed and in subsequent articles we will expand this model to deal with changing Domain Values over time, validating Transition States and Data Maintenance.

References

In all honesty everything documented in this article is derived from basic principles over a number of different database projects (I produced my first Reference Codes table in Oracle5) or because I simply wanted to know which rules did and did not work.

However the following publications in particular prompted the production of this article and provided me with some background thinking about some of the aspects / requirements of this type of data model:

  1. Date, C.J., What Not How: The Business Rules Approach to Application Development, 2000, Addison-Wesley

  2. Fowler, Martin, Analysis Patterns: Reusable Object Models, 1997, Addison Wesley

  3. Hallock, Patrick, "Dealing with De-Normalization of Lookup Tables", 2000, The Journal of Conceptual Modeling, Issue 13

Footnotes

  1. A Data Model repository, such as Oracle Designer, also serves exactly this purpose. However it is not always feasible or desirable to provide end-user access to the entire Data Model just for the purpose of maintaining Domain Values.

  2. This is desirable behaviour with programming languages such as Java that feature introspection in Classes because if the attribute can discover its own name then it can also dynamically identify the Domain it needs to validate itself against at runtime.

  3. VisioModeler applied the dotted line as a default to denote the secondary Super-Type.

Adrian Miley is a freelance data architect / database designer with over 15 years experience as an I.T. practitioner and, prior to moving into I.T., a further ten years experience as a Management Accountant. Adrian is a member of the British Computer Society and formerly a member of the Institute of Cost and Management Accountants (now Chartered Institute of Management Accountants and one of the top three Accountancy institutes in the UK). Adrian can be contacted by e-mail at adrian.miley@lineone.net.

© Copyright, 1998-2004 InConcept (Information Conceptual Modeling, Inc.) All Rights Reserved. Privacy Statement.
ISSN: 1533-3825