June
2001 Issue: 20
Journal of Conceptual Modeling
www.inconcept.com/jcm
Implementing
a Generic Attribute Domain
Validation Model (Part 1)
by
Adrian Miley
Introduction
In almost every database application ever designed and built there has been a requirement to hold and maintain Attribute Value Constraints (a.k.a. Domain Values, Enumerated Values, Reference Codes etc.) that define the valid values that are allowed for a particular Attribute.
In its simplest form (as generated by many CASE packages such as Oracle Designer), this is simply a lookup table with the basic following structure:
Table 1: Simple Lookup Table
| Attribute | Data Type | Null | Definition |
| #Domain Name | String | No | Collective name for each distinct set of Domain Values being validated. |
| #Domain Value | String | No | Unique Value within each Domain being validated. |
| Meaning | String | No | A free text description of what the Value means. |
However, in all but the most trivial cases, this is a simplistic implementation and has the following limitations:
It only supports validation of defined values for a Domain but does not support validation of generic characteristics of the Domain such as "Uppercase String" or "Integer between 1 and 5".
It assumes that any existing entry in the Domain is always, and has always been, valid whereas most databases store and maintain data over time and different Domain Values may be valid in the Domain at different points in time.
Does not handle Values that are common across Attributes in multiple Entities.
Does not deal with sub-sets of Attribute Values being valid for Sub-Types of an Entity and the dependencies of the Entity sub-types.
Does not handle Attribute Synonyms i.e. where the Attribute has a different name in different application data models.
The purpose of this article is to discuss the various functional enhancements that can be made to the basic model to overcome the above limitations and develop a generic data model to support those requirements.
Benefits of a Generic Model
Once in place and populated as part of the development process the information recorded in a Domain Validation database has numerous uses, some of which are:
Provides a generic "List of Values" functionality for any column populated via an interactive User Interface.
Automatic generation of User
Defined Type (UDT) Classes (for those that really like Objects).
Once the metadata for a UDT is defined in a consistent format it is very
easy to create a code-generator to interpret the data and generate the
necessary structures and validation code required for the language.
In my experience about 50% of the information stored in a database
application is stored using some sort of Domain Validation so being able
to quickly generate all these Base Classes in whatever language we
require (e.g. Java front-end, C++ back-end, C bulk loading programs) is
a real benefit.
If data validation is being implemented in the code rather than the database then it helps the Testing phase of software development if (Pre-Event) Check Triggers, using the same central data as the software, can be generated to check that the implementation matches the specification. This reduces the time required for testing the software.
Test Cases can be generated very quickly as it is just a case of finding an Attribute Value that is allowed and another one that is not allowed for each Attribute.
As less procedural code needs to
be created this should reduce the time required to implement the
Application Data Model.
All UDT classes / libraries / components / "whatever we're
using" are effectively skeleton definitions with constructor /
mutation functions implemented as calls to a generic check_value()
function (ideally a database function).
This way we almost guarantee a consistent implementation across the
entire application.
Provides flexibility in iterative development because new Domain Values can simply be inserted into the database i.e. from the analysis it is sufficient to know that the Attribute will be validated without caring what the actual values are.
Defining the Extended Model
Domain validation
There are two types of Domain validation that may be required in an application:
Checking that an enumerated or code value assigned to a Domain is an allowed value for that Domain. This is the functionality provided by the Lookup table described above.
Secondly we have Data Type Constraints checking that the value being assigned to an Attribute is of the correct datatype (e.g. "Employee Surname" is String, "Date of Birth" is Date, "Number of Children" is Integer), in the correct format (e.g. "Employee Surname" is Alphabetic or "Address Postcode" is Uppercase, "Product Code" format is "999-9-XXX") or is within a valid range of values (e.g. "Number of Children" between 0 and 10).
This type of validation is normally performed by creating a User Defined Datatype (UDT) which is perfectly acceptable in a "closed" database application where the developed programs are the only method of manipulating the data.
However in large scale systems where many different applications may be sharing sub-sets of the data and different programming languages are used to perform different functions (e.g. Java for the GUI, C for large volume back-end feeds, stored procedures for replication, an ETL package for Data Warehouse loading etc.) the use of a single UDT may not be compatible across the entire user community.
Having this information in a central user maintainable database1, and using the information appropriately (a good idea with any information system!) resolves many of the incompatibility problems.
In addition, even if we restrict the model to only performing Domain Value Validation we still need a sub-set of the datatype constraints in order to ensure that the codes we are maintaining conform to the characteristics of the Domain they will be assigned to.
Satisfying this requirement is done by adding a new Domain entity as a parent of the Domain Value entity, as follows:

Figure 1: Domain and Domain Value
At this point, it was tempting to introduce sub-types on the Domain entity based on the underlying datatype of the Domain (e.g. "String Domain", "Numeric Domain" etc.) to describe the characteristics of the Domain which would have been the rigorous and correct way of implementing a Domain.
However because we are discussing Domain Values validation and are discussing usage rather than maintenance of Domain Values, I decided to leave the issue of Domain sub-typing until later.
The essential attributes of the Domain entity are:
Table 2: Essential attributes of the Domain entity
| Attribute | Data Type | Null | Definition |
| #Domain Name | String | No | The Name of the Domain for which we are recording Domain Values. |
The other attributes of the new Domain Entity are purely dependent on the characteristics that will be validated when maintaining the Domain Values. Consequently, because the current scope of the discussion is focused on providing validation for a known Domain Value against a Domain rather than maintaining the Domain Values themselves, identifying the additional attributes has been left until we get to the issue of data maintenance and defining the maintenance rules.
Multiple Attributes sharing a Domain
In an ideal world (usually a virgin development) it is good practice to ensure that the same attribute name is used consistently throughout the model as exemplified by the following model fragment:

Figure 2: Consistent Attribute Naming
In our example, both Payment and Payment Instruction would have an attribute named Payment Method and, therefore, it is very obvious that the Payment Method Domain is used to validate the contents of both Payment Method Attributes2.
However in many large scale applications, especially ones that are initially created by integrating two or more existing systems, consistent attribute naming across the entire application is very unlikely to have happened.
For example, in our model fragment, Payment may already be defined in a Purchase Ledger application and Payment Instruction (a mandate to take payments as and when required) may already be defined in a Customer Management application. Combining the two parts together and we have the situation shown below where two attribute with different names are validated against a single Domain.

Figure 3: Integrated Attributes
Consequently we now have the case where we need to separate the Attributes from the Domains they are validated against. To resolve this requirement we need to add an Attribute entity, relating many Attributes to a single Domain, into our model.

Figure 4: Adding Attribute
The essential attributes of the Attribute entity are:
Table 3: Essential attributes of the Attribute entity
| Attribute | Data Type | Null | Definition |
| #Entity Name | String | No | The Name of the Entity / Class / Program containing the Attribute being validated. |
| #Attribute Name | String | No | The name of the Attribute / Variable being validated within the Entity. |
| Domain Name | String | No | The Domain that this Attribute is referenced against. |
Synonyms
Nowadays with stateless, message based applications and geographically distributed development teams becoming more popular it is possible to develop any application without knowing the name of the persistent Attribute that a value will be stored in. Instead all the application programmer has access to is the Screen Name / Field Name the data will be maintained through or a Message Name / Field Name used to pass the data across the front-end / back-end interface.
Consequently because we have an additional level of data hiding it is necessary to be able to refer to an Attribute by many different names.
To implement this we simply require a Synonym entity to hold the alternative names that an Attribute may be known by, as follows:

Figure 5: Adding Synonym
The essential attributes of the Synonym entity are:
Table 4: Essential attributes of the Synonym entity
| Attribute | Data Type | Null | Definition |
| #Entity Name | String | No | The Name of the Entity / Class / Program containing the Attribute being validated. |
| #Attribute Name | String | No | The name of the Attribute / Variable being validated within the Entity. |
| #Alternative Entity Name | String | No | Alternative Entity / Class name by which the reference Entity many be known. |
| #Alternative Entity Name | String | No | Alternative Attribute / Variable name by which the Attribute within the Entity being referenced is known. |
No other attributes are required because, as it is an alternative name for an Attribute, the Synonym should inherit all it's characteristics and allowed values from the Attribute it is linked to.
Sub-Domains or Sub-Sets (Collections)
In practice, particularly in the area of super-types and sub-types, it is common for an Attribute to only allow certain Domain Values depending on the sub-type but each sub-type may allow Domain Values common to other sub-types.
For example if we recorded Payment Method (with Domain Values of {Direct Debit, Standing Order, Cash} against a Payment and Payment had two sub-types of Manual Payment and Electronic Payment then it may be required to restrict Electronic Payment : Payment Method to {Direct Debit, Standing Order} and Manual Payment : Payment Method to {Cash}.
The question then is how to model groups of Domain Values using our basic Domain -> Attribute structure.
The two choices we have are:
Model an hierarchy of Domains at the Domain level
Model groups of Values at the Attribute level
If we decided to implement a Domain hierarchy (and we maintain our restriction that a Domain Value is unique within a Domain then we need to create non-overlapping sub-domains with a potential sub-domain being required for each discrete intersection in the domain.
For two attributes the VENN diagram of the possible non-overlapping sub-domains is:

Figure 6: VENN diagram of the possible non-overlapping sub-domains for two attributes
When three Attributes share the Domain the diagram becomes:

Figure 7: VENN diagram of the possible non-overlapping sub-domains for three attributes
Further progressive analysis of three, four, five … etc. attributes reveals that there is actually a exponential progression where the number of distinct Domains (D) required to exhaustively model all the sub-domain combinations for N distinct attributes sharing the Domain can be expressed as:
![]()
which equates to
D = 2N - 1
In any complex data model we would almost certainly also have a super-type / sub-type hierarchy on at least one of our entities. An extreme (real-life) case of such an hierarchy is shown in Figure 8:

Figure 8: Complex super-type/sub-type hierarchy
To fully exhaust all possible Sub-Domains for the above hierarchy we would potentially have to create a possible 255 Sub-Domains ordered into a six tier hierarchy. Admittedly this is an extreme example of a Sub-Type hierarchy but three and four tier hierarchies are not uncommon.
Consequently it is reasonable to say that implementing Sub-Domains is impractical and should not be included in the model. Instead the Domain structure should be a single level flat structure where each Domain contains all the values allowed for that Domain and there is no relationship between one Domain and another.
Instead we allow for the definition of discrete collections of Domain Values against an Attribute by adding an Attribute Value entity as follows:

Figure 9: Adding Attribute Value
The main restriction of the Attribute à Attribute relationship, indicated by the "sub-set" qualifier, is that an Attribute Value must be one of the Domain Values contained in the Domain that the Attribute is linked to.
The essential attributes of the Attribute Value entity are:
Table 5: Essential attributes of the Attribute Value entity
| Attribute | Data Type | Null | Definition |
| #Entity Name | String | No | The Name of the Entity / Class / Program containing the Attribute being validated. |
| #Attribute Name | String | No | The name of the Attribute / Variable being validated within the Entity. |
| #Domain value | String | No | Unique Value within each Domain being validated. |
No other attributes, such as Meaning, are required because these are inherited from the Domain Value attributes of the equivalent Domain Value.
Super-Types and Sub-Types
Single Inheritance
The structure we have defined so far does not however fully ensure the consistency of the Sub-Type hierarchy.
By definition a Sub-Type can only further restrict an inherited Attribute and never relax the constraints therefore it is a requirement of an inherited Attribute that it does not allow more Values than it's immediate Super-Type equivalent allows. For example given the following entity hierarchy:

Figure 10: Single inheritance example
If Entity 1 had a Status attribute then all of its Sub-Types would inherit that attribute and all the restrictions placed on that attribute so, for example, if Entity1:Status allowed {A,B,C.D} then Entity2.Status could not allow {E}, which is an illegal relaxation of the super-type constraint, but it could be further restricted to only allow {A,B,C} and exclude {D}.
The same applies to Entity4, the Status attribute would only be allowed to further restrict the values as defined for Entity2 and by inference Entity1, but as the hierarchy is recursive we only have to check the immediate Super-Type for allowed values.
Consequently in order to correctly validate an inherited Attribute we need to relate Attributes in a super-type / sub-type hierarchy.
We could place a recursive relationship on the Attribute entity to link many sub-type Attributes to the super-type Attribute they are based on. However, in the spirit of normalisation, the Attribute Name part of the Attribute primary key should be the consistent for all Attributes inherited within the type hierarchy. Consequently only the Super-Type Entity is required to identify the super-type Attribute and, in a single inheritance hierarchy where all occurrences of a particular sub-type Entity always have the same super-type Entity, the hierarchy can be keyed on the sub-type Entity Name alone.
This leads us to the introduction of an "Entity" entity into the model with a relationship to many Attributes link to one Entity and a recursive relationship on Entity linking many sub-type Entities to a single super-type Entity as shown below.

Figure 11: Adding Entity
The essential attributes of the "Entity" entity are:
Table 6: Essential attributes of the "Entity" entity
| Attribute | Data Type | Null | Definition |
| #Entity Name | String | No | The Name of the Entity / Class / Program containing the Attribute being validated. |
| Supertype Entity Name | String | No | Reference to the Entity that this Entity is a sub-type of. |
No other attributes are required.
Multiple Inheritance
Having established that we need to add an "Entity" entity with a recursive relationship to link Super-Types to their Sub-Types to deal with single inheritance, we now need to consider the possible requirements for Attributes with multiple inheritance:

Figure 12: Multiple inheritance example
In this case because a Sub-Type must be a instance of the Super-Type and Entity5 has two Super-Types then the allowed values for the Status attribute on Entity5 would be the intersection of the Status attributes for Entity2 and Entity3.
That is, if Entity2 allowed {A,B,C} and Entity3 allowed {A,C,D} then Entitry5 could at most allow {A,C} in order to be a valid instance of both Super-Types, {B} is not allowed because although Entity5 would be a valid Entity2 it would not be a valid Entity3.
In addition this is the simple case of a single hierarchy with all Sub-Types sharing the same ultimate Super-Type. In the very rare case of two Sub-Types inheriting from two non-related Super-Types, as can happen with Object Orientated Design when designers get carried away with Abstract Classes, the problem becomes even more complex as we need to ensure that not only is the Value allowed for each of the Super-Types but also that the Value has the same meaning within the Domain of the Super-Types.

Figure 13: Adding Entity Subtypes
At this point it is time to take a reality check on whether we actually want to support multiple inheritance for Attribute Value validation. Given that this is an implementation model rather than a conceptual design model because I don't think we do! There are two reasons for this conclusion:
Multiple inheritance of an Attribute is not that common in practice and is completely unknown in Physical Data Models as most DBMS's do not support it.
Most programming languages do not
implement multiple inheritance, for example Jave, C, Cobol and VB have
no support whatsoever for multiple inheritance.
Where a language, such as C++, does allow multiple inheritance this is
usually implemented by specifying which of the possible Super-Types a
particular attribute is inherited from essentially reducing the problem
to one of single inheritance.
This is the behaviour that is implied in the example diagram where the
Primary Super-Type is Entity2 and Entity3 is a secondary Super-Type
denoted by a dotted, rather than solid, connector3.
Consequently, because we would almost certainly never have to implement Attribute Value checking for attribute with multiple inheritance I decided to not include it in the data model and stayed with just having a recursive 1:M relationship on the Entity entity.
Conclusion
So far we have developed a robust metadata model for handling point in time validation of attributes in a data model and a number of constraints that need to be applied to the data held in that model to ensure internal consistency.
Namely:

Figure 14: The meta-model so far
With the following basic entity definitions:
Tables 7 - 12: Basic entity definitions
|
||||||||
|
||||||||
|
|
||||||||||||||||
|
||||||||||||||||
|
|
||||||||||||||||
|
||||||||||||||||
|
|
||||||||||||||||||||
|
||||||||||||||||||||
|
|
||||||||||||||||
|
||||||||||||||||
|
|
||||||||||||
|
||||||||||||
|
However the model still has a number of limitations that need to be addressed and in subsequent articles we will expand this model to deal with changing Domain Values over time, validating Transition States and Data Maintenance.
References
In all honesty everything documented in this article is derived from basic principles over a number of different database projects (I produced my first Reference Codes table in Oracle5) or because I simply wanted to know which rules did and did not work.
However the following publications in particular prompted the production of this article and provided me with some background thinking about some of the aspects / requirements of this type of data model:
Date, C.J., What Not How: The Business Rules Approach to Application Development, 2000, Addison-Wesley
Fowler, Martin, Analysis Patterns: Reusable Object Models, 1997, Addison Wesley
Hallock, Patrick, "Dealing with De-Normalization of Lookup Tables", 2000, The Journal of Conceptual Modeling, Issue 13
Footnotes
A Data Model repository, such as Oracle Designer, also serves exactly this purpose. However it is not always feasible or desirable to provide end-user access to the entire Data Model just for the purpose of maintaining Domain Values.
This is desirable behaviour with programming languages such as Java that feature introspection in Classes because if the attribute can discover its own name then it can also dynamically identify the Domain it needs to validate itself against at runtime.
VisioModeler applied the dotted line as a default to denote the secondary Super-Type.
![]()
Adrian Miley is a freelance data architect / database designer with over 15 years experience as an I.T. practitioner and, prior to moving into I.T., a further ten years experience as a Management Accountant. Adrian is a member of the British Computer Society and formerly a member of the Institute of Cost and Management Accountants (now Chartered Institute of Management Accountants and one of the top three Accountancy institutes in the UK). Adrian can be contacted by e-mail at adrian.miley@lineone.net.
![]()
© Copyright, 1998-2004 InConcept
(Information Conceptual Modeling, Inc.) All
Rights Reserved. Privacy Statement.
ISSN: 1533-3825