August 2001        Issue: 21

Journal of Conceptual Modeling
www.inconcept.com/jcm

Implementing a Generic Attribute
 Domain Validation Model (Part 2)

by Adrian Miley

As mentioned in part 1, a major limitation of the simple "Code : Description" version of the Domain Entity is that it only caters for values existing at a point in time, that is if a Code is valid then it is assumed to have always been and always will be valid and conversely once a code becomes invalid (and is removed from the Domain) then its absence logically means that it has never been valid.

Of course, this situation rarely matches the business requirements in that nearly all database applications records information over a period of time1 and have to cater for evolving business requirements during that time.

A scenario for this is:

Our Business initially accepts Payment by Cash or Cheque i.e. the Payment Method domain is {Cash, Cheque}, however from the first of next month, the Business will also start taking Credit Cards, that is the Payment Method domain becomes {Cash, Cheque, Credit Card}.

Or alternatively:

Our Business currently accepts Payment by Cash, Cheque or Credit Card but from the first of next month (for whatever reason) the Business will no longer accept Cash Payments.

Assuming that we want to avoid the problems of having to making Domain changes on the actual date that they are effective (and hope the changes are done before the first transaction using the new data is received by the application) the solution is to provide Effective Start and Effective End dates to identify when a Value is valid.

This discussion falls into three parts:

The answer to these business specific questions are the crux of implementing date ranging for Domain Values and Attribute Values.

What should be Date Ranged?

The obvious decision would be to date range everything but this adds a lot of complexity not only to the model but also to any functions that access and maintain the model.

For example, if we decided to use a Domain Date Range, defining the period that the Domain is valid for, and an Attribute Date Range, defining the period that the Attribute is validated against the Domain, then the following are restrictions on the Attribute Date Range:

In a Data Warehouse environment the question of date ranging is less critical because we are predominantly dealing with historic data and the benefit of dealing with history is that once it is established it invariably doesn't change.

However with OLTP databases, mainly supporting the business "from this point in time onwards", the requirements for date ranging and data maintenance are exclusively related to preparing for future events. This is more about supporting intentions rather than actual events and, because intentions may not actually be carried out, it is important to minimise the impact of a change and only implement functionality that is essential and avoid implementing something because "it might be useful someday".

This leads us to define an important boundary to our Universe of Discourse.

In OLTP applications only the current version of the application (software) is important and relevant because that it what defines the universe and what came before or comes after the current universe (other software versions) are irrelevant to the current definition of the universe.

Hence, if we are only recording things that are of relevant and practical benefit2, then the Domain Model, in turn, should only reflect what is relevant within the current reality.

All discussion about whether each entity in the Domain Model should be date ranged is made in this context.

Both Attribute Values and Domain Values require date ranging because this is part of the basic functionality as implied by the example Scenario i.e. to implement a change to the set of Domain Values in advance of the change becoming effective.

Entity does not require date ranging because we would effectively be saying that the existence of any occurrences of the Entity is only valid (allowable) between the two specified dates. This implies that the application should have no knowledge of the Entity, i.e. there are no facilities to insert, update, delete or retrieve the Entity, outside of those dates.

It could even be argued that, from the viewpoint of the application, that if instances of the Entity did exist but the application does not know how to process the data, then the Entity is corrupt data and if accessed this "unknown" data would (should) cause a software error.

Consequently the conclusion should be that within the application an Entity either exists or it doesn't, and if the Entity is not supported by the application then, in order to avoid software problems, all evidence of the Entity should be removed from the database3.

Attributes have two possible date ranges that can be defined:

  1. The period that the Attribute is part of the Entity.

  2. The period that the reference Domain is being used to validate the Attribute changes.

The first type of date range is not required because, using the argument previously given for not having an Entity Date Range, the Attribute either exists in the current universe or it doesn't, it cannot exist for some of the time.

The need for the second type of date range depends purely on whether there is a business requirement to be able to switch Attributes from one Domain to another during the life of the application. This is an often stated enhancement that really should be avoided unless there is a business requirement that makes this an essential requirement.

The main problem with allowing the Domain à Attribute association to be changed is the complexity of the data maintenance rules involved. The more important rules are:

Given that the equivalent result of a Attribute à Domain change can be produced by manipulating the existing Domain Values and Attribute Values I don't believe that we gain anything by allowing Attributes to be switched from one Domain to another.

The conclusion is that Attributes could be date ranged but shouldn't be.

Synonyms do not require date ranges because, by definition, all characteristics of the Synonym are inherited from the Attribute it is associated with.

In addition a Synonym is defined for use by the application software, not the users, and hence should only change when the application is changed. When this happens then the Synonym is just replaced because previous incarnations of the Synonym are now invalid.

Domains do not require date ranging because the use of a Domain is defined by the Attributes associated with it and even if there are no Attributes currently associated with the Domain the Domain should still valid and available for future use. The only exception to this is if the domain is invalid in which case it should be removed because it is not usable. Either way date ranging of Domains is not required.

Based on the above brief arguments the model only includes date ranges for Domain Values and Attribute Values and so are the only entities discussed in the following sections.

Allow different Meaning over time?

Currently in our model the primary key for the Domain Value is Domain Name and Value which, by definition, dictates that a given Value will only ever have one Meaning within the Domain and if the Meaning is changed then the change applies to all occurrences of that Value wherever and whenever it existed.

If the business has total control over the domain definitions then this limitation is not a problem because we simply state that Domain Values will never be reused for anything other than the original meaning. This restriction may not be palatable to the business (mainly because most business users have a problem with IT imposing constraints on their actions) but the Domain Value is just shorthand for the Meaning, so although it would be nice if the Domain Value bore some resemblance to the Meaning, there is no absolute requirement for this to be true.

However if the domain is externally defined4, for example Country Codes defined by the International Standards Organisation (ISO) or Bank Automated Clearing Services (BACS) defined Transaction Codes which appear in automated payment files, then we cannot enforce uniqueness of Value over time or dictate that the Value will only ever have one meaning, although we can be sure that the same code will not have two meanings at the same point in time.

An initial attempt at a possible solution would be to include the Allowable Date Range as part of the primary key for the Domain Value and so allow the Value to occur with multiple definitions over time. However this solution causes many problems with our data model because as well as changing the primary key to the Domain Value and Attribute Value entities it also requires us to record an attribute specific "modification" date against each physical attribute that is validated against the domain.

For example assuming the following Values are defined for a Payment Type domain:

Value

Effective From

Effective To

Meaning

CC

01-Jan-2001

30-Jan-2001

Credit Card

CC

01-Jul-2001

<whenever>

Counter Credit

If we recorded just "Payment Type" against Payment Transactions then when we subsequently tried to retrieve the meaning of the "CC" value we get two possible meanings. In order to identify which of the two possible meanings applied at the time the attribute was updated we also need to record "Payment Type Change Date" as well.

This is just one attribute! We would have to "date stamp" every Attribute for which we subsequently want to retrieve the meaning, each time the Attribute is updated. This would be a significant overhead within the application not only in additional storage requirements but also because the software maintaining the data would have to examine each update transaction and determine exactly which attributes have been modified and set the timestamp accordingly.

An alternative solution for externally supplied domains is to replace the external Domain with an internal Domain and then map the external Domain Values to an equivalent set of Values in the internal Domain:

Internal Value

External Value

Effective From

Effective To

Meaning

CC

CC

01-Jan-2001

30-Jun-2001

Credit Card

CCC

CC

01-Jul-2001

<whenever>

Counter Credit

CC

CR

01-Jul-2001

<whenever>

Credit Card

This works fine if there is a one-to-one mapping between the internal and external domains because we can ensure that a consistent internal value is used i.e. in the above example the internal "CC" value consistently means "Credit Card".

All that aside, in reality even Externally Defined Domains are unlikely to reuse Domain Values5 and in the very rare instance that we do require it we could always enhance the model to introduce mapping for the relevant Domain. Consequently I don't believe there is really a case for allowing any part of our Domain Model to support multiple meanings over time.

Use of surrogate keys - a note

The "non-reuse of domain values" restriction is the argument most commonly used for using system generated ID's rather than pre-defined values to implement enumerated domains. The argument being that if the Values will eventually become meaningless over time we should accept this and use meaningless numbers right at the start. At least we save the headaches of having to come up with decent mnemonic Values.

Although this has some merits and avoids most the problems outlined in this section, it also cancels out many of the usability benefits of having a natural key for navigating into the Domain Validation data plus, of course, we also have a second level of indirection introduced into the hierarchy.

I also prefer to use three / four character String values because in many instances these can be directly reported, without translation, because people are much better at remembering strings than numbers, e.g. compare "Y" and "N" to 0 and 1 (or is it 1 and 0?) for Yes and No.

Allow multiple Date Ranges for Domain Values?

At the Domain Value level the Effective Date Range indicates that the Value is a member of the Domain during that period, was not a member of the Domain before the Effective Start Date and is not a member of the Domain after the Effective End Date. In addition, this constraint cascades to all Attributes associated with the Domain and all Attribute Values (if any) defined against the Attribute.

A scenario for this might be:

Our company currently accepts Credit Card as a Payment Method. However due to a dispute over the commission payable on each transaction the company decides not to accept Credit Cards for any type of Financial Transaction for the foreseeable future. After three months the dispute is resolved (the Credit Card company drops it's commission rate) and Credit Cards are acceptable again6.

This is an across the board exclusion so, although we could manage this as an exclusion against each Attribute associated with the Payment Method domain, it would be more efficient to impose it as a Domain level exclusion.

If imposed at the Domain level then, because the scenario requires the Credit Card to be valid for two different periods within the Payment Method domain, we need the following extension in our model to record the many date ranges that a Value may be valid for:

The relationship is necessarily mandatory at both ends because (logically) the Domain value must have had an initial valid date range. Also we can't use the absence of a date range to assume that it is always valid because this conflicts with using the absence of a particular date to identify that the Value is not valid.

This scenario is also another argument against reusing Domain Values. The reason that a Value may not currently be in use could be due to a business reason rather than it simply being no longer valid. In our scenario if the Credit Card value, held as "CC", is recorded as invalid during the period of the dispute then it could be deemed to be available for re-use. If it has been reused for something else, e.g. CC is used for "Counter Credit", then re-enabling the value for it's original purpose has a cascading impact outside of its immediate scope i.e. the second (re-used) "CC" value has to be changed to remove the conflict. Of course, unnecessary updates should always be avoided if possible.

Allow multiple Date Ranges for Attribute Values?

Before we start on the discussion of whether multiple date ranges should be allowed for Attribute Values we should first discuss how Attribute Values, which are sub-sets of the Domain Values, will be recorded in the model. There are two methods of doing this, which are:

Ideally we would want to use the first method because it is more rigorous but, in practice, this tends to be unworkable for the following reasons:

Consequently, for ease of implementation and maintenance, we will use the Exclusion approach to defining Attribute Value sub-sets.

In addition, because we do not hold Meaning against the Attribute value we can simply add the date range to the Attribute Value entity, with the sub-set restriction switching from Attribute Value à Domain Value to Attribute Value à Domain Value Validity, to produce the following data model:

If our Domain scenario becomes:

Our company currently accepts Credit Card as a Payment Method. However due to a dispute the company decides not to accept Credit Cards as a Payment Method but, for example, will still make Refunds to a Credit Card. After three months the dispute is resolved and Credit Cards are acceptable again.

This is a restriction solely against Payments rather than all types of Financial Transaction so is an attribute restriction and, because we are only recording excluded Domain Values against the Attribute, our exclusion entry is:

Entity Name

Attribute Name

Value

Effective Start Date

Effective End Date

Payment

Payment Method

CC

01-Jan-2001

31-Mar-2001

If the situation then arose again for a different period of time then the entries would become:

Entity Name

Attribute Name

Value

Effective Start Date

Effective End Date

Payment

Payment Method

CC

01-Jan-2001

31-Mar-2001

Payment

Payment Method

CC

01-Sep-2001

31-Dec-2001

Based on this scenario, the conclusion should be that multiple date ranges should be allowed against Attribute Values and our model is changed as described.

Footnotes

  1. Even in real-time databases, such as telephone switching, there is usually a requirement to keep an historic log of each telephone call made for subsequent billing.

  2. A "goes without saying" rule of software engineering! We do not record Employee Income Tax details in an Inventory Control system just because we can conceptualise that the Employee responsible for Goods Inwards may also be paying Income Tax.

  3. This statement is may be a bit contentious (mainly because people don't like throwing information away) but the alternative requires proving the case for retaining what is effectively corrupt data (something I'm not about to do).

  4. Handling external domains is only really a problem if we need to interface with an external application that either supplies or receives data using the domain. If we are simply using a standard set of codes to initialise a domain that the business has subsequent control over then we're under no obligation to implement subsequently conflicting values.

  5. Imagine the chaos if the ISO committee decide that Country Code "UK" now meant "Ukraine" instead of "United Kingdom" or BACS decided that Transaction Code "01" was "Withdrawal" rather than "Deposit" whilst there was a possibility that the previous codes were still in use.

  6. This scenario is actually a "Insert Transition" restriction i.e. for a period of time we no longer want to create Credit Card Payments, but will do for our discussion.

 

Adrian Miley is a freelance data architect / database designer with over 15 years experience as an I.T. practitioner and, prior to moving into I.T., a further ten years experience as a Management Accountant. Adrian is a member of the British Computer Society and formerly a member of the Institute of Cost and Management Accountants (now Chartered Institute of Management Accountants and one of the top three Accountancy institutes in the UK). Adrian can be contacted by e-mail at adrian.miley@lineone.net.

© Copyright, 1998-2004 InConcept (Information Conceptual Modeling, Inc.) All Rights Reserved. Privacy Statement.
ISSN: 1533-3825