March 2002        Issue: 24

Journal of Conceptual Modeling
www.inconcept.com/jcm

Validating Information Models
by John Sharp       
Sharp Informatics Albuquerque, NM

The JCM email group has spent a considerable amount of effort in trying to understand the differences among a number of modeling approaches.  The old dogs in these discussions are all experts in NIAM modeling1.  The big discussions in the early days of NIAM were between NIAM and ER models.  There was no contest for leadership in modeling direction; ER had support in the United States from government, university and industry.  Apparently widespread use of ER was not sufficient for control of future analysis direction because OO came along and has made significant inroads in the area of data modeling.  The argument of whether the model should be NIAM, ER, OO, ORM, LDS or any other modeling methodology continues.  All of these methodologies define a way for presenting the results of analysis.  None of these methodologies have a procedure for doing analysis.

The knowledge that is contained in a data model comes for the subject matter experts in a company.  Subject matter experts cannot read the models that are presented from any of the previously mentioned methodologies.  I will show my NIAM bias by declaring that the ORM presentation of sentence rules comes the closest. 

My approach to data modeling is to start with a true sentence that is supplied by a subject matter expert and then derive the data model rules by asking the subject matter expert questions about the sentence.  The Natural Language Modeling (NLM) procedure provides a way to take knowledge from subject matter experts and create correct data models.  The presentation of the data model results can use the methodology that supports the implementation of the application.  

An interesting variant of the NLM procedure is to take a model presented in some methodology and validate the model using the NLM procedure.  A model developed in a tool that supports any of these methodologies will have a “Yes’ or ‘No’ for each of the NLM questions.  A subject matter expert then answers "Yes" or "No" to the same question.  When the answers are different, the NLM procedure is used to determine the proper data model.  Once the revised model's answers agree with the subject matter expert's answers the model is validated. 

The NLM procedure is a procedure for doing analysis.  It is not a set or rules for the presentation of analysis results.  The opportunity posed for using the NLM procedure is to create a precise set of requirements that are correct the first time and the subject matter expert is accountable for the implementation of these requirements.  

 The graphical presentation of the resulting knowledge can be in a suitable object-oriented or relational methodology.  Having a precise model and having someone who is accountable for the knowledge content of the model allows for "real" engineering opportunities in the information technology arena.

 Recent NLM Examples:

 A manufacturer of complicated parts was going to make a new database for some of their limited life components.  These components could be reused if certain conditions were met.  The parts had been tracked using a glorified spreadsheet that had been developed over a number of years.  The clerks that managed this information told us that one of the parts was more confusing than the others, so it was saved until last.  The internal analysts and I both knew how to model an instance of a part.  All of us had built systems that use the ‘Model Number’ and ‘Serial Number’ to identify an instance of a part.  This approach was confirmed when we analyzed all of these limited life parts until we got to the last one.  The clerks had tried to explain why this last part was different, but they could not present a clear explanation of these rules. 

We created true sentences about this last part and started using the NLM procedure. The clerks started giving different answers to similar questions asked about the other parts.  When someone gives an answer that is outside of my universe of modeling expertise I usually say ‘That answer is INTERESTING.’   [This is similar to a nurse looking your chart and saying that ‘That number is IMPRESSIVE.’  This means that you should already be dead!]  The meaning of my response is that I do not believe your answer.  I then help the subject matter experts to create two moiré true sentences and these sentences are analyzed.  In this case the clerks gave consistent answers to the same questions when they were presented differently.  We finally established that a part family type and a serial number defined a part instance for this part.  Once this was understood all of the confusion was gone.  It turned out that a part that was reused would have a fitting removed and the part would be recertified.  A new fitting could then be installed and this could change the part number.  The reuse of the base unit still was controlled through the use of the ‘Part Family Type.’ 

Would I suggest this data model result if I was starting with a new production control system? No.  But with an existing system you must understand the known rules before you can begin to suggest changes and if they do not want to change you must be able to support their current rules.  The internal analysts told me that they would have built the system using the standard approach for identifying part instances if we had not used the NLM procedure.  The rework costs could have been significant for this project.  If you want to know how hard it is to take your universe of modeling expertise out of analysis, try the example problem, link to: March 2002 Analysis Problem - that I put in each issue of this journal.  Most analysis who do not know the NLM approach have a very difficult time in getting the correct answer to these problems.  When real world values replace the variables, most good analysts can get the correct answer.  Our real world biases are supported in most modeling efforts, but we need to know when a subject matter expert is challenging them. 

The NLM training course usually spends three days working example problems that provide an understanding of specific parts of the NLM procedure.  I have changed the last two days from working structured problems to validating an attendee’s data model.  In my last course, David Hay (who besides being a excellent data modeling author is a very talented analysis) brought a pipeline model that he was in the process of completing.  He played the role of the subject mater expert and we validated his model.  It was not surprising when each Entity/Object had been identified correctly.  On two of the questions about attributes in his model the model said ‘Yes’ and Dave was not sure enough to confirm the ‘Yes’ answer.  He called his client subject matter expert and asked the two questions.  The subject matter expert instantly answered both questions “No.’  Dave said he would not have asked these questions without the NLM structured procedure.  He found two alternate identifiers that he had missed. 

Try Validating a Data Model: 

If you think that you can now catch all of the errors in your data models, try the example below.  You should be able to read the IDEF1x (ER) model and I assume that you have gone to college, so you have some subject matter expertise.  This example is published in the standard for IDEF1x modeling.  The creators of this ER modeling methodology all had one or more graduate degrees and were the experts in IDEF1x modeling (They created it!).  How many errors would you say that this model should contain?  Zero would be a good answer.  Any subject matter expert (with no data modeling experience) can find all nine errors when the NLM procedure is used for validation.  If you cannot find all nine errors in fifteen minutes or less, go to my website www.sharp-informatics.com and download the example problem that shows the NLM validation of this model.  [While you are there please use your real name and email address, so I can send you information on future NLM courses.  You can even select not to be notified about future courses.]  When you print out the IDEF1x example problem give it to several of your colleagues and see if they can find any errors.  I will be surprised if anyone finds an error.

 Learning about the NLM Procedure:

 I am not interested in converting any analyst to the NLM procedure instead of their current preferred graphical method.  NLM doesn’t even have a graphical version.  I am interested in providing another tool in your analyst tool kit that will improve the quality of your delivered models.  Finding just a few errors in one model that are corrected before the application is built will save a lot more than the cost of learning about the NLM procedure.

 I am offering my next NLM course the week of May 13, 2002.  You can learn more about this course and the early registration discount at my website www.sharp-informatics.com

Dr. John Sharp is the founder and principal consultant for Sharp Informatics.Before starting Sharp Informatics in 1997 he was employed by Sandia National Laboratories in Albuquerque, NM for 18 years. While at Sandia he held staff and management positions in all areas of information technology, including analysis, design, implementation, maintenance, information architecture, data administration, and information technology research. He has worked closely with Prof. Shir Nijssen of The Netherlands to improve the NIAM analysis methodology. Dr. Sharp is the creator of the first information analysis procedure known to be mathematically precise.This procedure reformulates the usual (imprecise and inaccurate) statements and examples from a subject area into verified fact types. The output of this productivity enhancing process (a set of information requirements) is compatible with all the latest and most productive database application creation tools. John is the editor of the international standard for conceptual schemas. He has co-chaired two international conferences on natural language modeling and he has presented numerous papers and seminars at professional conferences.

Contact information:

Dr. John Sharp
Sharp Informatics
1604 Vassar SE
Albuquerque, NM 87106
sharp@sharp-informatics.com
505-243-1498
fax 505-248-0345
http://www.sharp-informatics.com

© Copyright, 1998-2004 InConcept (Information Conceptual Modeling, Inc.) All Rights Reserved. Privacy Statement. ISSN: 1533-3825