February 2001        Issue: 18

Journal of Conceptual Modeling
www.inconcept.com/jcm

Natural Language Modeling
by
Patrick Hallock

 I just spent a week with John Sharp at his Natural Language Modeling training session. It was an interesting week and my first visit with John for a while. John is an old CDC fellow like many ORM or NIAM people. Over the years John has developed an interesting way of looking at the natural language modeling approaches and he has devised an interesting way to analyze data modeling problems. Actually, I was tired of seeing his analysis puzzles in the JCM and not being able to solve them using his approach. I knew only the first 2-3 steps in the process and just had to learn how to do them for myself. So, off I went to get the full story. I told John straight out that my definition of success at the class was to solve those puzzles. Time will tell if I am a good student. I liked the course and consider it a good set of skills to add unto other skills. It is always useful to see how others have solved sticky modeling problems. I see some modeling issues differently now and I am the better for it. After all, we all steal the best of each thing we learn and then go forward. In this case I'll share a solution after giving a brief explanation of Natural Language Modeling (NLM).

The NLM approach assures that you can determine the proper normalized structure for a given a valid sentence. It also assumes that the modeler has no knowledge of the subject area. The analyst merely interviews a subject mater expert to get the answers to some questions. These questions only require a Yes or No reply. I suspect that a good set of reports or other documents could substitute for a real live person and the answers could be validated later.

There are 3 questions, asked many times over, and fourteen possible paths. The first question is to gather the results from an instance of the matrix.

A

B

C

D

a1

b1

c1

d1

-- is a valid sentence

Another

b1

c1

d1

Yes or No

a1

Another

c1

d1

Yes or No

a1

b1

Another

d1

Yes or No

a1

b1

c1

Another

Yes or No

Each line is the matrix asks if another value can be used while holding all other values stable. Is it possible that another A such as a2 can be used with b1, c1, d1? Y/N. Each row in the matrix moves the open position over by one column. The final result is, ALL NO, ALL YES or MIXTURE OF YES/NO. Each of these three possible results is then processed using 1 of these three analysis approaches.

Question two is always, "Does this exactly identify the specified object?"
Question three is always "Does this partially identify the specified object?"

The procedure then guides you through the correct next step based on the answers. Actually the process is ripe for automation. The results always have a clearly defined next step. A simple tool for the analyst would be a great help. Send money to me and I'll make sure John gets it, minus the usual 90% handling fee.

In order to demonstrate this approach I am presenting one problem and the entire solution with all questions and decisions made by the analyst. The end result will be a normalized set of tables with table names, primary keys, attributes and foreign keys.

Here is my first solution to a variable puzzles using the procedure. I will explain each step as I go, making this a longer article in the process.

The initial significant sentence population is:

A

B

C

D

a1

b1

c1

d1

a2

b1

c1

d1

a3

b2

c1

d1

a4

b1

c2

d1

a5

b1

c1

d2

All we need to start is a single true sentence, so we will take the first one and create matrix for question 1:

A

B

C

D

a1

b1

c1

d1

--

b1

c1

d1

Yes

a1

--

c1

d1

No

a1

b1

--

d1

No

a1

b1

c1

--

No

s it possible that b1 - c1 - d1 can exist with another A such as a2? Yes
Is it possible that a1 - c1 - d1 can exist with another B such as b2? No
Is it possible that a1 - b1 - d1 can exist with another C such as c2? No
Is it possible that a1 - b1 - c1 can exist with another D such as d2? No

The answers indicate a Mixture of Yes and No.

Enter the Mixture Process:

Create the No Sentence (B, C, D)

a1

c1

d1

The valid starting sentence

--

c1

d1

Yes

a1

--

d1

Yes

a1

c1

--

Yes

 The result is all Yes so enter the ALL YES Process

Ask Question 2: Does this identify the specified object? NO

Why? b1, c1, d1 repeat.

Create a sentence using the known key (A) and analyze each sentence.

A

B

--

b1

Yes

a1

--

No

A binary Yes-No gives us A with attribute B A,B) A is the key.

A

C

--

c1

Yes

a1

--

No

A binary Yes-No gives us A with attribute C (A,C) A is the key.

A

D

--

d1

Yes

a1

--

No

 

A binary Yes-No gives us A with attribute D, (A,D) A is the Key

Create Yes Sentence 

a1
-- Is it possible that another can exist? Yes

Ask Question 2: Does this specifically identity the specific object? Yes

Note: All A's a unique - each A (a1 - a5) is unique and identifies an A.

So we have an object A.

The result is: A AB AC AD with A an identifier of an object (Key) in each elementary sentence.

We can put this back together as a relational table A (A, B, C, D) where A is the Key.

This is a very short description of part of the NLM procedure for doing analysis. The results of analysis can be populated in any type of graphical cartoon (ORM - yea, ER, or even O-O). The NLM contribution is that the subject matter expert precisely specifies the system requirements and the expert can then be held accountable for the specified rules in the implemented system.

John (on his web site: http://www.sharp-informatics.com) has a wonderful little example of a logical design with errors. This is interesting since the model is an IDEF1X standard teaching model. The software you can down load lets you examine the model and determine the errors. It's a very small College, Student, and Class model. Since we have been to school and the people creating the standard and the teaching example are all highly educated people in the field of modeling how can there be so many errors? Well, visit John's puzzle in this edition of the JCM and go use this example to see if you can get an answer. If not well, well then --- John has another training class scheduled for April 2-6 [www.sharp-informatics.com].

As I am sending samples of my efforts to John for correction, I am reminded that the method is more important than the right answer. I may just be good at guessing and then be no good when the guess is wrong. Besides, New Mexico is more fun in the winter than Minne-snow-ta. The walk between the training site and the hotel is 1 mile, a good morning and afternoon stroll. You can see forever across the plains to the mountains, a truly lovely sight.

Patrick Hallock is a Senior Partner and Principal Consultant for InConcept. He has over 20 years of ORM/NIAM experience and is  a certified ORM consultant, trainer/train the trainer and a certified Visio trainer.

Contact Information:

Patrick Hallock
President and Co-Founder
InConcept
8171 Hidden Bay Trail N
Lake Elmo, MN 55042
path@inconcept.com
(651) 777-8484
fax: (651) 777-9634
http://www.inconcept.com

 

© Copyright, 1998-2004 InConcept (Information Conceptual Modeling, Inc.) All Rights Reserved. Privacy Statement.
ISSN: 1533-3825