December 2001        Issue: 23

Journal of Conceptual Modeling
www.inconcept.com/jcm

Sharp Informatics Example Problem
by Dr. John K. Sharp

The following simplified example shows how Natural Language Modeling can be used to express three distinct kinds of knowledge that results in 1) database requirements, 2) derivation rules for knowledge presentation to humans or automated rule enforcement, and 3) instructions for humans to follow. Natural Language Modeling always starts with example(s) of the problem that is to be modeled. In this case the examples are taken from an instruction manual for security guards and a legacy security application. Subject matter expert(s) are asked to create a true sentence based on the example information. Results from the analysis are presented here. The NLM procedure will turn any true declarative sentence into instance(s) of valid fact type(s). For brevity the example is not wholly defined. This analysis technique scales linearly. That means that knowledge is defined only once. Then as the scope of a project expands, previously defined knowledge does not need to be reanalyzed.

This example problem is of a physical security area around a building. The security perimeter is monitored by motion- and thermal-detection sensors. The sensors function according to a fixed set of rules. The sensor knowledge is presented to guards according to another fixed set of rules. Guards respond to sensor activity according to a third set of rules. The problem is expressed below in a set of diagrams (drawings and screens) and excerpts from an existing application and a training manual. A detailed analysis of one fact type is presented, and only the summary information is presented for the other fact types.

   

Sensor Application Code (Lines 978 - 989)

//** A sensor light is red when movement is detected and a thermal signature between 95 and 105 is detected at that sensor. A sensor light is yellow when… **//

Guard Training Manual (Page 21)

Guard instruction for responding to the security sensor code alerts:

For a sensor code color of green a guard is to review all sensors once every thirty minutes via video monitors.
For a sensor code color of yellow a guard is to review action at the sensor via video monitors until the reason the sensor turned on is established or the sensor code color returns to green.
For a sensor code color of red a guard is to alert all guards on duty and review action at the sensor via video monitors for at least 15 minutes or until the reason is established.

Guard Training Manual (Page 35)

The following lists provide the ordered appropriate action responses once a reason for the sensor code alert has been established:

single person
1 determine if the perimeter fence has been violated
2 scan the intruder to detect the presence of weapons
3 …….

animal
1 determine if the perimeter fence has been violated
2 …….

The presentation of the analysis results here assists in the understanding of the precision available with the Natural Language Modeling procedure. The presentation is not structured to be an encompassing tutorial that explains all possible analysis results. Nor is it structured to show how valid fact types can be generated from any initial sentence.

Step 1: Verbalization and Highlighting

The analyst asks the subject matter expert to transform a portion of the previous data into sentences that the expert would use in explaining this information to a colleague. This first sentence comes from the Current Sensor State form. The number of data elements that appear in a particular sentence is not important, but there is a requirement that every data element appears in a sentence. This requirement is achieved by highlighting the individual data elements as they appear in sentence(s). This is the initial step used in NLM to establish and/or validate the knowledge rules behind any form, report, graphical information model or legacy application data structure.

The expert answers:
"On 7-16-97 at 16:55, active sensor 3867 detected movement and recorded a thermal signature of 99 degrees F."

The analyst requests another sentence that provides a different data instance for each of the portions of the original sentence that can vary. The original sentence is a true statement about the subject area under investigation. This second sentence does not need to be a true statement within the subject area because yet unspecified business rules may restrict some or all of the instances in the original sentence from varying at the same time (e.g. The original sentence is not normalized.) or it may just be the case that at this time another true statement may not exist in which every element varies. The portion of the sentence that is restricted from varying in this analysis is the verb.

The expert answers:
"On 7-18-97 at 7:55, inactive sensor 3413 detected no movement and recorded a thermal signature of 62 degrees F."

Step 2: Placeholder Assignment

The analyst then restates the two sentences and requests that the expert validate the portions of the sentence that are to be assigned placeholders during analysis.

"On 7-16-97 at 16:55, active sensor 3867 detected movement and recorded a thermal signature of 99 degrees F."
"On 7-18-97 at 7:55, inactive sensor 3473 detected no movement and recorded a thermal signature of 62 degrees F."

Step 3: Qualification

The analyst requests names for the class, label, and placeholder location for each varying element.

"Of which class are the elements referred to by 3867 and 3473 a member?"

The expert answers:
"Sensor."

The analyst asks:
"What is the name used (label) for an individual element of the population, 3867, of the class sensor?"

The expert answers:
"Sensor Identifier."

The analyst asks:
"What would you like to name the placeholder for the position where 3867 and 3473 appear in this sentence?"

The expert answers:
"SensorId."

These three questions are repeated for each placeholder. As instances reappear in sentences only the placeholder name for the corresponding position in the sentence need to be requested. The analyst concludes this step by creating a candidate fact type for evaluation:

1: On <Date> at <Time>, <Status> sensor <SensorId> detected <MovementReading> and recorded a thermal signature of <Temperature> degrees F.

Step 4: Fact Type Validation

The first step in the validation of a candidate fact type is to ask if it is ever allowed for a new sentence that differs by only a single variable placeholder to coexist with the original true sentence. Redundancy between the two sentences is ignored because the sentences are not normalized. The results from this step are presented in matrix form as:

1: On <Date> at <Time>, <Status> sensor <SensorId> detected <MovementReading> and recorded a thermal signature of <Temperature> degrees F.
 

7-16-97

16:55

active

3867

movement

99

------------

------------

------------

------------

------------

------------

Allowed?

another

16:55

active

3867

movement

99

Yes

7-16-97

another

active

3867

movement

99

Yes

7-16-97

16:55

another

3867

movement

99

No

7-16-97

16:55

active

another

movement

99

Yes

7-16-97

16:55

active

3867

another

99

No

7-16-97

16:55

active

3867

movement

another

No

The answers for individual rows in the matrix are generated using the following question. "Given that "On 7-16-97 at 16:55, active sensor 3867 detected movement and recorded a thermal signature of 99 degrees F." is true, is it possible for another valid Date [for example "7-17-97"] to exist such that the fact instance "On 7-17-97 at 16:55, active sensor 3867 detected movement and recorded a thermal signature of 99 degrees F." is true?"

The NLM analysis procedure uses the Yes/No answer vector to either generate new candidate fact type(s) or to validate that the current fact type is an elementary fact type (normalized fact type). This analysis procedure depends only on the Yes or No answers provided by the subject matter expert. Thus, the subject matter expert is fully accountable for the resulting information model. This example problem will not be fully analyzed using the NLM analysis procedure here. This shortcut is taken to focus on the results of the analysis for this problem. One of the elementary fact types contained in sentence 1 will be presented in the remaining analysis steps in order to provide an example for each step.

Step 5: Pattern Recognition

One of the elementary fact types generated from sentence 1 has the pattern:

FT-1: On <Date> at <Time> sensor <SensorId> is <Status>.

Step 6: Diagramming

This knowledge generated about this elementary fact type can be presented graphically as:

 

FT-1 diagram

Step 7: Additional Population Restrictions

The NLM analysis procedure assigns the required referential integrity (foreign keys) required among the elementary fact types, but other population restrictions need to be investigated once an elementary fact type is established. The type of additional population restriction evaluated here is the derivation rule. This population restriction is specified by the subject matter expert through the expert's response to the following question: "Is a status instance always or sometimes derived in the "On <Date> at <Time> sensor <SensorId> is <Status>." fact type?"

For this question the expert would respond No. If the response to this question is Yes, then the expert would be requested to describe the derivation rule using other fact types defined in the analysis. (An illustrative case of this will be found in the discussion of FT-4 below.)

Step 8: Summary

The results from the previous steps can be presented in a condensed form as:

FT-1: On <Date> at <Time> sensor <SensorId> is <Status>.

On 7-16-97 at 16:55 sensor 3867 is active.
"7-16-97" is a Date of a Day.
"16:55" is a Time of a Day.
"3867" is a Sensor Identifier of a Sensor.
"active" is a Status of a Sensor State.

On <Date> at <Time> sensor <SensorId> is <Status>.
 

7-16-97

16:55

3867

active

-------------

-------------

-------------

-------------

Allowed?

another

16:55

3867

active

Yes

7-16-97

another

3867

active

Yes

7-16-97

16:55

another

active

Yes

7-16-97

16:55

3867

another

No

Are instances of this fact type derived? No

 

Remaining Fact Type Summaries

Other significant elementary fact types will be provided in summary form so that the results of the NLM analysis can be discussed.

FT-2: On <Date> at <Time> sensor <SensorId> detected <Movement>.

On 7-16-97 at 16:55 sensor 3867 detected movement.
"7-16-97" is a Date of a Day.
"16:55" is a Time of a Day.
"3867" is a Sensor Identifier of a Sensor.
"movement" is a Movement Reading of a Movement Sensor.

On <Date> at <Time> sensor <SensorId> detected <MovementReading>.
 

7-16-97

16:55

3867

movement

-------------

-------------

-------------

-------------

Allowed?

another

16:55

3867

movement

Yes

7-16-97

another

3867

movement

Yes

7-16-97

16:55

another

movement

Yes

7-16-97

16:55

3867

another

No

Are instances of this fact type derived? No

 

FT-3: On <Date> at <Time> sensor <SensorId> has a thermal signature of <Temperature> degrees.

On 7-16-97 at 16:55 sensor 3867 has a thermal signature of 99 degrees.
"7-16-97" is a Date of a Day.
"16:55" is a Time of a Day.
"3867" is a Sensor Identifier of a Sensor.
"99" is a Temperature of a Thermal Sensor.

On <Date> at <Time> sensor <SensorId> has a thermal signature of <Temperature> degrees.
 

7-16-97

16:55

3867

99

-------------

-------------

-------------

-------------

Allowed?

another

16:55

3867

99

Yes

7-16-97

another

3867

99

Yes

7-16-97

16:55

another

99

Yes

7-16-97

16:55

3867

another

No

Are instances of this fact type derived? No
 

FT-4: On <Date> at <Time> sensor <SensorId> violates rule <RuleNo>.

On 7-16-97 at 16:55 sensor 3867 violates rule 1.
"7-16-97" is a Date of a Day.
"16:55" is a Time of a Day.
"3867" is a Sensor Identifier of a Sensor.
"1" is a Rule Number of a Rule.

On <Date> at <Time> sensor <SensorId> violates rule <RuleNo>.
 

7-16-97

16:55

3867

1

-------------

-------------

-------------

-------------

Allowed?

another

16:55

3867

1

Yes

7-16-97

another

3867

1

Yes

7-16-97

16:55

another

1

Yes

7-16-97

16:55

3867

another

Yes

Are instances of this fact type derived? Yes
The instance values and the parameter names for each point in time in FT-2 and FT-3 are passed through the rule checker FT-5 to determine if the sensor is in violation of any rule. This derived fact may or may not be permanently stored in the resulting application.

FT-5: Rule <RuleNo> requires <ParameterName> to be <OperatorName> <ParameterValue>.

Rule 1 requires temperature to be greater than 96.
"1" is a Rule Number of a Rule.
"temperature" is a Parameter Name of a Parameter.
"greater than" is an Operator Name of an Operator.
"96" is a Parameter Value of a Parameter.

Rule <RuleNo> requires <ParameterName> to be <OperatorName> <ParameterValue>.
 

1

temperature

greater than

96

-------------

-------------

-------------

-------------

Allowed?

another

temperature

greater than

96

Yes

1

another

greater than

96

Yes

1

temperature

another

96

Yes

1

temperature

greater than

another

No

Are instances of this fact type derived? No

Note: the other parts of Rule 1 are:
Rule 1 requires temperature to be less than 105.
Rule 1 requires movement reading to be equal to "movement."
This implementation of rules in FT-5 mixes meta-data (e.g. temperature) and data (e.g. 96). Fact types 2 and 3 could be rewritten as: On <Date> at <Time> parameter <ParameterName> has parameter value of <ParameterValue>. This was not done in this case because for most subject areas the business rules change more often than the types of data that are used in the rules.

FT-6: A rule <RuleNo> violation requires a <LightColor> sensor light.

A rule 1 violation requires a red sensor light.
"1" is a Rule Number of a Rule.
"red" is a Light Color of a Sensor Light.

A rule <RuleNo> violation requires a <LightColor> sensor light.
 

red

--------

---------

Allowed?

another

red

Yes

1

another

No

Are instances of this fact type derived? No

FT-7: A <HigherLightColor> sensor light has priority over a <LowerLightColor> sensor light.

A red sensor light has priority over a yellow sensor light.
"red" is a Light Color of a Sensor Light.
"yellow" is a Light Color of a Sensor Light.

A <HigherLightColor> sensor light has priority over a <LowerLightColor> sensor light.
 

red 

yellow

--------

---------

Allowed?

another

yellow

Yes

red

another

Yes

Does red, yellow at any moment in time identify exactly one light color having priority over another light color? Yes
Are instances of this fact type derived? No

FT-8: On <Date> at <Time> sensor <SensorId> has a sensor light color of <LightColor>.

On 7-16-97 at 16:55 sensor has a sensor light color of red.
"7-16-97" is a Date of a Day.
"16:55" is a Time of a Day.
"3867" is a Sensor Identifier of a Sensor.
"red" is a Light Color of a Sensor Light.

On <Date> at <Time> sensor <SensorId> has a sensor light color of <LightColor>.
 

7-16-97

16:55

3867

red

-------------

-------------

-------------

-------------

Allowed?

another

16:55

3867

red

Yes

7-16-97

another

3867

red

Yes

7-16-97

16:55

another

red

Yes

7-16-97

16:55

3867

another

No

Are instances of this fact type derived? Yes
The results from FT-4 derivation rule are used along with FT-6 and FT-7 to determine the sensor light color.

FT-9: On <Date> at <Time> sensor <SensorId> has an intrusion reason of <ReasonText>.

On 7-16-97 at 16:55 sensor has a reason of single person.
"7-16-97" is a Date of a Day.
"16:55" is a Time of a Day.
"3867" is a Sensor Identifier of a Sensor.
"single person" is a Reason Text of an Intrusion Reason.

On <Date> at <Time> sensor <SensorId> has a reason of <ReasonText>.
 

7-16-97

16:55

3867

single person

-------------

-------------

-------------

-------------

Allowed?

another

16:55

3867

single person

Yes

7-16-97

another

3867

single person

Yes

7-16-97

16:55

another

single person

Yes

7-16-97

16:55

3867

another

Yes

Are instances of this fact type derived? No

FT-10 A <ReasonText> intrusion reason has step <StepNumber> that states: <InstructionText>.

 

A single person reason for sensor code color has step 1 that states: determine if perimeter fence has been violated.
"single person" is a Reason Text for an Intrusion Reason.
"1" is a Step Number of an Intrusion Step Number.
"determine if perimeter fence has been violated" is Instruction Text of an Instruction.

A <ReasonText> intrusion reason has step <StepNumber> that states: <InstructionText>.
 

single person

1

determine ….

--------------

--------

---------------

Allowed?

another

1

determine ….

Yes

single person

another

determine ….

No

single person

1

another

No

Are instances of this fact type derived? No
Note: FT-10 has two identifiers. One is Intrusion Reason and Step Number. The other is Intrusion Reason and Instruction.

FT-11: A <LightColor> sensor light requires a guard to take the action of <ActionText>.

A green sensor light requires a guard to take the action of review all sensors every thirty minutes via video monitors.
"green" is a Light Color of a Sensor Light.
"review all …" is Action Text of an Action.

A <LightColor> sensor light requires a guard to take the action of <ActionText>.
 

green 

review all …

--------

---------

Allowed?

another

review all …

Yes

green

another

Yes

Does green, review all… at any moment in time identify exactly one light color requiring a guard to take the action? Yes
Are instances of this fact type derived? No

FT-12: A <LightColor> sensor light exists.
A green sensor light exists.
"green" is a Light Color of a Sensor Light.

A <LightColor> sensor light exists.
 

green

--------

Allowed?

another

Yes

Does green at any moment in time identify exactly one light color? Yes
Are instances of this fact type derived? No
 

All of the previous fact types are displayed in the following relational diagram.


FT-1 to FT-12 diagram

This simple example shows how both data rules (table, columns, keys, etc.) and business rules (when to turn the sensor light red, what are the response steps, etc.) are modeled using NLM. The benefit of modeling the entire set of knowledge is that the business rules can be specified as data to the application and they can maintained by the expert without coding changes being required for implementing a new rule or modifying or deleting an existing rule. This structure is valuable because in most businesses the business rules (e.g. A director has to approve all purchases over $10,000.00.) change more often than the bulk of the data maintained (e.g. Knowledge specified on a purchase order.). Questions arise from this analysis that should be addressed before the model is finalized including determining if the prioritizing of Action Text and Reason Text is needed. This complete documentation of knowledge enables the automatic generation of manuals and training programs that use the precise rules that are implemented in the application.

The subject matter expert(s) can answer each of these questions and then the answer may be validated by any qualified expert. This up-front precision in information requirements greatly improves communication of requirements among the subject matter experts, analysts, and implementers. The improved communication of requirements virtually eliminates all of the rework and unfinished projects that are due to misunderstandings of the requirements among various participants. The knowledge captured using the NLM procedure can be documented in an ORM or NIAM CASE tool or mapped directly into relational tables.

Dr. John Sharp is the founder and principal consultant for Sharp Informatics.Before starting Sharp Informatics in 1997 he was employed by Sandia National Laboratories in Albuquerque, NM for 18 years. While at Sandia he held staff and management positions in all areas of information technology, including analysis, design, implementation, maintenance, information architecture, data administration, and information technology research. He has worked closely with Prof. Shir Nijssen of The Netherlands to improve the NIAM analysis methodology. Dr. Sharp is the creator of the first information analysis procedure known to be mathematically precise.This procedure reformulates the usual (imprecise and inaccurate) statements and examples from a subject area into verified fact types. The output of this productivity enhancing process (a set of information requirements) is compatible with all the latest and most productive database application creation tools. John is the editor of the international standard for conceptual schemas. He has co-chaired two international conferences on natural language modeling and he has presented numerous papers and seminars at professional conferences.

Contact information:

Dr. John Sharp
Sharp Informatics
1604 Vassar SE
Albuquerque, NM 87106
sharp@sharp-informatics.com
505-243-1498
fax 505-248-0345
http://www.sharp-informatics.com

© Copyright, 1998-2004 InConcept (Information Conceptual Modeling, Inc.) All Rights Reserved. Privacy Statement.
ISSN: 1533-3825