March 2004        Issue: 31

Journal of Conceptual Modeling
www.inconcept.com/jcm

Entity Class Classification
And
Entity Relationship Types

By Steve Hitchman

Abstract

This is the second in a series of three papers that examine the relevance of the ER to practical data modelling.  This paper compares the use of entity class classification with the practical use of entity relationship types.  An information engineering (IE) notation is shown to provide pragmatic support for aspects of the entity relationship (ER) model idea of relationship types.  Entity class classification in the IE notation is examined in the context of a framework proposed for using entity relationship types in design.  The practical use of IE classification seems to result in a clearer design approach than the use of relationship types.  However, there is no apparent design gain in using either of these constructs.  This is because the information in the classification duplicates information specified by relationships.  It is difficult to form a firm conclusion because ER model theory is not clear in practical application.  The suggestion is that the added value of the ER model in the design process is at best weak when compared to the Bachman’s entity class design technique. 

Introduction

The previous paper in this series (Hitchman, 2004) summarises findings of a strand of research, partly based on an information engineering (IE) notation that does not support ideas from the entity relationship (ER) model:

·         The ER model does not explain available empirical findings from practice.  Instead, data modellers use Bachman’s ideas about entity class diagrams (ECDs).  These ideas pre-date the ER model by fifteen years but are given little prominence in the literature.  Bachman proposed the use of diagrams as a design tool for talking about an underlying data model in terms of entity classes.

·         Bachman’s ideas can be clearly used to explain data modelling practice.  An ECD is used to talk about the data situation involved in an underlying normalised relational design.  This is the provenance of widely used IE notations.

·         Theories from psychology, especially linguistics, explain why a single model construct, the entity class representing a relation, works well when talking about data.  So Bachman’s ideas have a good theoretical base that can be used to understand the technique.

·         Practitioners use a design technique for the relational model, they do not use a different data model. 

This strand of research raises questions about what added value the ER model provides in the design process.  This paper takes the argument further by considering a notation from the IE context that goes some way to supporting ideas from the ER model.  The next section discuses the notation extensions in comparison with ER model ideas.  Previous work on the design implications of using n-ary relationships is used as a context for this discussion. 

Notation support for weak entities and relationship types.

Chen (1976, p.18) defined “If a relationships used for identifying the entities, we shall call it a weak relationship type… If some entities in the relationship are identified by other relationships, we shall call it a weak relationship relation”.  Chen (1976, p.20) used a double rectangle box to indicate “the information about entities in this set is organised as a weak entity relation.”  Chen’s definition is not entirely clear where an entity is identified by a combination of relationship and attribute.  Chen (1976, p. 20) mixed in with this definition a discussion of existence dependency “For example, the arrow in the relationship set EMPLOYEE-DEPENDENT indicates that existence of an entity in the entity set DEPENDENT depends on the corresponding entity in the entity set EMPLOYEE.  That is, if an employee leaves the company, his dependents may no longer be of interest.”  Some confusion occurs in the literature between the idea of existence dependency and the idea of a weak entity.  Any mandatory relationship must define ‘existence dependence’, regardless of whether the relationship is used as an identifier.  The problem here seems to be that Chen did not explicitly define mandatory relationships and his examples of weak entities are always also existence dependent.  Therefore it is possible to confuse ‘identifying relationships’ with ‘existence dependency’.  A non identifying relationship can also be ‘existence dependent’ if it is mandatory. 

 Chen (1976, p.18) gave this reason for differentiating two classes of entity set types “The distinction between regular (entity/relationship) relations and weak (entity/relationship) relations will be useful in maintaining data integrity.  Chen needed to classify the entity boxes because he did not have a way of showing identifying relationships in the relationship notation.  Figure 1 shows a Chen weak entity type also with an existence dependency, similar to Chen (1976, Figure 11).  It is obviously pointless to have the two notations unless the arrow (existence dependency) can be used on regular entity types, since a weak entity will always be existence dependent.  A worker can is existence dependent on a department (arrow) and is also identified partly by the department identifier, ie. has an identifying relationship.

 
Figure 1.           Chen’s notation for weak entity types and existence dependence.

Later examples in this section are equivalent to Figure 2.  Here there is a m:n relationship between worker and department.  We are not sure what the attributes of the relationship are (if any) as these were not shown on Chen’s original notation.  The implication is that the department-worker relation will be fully identified by the two relationships.  This identification is problematical for practical modelling, since this only allows us to specify data for a very constrained circumstance.  The worker can only move to a different department.  If the worker moves back to a department they previously worked for, then their original data must be lost – because of the unique identifier created.  Another attribute is generally needed, such as start date, to identify the worker department so that we can track worker departments over time.  This is fundamental requirement of most business systems.  Businesses almost always want to know ‘what happened’ as well as ‘what’s happening’.  There is, then, some confusion about how to apply this idea of relationship sets to a practical situation. 

 
Figure 2            A Chen diagram showing a m:n relationship set.

Several notations support what could be a version of Chen’s idea of ‘weak entities’.  For example, the ERwin (2003) notation, based on IDEF1X (1993), reprsents relationships with identifying (solid) or non-identifying (dashed) lines.  Entity classes are regular or may be classified into ‘existence dependent’ entity classes.  An existence dependent entity class has at least one identifying relationship, or one mandatory relationship (IDEF1X, 1993, p.3, 2.27).  An identifying relationship is always mandatory (since the foreign key is included in the primary key).  An example of the notation is shown in Figure 3.  The existence dependency information therefore appears twice, once by differentiating the entity class, and again by relationship specification.  It is not clear why this is useful.  There is no theory to support the use of the differentiation in thinking about the design, nor empirical evidence to support the use of this differentiation.  The cost of differentiation is that the user of the diagram has to deal with the more complex situation of two different entity classes.  


Figure 3            ERwin Historical Assignment as weak entity.

System Architect (SA, Popkin 2003) also provides a version of the weak entity that works the same way.  An example is shown in Figure 4.  SA also provides a notation that pragmatically supports the idea of relationship types by flagging some entity classes as ‘associative’ types.  The SA notation tool therefore classifies three types of entity classes based on the following rules (Popkin 2003):

A class with at least one mandatory relationship is a weak entity class, shown with a double soft box (e.g. Figure 4).  Note that a solid relationship line indicates that the relationship is included in the identifier, and a dashed line would indicate the opposite.  Information about ‘weakness’ is again displayed two ways, firstly through the classification of the entity class as weak, and secondly through having a mandatory relationship.

A class with a unique identifier that is only derived from relationships is an associative class, shown by a ‘diamond’ in a rectangle (e.g. Figure 5, reflecting the diamond in the Chen notation).  It is the exclusion of any attribute as part of the identifier that results in the classification change.  Note that an associative class can include other optional and non identifying relationships.

Anything else is an entity class, shown with a soft box;

(Sub-types are shown as weak entity classes with one-to-one ‘is a’ relationships.) 


Figure 4            Popkin SA Historical Assignment as weak entity.


Figure 5            Popkin SA Current Assignment as associate entity.

An SA associative entity class therefore has a rough correspondence with the idea of a relationship type in the ER model.  Figure 5 is equivalent to Figure 2, except that the SA notation is a kind of entity class, and the Chen notation is a relationship type.  The SA notation is therefore interesting because it extends the previous discussion, in Hitchman (2003a, 2003b).  In the notation used previously to discuss this issue (Hitchman 2003a, 2003b) the classifications were not available in the notation considered.  All of the details about identifying relationships and existence dependency were derived from relationship notation.  In this situation the practitioners cannot use the ER model idea of relationship types.  Therefore the use of the SA notation may shed further light on how practitioners deal with relationship types when some notation support is available.

Associative entity classes and relationship types

Dey et al (1999) have provided a framework of ‘design implications’ for using relationship types.  Therefore this framework is a very useful context to examine the SA construct.  According to Dey et al (1999, p.454, 455) “… there is ample evidence that these higher-degree relationships may occur frequently in some real world situations … provide a more “natural” representation of the real world … The naturalness of the representation depends upon the application being modelled and the users’ perception of it.  Consequently, naturalness is a relative and context-dependent concept. … Empirical studies … suggests that the misrepresentation of relationships is a common error in the design process …”  Dey et al do not present any theory about what constitutes a ‘natural’ representation.  Wand & Weber (2002) point out that there is no theory to support any particular representation being ‘natural’.  The ‘ample evidence’ quoted by Dey et al is not sourced, so we are not sure what they are referring to in respect of frequent occurrence.  This is difficult to understand since virtually no commercial data modelling tools support higher degree relationship notations – how can they occur frequently?  The empirical studies referred to are all on naïve student subjects performing abstract laboratory tasks.  The findings from this research do not generalise to practice (e.g. Hitchman 1999).  In fact, there is no evidence at all that practitioners misrepresent relationships.  It would be again hard to understand how this would occur since few practitioner data modellers use tools that support higher order relationships.  In short, there is no case at all to suppose that higher order relationships are relevant to practice (Hitchman 2003a, 2003b) 

SA is implying that an associate entity class represent a relationship set type when the entity class is entirely identified by relationships.  Chen (1776, p.16) discussed the use of unique identifiers for relationship set types by example and implied that a relationship set would be identified only by its relationships:

“Since a relationship is identified by the involved entities, the primary key of a relationship can be represented by the primary keys of the involved entities.”.

Chen used the example of ‘employee’ and ‘project’ where an employee can only be assigned to the project once.  Unfortunately, Chen did not discuss the situation when a potential relationship set type was identified by a combination of relationships and an attribute (such as in the ‘assignment’ example where ‘start-date’ is part of the unique identifier).  For example, Chen did not discuss the example where an employee can be assigned to a project several times (with distinct start and end dates) or where an employee could be assigned to several departments over time.  In the example of marriage, we can marry the same person twice with different start and end dates.  Therefore, we are not sure what happens in the ER model in this situation.   

For Dey et al (1999, p.456) “A relationship type (or relationship in short) is a set of similar relationship instances.”  The issue of unique identifiers is not included in this definition of relationship set types.  However, the following ‘design implication’ is given (Dey et al 1999, p.475):

“Since a relationship can be converted to an entity, given a real world concept, the designer needs to decide whether to model it as a relationship or as an entity.  The designer’s objective is to develop a conceptual model that is close to the user’s perception.  In real-world situations, the users’ perception is primarily driven by the existence (or lack thereof) of a unique identifier for the concept… If there is a unique identifier for a real-world concept, the user is likely to view the concept as an entity.  To understand why, recall that only entities have primary keys in the ER model; the concept of a primary key as the unique identifier of a relationship does not exist.” 

It is not clear how Dey et al’s conclusion about primary keys is derived from Chen’s definition of the model.  What Dey et al seem to be saying is that the ER model definition is not clear on this point, and in practical design we can use this pragmatic approach to help make decisions about what constitutes a relationship.  Relationship status should be assigned where the potential entity is only identified by relationships.  Whilst Chen did not make this clear in his original paper, the design implication (not a model definition) is based on the assertion that we discriminate between entities that are identified by attributes, and those that are identified by relationships only.  The mystery is why this is not always the case, and further, why was this not a part of the ER model definition ?  The pragmatic SA interpretation of ‘associative’ entity classes therefore corresponds exactly to Dey et al’s design implication for relationship types.  

The ‘associative’ entity class in SA seems to be based on a pragmatic interpretation that relationship set types are special because they are only identified by relationships.  Any number of relationships can be involved, so this includes n-ary relationships.  If this is the intention of relationship set types then they are considerably easier to find than Chen (1976, p.10) would lead us to believe with the statement that:

“It is possible that some people may view something (e.g. marriage) as an entity while other people may view it as a relationship.  We think it is a decision which has to be made by the enterprise administrator.  He should define what are entities and what are relationships so that the distinction is suitable for the environment.”   

If Chen intended relationship sets to be restricted to those only identified by relationships, this conflicts with his view of flexibility in deciding what constitutes a relationship set.  If this restriction is always true, then this raises the interesting question of why we need a new semantic construct (relationship set types) for this situation.  What do we gain from making this distinction, why is it useful ?  Do we make this distinction in ‘pure’ or ‘natural’ thinking and why ?   

Dey et al (1999, p.475) give another reason for considering identifiers in deciding about relationships related to the use of ‘start date’ issue in assignment.  This is the situation where the relationship is not ‘current’ but is ‘over time’.  Where we need to have the data about assignments completed over time.  This is fundamental requirement of most business systems.  Businesses almost always want to know ‘what happened’ as well as ‘what’s happening’:

“There is another practical reason for representing relationships as entities. … the participating entity instances may not uniquely identify each relationship instance. … Since the ER model cannot capture the relationship history, it is necessary to represent the relationship as an entity…” 

This is a difficult position for a ‘natural’ model of thinking about data because using the ER model we do not seem to be able to think about business data in the real world.   

There are three possibilities to explain the situation about relationship types in the ER model.  Firstly the idea of unique identifiers for relationships is just an omission, we ought to be able to declare a relationship attribute (like start date) as part of the unique identifier for a relationship.  Dey et al support the second possibility, this situation is not a relationship set but an entity set.  We are not sure what the theory is to underpin this.  The third possibility is that the ER model just doesn’t model data in the real world.  In SA there is no issue because the situation can only be modelled one way, using an asspociative entity class.  The mystery is why the ER model does not explicitly tell us how to deal with this very common situation.  How can the model claim to be a ‘natural’ representation of reality if it cannot clearly model ‘real data’ ?

Applying Classification To Example Scenarios

Dey et al use three key examples of n-ary relationships.  The first example is a relationship set type called ‘issues’ from doctor, prescription, patient and drug entity types.  This example is shown in Figure 6.  Every prescription must be issued and only once (1,1), so a prescription can only exist if it is also simultaneously issued.  Drug, patient and doctor may never become involved in ‘issues’, or may become involved many times (0,*).  Where an issues relationship exists, participation by all four entities is mandatory.  Dey et al use this notation because participation is always mandatory in a relationship type.  Note that Chen did not originally specify the notation shown, this is one of the many variations proposed since.  In the original notation it was not possible to constrain the prescription relationship, for example.   


Figure 6         N-ary Relationship (From Dey et al 1999, p.457, Fig 1)

 All of the n-ary situations cannot be modelled in SA because we do not know enough abut the unique identifier for the relationship type.  Dey et al provide a design solution, with unique identifiers, using their own design implications.  The SA equivalent, shown in Figure 7, is based on Dey et al’s own design discussion.  This seems to show that relationship types and associative entities are not equivalent.  The issues relationship type merges with prescription.  Prescription then acquires mandatory relationships with doctor, patient, and drug.  In the SA version ‘issues’ simply disappears.  Prescription is not an associative entity class because there is a unique prescription identifier (from the prescription entity type).  The relationships from doctor, patient and drug would not uniquely identify a prescription, although the addition of a timestamp would.  Therefore we could instead create prescription as a weak entity.  


Figure 7            SA Regular Entity Class Equivalent

A possible explanation for this conflict is that the example itself is flawed.  Firstly, the ‘issues’ relationship has a natural unique identifier, prescription ID.  This identifier triggers the design implication to promote the relationship to an entity class.  However, this is not an attribute of the relationship, it is itself a relationship.  The interpretation could be that ‘issues’ and ‘prescription’ are really the same thing.  There is no real justification in separating these as an entity class and as a relationship.  This is just a prescription with three relationships to doctor, patient and drug.  The prescription is issued by the doctor, to the patient, for a drug.  It seems self evident that the SA design makes it easier to understand the situation than the original relationship type.  Considering the SA notation leads us to suggest that there is no added value in thinking about ‘issues’ as a relationship at all.  In other words, the SA solution seems a lot easier to conceptualise about.  Therefore we are not sure whether this is a valid example to use.  It is also worth commenting on other relationships in this example.  Does a doctor see a patient without issuing a prescription (in a consultation)?  What happens when renewal prescriptions are issued on a consultation for long-term illnesses?  Is there really only one drug on a prescription?.  This author has never seen a convincing example of a ‘one’ constrained four-way relationship and this example is no exception. 

The second example is shown in Figure 8.  A ternary relationship, ‘advises’ is optional for all three entity classes.  A student is constrained to take part only once in a relationship.  As in the first example, one of the original entity types, ‘student’, can acquire relationships to the other two types, ‘faculty’ and ‘major’.  This is again because of the ‘one’ constraint.  In SA the student would again be a regular entity class and may have a faculty major.  The situation here is therefore like the first example.  


Figure 8            Dey et al (1999, p.459, Fig 3)

Dey et al advise that a better design uses a new entity type ‘advises’, this SA solution is shown in Figure 9.  This has the same primary key as ‘student’ – if a row exists here then it exactly matches one row in ‘student’, and when it exists there are always values for ‘faculty’ and ‘major’.  Using this second design choice, ‘advises’ represents the relationship type as a weak entity class, partly identified by ‘student’.  This suggests that the SA associative class is again not equivalent to a relationship type.  However, another interpretation is that again the SA notation reveals a more understandable solution.  A one-to-one relationship between ‘advises’ and ‘student’ suggest a sub-type pattern.  We can think of the situation from just the student perspective.  Some students have faculty majors, some don’t.  It is that simple.  This is just a partitioning of students, or sub-typing by virtue of relationships owned by types of student with faculty majors.  We can now view ‘advises’ as just ’advised students’.  This would be shown in SA as a sub-type of student, also as a weak entity through identification.  This author finds this concept considerably simpler than conceptualising about an ‘advises’ relationship involving all three entity classes.  The suggestion is that the use of relationship types, being themselves vague concepts, leads to a vague conceptualisation.  The other thing to note about this example is the lack of history.  How do we keep track of changes to faculty majors ?  What happens when a few students change faculty majors ?  Can advised students get a particular major from every faculty ?  In a similar interpretation for the first example, the author has never seen a convincing example of a ternary relationship with a ‘one’ constraint, and this is no exception.   


Figure 9            SA Solution for the second example

The third example involves ‘supplier’, ‘project’ and ‘part’ in an n-ary relationship ‘supplies’, shown in Figure 10.  This is the basic ‘regular’ ternary relationship with no unusual ‘one’ constraints.  All three entities can optionally take part in ‘supplies’, and can take part many times.  


Figure 10          Ternary relationship (dey et al 1999, p.459, Fig 4)

This situation is modelled in SA with an associative class, shown in Figure 11.  In any n-ary situation where the entities have a 0:n relationship, SA would generate an associative entity class.  In this example there is a clear correspondence between SA and the relationship type that encourages us to think that these are equivalent concepts in a practical situation.  The number of model constructs is the same (4), and the idea of classifying the entity class seems to correspond, pragmatically, with the idea of using a relationship type.  Note that the ‘supply’ is now a noun indicating the thing supplied and not a verb, as in ‘supply something’.  This is reasonable from the ER model perspective since the supplied thing is indeed a thing – the thing supplied.  This is an example of the difficulty in applying the ER model to a practical situation since this calls into question the use of a relationship type in the first place!  The identification of supply is only through the relationships and that determines the ‘associative’ classification.  Note that if we chose to add a new identifier then supply would be a regular class; if we choose to include an attribute in the identifier then this will be a weak class.  The associative class only corresponds to the situation where the relationship type is identified only by relationships.  This agrees, therefore, with the design implications of Dey et al, and with the implied definition of Chen. 


Figure 11          SA associative class equivalent for n-ary relationship

A final example is shown in Figure 12.  This also illustrates the difficulty in interpreting examples from the literature in practical terms.  Here we have to suspend the idea of ‘home’ and ‘visitor’ teams.  In this scenario there is no home stadium, a fixture between two teams can take place at any stadium.  In practical terms we would need to know what season we were in and what week we were playing.  We would also need to know what time to turn up for the match.  Occasionally, matches will have to be cancelled due to poor weather, or referee illness.  We need to be able to re-schedule matches and know about this status.  Probably we need to know who the referee is supposed to be and what the score was.  Presumably teams have agreed to matches at particular slots, such as Saturday afternoon.  We need to book stadium slots so that we can allocate them to fixtures.  There are all sorts of business rules involved.  We can only create one fixture per week for a team, so we cannot schedule the same team to two fixtures in the same stadium in the same week, for example.  Another example is that each team has to play every other team in the league just once in the season (unless perhaps there is a re-match if the league is tied).  These details are missing from the Dey et al discussion.  One important aspect of this specification is that a team can only exist if it is scheduled to at least one fixture.  This is a very restrictive situation since we cannot ‘know about’ a team until we have fixed them up with at least one match.  This means we cannot keep a list of teams prior to scheduling.  We are not sure what the identifier for the relationship is here (which makes the situation harder to understand).  This is because, in a single season, an identifier can be created from just the teams involved, regardless of the stadium.  Of course, we have the usual problem with the ER model in that we can only represent one season of fixtures.  It is not possible to model last and next years fixtures with this specification.  Therefore, what looks like a simple, clear cut specification is actually very difficult to interpret. 


Figure 12          A symmetric recursive relationship (adapted from Dey et al 1999, p.478 Fig 19)

 The SA equivalent is shown in Figure 13.  The reason that we have reverted back to a weak class is because the fixture can be made before the stadium is chosen.  This makes sound business sense, but creates a real dilemma for the ER situation.  Can we assume that in a relationship type one of the relationships is ‘optional’?  The definitions given both by Chen and Dey et al do not explicitly address the point but imply that all participants must be present.  This is confirmed by the fact that it is not possible to specify optionality in the notation used by Dey et al, whereas in the SA notation it is possible to specify the cardinality situation ‘both ways’ for the class.  If we keep to the assumption that participation is always mandatory then we can never connect a relationship type to an ‘optional’ class.  We have to create ‘plays against’ as a weak class in the ER model to specify this situation.  It is also possible for the business to use a ‘fixture ID’ which makes this a regular entity type.  So far as can be discerned from the ER model definition by Chen, it does not seem to make sense for optionality to alter the nature of the concept from relationship type to entity type.   


Figure 13          SA equivalent for figure 12

Therefore, in the SA notation we could have any classification and in the ER notation we could have either a relationship type, a regular entity type, or a weak entity type to represent the fixture.  This turns out to be important because it means that we cannot use the classification to help us understand when to look for particular design issues.  The issue here, discussed by Dey et al, is that the team relationships are ‘symmetrical’.  The problem is that we cannot have a fixture with both team A&B and team B&A in the same season.  This is an interesting problem that the designer needs to be aware of, but is only flagged by the ‘double’ relationship with team.  The important point is that classifying or using relationship types will not flag the issue in itself. 

Discussion of the consequences of associative and relationship type equivalence.

It is important to point out that there is some doubt about of the existence of ternary relationships in a practical business situation (Hitchman 1999, 2003a).  For example, in Figure 10 we need to check that there are no important relationships between the three regular classes.  Do suppliers only supply certain parts ?  If so then we need to know this or we will order parts from suppliers that do not supply them.  We are not sure how common ‘pure’ ternary situations are because the literature currently only discusses invented scenarios, such as the ones used by Dey et al.  In this discussion we have taken the examples largely at face value, but have still found two of them difficult to interpret. 

Classification in notations seems to be of limited use in the design process.  Generally, it does not seem to be useful to classify based on identity since this is a minor point in the design process – what is the added value of having different classes in this situation when the relationship specification already clearly shows the information?  In the ER model, on the other hand, relationship types are very limited in their scope and unlikely to occur often in a business situation.  It is therefore very unclear why there is a design advantage in using them.  It seems that relationship types correspond to the SA idea of associative classes, but we cannot be entirely sure because the practical use of relationship types is unclear.  Understanding how SA notation and Chen notation correspond raises interesting questions, that extends the previous debate in Hitchman (2003a).   

Firstly, is the relationship type just the idea of classification?  We do not know the answer to this question because there is no natural thinking theory developed by Dey et al to use as a guide, and there is no empirical evidence to indicate what happens in practice.  This paper has argued that the distinctions are of little use, but we cannot be entirely sure about this. 

Secondly, how does a relationship set add design value over the use of an SA associative class?   

Thirdly, what is the design value of these ideas anyway?.  After considering the SA notation it seems that the ER model is a disguised way of classifying some classes that are used in an ECD.  This seems to be the main contribution of the ER model to the design process already established by the use of ECDs in normalised relational design (as discussed in Hitchman 2003a, 2003b).  In the SA tool, an entity class diagram (ECD) is used is used, but the classes are themselves classified.  The model users are presented with no choice about what to use – the tool itself allocates either the diamond or weak class symbol.  This is just the idea of how to identify classes.  It is not clear what advantage the user gets from this classification ‘after the fact’.  Put another way, the contribution of the ER model to an established ECD design process seems to be very limited. 

Finally, did Chen intended that relationship set types could be related to other entity and relationship set types.  This is not included in Chen’s discussion of the ER model.  Dey et al (1999) do not discuss this issue.  The inference is that it is not possible to relate relations, outside of their inherent n-ary relationships.  .  Obviously in SA, this is no problem because everything has the status of an entity class.  In the ER model this would raise the interesting situation where a relationship set type can itself have relationships with entity set types, more complex is the case when two n-ary relationship set types are themselves associated through another n-ary relationship set type.  Would anyone ever understand such a model with relationship types?

Conclusion

Comparing the ER relationship type with a practical notation that is adapted to support it is useful in understanding the practical issues involved.  The pragmatic interpretation by SA seems to correspond with the design implications used by Dey et al to make practical use of relationship types.  In this case the idea of relationship set types seems to be extremely limited.  We are basically classifying the few cases in a model where an entity class is identified only by relationships, or by some relationships.  It is not at al clear why this is an advantage in design.  All of the classes that appear in SA are the same entity classes that appear with a normalised relational design, using ECDs.  The SA notation exposes the fact that the ER model added two minor classifications to using Bachman ECDs with the relational model.  If this is the case, then it is difficult to see why the ER model is any real improvement.  There is no theory about natural thinking to support these classifications, and no empirical evidence to show they are useful.   

The third paper in the series examines the use of practical high level conceptual modelling.

References

Popkin (2003) Information from the Popkin website.  For example, several whitepapers available in July, 2003: http://www.popkin.com/customers/customer_service_center/downloads/whitepaper/index.htm

ERwin (2002) Information from ERwin manuals.  For example AllFusion ™ ERwin ® Data Modeler Methods Guide 4.1, Available when ERwin is installed, © 2002 Computer Associates International, Inc. (CA)

IDEF1X (1993) Federal Information Processing Standards Publication 184, 1993 December 21 Announcing the Standard for INTEGRATION DEFINITION FOR INFORMATION MODELING (IDEF1X)

Dey, D., Storey, V.C. and Barron, T.M (1999) Improving Database Design Through The Analysis Of Relationships. ACM Transactions on Database Systems (24)4 pp. 453-486

Hitchman, S. (1999) Ternary Relationships – to three or not to three, is there a question ? European Journal Of Information Systems Vol 8 December pp.224-231

Hitchman, S (2004) The Entity Relationship Model And Practical Data Modelling, In publication

Hitchman, S. (2003b) An Interpretive Study of How Practitioners Use Entity-Relationship Modeling in a Ternary Relationship Situation, Communications of AIS, 11, 451-485

Wand, Y and Weber, R. (2002) Research Commentary: Information Systems and Conceptual Modelling - A Research Agenda Information Systems Research, (13)4363-376.

Steve Hitchman (steve@infomod.fsbusiness.co.uk) has been lecturing and consulting in data modelling issues for over ten years.  Steve is currently managing a team of Data Architects in a government department.  This paper was the result of work carried out during a teaching and research semester at Melbourne University.

© Copyright, 1998-2004 InConcept (Information Conceptual Modeling, Inc.) All Rights Reserved. Privacy Statement. ISSN: 1533-3825