August 2005        Issue: 36

Journal of Conceptual Modeling
www.inconcept.com/jcm

Logical Navigation in the Concept-Oriented Data Model
By Alexandr Savinov

The paper describes logical navigation in the concept-oriented data model. This model explicitly and formally separates physical structure and logical structure so that each element of the model is simultaneously a collection and a combination of other elements. The physical structure is used to representing and access by elements by means of references. The logical structure is used to reflect the problem domain dependencies. The two-level model considered in the paper consists of a set of concepts and a set of items. Concept structure defines the model syntax while item structure defines its semantics. In the paper it is shown how the properties of the model can be used for logical navigation where we do not need to specify join conditions or other complicated parameters of queries.

 1.    Introduction

One of the main achievements of the relational model of data proposed in [Codd, 1970; Garcia-Molina et al., 2003] is that it provides a solution of the problem of physical navigation, which existed in the previous models. However, it is recognized [Kent, 1981] that it fails to solve the problem of logical navigation. This means that the user still has to specify concrete access paths in order to get a correct result set. In other words, even though a data item is not physically bound to some access path we still need to encode a logical access path in each query which describes how this item is reached. In SQL query language this means the necessity to have long and complex join conditions to be a part of each query. Such join statements specify all intermediate tables and the corresponding join conditions. In contrast, if the mechanism of logical navigation is incorporated into a data model then we need only to specify constraints for the source data items and indicate the type of target items possibly with some additional hints in the case of complex data structure or ambiguous queries. All the rest is then done by the database itself. In particular, it will propagate the constraints and build intermediate relationships between data items.

For example, assume that we have a data model consisting of 100 data types. Then we might issue a query like "retrieve all 'Students' related to 'Prof. Smith' via course 'Data models'". Here the database should understand that we are talking about Students, Instructors and Courses. It also understands that only two items have to be considered ('Prof. Smith' from Instructors and 'Data models' from Courses) and these constraints need to be appropriately propagated. After that the database has to identify all intermediate types that relate the source types with the target type. And then it has to automatically build all necessary join conditions. Finally it builds the access path which will retrieve the requested data items. If we were using SQL then we would have to do all this ourselves with all possible types of errors.

This problem of automating logical navigation is directly considered in the universal relation model (URM) [Fagin et al., 1982; Kent, 1981; Maier et al., 1984]. The idea of this model is that it is possible to introduce the notion of the universal relation which will contain the complete model semantics. Once we have such a universal representation (implicitly or explicitly) we can answer all queries without the need to specify explicitly all numerous complex join conditions. In this approach it is assumed that all attribute names have unique names with global meaning and for each set of attributes there is one basic relationship that the user has in mind (called the connection). Although the URM solves some problems its assumptions and their consequences are frequently not satisfied even for relatively simple and useful database configurations.

Another solution for the problem of logical navigation is provided by the Microsoft WinFS system which is based on the Object-Role Modelling (ORM) approach [Halpin, 1999].

Items in WinFS are related via facts which are relationship instances. Once data items have been connected the system answer simple queries by finding related data items.

In this paper we describe how the problem of logical navigation is solved in the concept-oriented model (COM) [Savinov, 2004]. This model makes several assumptions of the nature of data and its semantics. One principle similar to that postulated in URM is that the whole model is viewed as one global construct with concrete canonical syntax and semantics. In other words, we do not consider parts of the model (like tables) as primary constructs which then can be used to build different new constructs. Instead, we see the model as one global construct with some constituents.

We also assume that an important characteristic of any system or model is its dimensionality (degrees of freedom). In the concept-oriented model we go further. It is not only multidimensional but also hierarchical. This means that the model as whole is characterized by its primitive or canonical dimensionality as well as by the rank, which is the maximal depth of dimension hierarchy. Using this approach we can express the canonical semantics of the model in terms of points in the universe of discourse where each point is represented by some data item. In the sense of hierarchical dimensionality modelling this approach is similar to OLAP [Berson & Smith, 1997].

In order to describe such a multidimensional and hierarchical space we order all concepts by means of subconcept-superconcept relation. In this case each concept has a number of superconcepts (identified by the dimension names) and a number of subconcepts. The directed acyclic graph of concepts is complemented by the top concepts and the bottom concept, which represent the most abstract and the most detailed levels of the model. This method is similar to that used in concept-lattices and formal concept analysis (FCA) [Ganter & Wille, 1999] and ontologies [Fensel, 2004].

Once the syntactic structure of the model has been defined it can be used to automate query formation and data access. In other words, if all the concepts in the model are connected and the semantics is defined by their items we can use this structure to easily get related items. No additional information like join criteria, manual projection and constraint propagation is needed. For logical navigation in the concept-oriented model we propose the mechanism of dimensions and subdimensions (inverse dimensions). Dimensions are always single-valued and lead to superconcepts in the graph. Subdimensions are always multiple-valued and lead in the opposite direction to subconcepts. By applying them consecutively we can build an access path. Such an approach is close to that used in the functional data model (FDM) [Shipman, 1981; Gray et al, 1999; Gray et al, 2004]. Access paths and multidimensional queries can be defined as derived properties of concepts and then used just like normal dimensions.

In order to demonstrate the mechanism of logical navigation we will use an example shown in Fig. 1. It is assumed that there exist a number of users who may create auctions and make bids. Each auction is created at some date by some user and for some concrete product from some category. Each auction bid has some price, is made by some user at some date for some auction.

Fig. 1. At the logical level the problem domain is represented by a directed acyclic graph of concepts where each concept has a number of superconcepts and a number of subconcepts.

2.    Physical Structure of the Model

Each element in the concept-oriented model is a physical composition of other elements. By physical composition we mean that the element directly stores other elements (by value or by reference) and knows its own composition while the stored elements may not know directly where they are stored and what other elements use them. We separate two dual types of element composition: collections and combinations.

Collection is a number of elements interpreted as a set. We denote a collection by enclosing its elements into curly brackets: C = {c1,c2,K,cn}. This type of collection is called a physical collection in order to distinguish it from logical collections described below. Elements within a collection are identified by means of its references.

It is important that physical collections in the concept-oriented model are used to provide physical access to all elements and this is why they have a hierarchical structure. This structure is then used to implement references which uniquely identify elements within collections. A complex reference is a sequence of segments where each segment identifies an element within its physical collection. An element cannot change its parent physical collection during its life time, i.e., it is created, manipulated and deleted within one parent physical collection.

We distinguish several types of model depending on their physical structure:

          One-level model consists of one root physical collection which contains a set of
data items. This model is interesting only from theoretical point of view.

          Two-level model consists of one root element which physically includes a set of
concepts which in turn include a set of items. This model will be considered in this
paper.

          Multi-level model have an arbitrary hierarchy which starts from one root element
and then may physically include any number of internal elements. This model is
considered as the basis for the concept-oriented mark-up language which is not
considered in this paper.

Two-level concept-oriented data model has a physical structure consisting of one root element R, which consists of a set of concepts R = {C1,C2,K,Cn}, each of which in turn

consists of a set of data items Cj ={i1,i2,K}.

A physical structure of the two-level model is shown in Fig. 2. It consists of one root element which includes 8 concepts which in turn include data items called also concept instances. Note that data items cannot change their physical position, i.e., if a data item has been created within concept Products, it will belong to this concept until it is deleted. This guarantees that references to this item will guarantee access to it at any time.

One subtle moment here is that the physical structure itself cannot use references because its main function consists in implementing the mechanism of references. Thus the physical structure of the concept-oriented model is supposed to provide a mechanism of representation and access for all other needs. And this is why we cannot use this mechanism for representation of the problem domain laws.

Fig. 2. Physical structure of the problem domain for the two-level model. Each concept is physically included into the root element and each item is physically included into one concept.

3.    Logical Structure of the Model

In addition to physical collection we consider also combinations as a method of composition of other elements. Combination is a number of elements interpreted as an object. We denote a combination by enclosing its elements into angle brackets: O = (ouo2,...,on). If an element belongs to a combination then we write it as follows:

ele C, i = 1,2,K,n . Elements within a combination are identified by means of their positions.

Positions are known in advance and are used for accessing and manipulating data in queries or programs but they cannot be stored in data. References are not manipulated explicitly but they are stored as the contents. (This has some exceptions such as well known references for bootstrapping purposes.) Shortly, positions are elements of code (of program, query or other type of code), while references are elements of data (of objects, records or other types of data).

It is one of the main original assumptions of the concept-oriented paradigm that any element has two parts:

          it is a physical collection of other elements (collectional part), and

          it is a combination of other elements (combinational part).

Thus it can be represented as a pair of collection and combination:

E = {cl,c2,...,cm}(ol,o2,...,on). One or both parts can be empty. Normally we will follow a

convention that elements with non-empty collectional part will be written from the uppercase letter while elements with non-empty combinational part will be written from the lowercase letter.

It is important that in contrast to the physical collections the combinational part of any element can be changed during its life time. Thus the definition of any element as a combination of other element may change if we change the model. Moreover, most syntactical and semantic properties of the COM are determined by the combinational part of its elements while collectional part is used to implement the mechanism of physical representation and access. The properties of the model defined by combinational part of its elements are called the logical structure because we can easily change these properties independent of the hierarchical physical structure. Note that an element does not know directly what elements are included into its logical collection (their references are not stored in this element description).

Fig. 1 shows logical structure of the two-level model where concepts are boxes with arrows pointing to the concepts in their combinational part. For example, element Auctions is a combination of three other elements of this model Users, Dates and Products although physically all the four elements are physically included into one common root element. Later we can change this combination, e.g., by defining Auctions as a combination of four other elements from the model and hence we change the logical structure of the model while at physical level the structure does not change because all these elements are members of the root.

From the point of view of logical structure each element is considered as a combination of some other elements (and this composition may change at any time). It is an important principle of the the concept-oriented model that it provides a dual interpretation of the combinations as logical collections and logical membership in collections (as opposed to physical collections and physical membership). If element C is a member of combination O = (...,C,...) then O is a member of the logical collection C: C = {K,O,K}. If we draw collection C and combination O as two points then the two types

of membership can be shown as two opposite arrows: Oe C and Ce O, where the former means membership in a logical collection, and the latter means inclusion into acombination. We will follow a convention where combinations are written below their elements and collections are drawn above their elements.

For example, element AuctionBids in Fig. 1 is a combination of elements Prices, Users, Dates and Auctions. Dually, element Users is a logical collection of elements Auctions and AuctionBids. Thus incoming arrows from all elements below the current element denote elements from the logical collection. All outgoing arrows to all elements above the current element denote elements from the combination. Data items have the same structure as concepts except that they physically live in concepts rather than in the model root. In particular, an item is defined as a combination of other items but dually it is a collection of items where it participates in combinations. For example, a user item is a logical collection of several auction items and several auction bid items because both auctions and auction bids include user in their combinational part.

4.    Syntax and Semantics of the Model

Syntax of the two-level model is described by the logical structure of its concepts while semantics is described by the logical structure of its items. In other words, at syntactic level we define the combinational part for concepts while at the semantic level we define the combinational part of items.

A concept is combination of other concepts from this model:

C = (C1,C2,...,Cn)

Concepts C1,C2,K,Cn are called superconcepts of rank 1. Concept C is called subconcept

of rank 1. For example, Categories is a superconcept for concept Products, and Products is a subconcept for concept Categories.

The concept definition establishes a subconcept-superconcept relation between concepts:

Cj §C , j = 1,2, K,n

Cycles in this relation are not permitted. (In practice we can easily permit loops with special assumptions.)

If a concept does not have superconcept then it is assumed to be a top concept T. If a concept does not have a subconcept then it is assumed to be a bottom concept B. Direct subconcepts of the top concept are called primitive concepts. Top concept is a direct or indirect superconcept for all other concepts, i.e., all concepts are logically included into the top concept collection. Bottom concept is direct or indirect subconcept for all other concepts. For example, in Fig. 1 there are four primitive concepts Prices, Users, Dates and Categories; and AuctionBids is the bottom concept.

If one of two concepts is a direct or indirect parent of another then they are called syntactically dependent or parallel concepts; otherwise the two concepts are referred to as syntactically independent or orthogonal. In particular, all primitive concepts are syntactically independent.

Each superconcept has a uniquely identified position within a concept definition, which is called dimension (of rank 1 or local dimension):

C = {xi : C1,x2 : C2,K,xn : Cn)

Here superconcepts C1,C2,K,Cn are called domains for dimensions x1,x2,K,xn:

Cj =Dom(xj). All dimensions x1,x2,K,xnare unique and are used for access while some

concepts within the combination (C1,C2,...,Cn) can be the same, i.e., dimensions may

have one and the same domain. Normally dimensions (concept positions within a combination) are identified by names or integer values. For example, concept AuctionBids has four dimensions price, user, date, and auction which identify upward arrows leading to the corresponding domain superconcepts.

A dimension of rank k is a sequence of k dimensions of rank 1 separated by dots where each next dimension in the sequence belongs to the domain of the previous one

x1 .x2.L.xk, where xy§Dom(xy l), j = 1,2,K,n

Dimensions will be frequently prefixed by the first concept corresponding to the first dimension in the sequence:

Dom(x°).x1.x2.L.xk

Also we frequently will use the terms dimension and domain interchangeably.

Primitive dimension has primitive domain (of the last element in the sequence). The number of all primitive dimensions of a concept is referred to as the concept primitive dimensionality. The primitive dimensionality of the model is that of the bottom concept. The maximal rank of a primitive dimension of a concept is referred to as the concept rank. The rank of the model is that of the bottom concept. Thus each model is characterized by its dimensionality (width of multidimensional space) and its rank (depth of the hierarchy).

For example, concept AuctionBids has a dimensions AuctionBids.auction.product.category which has rank 3.This dimension is primitive because its domain Categories is a primitive concept. This model has dimensionality 6 because the bottom concept AuctionBids has 6 primitive dimensions. It has rank 3 (for comparison, the flat multidimensional space has rank 1 because it consists of primitive concepts and one common subconcept with all primitive dimensions).

Dually, each concept is a logical collection of its subconcepts:

C={S1,S2,K,Sn

Here C is a superconcept of rank 1 and S1,S2,K,Sn are subconcepts or rank 1. For

example, concept Dates is a logical collection of Auctions and AuctionBids (which are syntactically dependent or parallel concepts).

A subdimension is an inversed dimension, i.e., a dimension with the opposite direction. We denote subdimensions by enclosing the corresponding dimension into curly brackets. If x1.x2.L.xk is a dimension of rank k then {x1.x2.L.xk} is a subdimension of the same

rank k. In contrast to dimensions which identify superconcepts, the role of subdimensions is dual - they identify subconcepts. The domain of subdimension is the very first concept

in the sequence, i.e., if {Dom(x°).x1.x2.L.xk} is a subdimension then Dom(x°) is its

domain. For example, concept Products has one subdimension of rank 1 {Auctions.product} with the domain in Auctions and one subdimension of rank 2 {AuctionBids.auction.product} with the domain in AuctionBids.

The number of subdimensions of a concept with the domain in the bottom concept is referred to as the concept primitive subdimensionality. The primitive subdimensionality of the model is that of the top concept. The model primitive dimensionality is equal to the model primitive subdimensionality because each primitive subdimension is an inversed primitive dimension. For example, the top concept has 6 subdimensions with the domain in AuctionBids.

Such a structure of concepts can be represented by a concept graph where nodes are concepts and edges are instances of the subconcept-superconcept relation Ce Cj

identified by dimensions. Each path in the concept graph consisting of k edges leads from the source concept to a superconcept of rank k. Such a path is interpreted as a dimension of rank k, which is identified by a sequence of k local dimensions. The model dimensionality and subdimensionality is the number of paths from the bottom to the top and from top to the bottom, respectively. There may be several different paths between a concept and a superconcept.

Semantics of the two-level model is described by the structure of its items. In general case without syntactic constraints an item is a combination of other items or nulls if they are allowed Items i1,i2,K,in are called superitems of rank 1. Item i is called subitem of rank 1. Thus each item is a combination of its superitems and a logical collection of its subitems.

In the presence of syntactic constraints each item (physically) belongs to one concept where it is called an instance of this concept. In this case each item can combine only superitems from its superconcepts (rather than any other items):

i = </1,/2,...,/„>, where ieC = (Cl,C2,...,Cn) and ij^Cj

Using syntactic constraints we can effectively restrict possible items of a concept. For example,

So what is an item semantically? In the concept-oriented approach each element is physically characterized by its reference (identifier in the parent physical collection) and logically it is characterized by a combination of its superitems. Thus it is a combination of other items that provides a meaning for an item. In other words, each combination of items in the model has its unique meaning. In this sense, in order to get an item semantics we need to retrieve its superitems, however, these items themselves have their meaning encoded in other items and so on. In general case in the concept-oriented model we follow a principle of globality of semantics which means that each item has its semantics distributed all over the model (all over other items). Hence the question is what part of the whole model semantics we want to retrieve.

5.    Access Paths and Derived Properties

If ieC is an item and x = x1.x2.L.xk one of dimensions of concept C then i.x = i.x1.x2.L.xk returns a superitem of rank k referenced by item i via intermediate items along the path x. For example, dimension i.product.category of concept Auctions might return 'Cars' if auction item i references product 'Porsche' which in turn reference category 'Cars'.

If i is an item from concept C and {x} = {x1.x2.L.xk} one of its subdimensions of rank k

with the domain S then i.{x} = i.{x1.x2.L.xr} returns a collection of subitems of rank k from

the subconcept S where each item from this collection references item i via intermediate items along the path x: i.{x} = {seS\s.x = i}. For example, subdimension

{Auction.product.category} of concept Categories will return collection {1, 2} of auctions for a category of Dogs.

An access path is a sequence of segments P1.P2.L.Pr separated by dots where each

segment Pj is either a dimension or a subdimension. Each next segment is applied to all items returned by the previous segment and the result is their union. If all segments are dimensions then the result is one item. If there is at least one subdimension in the access path then the result is a collection.

An access path consists of upward and downward segments in the concept graph. Upward segment corresponds to a dimension leading to a superconcept while downward segment corresponds to a subdimension leading to a subconcept. Thus access path can change its direction and has a zigzag form. For example, we can fix a product item, then find its auctions, for which find their users and finally find the bids of these users. Such an access path for a product item i is written as a sequence of 3 segments:

i.{Auctions.product}.user.{AuctionBids.user}

A subdimension in an access path may have constraints, which restrict a set of subitems. The constraints are specified by predicate f which returns true or false for each subitem

from the domain of the subdimension:

i.{s : S.x | f(s)} = {seS\s.x = i& f(s) = true}

Note that the predicate f itself describes constraints by using dimensions, subdimensions

or arbitrary access path. For example, in the last access path we might want to restrict auctions selected in the first segment by only the latest ones:

i.{a in Auctions.product | a.date==today}.user.{AuctionBids.user}

This access path starts from some product item, then selects today's auctions for this product, then finds their users for which returns all their auction bids. To restrict auctions we introduced an instance variable a, which takes values from concept Auctions. It is important that these restrictions on selected auctions do not influence the concept Auctions so that the next segments can still use this concept with all its items. (In the next section it is shown how we can restrict items in the concept itself, which leads also to automatic constraint propagation.)

The mechanism of access path is a convenience method which provides a simple mechanism for accessing data semantics (more general query method is described in the next section). It is especially useful for defining derived properties of concepts. A derived property is a named definition of a query formulated for a concept constrained by one item when executed. For example, we might define a derived property as follows:

C = (xl: C1,x2 : C2,K,xn : Cn,prop = this.x\.a.b.{S.u.v}.c.d)

Here we define a concept with n normal dimensions and one derived property. This property starts from the first dimension of the current item (denoted by keyword this) and returns a set of items according to the access path a.b.{S.u.v}.c.d. In this way we can

reformulate the earlier considered query as a concept property:

Products.property1 = {Auctions.product}.user.{AuctionBids.user}

This property can be used like any other dimension or subdimension and it will return bids of users of auctions with given product. It is very convenient because we can directly apply other properties to this property or to use aggregation functions. For example, this property can be used to define the second property:

Products.property2 = avg(this.property1.price)

It returns mean price of all bids returned by the first property. In the next example, we would like to get all bids for an auction:

Auctions.bids = {AuctionBids.auction};

This simple property returns a collection of bids and it can be applied to any instance variable taking its values from a collection. For example, we might define a property that returns the maximal bid for an auction (or null):

Auctions.maxBid = max(this.bids.price);

Here max is an aggregation function applied to a collection of prices for bids returned by already defined property. This property can be used to get an auction with the current maximal bid price:

{a in Auctions | a == this.maxBids);

Another example is where we want to get the mean price for each category. This property can be defined as follows:

Category.meanPrice =

avg( this.{AuctionBids.auction.product.category}.price );

This property finds a collection of all bids for this category using a subdimension of rank 3 and returns their prices. This collection of prices is then passed to the aggregation function as a parameter. We can modify this query so that it returns the mean price for today:

Category.meanPriceForTenDays = avg( {ab in AuctionBids |

ab.auction.product.category == this && ab.auction.date == today }.price );

Now let us suppose that we want to find all categories corresponding to one user. In this case we select one user and then this constraint is automatically propagated down till the bottom concept. This means that the database includes now only subitems of the selected user in concepts Auctions and AuctionBids. After that it is necessary to select a collection of categories. However, in this example the path cannot be chosenautomatically because there are two alternatives: either to return categories corresponding to auctions (of this user) or to return categories corresponding to auction bids (of this user). Thus the query

Users.categories = {c in Categories};

is ambiguous and cannot be resolved. In order to provide a hint to the database we can specify the necessary constraint propagation path explicitly:

Users.auctionCategories = {c in Categories | Auctions.user == this }; Users.bidCategories = {c in Categories | AuctionBids.user == this };

Here it is clear that we want to propagate the constraints till the concept Auctions sin the first query and till the concept AuctionBids in the second query.

If it is necessary to find mean prices of user categories then we can simply combine the properties:

Users.auctionCategoriesMeanPrice = Users.auctionCategories.meanPrice; Users.bidCategoriesMeanPrice = Users.bidCategories.meanPrice;

These two queries will return a collection of prices rather than a collection of categories.

The same approach can be used to impose semantic constraints on any concept. The idea is that using the derived properties we specify a predicate that must evaluate to true for each data item. For example, if we want to prevent users from making bids for their own auctions then it could be formulated as the following constraint:

AuctionBids.myConstraint = (this.user != this.auction.user);

Constraints can be thought of as normal derived properties that return logical values rather than items or collections of items. The only difference is that they are checked automatically in order to maintain consistency.

As we already mentioned the mechanism of access paths is s simple and efficient method for defining queries and derived properties. Each next collection along the path is built from the previous collection. For logical navigation this mechanism is especially convenient because it does not require any joins. Instead of manually joining this mechanism used the syntactic structure of the model, i.e., the multidimensional and hierarchical structure of concepts in order to build a result collection. This structure is also used to automatic constraint propagation when we can impose constraints in one concept and then retrieve related items from another concept. All the rest is done by the underlying concept-oriented database engine.

6.    Multidimensional Queries

The mechanism of access paths builds new collections by selecting items from another collection, which acts as a one-dimensional universe of discourse for the new collection. In general case we frequently want to build a new collection by selecting items from a multidimensional universe of discourse. In this case the universe of discourse to be used is described by specifying all the source collections. The Cartesian product of these source collections is then considered the universe of discourse for a new collection produced by a query.

Multidimensional query has the following format:

C = {c1 eChc2 eC2,...,cn eCn |f(c1,c2,K,cn)}

Here in the first part before the bar we describe a set of source concepts and their instance variables, which define the universe of discourse of this query n = C1 xC2x...xCn

where each possible item is a combination of individual items ω = <c1,c2,...,cn)eQ . In the

second part after the bar symbol we specify constraints imposed on the items from the universe of discourse as a predicate f. The result collection includes items from the universe of discourse for which the predicate is true: C = {»eQ|/(») = true}. It is

important that the new collection is a subconcept for all the source concepts/collections or, dually, the source concepts are superconcepts for the new collection. In particular, the source collections have to be built and exist before the items for the result collection can be chosen (before instance variables can be instantiated).

For example, if we want to get all combinations of dates and categories then we specify concepts Dates and Categories as source collections of the query:

{d in Dates, c in Categories}

Elements of collection are represented by means of their own references, which are unique for each new collection. The items from the result collection then provide access to the original items. However, frequently we need to include some additional information as new dimension values for each item in the collection. This can be done by specifying them after the multidimensional query in angle brackets:

C = {c1 eCx,c2 eC2,...,cn eCn | f(cx,c2,...,cn)}(vx(cx,c2,. ..,<?„),...)

Here v1(c1,c2,K,cn) is a function that returns a single item given instance variables

defined in the query body. Note that this function itself may include complex multidimensional queries and access paths applied to instance variables.

For example, if for each combination of one date and one category we want to store also category mean price then we specify it as a property (defined in the previous section) in angle brackets:

{d in Dates, c in Categories}<c.meanPrice>

It is important to understand that each use of curly brackets evaluates to an absolutely new collection, which is a new subconcept to the source concepts. We can assign this new collection to a new variable or it can be used anonymously. For example,

C = {c1 eCx,c2 eC2,...,cn eCn |L} defines a new collection with its reference stored in

variable C. This new collection is a subconcept to the source collections C1,C2,K,Cn,

which are superconcepts. One important principle for building new collections is that all the source collections have to be already defined and constructed before the new collection (subconcept) can be built.

The source collections can be themselves specified via independent queries rather than using references to existing concepts. For example, we might want to restrict a set of items in some source concepts and then we need to write it as a nested query:

C = {c1 e{Cx |f1},c2 e{C2 |f2},K,cn e{Cn \fn}\f(cl,c2,...,cn)}{vl(cl,c2,...,cn),...)

Here we did not show instance variables for nested collections, which might be written as follows:

Thus nested collections are normal collections just like their external collection where they are used. It is important that each nested source collection is evaluated before its external collection.

For example, if we want to use only some interval of dates then we can describe it as follows:

{d in {d in Dates | d > '02.02.2002'}, c in Categories}

Nested collections can also be used in the predicate part of the query but here they play a different role than the source nested collections. These collections are evaluated in the context of their external queries (rather than before the external query for source nested collections). In particular, the predicate nested collections can use instance variables from their external contexts. Thus for each collection its instance variables defined for source collections are visible from all internal collections in the predicate part. For example, in some internal collection we might impose constraints by using its own instance variables, its external collection instance variables and even instance variables from the parent query context

For example, let us suppose that we need to compute an average bid price for each combination of existing dates and categories. For each combination of dates and categories it is necessary to get a (nested) collection of all bids for which we can compute an average bid price. Such a query can be written as follows:

{d in Dates, c in Categories

NestedCollection = {ab in AuctionBids |

ab.date == d && ab.auction.product.category == c

}.price }<avg(NestedCollection)>;

Here for convenience we assigned the nested collection to the internal variable. Then this internal variable computed for each combination of date and category is used in the output part of the query as a parameter for the aggregation function.

As we already mentioned each new result collection is different from each of its source collections even if we use only one source collection. This new collection is a new subconcept which has its own independent set of items with their own references. For example, a collection {ieC\i.pmp = 5} which selects items with the specified property has nothing to do with the source collection C. Its items will have different references and it will a subconcept to C. In particular, it is important that C will still have the same original set of items in it whenever we use it again somewhere in the query.

Sometimes however we want to restrict the number of items in some concept or new collection so that these changes are visible to all internal queries. For example, if we know that our analysis deals with only an internal of dates then it is more convenient to restrict the set of dates in the very beginning in order to avoid specifying this constraint each time when we use dates. Such a mechanism is even more important where we want these constraints to be propagated automatically over the model and imposed onto other concepts.

One approach to this problem consists in redefining one of existing concepts so that the new definition is visible for all internal query contexts (but not from external contexts). Such a redefinition can be done explicitly when we assign a new constrained collection to the same existing variable (concept name). For example, if we want to deal only with today's transactions then in the beginning of any query we write:

Dates = {d : Dates | d == today}

After that the concept Dates will be redefined and consist of less items than the original source concept. Moreover, this constraint will be automatically propagated down to all the subconcepts. This means that the subconcepts will include only items with today's date assigned to them directly or indirectly. In particular, in all internal queries we will see only today's auctions and auction bids.

There could be other ways to restrict visible items but the general idea is still the same and consists in redefining an existing concept rather than using a new concept. For example, if in the body of query we define a new collection based on some source concept and then do not use it anywhere then this should be interpreted as imposing constraints on the source concept. For example, the query

{Auctions | {d : Dates | d == today} }

will return all today's auctions because the internal query is interpreted as a constraint for the concept Dates which is propagated down and restricts items from concept Auctions.

In general case concept-oriented query involves the following constituents:

Prolog. Before query starts executing and after its body we frequently need to perform some actions. In prolog we normal want to prepare the source concepts/collections. In these section we can create and use intermediate variables that will be visible from internal contexts. As usual these variables may have either a type of collection or a type of data item.

Universe of discourse. In this section we specify a set of existing source concepts/collections along with their instance variables. For example, we can use collections defined in the prolog section or the source collections can be taken from the external context (like preexisting concepts). Their Cartesian product is the universe of discourse. Each point of the universe of discourse is referenced by the keyword 'this' in the query body.

Query body. This section involves operations in the context of one point from the universe of discourse. In other words, all operations from this section assume that 'this' variable as well as all instance variables of the source collections are assigned some concrete data items.

Before. This section is part of the query body and it is executed for each point from the universe of discourse. Normally it is needed if we want to prepare some intermediate results to be used in the next section.

If. This section is part of the query body and it is where we evaluate if the current point has to be included into the result collection returned from the query. If this condition is evaluated to true then the current point will be included into the result collection.

Then. This section is part of the query body and it is executed after the point is decided to be included into the result collection. For example, here we might compute some return variables for return section.

Else. This section is part of the query body and it is executed after the point is decided to be not included into the result collection.

After. This section is executed after the evaluation of the current point and before we iterate to the next point from the universe of discourse.

Return. Here we specify variable to be returned as new dimensions of the result collection.

Epilog. In this section we execute operations before the result collection is returned. After this section all intermediate variables will be destroyed.

8.    Conclusion

In the paper we described the concept-oriented data model and how it can be used for logical navigation. The whole model and each element can be viewed from two sides: physical and logical. From physical point of view each element is a collection of other elements. This structure is hierarchical and is used to implement references. From logical point of view each element is a combination of other elements in the model. The logical organisation has a dual interpretation where each element is a logical collection of other elements. The logical structure is used to represent the problem domain dependencies. In the two-level model considered in the paper the root element physically contains a set of concepts which in turn contain their sets of data items. Concepts and items are defined as combinations of other concepts and items, respectively. Such a logical structure of concepts describes the model syntax while the logical structure of items defines the model semantics.

In order to access data in such a model we described a mechanism of access path and derived properties. An access path consists of a sequence of segments where each segment is either a dimension or a subdimension (inverse dimension). Such a mechanism provides very easy to use means for navigating in the model using its syntactic structure of dimensions. This approach is especially useful if access paths are used to define derived properties of concepts which are considered normal dimensions. In general case the model provides a mechanism of multidimensional queries with automatic constraint propagation.

Taking into account the described characteristics the concept-oriented model it is a good choice for applying in very different domains for modelling a wide range of practical situations and use cases.

9.    References

[Berson & Smith, 1997]  A. Berson and S.J. Smith, Data warehousing, data mining, and OLAP, New York, McGraw-Hill, 1997.

[Codd, 1970]  E. F. Codd, A relational model of data for large shared data banks. Communications of the ACM 13 (6), 377-387, 1970.

[Fagin et al., 1982]  Ronald Fagin, Alberto O. Mendelzon, Jeffrey D. Ullman, A Simplified Universal Relation Assumption and Its Properties. ACM Trans. Database Syst. 7(3): 343-360, 1982.

[Fensel, 2004]   Dieter Fensel, Ontologies: a silver bullet for knowledge management and electronic commerce. 2004.

[Ganter & Wille, 1999]   Bernhard Ganter, Rudolf Wille, Formal Concept Analysis: Mathematical Foundations, Springer, 1999.

[Garcia-Molina et al., 2003]  Hector Garcia-Molina, Jeff Ullman, Jennifer Widom, Database Systems: the Complete Book, Prentice Hall, 2003.

[Gray et al, 1999] P.M.D. Gray, P.J.H. King, L. Kerschberg (eds.), Functional Approach to Intelligent Information Systems (Special issue). Journal of Intelligent Information Systems 12, 107-111 (1999).

[Gray et al, 2004]  P. M. D. Gray, L. Kerschberg, P. King, and A. Poulovassilis (Eds.), The Functional Approach to Data Management: Modeling, Analyzing, and Integrating Heterogeneous Data, Heidelberg, Germany: Springer-Verlag, 2004.

[Halpin, 1999] T.A. Halpin, Entity Relationship modeling from ORM perspective. Journal of Conceptual Modeling (www.inconcept.com/jcm), 11, 1999.

[Kent, 1981]  W. Kent, Consequences of assuming a universal relation, ACM Trans. Database Syst., vol. 6, no. 4, pp. 539-556, 1981.

[Maier et al., 1984]   D. Maier, J. D. Ullman, and M. Y. Vardi. On the foundation of the universal relation model. ACM Trans. on Database System (TODS), 9(2):283-308, 1984.

[Savinov, 2004]  A. Savinov, Principles of the Concept-Oriented Data Model, Research Report, Institute of Mathematics and Informatics, 2004, 54pp. http://conceptoriented.com/savinov/publicat/imi-report'04.pdf

[Shipman, 1981]  D.W. Shipman, The Functional Data Model and the Data Language DAPLEX. ACM Transactions on Database Systems, 6(1), 140-173, 1981.

------------------------------------------------------------------------------------------------------------------------

Alexandr Savinov is a researcher at the Fraunhofer Institute for Autonomous Intelligent Systems, Germany. His research interests include very different aspects of data and knowledge representation/manipulation. In particular, he is interested in expert systems, data mining and data modelling. He has published more than 60 papers. Currently his research efforts are aimed at developing the next generation concept-oriented paradigm that includes such branches as the concept-oriented data model and the concept-oriented programming. More about the author and this new approach can be found at http://conceptoriented.com.