October
2001 Issue: 22
Journal of Conceptual Modeling
www.inconcept.com/jcm
Entropy of a Data Model
by
Sakti
Prasanna Bhattacharya
Introduction
In this short paper we are trying to establish the definition of Entropy of a database system and thereby determine a quantitative figure to judge the efficiency of a data model. In a data model, the basic building blocks are:
Columns/attributes: The smallest amount of information, which can not be broken down.
Referential Relationship: Ultimately mapped as columns.
Tables: logically related set of information
The main objective of data modeling is to arrange the smallest information (attributes) into logical sets so the storage and retrieval is efficient. Any system has a natural tendency to disintegrate and thereby increase the degree of disorder in that system. This is the entropy of the system. In this paper we are trying to establish a definition of entropy of a database table and relate it with the performance measure of a data model.
First Cut Entropy Definition of a Table
To formulate the mathematical definition of the entropy of a table present in a RDBMS, we have to draw some analogy from material science. Let us consider that the attributes are analogous to atomic particles and the set of attributes constitutes the atom. Each instance (record) of the Table (entity) represents a different state of the atom. This makes a table analogous to an atom with the exception that the number of states can be unlimited. The instance or the state is uniquely identified by the value of the primary key.
Let us consider a table (entity) E with n attributes. Let p be the number of attributes constituting the primary key. Let us define a parameter called degree of the entity as m = n-p. Let W be the total number of records present in that table at that point in time. Then let us first define entropy as En = Ln W
Refined Entropy Definition of a Table
Let us try to validate
the above definition with an example.
Example 1:
There is a table with 8 rows. The entropy is Ln 8 = 2.079. Equally partition the
table into 2 tables having 4 rows in each based on the range of the value of
the primary key. The total entropy becomes Ln (4 + 4 ) = 2.079 meaning no
change in performance.
Example 2: Let us assume that the above table contains one column called sex which is not the part of the primary key with allowable values as M & F. if we partition on the basis of sex, we normally experience an improved performance even if our so called sum total entropy increases. Therefore our definition is not correct and the degree of an entity (proportional to number of non-key attribute) has some influence on the definition of entropy. We take the modified definition as En = Ln W* m
We now take the above definition and apply it to the M & F partition where each has one column (sex) less than the original table.
Mathematically, the sum total entropy = Ln ( W1m1 + W2m2 )
Where W1 + W2 = W and m1 + 1 = m also m2 + 1 = m. We see that the entropy decreases meaning improved performance.
Validity of the definition
With the above definition in mind let us investigate some quantitative parameter in a Database. The rate of change of entropy with respect to state (proportional to database size) dEn/dW = 1/W
The above expression shows that the rate of change of entropy is inversely proportional to the number of states only and does not depend on the degree of the Table.
The above signifies that
the rate of performance degradation during the life cycle as the database
grows will some what remain same what ever be the initial data model. This
is exactly what we see in real life. If the growth rate is x% then what ever
be the data model the performance degradation will be proportional to x%
under the same environment.
It is clear from the
above definition of entropy that the
tables with more number of rows
should have less number of columns and vices versa to keep the entropy of
the database at minimum.
![]()
Sakti Prasanna Bhattacharya
Manager Projects -Cognizant Technology
Solutions Pvt. Limited
Pekon Building, Saltlake
Electronics Complex, Calcutta – 700 091
India, Phone : (+91) 33 – 3573211/
3573212
E-Mail : sakti@cal.cognizant.com
![]()
© Copyright, 1998-2004 InConcept
(Information Conceptual Modeling, Inc.) All
Rights Reserved. Privacy Statement.
ISSN: 1533-3825