Friday 16 March 2012

Relational Database Model - Codd's Rules - Part 1

The Background -
Dr. Edgar Frank Codd (1923-2003) introduced the "Relational Model" for database management systems in 1970 in a research paper entitled "A Relational Data Model of Data for Large Shared Data Banks" while working at IBM Research Laboratory, San Jose, California. This model uses the concept of a "Mathematical Relation" which is nothing but a table of values with certain specific constraints (Refer posts - Database Design - Normalization ... Part I, II, III) as its building blocks. The concept has its theoretical basis in set theory and first-order predicate logic. However, those "Rules" sometimes referred to as "Codd's 12 Commandments" were proposed by him in 1985 to articulately define his vision of the relational database, which he feared was being diluted by many vendors in early 1980s to repackage their existing products with a "relational" veneer.

There are actually 13 rules numbered 0 - 12. However, in practice, rule 0 is ignored (from the count) as it only insists on exclusive use of "relational facility" to manage the database. It is considered as fundamental to the database system to qualify as being called as "Relational".

It is said that NONE of the popular commercial database systems today available in the market are ideally "Relational" going by the rules of Codd and more formal definitions derived subsequently by other authorities in the subject. The reasons for this being are some controversial rules (like Rule #3) and some debatable issues like Three Valued Logic. Also some rules are difficult to implement if not impossible.

The simplified rules and their explanation in brief follows -

Codd's Rule #1 (The Information Rule) -
Data is "presented" in Tables. A set of related tables forms the database and all the data is represented as tables. There is no other way than the tables, in which data can be placed or viewed.


Codd's Rule #2 (Guaranteed Access Rule) -
Data is "logically" accessible. A relational database does not refer to the data by position or physical location. For example, you can not say, in a relational database like 5th record and at character position 24-29 is the designation of an employee XYZ. Each of data must be logically accessible by referencing a "table", a column (or some smallest combination of columns) with unique value designated as "primary key", and column(s) (of which data is to be known). For example you may know the job of an employee Peter with empid 1234 from the table named emp. 

Codd's Rule #3 (Systematic Treatment of "NULL" values) -
NULL s are treated uniformly as unknown values. "NULL" means the value that is simply not present or not available or may not be applicable. It is not allowed to be represented as "0" (zero) or '' (an empty string) for numeric or string values respectively, because this substitution does not represent the fact that the value is not present. Imagine, does it not look strange substituting 0 for the age of an employee whose information of age is currently not available? The NULL is independent of datatype. However NULL may cause problem of comparison (NULL can not be compared) and it propagates through arithmetic expression. For example comparison "if x = NULL then" is illegal and x + y will result in NULL if x is NULL and y holds 10. Actually in comparison like statements for NULL we write "if x is NULL then" or "if x is not NULL then". For dealing with the NULL values in expressions, we need to substitute an appropriate value in place of NULL (for example 0 in addition and 1 in multiplication) by using some function. (for example nvl(V,R) where V is the value; if NULL is replaced with R).


Codd's Rule #4 (Dynamic on-line catalog based on relational model) -
Database is self describing. This pertains to how should the database keep the data about itself - the Metadata. Metadata is the information about the structure of the tables, their relationships, ownership, allocation of privileges etc. Dr. Codd insists to store this information into tables (also called as catalog or data dictionary). This data must also be accessible to the appropriately authorized users using the same query language as the users use for accessing their owned tables.


Codd's Rule #5 (Comprehensive data sub-language rule) -
Use of single language to communicate with the database. Dr. Codd insisted on use of a "Non Procedural" or "Declarative" language for accessing the database; such language in which the users will only provide (declare) what they want and not how to get it. Apart from the basic query operations the language must also support Data Definition Operations (DDL), Data Manipulation Operations (DML), security and integrity constraints (DCL), Transaction Management Operations (commit or rollback), etc.


Codd's Rule #6 (The view updating rule) -
Provide alternative way for viewing the data. All the data is stored in tables. However, a relational database must not be limited to source tables when presenting the data to users. So we may be able to create a "view" on the data (technically it is a query statement stored in the database with a name and can be accessed in the same way as the table but in the background the data presented by the view is actually has been fetched from the table(s)). So views are termed as "virtual tables" or abstractions of the source tables. For to be able to show the current state of the data, the rule says - All views that are theoretically updateable must be updateable by the system.


With this we have covered the first 6 of the Codd's Rules in this part. Click Here for the definitions and explanation on codd's rules 7 through 12.

No comments:

Post a Comment