Thursday, 8 March 2012

What is RDBMS? - Part II

You are advised to visit Part I of this same heading as the following text may contain some references to it!

There were a few questions left to the readers at the bottom of the part I of this same heading and we answer them here and try to understand the concept of "Relation" (special tables) in the Relational DBMS. The order of the answers may differ from those of the questions.

Reorganisation / Restructuring of the Table -
Actually the table in Part I (containing the columns Student id, Subject, Fees and Phone) is technically called as "Universal Relation Schema" which is nothing but single table (call it relation for some time) that includes all the attributes of the database. But as we have seen such relation presents us with a set of anomalies, thus we have problems performing the data operations with it, and so you if you have guessed correctly we are going to split this table into two to see those anomalies are gone (to some extent).
Table : Student
Student IdPhoneSubject
12345C
21546ORACLE
38910LINUX
21546PHP
44672ORACLE
Table : Subject
SubjectFees
C300
ORACLE1500
LINUX1000
PHP1200
VB2000
What we have done here (by splitting the original "universal relation") is technically called as Relational Decomposition. 

Why was this Decomposition (restructuring) required -
In the original table (universal relation) we had combination of Student id and Subject as a means for unique identification of each row. Any such smallest combination of columns in a table which decides uniqueness of the rows in it is called as "Primary Key" and the remaining columns in the table as "Non Keys".

Now why in this decomposition you did not put "Fees" column in Student table and "Phone" column in Subject table? What a ridiculous question you said. Right? Because Fees is charged for a subject and Phone number belongs to a student. There you are! You have already explored an extremely important concept - there are columns in a table which show some kind of relationship that their values can be deduced by knowing their associations with some other columns. You deduce the value of fees by knowing the subject and the phone number by knowing the value of Student id. And incidentally the columns "Subject" and "Student id" form the part of "Primary key".

This relationship is typically called as Functional Dependency. This is a mathematical term and certainly has scientific derivation, proof and expression. I know my promise. No, no mathematics and hi-tech jargon. So a simple analogy in order - you go to a bank and want to inquire about your account number from your balance. You immediately become a laughing stock. Why? You will be promptly told they can tell your balance if you know your account number but not the other way round. There is an important aspect to this - because your account number is "unique" not the balance. So, balance may be determined uniquely (and articulately) from your account number is mathematically phrased as - Balance is function of account number. Hence the term "functional dependency". And also note that as such account number is primary key.


An important rule in structuring of the table in RDBMS says that the tables only must show "Functional Dependency" i.e. all the non key attributes must and must only depend (uniquely determinable) on Primary key. Then such a table will be called as pure "Relation". Of course there is much more to just this definition but we are close.


But why were those anomalies present in Universal Relation? -
There we had combination (student id + subject) as Primary key and non-keys fees and phone. The non-key Fees depends on Subject (only part of primary key) whereas phone depends on Student id (again only part of primary key). This culprit dependency is called Partial Dependency. It was the source for all the anomalies.


Delete Anomaly - A subject is taught for a fees and student joins for a particular subject are the two facts. But subject is taught is the fact which must exist first before students wished to join for a subject. A fact which must exist first we shall call as "Parent" and of course the one dependent on it as "Child". Now when student 1 wanted to leave our class (and he was the only student for subject C) and hence we wanted to delete the row (recall deletion is row operation?) we were actually wanting to delete a child fact but in the process loosing the parent which is absurd and hence "Delete Anomaly".


Insert Anomaly - Just think other way round of delete anomaly. We could not have recorded information about VB because we did not have any student wishing to take a course in VB. Again equally absurd that we could not enter a parent fact (an independent fact) for non existence of child fact and hence "Insert Anomaly".


Update Anomaly - If there are multiple students for a subject, like Oracle, then fact Oracle costs 1500 repeats for equal number of times as the students. Don't you think this to be unnecessary and if you want to change (update) the fees for Oracle it will have be changed at equal number of multiple places and hence "Update Anomaly". But wait. The unnecessary repetition of a fact is called as Redundancy.  

So you might think the source of Delete Anomaly, Insert Anomaly and Update Anomaly (redundancy) is the Partial Dependency. Actually yes, that is true in this case, but wait.


What if we implement the rule in our institute that one student may learn only one subject? Then our universal relation in the last blog would not have the row of 2 joining PHP as 2 is already studying Oracle. So now the "Universal Relation" will be like -

Student idSubjectFeesPhone
1C3002345
2ORACLE15001546
3LINUX10008910
4ORACLE15004672

Now the Primary Key is "Student Id". The columns Subject, Fees and Phone are all non-keys. But does the problems (Anomalies) seem to have gone? No way. Surprised?

Observe, what the Fees depends on? Subject! Right? So Fees (a non-key attribute) depends on Subject (another non-key attribute), an unscrupulous dependency indeed. This is called as Transitive Dependency. The name is derived from a very logically simple concept from arithmetic called property of transitivity - if a = b and b = c then a = c. In the same way Fees tends to depend on Student id through its dependence on Subject. So "Transitive Dependency" is an equal culprit as "Partial Dependency".

Incidentally, what you have done here is Relational Decomposition by synthesizing dependencies and of course Normalization. Wasn't it so simple? yes indeed. But that's not all about Relational Decomposition and Normalization. Visit again for further discussion on them. Coming shortly. 

2 comments:

  1. Very nice information.
    Please keep up the good work

    -Adi

    ReplyDelete
  2. Almost got a clear idea about all the dependency and anomalies as well.
    Revision is quick and easy just because of your blog sir.
    Thanks.

    ReplyDelete