Tuesday 6 March 2012

What is RDBMS? - Part I

Need for non-procedural databases was felt around 1960 for providing an alternative to  programming language interfaces to be replaced with high-level (statement like) query language. Thus they separated the physical storage of the data from the conceptual representation and application. This separation was meant to provide flexibility to develop new queries quickly and to reorganize the database as requirements changed.

At the very foundation of these non-procedural databases was the idea of organizing the database in the shape of "special tables" called Relations; and hence the name Relational DataBase Management System (RDBMS). It was Dr. Edgar Frank Codd (1923-2003), who for the first time, published the concept of RDBMS in 1970 in a research paper entitled "A Relational Data Model of Data for Large Shared Data Banks" while working at IBM Research Laboratory, San Jose, California. The Relational Database model has a firm base in the mathematical theory of relational algebra. But how many of you readers are ready to be spooked by the arithmetic details of this concept? Perhaps nobody! The case was not very much different at IBM in 1970. IBM did not much consider to pay attention to this model however until Codd showed the customers of IBM the potential of the implementation of this model and they in turn pressured IBM, and that while commercial rivals of IBM had already started implementing it. So no mathmatical jargon? Right! That's guaranteed.

Data in Tables -
Did we say that in RDBMS the data is maintained in "special tables" called Relations? But nevertheless there are tables. "Table" is an obvious way of storing the data as even known to most non-technical of us.
(Dr. Codd does not seem to have done much new at this stage! Right? But then who did? - Newton or Archemedes also only theorised obvious experiences of the common people through ages. Dude, Scientists just don't see, they observe the same things as we do and think about them differently.)
So we create a table here for our example of students studying different languages in a computer institute -
Student idSubjectFeesPhone
1C3002345
2ORACLE15001546
3LINUX10008910
2PHP12001546
4ORACLE15004672
Now I am going to ask you play Codd, Newton or Archemedes - Observe (not just see) this table and think about it on following lines -
  1. How do you identify each row uniquely (different from all others)?
  2. Is there any problem you think faced with update (change) of any values?
  3. Is there any problem you think faced with deletion of any row from the table?
  4. Can you identify any problem with inserting a row in this table with this given structure?
Remember that insertion, deletion and selection are the row operations i.e. you can perform these operations only in terms of rows, whereas update may be column operation where you may change the values pertaining to a column.

If there is a problem associated with any operation then that is identified as an anomaly of the same type as the operation. So those quetions above are aimed at if you can locate some anomalies in the above table with respect to the four operations of - select, update, insert and delete. (and you don't have to consider these operations being done using a computer or any particular computer language as the theory of database is not related to any computerization; remember the databases have always been around there, computers existed or not.)

I am sure you have identified the answers to those questions by now -
  1. You require to consider student id and Subject (in combination) to identify each row uniquely.
  2. If you want to update the fees of subject Oracle then it has to be updated at number of places as the number of students opting for the subject Oracle. (Same is the case with if student id 2 changes his phone number). [update anomaly]
  3. If the student id 1 wants to leave the class then if I delete this row I loose the information about the  subject C that its cost is 300. [delete anomaly]
  4. Why we can't see the subject VB and its fees? Since there is no student joining for this subject VB. [insert anomaly]
But then here are tougher questions on these answers to keep you playing Codd -
  1. What is the reason behind all those anomalies?
  2. Whether a student joins because a subject is taught or a subject is taught because a student joins for it? Which fact comes first - Institute teaches a particular subject or a student joins for a particular subject?
  3. What is the solution / how to reorganize / structure the table so that it will eliminate those anomalies?
  4. What is the basis for the restructuring of the table?
Keep thinking and post your answers in comments and bear with me till the next part of this heading. Coming shortly.

4 comments:

  1. Replies
    1. Thank you so much Tushar for your encouraging comments. Keep visiting for new topics on same line and post them on your facebook, twitter, linkedin, Google+ or social bookmark group those you like for others to share this knowledge.

      Delete
  2. Great way to explain all the anomalies...

    ReplyDelete