You are requested to visit "Database Design .... Part II" for continuity with this post.
In the last two posts with the same heading we did all the ground work required to understand the concept and technique of Normalization and Normal Forms. This post may contain some bookish / hi-tech definitions - but no worries. The readers with heuristic approach may skip the hi-tech stuff and they still can understand normalization and normal forms from the examples.
Relational Database Design - The goal of a relational database design is to generate a set of relation schema that allows us to store information without redundancy and perform the various operations on the data without anomalies. This is achieved by designing the schema in appropriate Normal Form.
Decomposition - It is a process to divide appropriately a "universal relation schema" with all database attributes into several relations each with fewer attributes to achieve a targeted Normal Form. A "BAD" decomposition procedure is one which may give rise to Lossy Decomposition or Lossy-join Decomposition. Ideally the decomposition should have following important properties -
In the last two posts with the same heading we did all the ground work required to understand the concept and technique of Normalization and Normal Forms. This post may contain some bookish / hi-tech definitions - but no worries. The readers with heuristic approach may skip the hi-tech stuff and they still can understand normalization and normal forms from the examples.
Relational Database Design - The goal of a relational database design is to generate a set of relation schema that allows us to store information without redundancy and perform the various operations on the data without anomalies. This is achieved by designing the schema in appropriate Normal Form.
Decomposition - It is a process to divide appropriately a "universal relation schema" with all database attributes into several relations each with fewer attributes to achieve a targeted Normal Form. A "BAD" decomposition procedure is one which may give rise to Lossy Decomposition or Lossy-join Decomposition. Ideally the decomposition should have following important properties -
- Lossless-join decomposition
- Dependency Preservation
- No Repetition of Information
Functional dependency concept is at the center of the definition of "Normal Forms" or process of "Normalization". Functional dependency is a constraint on set of legal relation.
Definition
– A Functional Dependency, denoted by X → Y,
between two sets of attributes X and Y
that are subsets of R specifies a constraint on the possible tuples (read rows) that can
form a relation state r of R. The constraint is that, for any two tuples t1 and
t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y].
This means that the values of the Y
component or a tuple in r depend on, or are determined by, the value of the
component X; alternatively, the values of X component of a tuple uniquely (or functionally) determine the values of
the Y component.
(This was only a definition in mathematical format of the concept of dependency we discussed in "what is RDBMS - Part II" with an example of "account number" and "balance". This and some such stuff is for readers with scientific approach and academic interest)
Closure of a set of Functional Dependencies -
There may be several functional dependencies that may be observed in a relation. For example a relation of students S = {Stid, Sub, Fees, Phone, Attendance}.
Here Stid
→Phone Attendance Sub and Sub →Fees are
the dependencies which are semantically obvious dependencies. But also there is
dependency Stid →Fees which is inferred (since Stid → Sub and Sub → Fees, by property of transitivity Stid → Fees).
Let F
be a set of functional dependencies. The
Closure of F (indicated by F+) is the set of all functional dependencies
logically implied by F.
Closure of Attribute Sets - Given a set α of attributes of R and a set of functional dependencies F, we need a way to find out all the number of attributes of R that are functionally determined by α. This set of attributes is called as closure of attributes α under F. It is denoted by α+. Finding α+ is useful because –
Closure of Attribute Sets - Given a set α of attributes of R and a set of functional dependencies F, we need a way to find out all the number of attributes of R that are functionally determined by α. This set of attributes is called as closure of attributes α under F. It is denoted by α+. Finding α+ is useful because –
- If α+ = R then α is superkey for R.
- If we find α+ for all α R then we have computed F+
Normalization of Relations -
Normalization of data can be considered as
a process of analyzing the given "universal relation schema" based on their FDs
(Functional Dependencies) and primary keys to achieve the desirable properties
of –
- Eliminating redundancy.
- Eliminating the insertion, deletion and update anomalies.
First
Normal Form (1NF) –
A relation is in the First Normal Form (1NF) if the value of any attribute in the tuple (read row)
must be single value (atomic) from
the domain of that attribute. (in the following example however the "skills" column contains multiple skills.)
Empid | Ename | Qualification | Skills |
---|---|---|---|
101 | Peter | BS | Java,VB,Oracle |
.... |
Empid | Ename | Qualification | Skills |
---|---|---|---|
101 | Peter | BS | Java |
101 | Peter | BS | VB |
101 | Peter | BS | Oracle |
.... |
Empid | Ename | Qualification | Skill1 | Skill2 | Skill3 | Skill4 |
---|---|---|---|---|---|---|
101 | Peter | BS | Java | VB | Oracle | |
.... |
Employee
Empid | Ename | Qualification |
---|---|---|
101 | Peter | BS |
.... |
Skills
Empid | Skill |
---|---|
101 | Java |
101 | VB |
101 | Oracle |
102 | PHP |
102 | .... |
.... |
Second Normal Form (2NF) – A relation is said to be in the Second Normal Form (2NF) if it is in the "first normal form" and then it does not show any "partial dependency" and hence the chances of a relation violating "2NF" is only when there is "composite primary key" (primary key in a relation is combination of 2 or more columns).
In the relation that follows :
This relation has been decomposed into three relations to achieve the "2NF". Observe here that there is at least one relation that remains with the original composite Primary Key (stid + subject). In this case this relation is "Attendance".
Student
Subject
Attendance
Third Normal Form (3NF) – A relation is said to be in Third Normal Form (3NF) if it is already in the "1NF" and "2NF" and further does not have a non-key attribute functionally dependent on another non-key attribute, that is, there should be no "transitive dependency" of a non-key attribute on primary key.
In the following relation observe that the non-key attribute "location" is decided by (dependent upon) "deptid" which is a non-key attribute. The primary key is "empid". So here is the case of "transitive dependnecy". There is redundancy and also all the transaction anomalies -
And here below the same relation has been decomposed into two relations "Employee" and "Department" where "location" and "dname" has been moved to relation "Deparment" (and thus eliminating the "transitive dependency" -
Employee
Department
Boyce-Codd Normal Form (BCNF) – A relation is said to be in Boyce-Codd Normal Form (BCNF) if it is already in the "3NF" and further does not have overlapping composite key. A relation in "BCNF" always satisfy "3NF" but NOT every "3NF" is also "BCNF". For Example the following relation is in "3NF" (does not show any transitive or partial dependency).
In the following example we assume that a teacher may teach only one subject but one subject may be taught by 2 or more teachers and only a particular teacher teaches that subject to a particular student. Then combination (stid + subject) will decide the teacher and teacher decides the subject. The combination (stid + subject) is primary key, Teacher is (tends to be) candidate key, and potentially (stid + teacher) also tends to become the primary key. [however we don't accept the combination (stid + teacher) as primary key as it will give rise to partial dependency - subject depends on teacher and hence it is rejected]
The solution will be, to split the relation into two – {studentid, teacher} and {teacher,subject. In this example we have given more importance to "losslessness" and "non-additive decomposition" while ignoring the property of "preserving functional dependency" in the decomposed relations. (The discussion on the decomposition properties coming up shortly in next posts)
Multivalued Dependency & Fourth Normal Form (4NF)-
Multivalued dependencies (MVD) are a consequence of when we have two non-key multivalued attributes. For example consider the following relation -
Now for this relation to be in "1NF" -
Now this relation decomposed to "4NF" -
EmpSkills -
EmpLang -
EmpSpouse -
In here we have learned the technique of Normalization. There are however some properties of a "good" decomposition remain to be covered. Let us meet again in the next blog shortly.
In the relation that follows :
- Primary key is composite (stid + subject)
- Partial dependency is observed - "Fees" depends on "Subject" and "Phone" depends on "stid".
- Attendance has functional dependency - on composite primary key (stid + subject).
stid | subject | fees | phone | attendance |
---|---|---|---|---|
101 | C | 1000 | 12345 | 20 |
102 | VB | 1200 | 76589 | 25 |
101 | Oracle | 2000 | 12345 | 30 |
.... |
This relation has been decomposed into three relations to achieve the "2NF". Observe here that there is at least one relation that remains with the original composite Primary Key (stid + subject). In this case this relation is "Attendance".
Student
stid | phone |
---|---|
101 | 12345 |
102 | 76589 |
103 | ..... |
.... |
Subject
subject | fees |
---|---|
C | 1000 |
VB | 1200 |
Oracle | 2000 |
PHP | .... |
.... |
Attendance
stid | subject | Attendance |
---|---|---|
101 | C | 20 |
101 | Oracle | 30 |
102 | VB | 25 |
103 | .... | .... |
.... |
Third Normal Form (3NF) – A relation is said to be in Third Normal Form (3NF) if it is already in the "1NF" and "2NF" and further does not have a non-key attribute functionally dependent on another non-key attribute, that is, there should be no "transitive dependency" of a non-key attribute on primary key.
In the following relation observe that the non-key attribute "location" is decided by (dependent upon) "deptid" which is a non-key attribute. The primary key is "empid". So here is the case of "transitive dependnecy". There is redundancy and also all the transaction anomalies -
Empid | Ename | job | salary | deptid | dname | location |
---|---|---|---|---|---|---|
101 | Peter | Manager | 4000 | 10 | Accounting | New York |
102 | Sarah | Analyst | 3000 | 20 | Research | Boston |
103 | Albert | Clerk | 2000 | 10 | Accounting | New York |
.... |
And here below the same relation has been decomposed into two relations "Employee" and "Department" where "location" and "dname" has been moved to relation "Deparment" (and thus eliminating the "transitive dependency" -
Employee
empid | ename | job | salary | deptid |
---|---|---|---|---|
101 | Peter | Manager | 4000 | 10 |
102 | Sarah | Analyst | 3000 | 20 |
103 | Albert | Clerk | 2000 | 10 |
.... |
Department
empid | ename | job |
---|---|---|
10 | Accounting | New York |
20 | Research | Boston |
30 | Sales | Dallas |
.... |
Boyce-Codd Normal Form (BCNF) – A relation is said to be in Boyce-Codd Normal Form (BCNF) if it is already in the "3NF" and further does not have overlapping composite key. A relation in "BCNF" always satisfy "3NF" but NOT every "3NF" is also "BCNF". For Example the following relation is in "3NF" (does not show any transitive or partial dependency).
In the following example we assume that a teacher may teach only one subject but one subject may be taught by 2 or more teachers and only a particular teacher teaches that subject to a particular student. Then combination (stid + subject) will decide the teacher and teacher decides the subject. The combination (stid + subject) is primary key, Teacher is (tends to be) candidate key, and potentially (stid + teacher) also tends to become the primary key. [however we don't accept the combination (stid + teacher) as primary key as it will give rise to partial dependency - subject depends on teacher and hence it is rejected]
stid | subject | teacher |
---|---|---|
101 | RDBMS | Peter |
102 | OS | Albert |
103 | RDBMS | Sarah |
101 | Micro Processor | George |
104 | RDBMS | Peter |
.... |
The solution will be, to split the relation into two – {studentid, teacher} and {teacher,subject. In this example we have given more importance to "losslessness" and "non-additive decomposition" while ignoring the property of "preserving functional dependency" in the decomposed relations. (The discussion on the decomposition properties coming up shortly in next posts)
Multivalued Dependency & Fourth Normal Form (4NF)-
Multivalued dependencies (MVD) are a consequence of when we have two non-key multivalued attributes. For example consider the following relation -
empid | skills | langspeak | spousename |
---|---|---|---|
101 | VB,Oracle | French,English | Sarah |
.... |
Now for this relation to be in "1NF" -
empid | skills | langspeak | spousename |
---|---|---|---|
101 | VB | French | Sarah |
101 | VB | English | Sarah |
101 | Oracle | French | Sarah |
101 | Oracle | English | Sarah |
.... |
Now this relation decomposed to "4NF" -
EmpSkills -
empid | skills |
---|---|
101 | VB |
101 | Oracle |
.... |
EmpLang -
empid | Langspeak |
---|---|
101 | French |
101 | English |
.... |
EmpSpouse -
empid | SpouseName |
---|---|
101 | Sarah |
.... |
In here we have learned the technique of Normalization. There are however some properties of a "good" decomposition remain to be covered. Let us meet again in the next blog shortly.
STOP spamming usenet.
ReplyDeleteWhy anonymous? Appears to be some disgruntled soul. Carry on crazy4db your mission to reach those thousands who need this knowledge or how could people like me could know about this wonderful post?
ReplyDeleteThanks Raj for your encouraging comments. Keep visiting; some more posts in time to come.
DeleteIf you would look at the responses your spam generates, perhaps you would figure out that you are actually being stupid. The only disgruntlement I have is you are spamming. If you would bother to read the responses to your spam, you might understand that everyone does not think your spam is wonderful. I make no judgements about the accuracy or relevancy of your blog, simply your spam. You should read the charters of the usenet groups you post to, before you post.
ReplyDeleteIf you would actually respond to comments made on usenet, you'd see a lot of people who are not anonymous. You might also note that people are confusing you with someone else who is known to spout misinformation. People were happy when your blog 404'd. They are now unhappy that it is back. That is what happens when you spam.
You must understand what normalization mean and simplify to easy words. Not get everything from book or class notes.
ReplyDeleteNormalization is reduce complexity and remove duplicate categories.
Basically, the first normalization is simple : no duplicate groups or columns in table. That is one to many relationship
the second normalization is simple too : every table should have primary key, and that is many to one relationship. you must separate dynamic data from static data.
the third normalization is simple also :no transitive dependencies.
Normalization join command cause so many problems and slow down search speed. Such Normalization theory Only for schooling use, not for real world. What real world use ? Industrial use Data House concept and Normalization concept combine for their data modeling project
Thanks for your views "Direct Forward Concept". Actually frankly speaking I am not originally from IT engineering background but with a degree in Electrical Engg that I passed in the days when even calculators were not allowed to be used(in 1980). I have studied the database technology as part of my passion which I picked at very advanced stage of my career and did not obtain any formal training in DBMS. I have always believed that the authoritative knowledge comes from books written by esteemed authors be it electrical, databases or any technology and so I studied databases technology following some books and manuals and all by myself. The subject I am trying to put here is in the same way as I have understood it. There is no claim that I am an authority in the subject and must humbly agree that I am still a student and will be so for entire of my life. There may be errors of fact, of understanding and technical too. But one thing is for sure that following only those concepts which I am putting here, I got the success as an Oracle DBA and have been working so for last about 17-18 years. Practice is not much different from theory, rather theory only is projected in practice or put straight, practice is implementation of theory. Only thing is; theory is ideology whereas its implementation has some limitations. And since I am not originator of RDBMS theory, there is absolutely no shame in accepting that I have got everything from books but trying to combine with examples which I came across in real life and what I did in such situation and probably show how it is supported in theory. This blog I have created to share views of everybody, but I will appreciate if the views are about technology rather than about the person or his/her personality. Technology is independent of person who would write here and the contents may be according to his/her own experience. Anybody may contradict based on logical reasoning and examples or quotes from authorities, books and manuals. Let this forum be a discussion forum for some technical concepts rather than battlefield of persons claiming/confronting superiority or level of knowledge. Thanks for your views and hope you would keep on visiting the blog and share your experience with the readers.
DeleteI Like your Article. It really excellent…
ReplyDeleteWe all developer share the real time experience, but sometimes we must have to see the structural definition behind it. As we don’t have enough time from our daily jobs we are unable to see the real structural configuration.
After long time I really find it is interesting. When I read it and compare with my daily works, it’s really exciting.
Good man keeps it up.