Sunday 11 March 2012

Database Design - Normalization & Normal Forms - Part II

You are advised to visit Part I of the same heading!

With a brief introduction to the topic of Database Design, we started with some explanation on very important terminology related to the subject of "Database Design - Normalization and Normal Forms" in the last post with the same heading. There are some more such terminologies to be understood before we actually take to actual design and normalization and hence we shall continue with the same here -

Constraints -
The constraints may be required to be defined in a database schema for the sake of ensuring that the data follow certain rules of integrity (validation). They are broadly classified as Mapping Cardinality, Key Constraints and Participation Constraints.

Mapping Cardinality :
Also called Cardinality Ratio express the number of entities to which another entity can be associated via a relationship set. For a binary relationship (relationship involving two participant entities i.e. relationship with degree 2. visit Part I for explanation.) set R between entity sets A & B, there may be one of the following mapping cardinality -

o   One-to-One – An entity in A may be associated with at the most one entity in B and same is for an entity in B for an entity in A. (example BS student and branch he/she opts)
o   One-to-Many – An entity in A is associated with any number (zero or more) of entities in B. However an entity in B can be associated with at most one entity in set A. (example department and lecturer)
o   Many-to-One – An entity in A is associated with at most one entity in B, however an entity in B can be associated with any number (zero or more) of entities in set A (just the reverse of above).
o   Many-to-Many – An entity in A is associated with any number(zero or more) of entities in B, and an entity in B is associated with any number(zero or more) of entities in A.(example actors and movies)
  
Key Constraints :
Real world has natural/obvious ways of distinguishing entities, but from the database perspective distinction among them must be expressed in terms of their values of their attributes. No two entities in an entity set are allowed to have exactly the same values for all attributes.

The concept of key allows us to identify a set of attributes that is enough to distinguish entities from each other in entity sets. Keys also help uniquely identify relationships and thus distinguish relationships from each other. Following are the few "key" definitions -
  • Superkey - Set of one or more attributes, that taken collectively, allow us to identify an entity in an entity set. (Example empid, empname, empjob, entrypassnumber etc. taken collectively or together).
  • Candidate Key - Superkey for which no proper subset (set containing any less attributes than in consideration) is a superkey is called as candidate key (For example empid or entrypassnumber are candidate keys in the above example).
  • Primary Key - One of the chosen candidate keys to identify an entity uniquely in the entity set is called as Primary Key. The choice is driven mainly by the candidate key value of which may never be required to change and is never a null is more eligible to become the primary key. (For example "empid" may become primary key whereas "entrypassnumber" may not, though also unique but may change if new entry pass may be required to be issued in case of loss and also may temporarily have null value until the new one is issued).
Participation Constraints :
The participation of an entity set E in a relationship set R is said to be total if every entity in E participates in at least one relationship in R. If only some entities in E participate in relationship R then it is called partial. For example an employee joining a company must belong to some one department of the company (Total). However we may not have same constraint applicable for the department as there may be a department in company (an empty department) which may not yet have any employee (Partial)). 
Weak / Strong Entity Sets -
A weak entity is the one which may NOT have sufficient attributes to form a primary key.
A strong entity is the one which has a primary key.
Consider a case of customer buying a consumer durable appliance on hire purchase (installment of payment). Such payment details may have natural attributes like instalment_number, instalment_amount and instalment_date. None of these attributes or combination of them may become primary key, because may customers may have same combination of these three numbers. For example two customers Peter and John might both might have paid 3rd installment of amount $100 on December 15' 2011.

Now obviously to identify these installments we need to associate them with the customers who paid them. So a weak entity set must be associated with an identifying or Owner entity set. In such case the weak entity set is said to be existence dependent on the identifying entity set.

The relationship associating the weak entity set with identifying entity set is called identifying relationship. The identifying relationship is typically "many-to-one" from the weak entity set to the identifying entity set, and the participation of the weak entity set in the relationship is total.
However it is possible to form a primary key in such entity by combining discriminator and primary key of owner entity set. The attribute(s) in the weak entity that allows a distinction to be made that helps to relate / identify those entities in the weak entity set that depend on one particular strong entity is called as discriminator. Like "instalment_number" in the ongoing example is the "discriminator" (which combined with customer_id will become primary key of set of installment entities. The discriminator of a weak entity is typically the partial key of the entity set.

In some cases (and when the commercial Database Systems allow) the database designer may choose to express a weak entity set as a multivalued composite attribute of the owner entity set. (like a nested table or varray type as allowed in Oracle 8i onwards. Here we may use one composite attribute say "instalment_details" with instalment_number, instalment_amount and instalment_date as elements of this composit attribute. But this is convenient when weak entity has only small number of attributes.)
Now probably we are well equipped with the terminologies and concepts and we are now ready with the launch pad. In whatever ways we try, but the process of Normalization though intuitive has to some extent involve some very scientific approach and hence a bit of technical definitions. However I will try my level best to balance the next part of the post with the "same heading - Part III" exclusively dedicated to "Relational Database Design" for readers with academic interest and those with heuristic approach. Coming soon the most interesting post. Thanks for your patience. 

P.S. - Those with any query or doubts or would like to just add some more info may do so in the comments area. I would also appeal you to link these posts with your FB walls, Google+ , twitter or LinkedIn so the knowledge reaches to everyone and I see there are many who may not have an independent blog but would like to share their experience and knowledge about databases with us.

1 comment: