Saturday, 10 March 2012

Database Design - Normalization & Normal Forms - Part I

You are advised to visit posts "What is RDBMS" Part I & II as the following text may contain some references to them!

Introduction -
The example that we considered in earlier posts "What is RDBMS" part I and II have been very simple, just enough to explain the basic concepts behind the term Relation. However there are some more conditions to be imposed on a table for it to become a "relation" and we did not encounter all the real world circumstances and shapes we may find our "universal relation" in. Moreover, expected use of the database influence a variety of design choices at physical, logical and view levels.

Database designing is considered as one of the important phases of the database based applications. The database designers must interact with the users of the database application to understand the need of application and represent them in a high-level fashion that can be understood by the users and then translate the requirements into lower levels of the design. A high level data model, particularly E-R model, helps to provide a conceptual framework to represent data requirements of the users and database structure that fulfills these requirements. 

Database Design Phases -
  •  Initial phase of the database design is to characterise fully the data needs of the prospective database users. The outcome of this phase is a specification of user requirements.
  • Next, the designer chooses a data model and, by applying the concepts of the chosen data model, translates these requirements into a conceptual "schema" (structure or layout) of the database. The "E-R model" is typically used to represent conceptual design. Conceptual design phase results in the creation of an entity-relationship diagram that provides graphical representation of the schema.
  • Next, specification of functional requirements evolve from the conceptual model. Users describe the kinds of operations (say transactions) that will be performed on the data. The access patterns are identified and the designer can review the schema to ensure that it meets those functional requirements.
  • Last is the implementation of the abstract data model drawn so far which proceeds in two final design sub-phases –
    • The logical design phase typically consists of mapping the conceptual schema defined using the E-R model into a relational schema. (Assuming that the relational data model is typically used in the implementation.)
    • In the final Physical design phase the physical features like files organization, internal storage structures etc. are evolved.
Database Design Pitfalls -
  • Redundancy – Unnecessary repetition of the information is called as redundancy. Ideally information should appear in exactly one place. 
  • Incompleteness – Restriction of the database design to not allow insertion of the parent fact for non-availability of the child fact may give rise to incompleteness. Parent fact registration in the database should be independent of the availability of the child fact. For example not able to insert the information of the subject being taught in an institute for no students opting for the subject offered. (from the example of post "What is RDBMS? Part I").
E-R Model -
The entity relationship data model facilitates database design by allowing specification of an enterprise schema that represents the overall logical structure of a database. The E-R model is very useful in mapping the meanings and interactions of real world enterprises onto a conceptual schema.

E-R model deploys three basic notions : entity sets, relationship sets, and attributes.

Entity Sets : 
An entity is a concrete “thing” or “object” ( example a person in company) or logical concept (example a loan from the bank) in the real world that is distinguishable from all the other objects.

An entity set is a set of entities of the same type that share the same properties or attributes.

The individual entities that constitute a set are said to be the extension of the entity set. For example "Customer" is an entity set while "every individual bank customer" is an extension of the entity set customer.

An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each member of an entity set.

Database stores similar information concerning each entity in the entity set; however each entity may have its own value for each attribute.

A database includes a collection of entity sets, each of which contains any number of entities of the same type.

Relationship Sets :
A relationship is an association of one entity with another entity or entities; as entities never happen to be isolated in real world.

For example, we use a relationship borrower to denote the association between an entity "bank customer" and entity "loan". (a bank customer "borrows" a loan).

The association between entity sets is referred to as Participation. A Relationship Instance in an E-R schema represents an association between the named entities of the real world.

For example, the individual "customer" entity Peter identified by 1234 and "loan" entity L-15 "Participate" in a "relationship instance" or "borrower". (Peter has taken loan detailed under L-15).

The function that an entity plays in a relationship is called as entity's Role. "Roles" need to be specified generally as the entities participating in a relation are distinct. However, in the case of Recursive Relationships, where same entity set participates twice then explicit role names may become necessary. For example in case of employees' table an employee may be reporting to an executive (who is also an employee obviously); in such case the role may described as pair (worker, manager).

A "relationship" may have optionally descriptive attributes. For example "Borrow Date" for identifying an instance of relationship "borrower".

The number of entity sets that participate in a relationship set is called degree of the relationship set. A "binary relationship" is of degree 2 (usual); a "ternary relationship" set is of degree 3 (for example customer, loan, guarantor).

Attributes :
An attribute of an entity set is a functions that maps from the entity set into domain. For example "customer name" attribute maps from entity set "customer" into a domain of strings containing only alphabets (name may not have any other characters than alphabets). Each entity cab be described by a set of (attribute, data value) pairs.

Attributes in the E-R model can be characterized by the following attribute types -
o   Simple Attributes – Simple attribute are those which can not be divided into subparts. (example "salary" of an "employee").
o   Composite Attributes – Those attributes which may be divided into sub-parts. (example "address"). A composite attribute sometimes may appear as a hierarchy. (example "street_address" attribute of composite attribute "address" may be further divided into "street_name", "plot_number", "apartment_name", "flat_number" etc.)
o   Single Valued Attribute – Simply may have single value. (for example "salary" may also be example for single valued attribute along with example of simple attribute).
o   Multi-valued Attribute – may have multiple values (example "phone_number" may hold cell number as well as land line residence number)
o   Derived Attributes – The value of the attribute may be derived from a related attribute. (example "age" which may be derived from attribute "date_of_birth"). The attribute from where the value of derived attribute is arrived is called as Base attribute. However because of the dynamic nature of the derived attribute, it is preferred to calculate it as an when required than storing it.
So a lot of stuff that has been here guys. I know you would need some time to soak yourself. But of course will be back shortly with next part of this same heading. Thanks for your patience!