Saturday 28 April 2012

Data Integrity Constraints

A Relational Database Management System must implement Data Integrity in the database in declarative manner (Codd's Rule #10) to ensure the data integrity independence as proposed in the theory of RDBMS. However, though it may be possible to implement the data integrity through a non-declarative means using triggers in the database, the most convenient way of implementing the data integrity is through declaring "Integrity Constraints" on the columns of the table. Implementation of Integrity Constraints is more or less same in all the commercial database systems, however our model database for reference in this post will be Oracle.

Data Integrity Constraints -

Data Integrity Constraints is a declarative means of implementing data validation rules by creating constraint objects on the columns as a tight integration with the definition (read structure) of the table. This way, the integrity constraints are the most reliable and easiest method to prevent entry of the invalid data into the tables. When any of the data involved in a DML operation on the table violates the rule of the constraint then operation is rejected by the database system. 

Mechanism of Data Validation -

An important question to be answered is, when does the system run check for the validity of data against the rule of the constraint? By default the system checks the validity of the data when the DML statement completes the execution. For example, if a delete statement fired at a table is supposed to hit some 10 rows, then those rows shall be first deleted and then system will run the check of validation and if it finds that for any of the rows so deleted would result into violation of the constraint, then the delete operation is implicitly rolled back. This is called "Statement Level Rollback" - is an implicit operation, transparent to the user. Then the database system reports the error in rejection of the operation. It is important to understand that there is something called "statement level rollback" which comes into effect when the statement has to fail (for any reason) and it is the right (prerogative) of the system an no explicit "commit" or "rollback" could supersede it. (Actually this is the reason for triggers are not allowed to contain "commit" or "rollback" statements. important question asked in interviews, attention job seekers!!!).

The foregoing mechanism of constraint checking is called as "immediate" constraint. All the constraints created by default, are immediate. However optionally you may create a constraint which may defer the validation of the data until the user fires the "commit" statement. Such constraints are called "deferrable". However deferrable constraints also behave like "immediate" by default but may be deferred by using "alter session" statement to defer them. Databases like Oracle also provide facility to create them deferred with "initially deferred" keyword. (See the syntax section for more actions). Deferred state of constraint ensures transaction level integrity of data as the validation of data is performed at the end of the transaction (when user says "commit"). If any data which violates the constraint is observed at the end of the transaction i.e. while committing, then the entire transaction is rolled back.


Types of Integrity Constraints -

Integrity constraints come in following 5 types -

Primary Key constraint is used on column or a combination of columns to designate a primary key on a table. You may (obviously) define only one primary key on a table and it ensures that the column(s) under primary key constraint must always receive a value (not null) and a unique value. It actually implements "entity integrity" and it is an important part of parent/child relationship between the tables. For example an employee id or a student's university enrollment number.

Unique constraint, as the name suggests, allows only unique values if specified, however it accepts null. Unique constraint is used for such columns in the table which must contain unique values (but they can not become primary key, which is only one in any table). For example employee's access card number.

Referential constraint, also called as Foreign Key, requires that each row of a table has a value in such column that matches to a value in the "parent key", if the value is present or it may remain null. The "parent key", also called as "referenced key", is usually a "primary key", though it may be possible for a foreign key column to refer to some "unique" constrained column. For example, a department id column in the employees' table refers to department id column in the departments' table, and since an employee must belong to an existing department, his department id must match with one of the values in the department id column of the department table or an employee may not have a department id at all (null) reflecting that the employee is yet to be allotted a department. The referential constraint implements the "referential integrity".

Check constraint allows you to specify a condition to be true for the data if specified or be there no data (allowed null) in the constrained column for every row in the table. There may be multiple check constraints defined on a single column, as such you have to take care that the conditions in the check constraints on the same column do not conflict or contradict each other. This is because, database systems do not have any mechanism to verify if the conditions specified in the different check constraints on the same column are or not mutually exclusive. Moreover there is no specific order to the evaluation of those multiple check constraints on the same column. Database systems like Oracle allow to use built-in functions in the check constraint with some limitations. Check constraints are actually "domain integrity" part of implementation of data integrity. For example you may allow only Male, Female or 'Rather Not Say' (or null) values for a gender column in a table of customers.

Not Null constraint restricts that the value for the column must be provided. Not Null constraint is actually implemented as check constraint with a built-in condition "<column name> is not null" implicitly when a column is declared as Not Null.


Referential Constraint DML restrictions -

Referential Constraint puts following restrictions on the DML performed on the existing rows in the Parent and Child tables -
  • Parent row(s) which has dependent child rows is(are) not allowed to be deleted by default.
  • Parent row(s) which has dependent child rows is(are) allowed to be deleted if the referential constraint is created with "on delete cascade" option whereby the corresponding child rows are also deleted along with the parent row(s).
  • Parent row(s) which has dependent child rows is(are) allowed to be deleted if the referential constraint is created with "on delete set null" option whereby the corresponding foreign key column shall be set to null when their referenced parent row(s) is(are) deleted.
  • Parent key value can not be updated if it has been referenced by a child row(s).
  • Foreign key column value(s) may be updated to a null or any other value provided the new value exists as parent key value.
  • Child row(s) may be deleted without any restriction.

Constraint Indexes -

Unique and Primary Key constraints generate unique indexes in the background on the columns which have these constraints declared. These indexes are necessary for implementation of uniqueness, otherwise when an insert or update operation on the constrained column is performed then the database system will have to verify the uniqueness of the new value by scanning the entire table, which will be very inefficient. They are plain b*tree indexes. No other constraints create any index on the constrained columns automatically. (this is one of very popular questions asked in interview, attention job seekers!!!)
  
After this theoretical background on the integrity constraints , we shall see the actual creation and administration of the constraints.  Click Here to continue.
 

1 comment: