Cool Database Fundas: May 2012

Saturday, 26 May 2012

Procedure vs. Function

A database (back end) programming language like PL/SQL in Oracle, has been provided to push the processing of the data to the server end where the actual data resides. This helps reduce network traffic otherwise involved in taking the data from database to front end, if the processing is done in the application in lack of back end language. The two main types of programs used in the database for supporting the data handling on clues / calls from front end are Procedures & Functions. They are source of confusion while making choice (procedure or function) to solve a problem, because in most problem cases, both may be written with equal ease, and it is the question of appropriateness to be addressed rather than correctness. Beginners are also observed to be failing to answer the question of difference and/or similarity between the two confidently and correctly in the interviews. Hence this post. Here we shall be taking the cases with PL/SQL, however other database systems also have their own back end language sharing same syntax as PL/SQL, so this should help them too.

About the PL/SQL Procedure -

A PL/SQL procedure, like procedures in any other language, are workhorse i.e. they are supposed to perform some "action" or "operation" (Query / DML). There may be some outcome of this action, in the form of one or more values, which might be required to be sent to the calling program of the procedure; which it does by assigning those values to the parameters (OUT or IN OUT mode parameters) passed to it by the calling program. However there is no compulsion of doing so on the procedures i.e. a procedure may not require to send any value to the calling program (no outcome of the action). As such, technologically the procedure does not "return" a value. The return statement in the procedure is never associated with an expression (no return <x>; , it may be only return;) but may be optionally used to end the execution and return the control to the calling program. For this different ability of the procedure to "send" the value to the calling program (without ending its execution simultaneously) does not allow it to be used in SQL statements and also can not be used in expression. For example, we can not use select p(arg) from <table>; neither we can use x := p(arg);. We may only use the procedure as an executable statement itself. For example, the procedure may be called like p(arg);.

About the PL/SQL Function -

A PL/SQL function is supposed to perform only "computation" on the values provided to it through its arguments and no "action" or "operation" (Query / DML). The result of the computation must have to be returned through a "return" statement. The return statement must be associated with an expression, which may be a hard coded value or a variable containing a value or an arithmetic / logical expression. That means the execution of the function ends simultaneous with returning the value. This makes the function able to be called in SQL statement and as a term in expression; never as an executable statement. Since the execution of the function must end with simultaneous return of the value, only one value may be returned from the function.

Why should there be a confusion to choose? -

As described above, the Procedures and Functions have diverse characteristics, so then why this discussion, "Procedure vs. Function"? The reason for this being is that, the functions may have everything that a procedure may have. A function may have a Query or DML statement, it is not illegal in the function, however they are not supposed to be used in a function. On the other hand a Procedure may be created for a problem which may be solved with a function, as the "action" in the procedure is optional. Also there is nothing like, a function may not contain an OUT or IN OUT mode parameters and nothing like it can not assign a value to them (though that is not the appropriate way the function should return the value). For example, the following two programs are technically legal but not appropriate -

create function delrows (dno emp.deptno%type)

return number is

       x number;

begin

      delete from emp where deptno = dno;

            x := sql%rowcount;

      return x;

end;

create procedure addnum(n1 number,n2 number,n out number)

is

begin

      n := n1 + n2;

end;

In the above case "delrows" should have been developed as procedure and "addnum" as a function. But even as of now, they are fully legal and they work in their own ambit. The "delrows" will indeed return the number of rows deleted and addnum will return the sum of the two numbers. Then, where the thing has gone wrong -

Function Purity - 
 

You can see from the foregoing explanation on the procedure and function, that the function is a versatile entity and ideally it should be callable from SQL as well as PL/SQL (programming) environment, whereas the procedure may only be called in PL/SQL environment as an executable statement itself. However, the function like "delrows" can not be called in SQL, but the same may be called in PL/SQL environment in an expression to return the number of rows deleted. So the function, though syntactically correct has now a restricted use. The versatility of the function actually comes by following certain rules defined in terms of its purity levels, which are - WNDS, RNDS, WNPS and RNPS. They respectively mean -

WNDS (Write No Database States) - A function may not have DML statement(s).
RNDS (Read No Database States) - A function may not read data from table(s).
WNPS (Write No Packaged States) - A function may not assign to a package variable.
RNPS (Read No Packaged States) - A function may not read a package variable.

So it means, when all the above conditions are met in a function, the function is truly pure and versatile with respect to its calling environment. In the instant case, "delrows" violates the WNDS purity level by the presence of delete statement in it, hence can not be used in SQL.

On the other hand, adding two numbers is a simple computational activity, but in the form of a procedure can not be conveniently used in SQL statement, by passing columns of a table say salary and commission of employee table for calculating the total earning of employees.

Guidelines to make a choice -

The following considerations may be made while choosing between procedure or function for solving a problem -

If the program is supposed to generate multiple values and all those if be required to be sent to the calling program, then use procedure.
If the program is supposed to perform a query and/or DML (cursor), then irrespective of the number of values required to be returned to the calling program, use procedure.
If the program is not supposed to use a query and/or DML and must return a single value then use function.
However if the purity of a function may be forgone (and hence ready to be used in restricted environment) but is a matter of some convenience, then the function may be written, in such special cases, which of course are rare (to be true never faced by me).

Other interesting reads : Data Integrity Triggers, Transaction Concurrency, Object Tables

Monday, 21 May 2012

Delete vs. Truncate

Delete and Truncate SQL verbs (commands) are frequently used to empty tables. The actual difference between the two is not much clear to most users; not only beginners but also to the experienced users. There also seems to be people having very little idea about when to use what regarding these two commands. This post is an effort to clarify about the same. The bulleted list of properties of these commands may help to make out the differences between the two very easily. We also visit the situations when it will be appropriate to use "delete" or "truncate". -

Delete -

This is a DML category of command.
Being a DML category command, it stores the rollback/undo data (the original version of the data that was changed, in case of delete it is the entire set of rows which are deleted)
Being a DML category command, the operation can be undone.
Being a DML category command, it requires an explicit "commit" command to be fired post operation if the deletion is to be confirmed.
Delete command does not reset the high water mark (HWM) i.e. the level to which once the rows had reached in the table.
Delete command may be used with a "where" clause. It is only when the predicate is not used, will empty the table, and there is no freeing of space even in such case for new object(s).

Truncate -

This is a DDL category command.
Being a DDL category command, it stores no rollback / undo data.
Being a DDL category command, the operation can NOT be undone by using a "rollback" command.
Being a DDL category, it DOES NOT require an explicit "commit" command.
Truncate command resets the high water mark to the 0 level, with only preserving the initial extent allocated to the table for the existence of the table. Which means all the extents above the initial are reclaimed as free space.
Truncate command does not have a "where" clause, which means it must empty the table while returning the allocated space (except the initial extent) for new objects.

Is "Truncate" same as "Delete + commit" ? -

There is a normal perception of many users of the databases that truncate is equivalent to deleting all rows and commit rolled into one. And it is just that. Technically however, this is a big NO. As is apparent from the properties of these two commands as described above, the main difference is the reclamation of the space and resetting of the High Water Mark. Many are also seen to be advising use of Truncate over Delete + commit if the users want to empty the table, since the truncate command is "more efficient". Actually there are some considerations to be made before deciding upon the "truncate" or "delete" for emptying the table -

Delete will require huge space (same size as table) to store rollback/undo data if the table is huge, so a truncate may be more appropriate, but only if the data being deleted is for sure to be deleted and there is no possibility of afterthought for rolling back.
A moderate sized table if required to be reloaded with approximately same amount of data (to same HWM) then delete command may be recommended. Truncate will release the extents above the minimum initial extents and will require de-allocation and reallocation of space, the resources for which may outweigh the extension of rollback segments where the undo data is stored.
A truncate command may be recommended over delete if the volume of the reload to the table is much smaller than the present volume of the data, since the space above HWM of the present data may have been reclaimed in the truncate operation.
Delete command may be recommended in cases if there is a temporary replacement of data for running only "select" queries on the intermediate temporarily loaded data and a simple rollback will be able to restore the original state of data.
If no bulk load is in consideration post empty operation of the table, the truncate shall be more appropriate.

The above scenarios may not present an exhaustive list, but known the properties of the delete and truncate operations, a prudent consideration of the differences between the two, may guide the process of appropriate choice between the two.

Other suggested reading - Materialized Views, Use of External Tables , DDBMS

Monday, 14 May 2012

Materialized Views

What are Materialized Views? -

Let us start with an example - A company has a database with a table "sales" maintaining millions of rows pertaining to the orders the company has received from its customers over a couple of years. The company managers are periodically interested to know breakdown of region wise total sales, or average value of region wise orders, or breakdown of customer wise total sales etc. for which you would write the queries, such as -

select region, sum(OrderValue) from sales group by region;

select region, avg(OrderValue) from sales group by region;

select CustName, sum(OrderValue) from sales natural join customer group by CustName;

When each of such query is fired, then every time the millions of rows of data shall be read and loaded in the memory, the aggregation / calculations are performed and then the results are returned to the user.

Now answer this simple question to yourself - why you used those log tables, or trigonometric tables in your school when you knew the method and formulas for calculating logs or trigonometric ratios? Used them as "ready reckoner", right? To save time!!

Then Materialized Views are such "ready reckoner" in the database, which when created with such queries, have captured and stored those summaries in them, for to be quickly referenced to, when such aggregating queries are fired. And because those Materialized Views store only the summaries, the number of records in them are obviously small. For example, if the territory over which the company carries out the business has 10 regions then the first and second of the examples above will have only 10 rows each. So not only much much less I/Os and scanning efforts are required to be performed to satisfy such queries but there will be much much less memory requirements to cache the data too; to conserve the server resources. Materialized Views are a data warehousing / decision support system tools that can decrease by many orders of magnitude the response time taken for queries involving hundreds of thousands to millions of records in calculating aggregations and performing complex joins and other operations, that too transparently. Due to the transparent use of Materialized Views, the SQL query statements still are fired in the same form & syntax at tables, so without any need for users to change them.

Another purpose for creating materialized view(s) may be to provide a local copy of the data (replication) in order to reduce network loads or enable data subsetting (based on rows and/or columns) for local sites or enable disconnected computing in the distributed database environment.

View vs. Materialized View -

The typical view is a stored query written on table(s), which implicitly gets invoked, each time when the view is referenced in the SQL query. It is only a definition and is stored as such, there is no data storage, as such view is called as virtual table. The view is created to provide an abstraction layer, hide the source (table), implement a constraint or provide a predefined visibility to the data to different users. The views provide the fresh data, but incur the overheads of executing the query (stored in the view) each time on reference.

Materialized View stores the data, in most cases precomputed summaries. You can not refer to the materialized view in a query; the queries are same fired at tables, but if they involve the same expressions based on which there exist the materialized views created on the table(s), then those queries are internally directed to the materialized views to satisfy them. Materialized Views may provide the stale data (if non-updatable or less update frequency), and since do not execute frequently, do not incur any execution overheads, but must use space on the hard disk to store the precomputed summaries unlike the views.

Materialized view updatability -
Materialized Views as such are used in Oracle, while other DBMSs use an implementation of equivalent functionality through a slightly different concept and arrangement. For example - SQL Server uses "Indexed Views" or MySQL uses "FlexiViews". The forthcoming discussion is with reference to Oracle as a model database.

Materialized views come in 3 possibilities with respect to their updatability at the remote site :

Read-Only materialized Views
Updatable materialized Views
Writable materialized views

Read-Only materialized view as the name suggests, do not allow updates to be performed at the remote materialized view site and they are simple created by omitting "FOR UPDATE" clause from their definition syntax.

Updatable materialized view allows the changes to be made at the remote site. The materialized view must belong to a "materialized view group" to push those changes back to the master site during refresh. The views use "FOR UPDATE" clause in their definition syntax.

Writable materialized view are same as the updatable, but being not made part of "materialized view group", it can't push the changes back to the master site (during refresh) rather changes are lost in the materialized view itself during refresh.

Other topics of interest - A short challenging quiz , Evolution of DB systems

Thursday, 10 May 2012

Openbook Quiz - 2

Challenge yourself here with the following questions. Each question has also the links provided to the text where the possible answer may be found. Please post your answers in the comments. Also please share / circulate this quiz to your contacts over social network, linked in groups, your own blogs etc. to help everybody come close to an opportunity of gathering knowledge.

Question 1 :
The requirement is to provide a form where user will enter a number which is a unique identification for an application the user has submitted to an organization. The application has to pass through various departments, each of which assigns it a status as soon as it has been processed by it. The application form id which the user enters is Primary key of the table wherein its status is updated. Which of the following Indexes will be suitable to process the query fast and efficiently - (You may click on the options to find the properties of each of those option indexes).

(a) Cluster Index (b) Descending Index (c) Bitmap Index (d) Index Organized Table (IOT)

Question 2 :
A client wants to store multiple addresses for each customer in the database and wants most of the time to list all the addresses of each customer at a time using a query by providing the unique customer id. Which of the following table(s) will be most suitable for this requirement -

(a) Heap Organized Table (b) Clustered Tables (c) Nested Table (d) Object Table

If this interests you then you may visit for a previous quiz if you have missed it. Wait for some more quizzes to come shortly on Data Integrity, so prepare yourself.

Wednesday, 9 May 2012

Data Integrity Using Views

Data Integrity is implemented in the databases using various Constraints and Triggers. Whereas Integrity Constraints are a non-procedural, declarative way of implementing data validation rules (refer Codd's rule #10), the triggers offer the procedural method to implement data validation for such rules where the constraints show certain limitations. However there is a third method for ensuring the validity of the data, that is through creating "views". Our discussion will be with reference to the Oracle as model database.

What are Views? -

Views are created over tables to provide an abstraction layer, to mask the actual source of data and provide a limited access or visibility to the data contained in the tables. Views provide a means to simplification by allowing to store a complex query as a view in the database (hence an abstraction layer) apart from implementing a security feature (by providing limited access and / or visibility).

View is technically a Query (select statement) stored in the database in parsed and compiled form with a name, which can be used as virtual source of data. The views are referenced in the SQL statements in place of tables; and at such reference the select statement associated with the view implicitly executes on the underlying tables to present the data to the operation or for viewing. As such, the views DO NOT store any data, but they exist only in the form of a definition.

The syntax for creating the plain view (without constraint implementation) is simple -

create [or replace] view <ViewName> as <Select Statement>;

The "or replace" clause is optional (indicated by presence of [], a universal standard to indicate optional part of the syntax), when used, it will replace an existing definition of the view with the same name. The "ViewName" may be any suitable name of user's choice, but of course, must follow all the naming conventions imposed by the DBMS. The "Select Statement" may be any select statement, inclusive of where, having, group by, order by, distinct, sub-queries, joins or relational algebra expressions like Union, Intersect or Minus.

Data Integrity Implementation Example -

Consider a company who has managers, each in-charge of multiple departments. So there is a manager M1 in-charge of departments (10, 20, 30), manager M2 in-charge of departments (40, 50, 60) .. and so on. The managers may only make appointments and transfers of employees for the departments under his/her control. How do you think you could implement this rule using Integrity constraints or triggers? There comes the convenience of using views for such requirements by using a clause "with check option". The mechanism of this clause comes into play in restricting the data manipulation only within the range of "where clause" of the view i.e. any such data manipulation on the view would NOT be allowed which would put the row(s) beyond the selection of the view. For example, to implement the above rule we create two views -

create or replace view empm1 as select empid, ename, job, sal, deptno from employee
where deptno in (10, 20, 30) with check option;

create or replace view empm2 as select empid, ename, job, sal, deptno from employee
where deptno in (40, 50, 60) with check option;

Some more such views may be created for other managers for the set of departments under their control. The respective managers will be granted access only to the view created for their set of departments and none other and the table. So the view "empm1" and "empm2" will be respectively available to managers "m1" and "m2".

The following statements will be valid for manager m1 -

insert into empm1 values (101, 'peter', 'clerk', 900, 20);
update empm1 set deptno = 10 where empid = 101;

Since in these - the insert statement is inserting the employee's rows in department number 10, and the update statement is transferring an employee from department 10 to 20, so well within the scope of the view.

However, the following statements if fired by manager m1 will result into an error -

insert into empm1 values (102, 'robert', 'analyst', 2500, 60);
update empm1 set deptno = 60 where empid = 101;

since in these - the insert statement is trying to insert the employee's row in department 60, not within the scope of the view allocated to manager m1. Similarly, the update statement will fail for the same reason as after the update the row will tend to move outside the scope of the view "empm1".

This was not a complete tutorial on views, but we were only restricted to see the use of views to implement such data validation, restriction or business rules which may not be implemented through integrity constraints or triggers. Actually Oracle also allows to implement primary key, unique, rely, read only constraints through the views.

If you have liked this, Click Here for "concurrency & locking" (a one of the most liked topic by other readers).

Monday, 7 May 2012

Data Integrity - Triggers

Please Click Here to continue from the background.

Integrity Constraints are the most convenient form of implementation of data validation rules, as they are purely declarative and involve no programming. They check the existing data in the columns for validity, in case if they are created after the tables are populated with the data or if for some reasons they are required to be disabled and re-enabled; they leave no chance that the column may contain any invalid data. Hence, the constraints are considered to be the most reliable way of implementing data integrity.

However, they are not without limitations. For example, constraints can not implement conditional validation or they do not have capability to perform pragmatically minor transformations to the data to make it compatible with the rule of the constraint. They may only fail the statement / transaction if the data does not conform to the rule of the constraint. How if you want that the names of the employees, departments and locations always have to be in upper case? How if you want the salary of the employees not to cross 5000 for those working in department number 30 only? How if any department in your company should have no more employees than 100? .... The list of such requirements may be long, for which the integrity constraints offer no solution. Those limitations on data (columns) however may be implemented using the triggers -

What are Triggers? -

Triggers are the programs, written in a back end programming language and reside in the database, which execute automatically in response to an event in the database.

Triggers come in different types, primarily according to their events. The triggers which fire (read execute) for system related events are called system triggers, while those which fire for a DML events on tables, are called as DML triggers. Some databases allow to write triggers to fire on DML events on views, are called as instead-of triggers. While it may not be possible to provide an entire lesson on triggers here in this small post, we shall discuss in brief about the DML triggers, the type mostly used and having context with our present topic. Our discussion is in reference to Oracle as model database, however minor differences may emerge in the other databases. The readers are requested to post freely their comments about those differences here to enrich the reading experience of the visitors.

DML Triggers - A Primer -

The trigger (read DML triggers, in all forthcoming references), may be created to fire at "row level" or "statement level"/"table level". The former, identified by the presence of "for each row" clause, executes for every row affected by the DML statement, for which it fires, the latter fires for only once irrespective of the number of rows affected by its conjugating DML statement. Row Level triggers are useful in the situation when triggering action has to refer to or use the row level values, since such reference of row level values is only possible in row level trigger (being firing for each row). Oracle provides :NEW and :OLD to refer to the new and old state of record being affected by the DML in the row level trigger. Statement level triggers can't refer to :NEW and :OLD. Any of these may be timed to fire "before" or "after". The "before" timed trigger performs its action before the conjugating DML statement performs it operation, so that automatically defines what "after" timed trigger means. The "before" trigger may be used in situations where some data transformations on the data in transition may be required or the operation is required to be aborted by aborting the trigger, whereas "after" trigger may be used in situations where the trigger action may depend upon the outcome of the DML statement or has to refer to such outcome. Oracle allows triggers to name columns for the update event, so that the trigger is saved from firing unnecessarily while only other columns are being updated. The triggers may have optionally a condition specified under a "When" clause only for which the trigger should fire. The clause can be used only in row level triggers, as this restricting clause can refer to new and old versions of the row(s). Oracle allows the trigger to be combined for insert, update and delete events or any combination thereof, if the triggers had to be same level (row or statement) and timed to fire similarly (before or after) if they were to be written separately for each event. The particular code for each separate event can be made to execute by using boolean built-in variables, which Oracle calls as "predicates" viz - inserting, updating and deleting.

That was rather a highly stuffed primer on the triggers and by no means can substitute a complete explanation, but the same has only been provided for the base, so that users will not feel the forthcoming discussion as alien. A dedicated post to triggers may sometime come in near future but meanwhile you may refer to this link.

Trigger Examples - Implementation of Data Integrity -
Example 1 - In the first example we shall see the implementation of a constraint, where we want that the data in the "dname" and "location" columns of "department" table should be always in upper case. Whereas we could simply put a check constraint, the problem with constraint would be that, it will fail the update or insert statement if the data is not provided in upper case by the user, thus consume time and resources in rejection and users then resorting to re-execute the operation. Whereas now, in the trigger, we implicitly convert the case of the data provided by the user and so trigger will ensure that the data which makes it to the columns is always uppercase, no matter in what case it was provided by the user. So the example -

create or replace trigger upper_case_dept_trig

before insert or update of dname, location on department

for each row

begin

:new.dname := upper(:new.dname);

:new.location := upper(:new.location);

end;

/

Explanation : The code in red is triggering statement - specifies the timing for the trigger "before" in this case and events which are "insert" and "update" (with column list, in absence of such column list it will fire for all updates on any column(s)) and table name "department". In blue is the clause which tells that this is "row level" trigger, since we require to manipulate the data provided by the user, we are using "before" timed and its reference may be only available from :NEW, we are using a row level trigger. The code in grey is the trigger action - in this case converting the user provided data into upper case (upper is a function) and storing it in same vars so ensuring that it will always be upper case.

Example 2 - In the second example we ensure that the salary of any employee may not be greater than limit of 5000 only in department number 30. Remember that, if any employee is having salary greater than 5000 in department number 30, before the trigger is written or some user makes it so when the trigger is in the disabled state, then the trigger CAN NOT detect this, unlike the integrity constraint. However we may not easily apply this rule using integrity constraint. So the example -

create or replace trigger limitsal
before insert or update of deptno, salary on employee
for each row

when (

(new.deptno = 30 and new.salary > 5000) or

(new.deptno = 30 and old.salary > 5000) or

(old.deptno = 30 and new.salary > 5000)

)

begin
raise_application_error (-20000, 'salary limit crossed - operation disallowed');
end;
/

Explanation : While the other things are same as explained in the above example, the text in red is the restricting "when" clause. Observe here that the when clause is the part of trigger header and its conditions are actually stored in the data dictionary. The phase of parsing on the insert and update statement decides whether or not to fire this trigger (if satisfies the conditions). The text in blue is the trigger action, which in this case is calling a built in procedure to fail the trigger and hence the conjugating DML. The error numbers allowed here are -20000 through -20999. The second argument is the error message to be passed on to the user.

Conclusion -
So here we had two very simple examples of the triggers to implement such constraints which may not have been possible with the integrity constraints. The most important limitation of the triggers however is that, they do not check the data already residing in the tables. The data validation for such data may have to be separately done, if the trigger was disabled for some time, or was created after the table was populated with the data.

Data integrity and DML restrictions, such which may not be possible with integrity constraints or triggers, may be implemented through views. Click Here to continue reading about the views.

Wednesday, 2 May 2012

Integrity Constraints - Creation

Please Click Here for the explanation and background on the Data Integrity Constraints.

Note : The set of commands / statements used here and the explanations thereof are with respect to Oracle as a model database. Readers may experience minor differences in the other database systems.

Constraint creation with the table -
Constraints may be created along with the table. For example -

create table employee (empid    number(5) constraint pk_employee primary key,
ename varchar2(15),
job    varchar2(15),
salary number(5) not null,
cardno    number(5),
deptno    number(3) constraint fk_employee references
                                                                                    department (deptno),
constraint chk_employee check (salary between 1500 and
                                            12000),
constraint uk_employee unique(cardno)
   );

Explanation : The constraint definitions in red are called column level definitions and those in blue are called table level definitions. Here the word level should not be construed as any hierarchy but only places where those definitions are integrated. The constraints ultimately apply to the columns and their behavior is uniformly same irrespective of their level (read place) of definition. Only one constraint definition may be integrated with column (but a column may have multiple constraints), whereas there is no limitation to the number of constraints definition that can be included after all the columns have been defined (table level). So if a column is supposed to have multiple constraints then one of the constraints' definition may be integrated with column and remaining at table level. Or table level constraint definitions must be used if the constraint has reference to multiple columns, for example a composite primary key (primary key consisting of multiple columns). But some developers are of the opinion to define all the constraints at the table level, just to keep the table definition neat and clean. However remember that table level definitions of constraints are not possible for NOT NULL constraints. A constraint definition contains optional constraint clause, for example "constraint pk_employee", which is used to provide user defined name for the constraint, used conveniently for fetching the constraint metadata from catalog. In absence of "constraint <name>" clause, for example "not null" in above case, the system will implicitly provide the name for the constraint, and will have to be every time known from the metadata as such names are very difficult to remember. The check constraint in above example has been defined as table level because the column had to have two constraints and "not null" could not be defined at table level. The example of unique constraint as table level indicates that the constraint could be defined at table level too, and it may be a matter of convenience as aforesaid; we could do this with other constraints as well.

Adding constraints to an existing table -

It is possible to add the constraint to an existing table, if a certain constraint was not created with the table, or some developers also prefer to create the tables without constraint definitions and add the constraints afterwards.

Add Primary Key : alter table employee add constraint pk_employee primary key(empid);

Add Not Null : alter table employee modify salary not null;

Add Check : alter table employee add constraint chk_employee check (salary between
1500 and 12000);

Add Unique : alter table employee add constraint uk_employee unique(cardno);

Add Foreign key : alter table employee add constraint fk_employee foreign key (deptno)
references department (deptno);

Explanation : Observe that for "Not Null" constraint we are using "modify" instead of "add"; for actually not null constraint is defined as column specification as explained above. Also observe that "foreign key (deptno)" clause had to be added to the definition, because we have to tell here which column has to become FK, that was not necessary in the previous example as the constraint was being defined as column level. Here also "constraint <name>" part is optional in all the above statements.

Adding constraints when table has rows -

The ambit of constraints is not to allow any invalid data to remain in the column(s) on which they are defined. Hence, if the data is present then, it will be validated by the constraint while getting created, for the default status of the constraint is "enabled". The constraint creation fails if the data does not conform to the rule of the constraint.

Some database systems like Oracle allow to use "novalidate" key word to ignore the existing invalid data. However, this keyword is not allowed to be used with primary key constraint or unique constraint. This is because, PK or UK may be referenced by the FKs, and it may result into identity crisis as the referential integrity may be affected by it, with the possibility that the child row may result into referring multiple parent rows (important for job seekers, a possible question in the interview!!!) which has to be strictly avoided in any case. The use of "novalidate" is demonstrated for a check constraint in the example below -

alter table employee add constraint uk_employee
check(salary between 1500 and 12000) novalidate;

Disabling/enabling a constraint -

A constraint, when created, is in the enabled state by default. However, sometimes like, during bulk data load operation, or for other reasons the constraint may be required to be disabled. Disabling action is associated with declaring the data as "NOT VALIDATED" in the data dictionary, and obviously while enabling, the data will be verified before the constraint state is changed to enabled.

alter table <TableName> disable constraint <ConstraintName> [cascade];

alter table<TableName> enable [novalidate] constraint <ConstraintName>;

"Novalidate" option does not apply to primary key or unique constraint for the reasons already explained, but it may be optionally used for other constraints if you want the constraint to ignore any invalid data that was put in the column while the constraint was in the disabled state. "cascade" option may be only used while disabling the primary key. It is used/useful when there are foreign keys referencing to it, which are also supposed to be disabled. The primary key constraint is not allowed be disabled with an enabled foreign key constraint referencing it. However, "cascade" option is not available with enable action on primary key, which means that Foreign Key constraint will have to be enabled explicitly after enabling the primary key.

Dropping a constraint -
A constraint may be dropped using the following statement :

alter table <TableName> drop constraint <ConstraintName> [cascade];

Here the "cascade" option may only be used while dropping the primary key constraint. It is used/useful when there are enabled foreign key constraints referencing to it and they (FKs) are also required to be dropped. The primary key constraint is not allowed to be dropped with an enabled foreign key constraint referencing it.

Dropping a Parent Table -
A table that has a primary key constraint being referenced by an enabled foreign key constraint set on another table (child table), is called as Parent Table. It is not possible to drop a parent table, irrespective of whether the tables contain any data. However, this default behavior may be overruled by using "cascade constraints" clause in "drop table" statement as given below -

drop table <ParentTableName> [cascade constraints];

The "cascade constraints" clause not only allows to drop the parent table but also drops implicitly the referencing foreign key constraints enabled on the child tables, while retaining the child tables and its data.

Foreign key constraint options -
Foreign key constraints enforce the referential integrity. The default behavior enforces that, the parent row(s) (in parent table) can not be deleted if there are any child row(s) dependent on it (in child table). This behavior may be altered by using "on delete cascade" or "on delete set null" options in the definitions of foreign key constraints as given below -

...constraint fk_employee [foreign key (deptno)] references department (deptno)
on delete ....;

The options are mutually exclusive, as former option cascades the delete of parent row(s) on related child row(s), while the later option will set (update) the corresponding foreign key value to null without affecting the remaining part of the child row. The "foreign key (deptno)" option (without square brackets) is required when the constraint is defined as table level or while adding the constraint on an existing table.

Deferrable Constraints -
By default the constraint rules are implemented at statement level, i.e. the data is validated immediately after the statement finishes its action. Such constraints are classified as "immediate" with reference to their data validation implementation. However, some situations require the validation of the data against the rule of constraint be delayed until the end of the transaction rather than checked at statement level. Such constraints are called as "deferrable". Following examples demonstrates such a situation -

create table employee (empid     number(4)     constraint pk_employee primary key,
ename    varchar2(15) not null,
   job    varchar2(15),
   salary number(5)     not null,
   mgrid number(4) constraint fk_employee
                                                                                  references employee (empid) deferrable,
   .......
);

Explanation : In the above partial definition of an employee table, we have a column "mgrid" as foreign key referencing to "empid" column of the same table which is a primary key. This is because the "mgrid" is the "empid" of the employee whom the corresponding employee reports. This depicts an organizational hierarchy of the employees within the same table (because managers and subordinates are all employees). This case may require multiple insert statements until the subordinate and manager relationship is established among them. So the constraint checking will have to be delayed until all such rows are inserted (end of the transaction). "deferrable" constraints however are implemented as "immediate" by default and they require explicit statement to "defer" as under -

alter session set constraints = deferred ;

This above statement will "defer" (for every transaction) all the deferrable constraints in the particular session. If it is required to create the constraints as "deferred" only to start with, then replace the key word "deferrable" with "initially deferred".

Constraints are most convenient way of implementing the data integrity as we have seen from the above, but they are not without limitations. There are situations when the constraints may not be able to implement some custom business rules and we have to resort to other techniques like triggers or views. Click Here for data integrity implementation using triggers. Don't forget to leave your comments and ask your doubts in the comments section please.