Saturday 28 April 2012

Data Integrity Constraints

A Relational Database Management System must implement Data Integrity in the database in declarative manner (Codd's Rule #10) to ensure the data integrity independence as proposed in the theory of RDBMS. However, though it may be possible to implement the data integrity through a non-declarative means using triggers in the database, the most convenient way of implementing the data integrity is through declaring "Integrity Constraints" on the columns of the table. Implementation of Integrity Constraints is more or less same in all the commercial database systems, however our model database for reference in this post will be Oracle.

Data Integrity Constraints -

Data Integrity Constraints is a declarative means of implementing data validation rules by creating constraint objects on the columns as a tight integration with the definition (read structure) of the table. This way, the integrity constraints are the most reliable and easiest method to prevent entry of the invalid data into the tables. When any of the data involved in a DML operation on the table violates the rule of the constraint then operation is rejected by the database system. 

Mechanism of Data Validation -

An important question to be answered is, when does the system run check for the validity of data against the rule of the constraint? By default the system checks the validity of the data when the DML statement completes the execution. For example, if a delete statement fired at a table is supposed to hit some 10 rows, then those rows shall be first deleted and then system will run the check of validation and if it finds that for any of the rows so deleted would result into violation of the constraint, then the delete operation is implicitly rolled back. This is called "Statement Level Rollback" - is an implicit operation, transparent to the user. Then the database system reports the error in rejection of the operation. It is important to understand that there is something called "statement level rollback" which comes into effect when the statement has to fail (for any reason) and it is the right (prerogative) of the system an no explicit "commit" or "rollback" could supersede it. (Actually this is the reason for triggers are not allowed to contain "commit" or "rollback" statements. important question asked in interviews, attention job seekers!!!).

The foregoing mechanism of constraint checking is called as "immediate" constraint. All the constraints created by default, are immediate. However optionally you may create a constraint which may defer the validation of the data until the user fires the "commit" statement. Such constraints are called "deferrable". However deferrable constraints also behave like "immediate" by default but may be deferred by using "alter session" statement to defer them. Databases like Oracle also provide facility to create them deferred with "initially deferred" keyword. (See the syntax section for more actions). Deferred state of constraint ensures transaction level integrity of data as the validation of data is performed at the end of the transaction (when user says "commit"). If any data which violates the constraint is observed at the end of the transaction i.e. while committing, then the entire transaction is rolled back.


Types of Integrity Constraints -

Integrity constraints come in following 5 types -

Primary Key constraint is used on column or a combination of columns to designate a primary key on a table. You may (obviously) define only one primary key on a table and it ensures that the column(s) under primary key constraint must always receive a value (not null) and a unique value. It actually implements "entity integrity" and it is an important part of parent/child relationship between the tables. For example an employee id or a student's university enrollment number.

Unique constraint, as the name suggests, allows only unique values if specified, however it accepts null. Unique constraint is used for such columns in the table which must contain unique values (but they can not become primary key, which is only one in any table). For example employee's access card number.

Referential constraint, also called as Foreign Key, requires that each row of a table has a value in such column that matches to a value in the "parent key", if the value is present or it may remain null. The "parent key", also called as "referenced key", is usually a "primary key", though it may be possible for a foreign key column to refer to some "unique" constrained column. For example, a department id column in the employees' table refers to department id column in the departments' table, and since an employee must belong to an existing department, his department id must match with one of the values in the department id column of the department table or an employee may not have a department id at all (null) reflecting that the employee is yet to be allotted a department. The referential constraint implements the "referential integrity".

Check constraint allows you to specify a condition to be true for the data if specified or be there no data (allowed null) in the constrained column for every row in the table. There may be multiple check constraints defined on a single column, as such you have to take care that the conditions in the check constraints on the same column do not conflict or contradict each other. This is because, database systems do not have any mechanism to verify if the conditions specified in the different check constraints on the same column are or not mutually exclusive. Moreover there is no specific order to the evaluation of those multiple check constraints on the same column. Database systems like Oracle allow to use built-in functions in the check constraint with some limitations. Check constraints are actually "domain integrity" part of implementation of data integrity. For example you may allow only Male, Female or 'Rather Not Say' (or null) values for a gender column in a table of customers.

Not Null constraint restricts that the value for the column must be provided. Not Null constraint is actually implemented as check constraint with a built-in condition "<column name> is not null" implicitly when a column is declared as Not Null.


Referential Constraint DML restrictions -

Referential Constraint puts following restrictions on the DML performed on the existing rows in the Parent and Child tables -
  • Parent row(s) which has dependent child rows is(are) not allowed to be deleted by default.
  • Parent row(s) which has dependent child rows is(are) allowed to be deleted if the referential constraint is created with "on delete cascade" option whereby the corresponding child rows are also deleted along with the parent row(s).
  • Parent row(s) which has dependent child rows is(are) allowed to be deleted if the referential constraint is created with "on delete set null" option whereby the corresponding foreign key column shall be set to null when their referenced parent row(s) is(are) deleted.
  • Parent key value can not be updated if it has been referenced by a child row(s).
  • Foreign key column value(s) may be updated to a null or any other value provided the new value exists as parent key value.
  • Child row(s) may be deleted without any restriction.

Constraint Indexes -

Unique and Primary Key constraints generate unique indexes in the background on the columns which have these constraints declared. These indexes are necessary for implementation of uniqueness, otherwise when an insert or update operation on the constrained column is performed then the database system will have to verify the uniqueness of the new value by scanning the entire table, which will be very inefficient. They are plain b*tree indexes. No other constraints create any index on the constrained columns automatically. (this is one of very popular questions asked in interview, attention job seekers!!!)
  
After this theoretical background on the integrity constraints , we shall see the actual creation and administration of the constraints.  Click Here to continue.
 

Thursday 26 April 2012

Data Integrity

If you have visited an earlier post entitled "Relational Database Model - Codd's Rules", then you must have realized that the rule #10 emphasizes the need for enforcement of "data integrity independence" in commercial RDBMSs. The most important purpose of the whole RDBMS paradigm is that, the database should be independent of the applications which access it, so that there should not be any interference of changes in either of them on the other; and to achieve this, the database systems should have only a non procedural (declarative) approach. The rule #10 thus suggests that "data integrity" is the responsibility of the database systems and it must be definable as an object or in the form of a self executing program (read trigger) written within the database (not application), and their existence must be part of metadata stored in the "data dictionary" (or catalog). Also, any change in the "data integrity" be allowed in the database as and when appropriate without affecting the existing application.

What is data integrity ? - 

It is important that data adhere to a predefined set of rules, as determined by database administrator or application developer and also the obvious and natural validation. For example no two customers of the bank may have same account number or a value for gender column can be one from Male, Female and Rather Not Say.

So, in short, "data integrity" ensures the quality of the data in the database and it's enforcement prevents entry of any invalid data into the database tables.

Types of Data Integrity - 

Two important steps in designing a table from "data integrity" point of view is, first identifying the set of valid values for certain columns and secondly, decide how to ensure that only valid values make their way to the column(s). This entails the need for categorizing the "data integrity" -
  • Entity Integrity
  • Referential Integrity
  • Domain Integrity
  • User-Defined (custom) integrity
Entity Integrity defines a row as a unique entity for a table. This is obvious from the definition of entity - a real world object, person or a logical idea with a set of attributes and it to be identified uniquely by its own set of values (which form a row in the table i.e. entity set). So here we require to constrain for at least one column in the table to have values which may never repeat. For example student university enrollment number.

Referential Integrity preserves the real world relationships between the entities, which defines that in such relationship there is an entity with independent existence called as parent may be uniquely referenced by dependent entity (may be multiple) called child (children). The rule enforces that a child may refer uniquely to an "existing" parent only or none (may be termed orphan) and the parent may refer to multiple child entities or none (childless). For example an employee (child) may belong to one and only one existing department (parent) or may not be assigned to any department at all and a department may contain multiple employees or none.

Domain Integrity enforces that the valid values for a column follow a restriction of range or list, and type (numeric, string, date etc.). For example the only valid values for a gender column shall be Male, Female or Rather Not Say.

User-Defined (custom) integrity allows you to implement specific business rules. For example a certain level of manager in a bank may approve a loan case only to a certain maximum limit.

How to enforce Data Integrity ? -

Data Integrity may be enforced in many different ways, depending upon the requirement and suitability. Broadly there may be following ways to enforce data integrity -
Each of these methods have elaborately discussed in the above list of links.

Here is a link to an interesting quiz, some food for thoughts : Click Here

Tuesday 24 April 2012

Database Indexes - Usage and Types

In a previous post entitled "Database Indexes - An Overview" we discussed about two major categories of the indexes - B*Tree and Bitmap. We also discussed about the difference between their structure and their suitability for different access patterns and index key cardinality. This post I am dedicating to some more types under B*Tree, and our reference database system shall be once again "Oracle". However this may be a good read for other database systems users as well.

The types of indexes a database like Oracle may potentially have are - 
  • Index Organized Tables (IOT)
  • Cluster Indexes
  • Reverse Key Indexes
  • Descending Indexes
  • Function-based Indexes

Index Organized Tables (IOT) - 

Index Organized Tables are the tables stored in B*Tree index structure based on primary key values. The same has been already discussed in a previous post related to tables as IOT is one of the table types as well. Click here to read about them in detail.

Cluster Indexes -

Cluster Indexes are used to index the cluster key and they differ from the normal index (the types of which are discussing here) in the cluster key entries in such indexes point to the block address which contains the data about the cluster key value in consideration unlike the normal index key entry which points to a row address in the related table (on which the index is created). Cluster index has been discussed in detail in a previous post. Click here to read about it in detail.

Reverse Key Indexes - 

These indexes are particularly a feature used in a database set up where users connect to the database from different instances (called OPS [Oracle Parallel Server] in old version or RAC [Real Application Cluster] from version 9i onwards) contend for a same index block for inserting their entry in the index. This particularly is required when the index entries come in a sequential manner (generated by some sequence). Reverse key indexes help in reducing the I/O otherwise caused due to pinging and also reducing the amount of resources that may be required to clear the unbalance in B*Tree index arising out new index entries being stored on only one end of the index tree.


Explanation - There are users u1, u2, u3, u4 and suppose u1, u2 are connecting to the database through instance i1 and users u3, u4 are connected through instance i2 (instance is defined as a combination of common memory area which Oracle calls as SGA and supporting background processes, which together provide the resources for users' operations on the data in the database). Let us further assume that all these users at a time are performing insert operations on some table T and there is some column C (index key) in the table which takes the values from a sequence, the current values being 1021, 1022, 1023 and 1024 for the rows users u1, u2, u3 and u4 want to insert respectively. Now, as the actual data is going to be stored in Heap Organized Table (a normal table), the rows may get inserted at any location available in the table, because the table does not need to have any sequence of rows to be maintained (not even in the order in which they are inserted). However, the corresponding entries in the index created over C on table T for the above values must be located in same order (juxtaposed as index must contain the values in an order). In a B*Tree, which is the basic structure of the Reverse Key Indexes, this insert activity will focus in a particular leaf block. This will cause two problems - First, the b*tree, which uses the balance tree (nodes divide the values half way through) for guiding the search to the leaf blocks, will get skewed due to entries only on one side of the balance tree (hence reorganization will be required continuously / frequently). Secondly, in the OPS or RAC the pinging of block (writing the block to the disk modified by one user when accessed by user from another instance and reloading it in memory) will cause heavy I/O, and hence should be avoided, by avoiding the multiple users colliding in the same block. Now if you reverse the above values (1021, 1022, 1023, 1024), they will become 1201, 2201, 3201 and 4201 and hence no more sequential and because of the vast gap between them, now will be stored in different blocks, thereby avoiding the concentration of activity on single block and hence pinging and also the skewing phenomenon on the index. The "sequential" index key values will get distributed over multiple leaf blocks. Of course at the time of search too the values provide by the users will also have to be reversed and then searched through the index.


Drawback - The reverse key indexes are not suitable for range scans i.e. predicates like "where c > 1234 " will not use index. The only queries that may be benefited will be using the predicates like "where c = 1234".


Descending Indexes -

The default ordering of the index key values in the normal B*Tree index is ascending i.e. from lower value to higher. Those indexes are useful in all situations of queries which use unidirectional ordering clause whether ascending or descending, as the databases are known to have the capacity to scan the index in forward or reverse according to unidirectional "asc" or "desc" options specified on the columns in "order by" clause. The descending indexes allow to store the index key values in order from higher to lower. Oracle 8i onwards and SQL Server 2000 are known to use the descending indexes, to deal with a situation when the order by clause in MOST queries may contain different directions for the key columns. With the descending indexes the last step of sorting in the phase of optimization is eliminated and hence there is considerable enhancement in the performance of the query.


Explanation - If in most of the queries you use the ordering clause of the form "order by col1, col2 DESC" or "order by col1 DESC, col2" then you should prefer to create the index of the form "create index <name> on <tablename> (col1, col2 DESC)". If you create the index without using "DESC" option on any one of the columns then the queries using the order by clause of the aforesaid patterns will have a "sort" step in their execution plan. However if the ordering clause is of the form "order by col1 desc, col2 desc" or "order by col1, col2" then you don't have to use "desc" with any of the columns while creating the index.


Function-Based Indexes -

The database optimizers have tendency to ignore the indexes created on bare column(s) when in actual query the predicate uses some function around the column(s) (or if the column(s) is(are) used in some expression or equation). It means that the indexes should be created on just the same expressions or equations to effectively make use of them rather than the bare column(s) involved in them. So function-based indexes give us the ability to index computed columns, which allows us to have searches on different expressions for example - case insensitive searches, search on an equation involving columns etc as explained below.

Explanation - Imagine if you are allowing the user to store the data in "ename" column of "emp" table without any case restriction. However, because the search is case sensitive, you may require to use the predicate with a function around ename. For example "where upper(ename) = 'SCOTT'" to hit all the rows of "Scott" irrespective of the case in which the name is stored. If you have only created the index on "ename" then, for such predicates the index will be ignored and despite the presence of index the optimizer will go for a full table scan. So we may create the index itself on "upper(ename)" rather than just "ename". We may also create index on some expressions like "(x /y + z)" where x, y and z may be the columns of some table on which index is being created.


Create Index Statements -

For a normal B*Tree index -
create index <name> on <tname> (col1[,col2,col3,...coln]);

For a bitmap index -
create bitmap index <name> on <tname> (col1[,col2, col3,...coln]);

For a reverse key index -
create index <name> on <tname> (col1) reverse;

For descending index -
create index <name> on <tname> (col1 , col2 DESC);

For Function-Based index -
create index <name> on <tname> (F(col1));

 Why not take this small quiz and challenge yourself to test your DB knowledge or get it if you don't have this information : click here

Thursday 19 April 2012

Choose a suitable table type - Challenge 1

Examination must be an integral part of learning process or other way round, learning is incomplete without an examination. But not such examinations, which just declare a learner pass or fail, but those which give an opportunity to learn. In my opinion, when we call some person as "experienced" means, the person has learned by facing and solving problems (examinations). Truly, experience is a great teacher, it gives the exercise first and leaves it up to you to draw a lesson from it, which as we all know from "experience" we never forget in life. What part of lessons you really remember from your schooling or college? A very small indeed. Because, the first thing is, we did not know its application in the real life when the lessons were given to us and secondly, the answer lies in this very first reason, that we did not learn the lessons through exposure to problems, but just lessons were in a way "taught" (say read out) to us.

Here the questions I am going to post are going to be  simple but practical questions faced in real life as developer. I am liberal to provide you links in the options of the questions itself to different pages / posts of this same blog and may be some other blogs too or you are free to search the answers elsewhere in books, docs, manuals or from your own experience. So you have an opportunity to thoroughly read or critically learn from the pages in search of the proper answer. I would of course like you to post your answers with reasons / explanations in the comments of the respective posts, and let there be a technical analysis in those reasons / explanations. So I hope you might be eager and ready as well to face this first challenge (many more to come, become a follower of this blog so that you will get automatic updates and of course such posers) -

Choose a suitable type of table to be created for following situation -
I want to store the data about employees and departments of a company. The data about the employees consists of (empid, name, job, salary, commission, departmentid) and about the departments consist of (departmentid, department name, location). The requirement is, while I need to access and/or manipulate the data of employees and departments individually a few times, but more frequently I list the names of the employees, their salaries along with the names of the departments and location of the departments in my application. Which of the following table / structures you would prefer to use and why? (this poser assumes the Oracle database, but the professionals from other DB implementations are invited to answer from their own area for readers to have a comparative study, which of course leads to cross platform knowledge and that is also the purpose behind the posts of this type). Your options / links are -

(a) Heap Organized Tables (b) Nested Tables (c) Object Tables (d) External Tables (e) Clustered Tables (f) Temporary Tables (g) Index Organized Tables

If you have liked it, then another more interesting quiz : Click Here 

PS : Share this post generously on your social or professional network.

Tuesday 17 April 2012

Database Indexes - An Overview

Why create index? -
DBMS is all about storing the data, allowing for the easy access and perform transaction on the data. RDBMS insists on such storage to be in the structure of a Normalized Relation(s) that obey certain rules, popularly referred as Codd's Rules. Of course, as the volume of the data increases manifold on the top of increase in the number of users, the query execution time and also the amount of resources consumed in carrying out such data operations (search) increase exponentially. "Indexes" are another prime set of objects, created along with the tables in the RDBMS, to cut down on both, the execution time and the resources. Database indexes are analogous to the indexes or list of contents we normally visit in "large" books to help us locate a particular topic of interest. The alphabetical order on the titles along with the page numbers against them reflect in the database index in the form of ordered values from a column(s), usually called as index key, and a pointer to the exact address of the row in the database, respectively. Just like if you don't have index or table of contents in a book will require you to read the entire book to locate and read a topic; an absence of index on the "search column" of a table will require to read ALL the table data to find what the user is looking for, giving considerable rise in I/O.

Indexes don't serve any mandatory functionality in the database and hence are purely optional objects. Yet, there may be no small or big databases without them. However, use of indexes and that too of an "appropriate" type is extremely crucial issue as using (or not using or not using an appropriate type) involves a trade off between speedy retrieval of query results and slower DML operations. For most of those DML operations would require reorganization of the index and since time spent in such reorganization would be directly proportional to its size and volume, density and distribution of the transaction, the decision requires prudence on the part of developers and DBAs as well. There is no limit to the number of indexes you may create on table, however there may only ONE index you can create on a particular column or combination of columns (index key). The use of index is transparent to the application (and of course user) and does not require any change in the SQL, but it is required for an application developer to be well aware of the subject of indexes, their types, how they work and specifically their suitability to a pattern of access while framing SQL.

It is observed that the creation of indexes is reactive part of the tuning process unfortunately, though in small proportion this may be the approach, but by and large it should not be so. Creation of the index should be the proactive part of the tuning strategy to be considered during the development of the application itself and not as an afterthought.

Index Schemes -
Basically there are broadly two types of indexes, based on their organization characteristics, used in the commercial database systems like Oracle (this DBMS would be our reference for our further discussions) namely - B*Tree and Bitmap.
B*Tree Index Structure
  • B*Tree Indexes, considered as a more conventional type of indexes and default types in most database systems, including Oracle, are implemented in similar way as binary search tree. The goal is just not to minimize the time taken to search value(s) but also to equalize it. The search process to reach to page (block) containing your value is something analogous to searching a word in a dictionary(with some discipline/ rule). For example if you want to search a word "tiger" - you open the dictionary half way through (the discipline / rule) and say we land on a page of words that begin with "g", it means there are no chances for "tiger" to be in the left hand side pages (you eliminated 1/2 of the pages from your search). On the right hand side pages you open half way through again (remember the discipline / rule) and you land on some page of "s" words, so there is no chance of "tiger" being in left side 1/4 pages of the dictionary. You continue the process thus cutting each time half of the remaining pages either on left or right side of the landing page, you hit a small number of pages each having two words on the top, one on the left and other on right, that helps you guess if "tiger" could be on the page or not without fully reading it. The pages in this example which are scanned at the end are "leaf blocks" in our B*Tree indexes and the landing pages, where we decide to move right or left are called "nodes" in our indexes. The pictorial representation as above, would be like there are multiple levels of branches, each called as "Blevel".
  • Bitmap Indexes, were for the first time proposed in the year 1985 in a research paper entitled "Storage and Retrieval Considerations of Binary Databases", published by Professor Israel Spiegler and Rafi Maayan. The first commercial use was made in 1987 in a database product called Model 204 from Computer Corporation of America. Oracle implemented it in their version 7.3. The principle of bitmap index is simple - First, the index works very well for low cardinality i.e. for the column where the number of possible values are small (so they appear repeatedly in that column) as compared to the number of rows in the table. Second, the index contains, bit vectors equal in number as those possible values and have 1 bit placed in the jth position if the jth row contains the value being indexed. Each vector contains bits equal in number as the number of rows in the table so there is one-one positional correspondence between the bit and the row(in table). So it does not have to store row addresses (in Oracle "rowid").

    Example : If we are maintaining information about cars in an application developed for a regional transport registration office and say there are 100,000 cars so far registered (and many more to be registered in time to come) and one of the columns in the table is vehicle type, which may have five possible values - sedan, hutchback, SUV, MUV, and van. Then indexing "VehicleType" column is a perfect case for using "bitmap" index. There will be five bit vectors, one each for sedan, hutchback, SUV, MUV, and van. Each such vector will have 100,000 bits (equal in number as the number of registered vehicles and will increase as new vehicle register). For instance, vector for sedan could be 00010001000001000000..... Observe here that there is bit "1" in positions 4,8,14..., so the car type at the rows 4,8,14 ... are of sedan type. Another vector say for hutchback could be 0000010100000010000.... Observe here that there is bit "1" in positions 6,8,15..., so the car type at the rows 6,8,15... are of hutchback type. And so on. 
B*Tree v/s Bitmap -
B*Tree indexes in general are best suited for more selective queries (requiring to select very small % of the total rows from table) and particularly for high cardinality index keys (the column(s) to be indexed). Bitmap indexes are best suited for the low cardinality index keys. Bitmap indexes take very small space, because they store only bit vectors, for example in the above case each vector will store 100,000 bits i.e. equivalent to 12.5KB and the total size of index will be 12.5 * 5 = 62.5 KB. whereas the size of B*Tree index for same number of rows in the table will be at least 976 KB (that is minimum space required to store only rowids) and will further take space to store index key values (not much predictable at this stage). Bitmap indexes are considered very well for queries of ad-hoc type and particularly aggregation like count whereas B*Tree are considered suitable for selection of small portion of rows and based on range of values on index key column and may contain a fair mix of selection and DML. Queries like "select count(*) from cars where vehicletype = 'sedan';" may be answered from the bitmap index without even visiting table and same would be the case if we use some column "c" of a table "t" in the query by replacing cars with t and vehicletype with c and some appropriate value in place of sedan, where c is indexed as per B*Tree scheme. But here the matter is the volume of data scanned (hence loaded in memory) which will be much less in Bitmap index than in B*Tree case. Because OLTP and DWH databases involve more of aggregation/ad-hoc queries and close to zero transactions, the bitmap indexes are more suitable for most cases, whereas OLTP databases are observed to have more of B*Tree indexes, however this should not be construed to be a rule.


When to Index and What -
Indexes generally are known to enhance the query performance but only if proper attention is paid to when (access pattern), what (choice of index key) and which type (described above), or the indexes may prove detrimental to the performance. Though there are no concrete rules, but a few guidelines do provide the clue -
  • Index generally when you need to access data no more than 10 / 15 % from the table or some aggregation queries like count are frequently required to be answered (from the index alone).
  • Indexes should be generally avoided on small tables as full table scan may be much more economical in such case than scan index and then table.
  • Index those columns that may be frequently involved in joins, where clause, order by, group by, union or distinct operations.
  • The index key columns should be fairly static (normally not involved in update operations) or overheads in reorganization will be considerable.
  • In composite index, the driving column (first column in composite index key) should be more selective.
Apart from the above "general" two types of index schemes, we can see that sophisticated database management systems like Oracle provide you with some more flavors of B*Tree indexes. Click here for an elaborate description of more special indexes as listed below and their usage:
Also Visit : Career Articles

Thursday 12 April 2012

What are Nested Tables ?

"Nested Table", as the name suggests is table within a table. A normal Heap Organized Table , if has a column whose datatype is table like object, then such column is called as "column object" and since it being a table may store "unordered" set of data elements, all of the same type as the datatype of the column, hence it is very much like a table within the table and is called as "Nested Table". Actually creating nested table is sort of going out of bounds that allows us to create a relationship within a table that could have been easily done with normal two separate "parent" and "child" like relational tables. At least some of the authors like Thomas Kyte believe it so and do not much encourage to use nested tables in this way.


"Nested Table", as a part of Object Relational Extension, in Oracle, has been provided as one of the type of "collection objects" (read array but somewhat different in some sense), the other being "varray". The only points of differences between these two types of collections are, the former is virtually unbounded (with no upper limit to number of elements it may contain) whereas latter is bounded and also while the former may be manipulated easily by "Extended" SQL, the latter may not and requires PL/SQL program to manipulate. So the discussion about Nested Table is considered enough to give a fair idea about the varrays.


As has been said above, the Nested Table is very much similar to a child table in a traditional parent/child pair in the relational model but just has been facilitated with being defined as an object data type to be used to create a column in the normal table. So in a way it gives an illusion that each row in parent table has its own child table. Visually the structure could somewhat resemble as follows -

Table deptemp :
deptno dname loc empt

empno ename job salary darate itxrate xemplim
10 acct NY 1 Peter manager 3000 120 10 24000

2 George clerk 800 140 5 6000
... ... .... .... .... .... ....
20 research CA 15 Scott analyst 2900 110 9 22000

25 Julia analyst 3200 120 10 23000
.... .... .... .... .... .... ....

A Live Example -

We shall consider the same object (empitax) that we created in our post about Object Table and use it to create our Nested Table type collection (emptaxtab) and its column (empt) in a "deptemp" table.

Step 1 -
Of course creating the object type empitax which we did in out post about Object Table. (not to be repeated here)


Step 2 -
We shall create "Nested Table" type collection


create or replace type emptaxtab as table of empitax;
/


Step 3 -
Creating the table "deptemp"

create table deptemp (deptno number(3) primary key,
                      dname  varchar2(15),
                      loc    varchar2(2),
                      empt   emptaxtab)
nested table empt store as empnest;

alter table empnest add constraint u_empnest unique (empno); 


Observe here that we're creating the column "empt" as type "emptaxtab" of nested table type collection we created in step 2. We are adding "unique" constraint in the statement "alter table ..." to force the "empno" to be unique. The nested table however does not allow to add any foreign key constraint, even the one to refer to its own unique column. The clause "nested table ... store as ..." has been explained at the bottom of this post.

Step 4 -
Data manipulation -
We may insert the rows in this table as -

insert into deptemp
values(10,'acct','NY',emptaxtab(empitax(1,'peter','manager',3000,120,10,24000),
                                empitax(2,'George','clerk',800,140,5,6000),
                                empitax(......))); 

In this same way we may put rows for all the departments and their employees. (observe the constructors "emptaxtab" and "empitax" being called. "constructors" are functions used for initializing (read providing) values for object attributes)


Now suppose we have recruited one more employee in deptno 10, then to add such employee row we require to use an operator "the" or "table" as given below -

insert into the(select empt from deptemp where deptno = 10)
values (5,'Bobby','accountant',1200,130,5,9000);

Now suppose we want to change salary of Peter working in department 10 from 3000 to 4000 then

update the (select empt from deptemp where deptno = 10)
set salary = 4000 
where ename = 'Peter';

Now if George leaves our company, so to delete his row -

delete from the(select empt from deptemp where deptno = 10)
where ename = 'George';

The "Table" or "the" operator (in fact a function) may be used to list the data in an un-nest way (similar to relational table join) -

select d.deptno, d.dname, d.loc, e.*
from deptemp d, table(d.empt) e;

Nested Table storage -

Recall the clause "nested table ... store as ...". Actually it implicitly did create a real physical table "empnest" apart from our table "deptemp" and stored it separately. Also look at each "select" statement in "the" or "table" operator. The predicate contains only an equality condition, "select empt from deptemp where deptno = 10". This is virtually selecting the "nested table" for one row in the table, here for department 10. And it must return at least one row or it will result into error "ORA-22908:reference to NULL object value". If this same predicate (for example deptno > 10) returns multiple nested tables, it will return error "ORA-01427:Single row subquery returns more than one row".

Nested Table Limitations -

So we may conclude - Oracle considers each row in "deptemp" table to point to a "table" (a nested table virtually only meant for it) and not a set of rows from another table as it happens in parent/child like relational tables. In relational model there may be set of rows in the child table for one parent row. This basic difference has some limitations in the use of nested tables.

The most important limitation is that we can not access the nested table directly, it has to be accessed through the main table (deptemp). So it is not easily possible to answer queries like "what department does peter work in". Also "how do you transfer an employee working in department 10 to department 30" in nested table? In relational table we just may update the "deptno" (a reference column in emp table to point to deptno column of dept table) column of the emp table from 10 to 30.

Tuesday 10 April 2012

What are Object Tables ?

For an introduction to the table types and list visit : What are Index Organized Tables?

The era of Relational Database Management System (RDBMS) is now left far behind and since last around a decade or so we have been treading the era of Object Relational Database Management System (ORDBMS) when some of the commercial database systems (Informix) and Oracle in particular (since version 8) provided the SQL to implement "Object Relational Model" in the databases running out of their systems. Before we delve into "Object Relational Extension" again with reference to Oracle as our model DBMS, we need to have a brief primer in what "Object Oriented" is all about.

Object Oriented Concepts -
"Object Oriented" programming approach was introduced in late 80s but if going on the spiritual lines one should say the creator of this universe had this same approach in designing this universe; this statement I am making on the same lines as "this world is worth living because it's non-procedural" I made in an earlier post. Now I must also say "the world is worth living because it runs the object oriented way" (of course apart from being non-procedural).

Technically, object oriented approach is defined as "encapsulation" of attributes (variables) and methods (programs) which manipulate those variables to produce certain output depending upon which of the methods have been revoked. An example is the need of the order for this definition to make sense to the beginners or juniors.


What do you do when you feel like having a "brick oven pizza" on your lunch? You call a nearby pizza hut service, right? and then you provide the service with the toppings (technically each topping is a variable and you provide a value for it) of your choice and then tell the service the type of pizza you want, in this case "brick oven pizza". Now you know each of those classified pizzas are made in certain way or process (read Method), so technically you invoke the method. And you get the pizza delivered (the output). And so call "pizza hut" an object.


Actually we may use and extend this same example to understand various properties of an "object" like inheritance, polymorphism, overloading etc. but not for now, as we have got enough of dose of object oriented concept for our job at hand "creating an Object Table".


One important point before we leave this item is that, in Oracle, we actually create a "type" rather than a "class" (a terminology used by other object oriented languages). In a way class and type have no difference conceptually, as both are a "template" (read definition) of an object. The Oracle's "type" terminology is picked from the idea of "datatype" as technically declaring a variable of a datatype and assigning a value to it, if studied carefully, will reveal that it is very much an object.


Defining an Oracle Object Type -
Oracle programmers use PL/SQL back end language to define an "object type", which is written in two parts - header and body. Simply put the header contains all the "declarations" and the body, "code" for all the methods declared in the header. The header has to be successfully compiled before you attempt to write the body. Here in this example we are creating an object type "empitax" containing the salary details of employee along with the methods to calculate the CTC, taxable income and tax amount. -

-- Type Header
create or replace type empitax as object (
                                          empno   number(4),
                                          ename   varchar2(15),
                                          job     varchar2(15),
                                          salary  number(4),
                                          darate  number(3),
                                          itxrate number(3),
                                          xemplim number(8),
member function ctc return number,
member function taxincome return number,

member function taxamt return number);
/

-- Type Body
create or replace type body empitax is 
  member function ctc return number is 
   totearning number;
  begin
   totearning := self.salary * 12; -- annual salary
   totearning := totearning * (toearning * self.darate / 100);-- sal + allowance
   return totearning;
  end;


  member function taxincome return number is
   tincome number;
  begin
   tincome := self.ctc - xemplim; -- taxable income salary minus exemption limit
   return tincome;
  end;


  member function taxamt return number is
   tamt number;
  begin
   tamt := self.taxincome * self.itxrate / 100; -- tax amount x taxrate percent
   return tamt;
  end;
end;
/
Some explanations - When an object type has to refer to its own element then it must be qualified with "self" (self.salary, self.darate, self.ctc, etc). The calculation formulas are arbitrary and not standard. The "or replace" clause is optional; used to overwrite the old definition with new one.

Creating Object Table -
Once you have created the object type now you may create the object table simply by using following form of "create table" statement -

create table emptaxtab of empitax;

This statement will create a table named emptaxtab with the attributes of empitax to become the columns of the table. The table is called "Object Table" and each row stored in the table is called as "row object". All the normal SQL statements may be used to handle the data in this table.

Inserting rows -
insert into emptaxtab values (101, 'peter', 'manager', 3000, 120, 10, 24000);
insert into emptaxtab values (101, 'diana', 'analyst', 2500, 110, 8, 20000);

Observe that the insert statement is same as would be used against a normal heap organized table. Other statements, select, update or delete would also be same. But then what is so special about it? Well, remember? There are some methods integrated in the object from which the table has been created. So we can write a statement -

select ename, x.taxamt() as taxamount from emptaxtab x;

and thus you could list the names of the employees along with their tax amount (or even CTC or taxable income by calling appropriate methods) so easily. The methods are integrated and remain part of the table. (you may use appropriate predicate, "where clause" to list only the names of choice also in this select statement).

Object Oriented is a great concept; so next time when order a pizza of your choice, thank the god for creating this world the object oriented way, and enjoy it. Click Here to continue reading on "Nested Tables" another implementation of Object Oriented concepts.

Saturday 7 April 2012

What are External Tables ?

For an introduction to the table types and list visit : What are Index Organized Tables?

"External Tables" are probably the types of tables used in Oracle only (other database systems might use some other mechanism, and I request users of those systems may share their knowledge through posting comments / links), as an intermediate staging area during bulk data load operations, particularly in Data Warehouses. They were introduced in Oracle version 9i and the limitation of being only "read only" has been removed since version 10g. The "External Tables" feature is a complement to existing SQL*Loader functionality. It enables the user to access the data in external sources as if it were in a table in the database, and also allows data transformations.

What are External Tables? -
Contrary to what their name indicates, "external tables" are not really the tables. They are only the objects in the form of a definition stored in the data dictionary (catalog). They are capable of providing the data in the normal table format while the data is actually stored in operating system files "external" to the database (hence the name "external tables"). So dropping of the external tables only removes its definition from the data dictionary while data still remains safe in the files. These tables allow all sorts of operations like normal tables do - queries, joins, and DML (from Oracle 10g onwards). So external tables are objects in the form of a definition which do not store any data in themselves but only enable you to view externally stored data as if it were inside a table in the database. The trick is actually the data fields in the external file are merely "mapped" to the external table columns, not actually loaded into them.

External tables are said to imitate functionality of a bulk data load utility called "SQL*Loader" and is considered to be equivalent to storing "SQL*Loader Control File" in the form of a definition of table and hence the same plain SQL (and also PL/SQL, a server side programming language in Oracle) is also allowed to be used against the External Tables. However data transformation and filtering capabilities are considered to be much more flexible than SQL*Loader.


A working example of External Tables -

Step 1 - creating a directory object
A directory object is object defined in Oracle database to point to a file system location where our source flat files are stored. It is the way of protecting the source files from unauthorized access as there are also privileges associated with the directory objects which are granted from the database with "grant" command -

create directory extdir as 'c:\oracle\dfile'; -- directory creation


grant read, write on directory extdir to scott; --read & write privilege granted

Step 2 - Creating external table

create table empxt (
   empno  number(4),
   ename  varchar2(15),
   job    varchar2(15),
      .....
   )
organization external
(
type oracle_loader
default directory extdir
access parameters
(
   records delimited by newline
   badfile empxt.bad
   logfile empxt.log
   discardfile empxt.dis
   fields terminated by ','
   missing field values are null
)
location ('emp.dat')
)
reject limit unlimited;

Explanation : The part of definition above "organization external" is plain create table with definitions of colulmns, "organization external" clause tells oracle that the table is to be created as external table, "type" clause specifies the engine - oracle_loader is used for loading data from file to the internal table, "default directory" points to the directory where all of our source data files and rejection & logfiles will be housed, "location" provides the name of the input data file.

Step 3 - the input data file (external file) 
The input data files may contain the records in two formats - (1) fields delimited by a character for example comma as is used here (2) or fields with fixed width format [in which case the access parameters of this example will change accordingly]. For this example the data may in the following way -

7788,scott,analyst,....,20
7839,king,president,....,10
     .......
Step 4 - we can now query the external table by simple SQL

select * from empxt;

Step 5 - We can populate our EMP table (internal) 

insert into EMP select * from empxt;

This was a very simple example but it is possible to perform a lot of transformation operations, use oracle's built-in functions, in-line formulas etc. for a more exiting run of the bulk data load operation.

The most important of the advantages of using the external tables is that you can use the common SQL to query the data from external tables which mask the source data files with its structure and almost any operation that we normally do in query may be performed in this query too against the external tables.