Tuesday 24 April 2012

Database Indexes - Usage and Types

In a previous post entitled "Database Indexes - An Overview" we discussed about two major categories of the indexes - B*Tree and Bitmap. We also discussed about the difference between their structure and their suitability for different access patterns and index key cardinality. This post I am dedicating to some more types under B*Tree, and our reference database system shall be once again "Oracle". However this may be a good read for other database systems users as well.

The types of indexes a database like Oracle may potentially have are - 
  • Index Organized Tables (IOT)
  • Cluster Indexes
  • Reverse Key Indexes
  • Descending Indexes
  • Function-based Indexes

Index Organized Tables (IOT) - 

Index Organized Tables are the tables stored in B*Tree index structure based on primary key values. The same has been already discussed in a previous post related to tables as IOT is one of the table types as well. Click here to read about them in detail.

Cluster Indexes -

Cluster Indexes are used to index the cluster key and they differ from the normal index (the types of which are discussing here) in the cluster key entries in such indexes point to the block address which contains the data about the cluster key value in consideration unlike the normal index key entry which points to a row address in the related table (on which the index is created). Cluster index has been discussed in detail in a previous post. Click here to read about it in detail.

Reverse Key Indexes - 

These indexes are particularly a feature used in a database set up where users connect to the database from different instances (called OPS [Oracle Parallel Server] in old version or RAC [Real Application Cluster] from version 9i onwards) contend for a same index block for inserting their entry in the index. This particularly is required when the index entries come in a sequential manner (generated by some sequence). Reverse key indexes help in reducing the I/O otherwise caused due to pinging and also reducing the amount of resources that may be required to clear the unbalance in B*Tree index arising out new index entries being stored on only one end of the index tree.


Explanation - There are users u1, u2, u3, u4 and suppose u1, u2 are connecting to the database through instance i1 and users u3, u4 are connected through instance i2 (instance is defined as a combination of common memory area which Oracle calls as SGA and supporting background processes, which together provide the resources for users' operations on the data in the database). Let us further assume that all these users at a time are performing insert operations on some table T and there is some column C (index key) in the table which takes the values from a sequence, the current values being 1021, 1022, 1023 and 1024 for the rows users u1, u2, u3 and u4 want to insert respectively. Now, as the actual data is going to be stored in Heap Organized Table (a normal table), the rows may get inserted at any location available in the table, because the table does not need to have any sequence of rows to be maintained (not even in the order in which they are inserted). However, the corresponding entries in the index created over C on table T for the above values must be located in same order (juxtaposed as index must contain the values in an order). In a B*Tree, which is the basic structure of the Reverse Key Indexes, this insert activity will focus in a particular leaf block. This will cause two problems - First, the b*tree, which uses the balance tree (nodes divide the values half way through) for guiding the search to the leaf blocks, will get skewed due to entries only on one side of the balance tree (hence reorganization will be required continuously / frequently). Secondly, in the OPS or RAC the pinging of block (writing the block to the disk modified by one user when accessed by user from another instance and reloading it in memory) will cause heavy I/O, and hence should be avoided, by avoiding the multiple users colliding in the same block. Now if you reverse the above values (1021, 1022, 1023, 1024), they will become 1201, 2201, 3201 and 4201 and hence no more sequential and because of the vast gap between them, now will be stored in different blocks, thereby avoiding the concentration of activity on single block and hence pinging and also the skewing phenomenon on the index. The "sequential" index key values will get distributed over multiple leaf blocks. Of course at the time of search too the values provide by the users will also have to be reversed and then searched through the index.


Drawback - The reverse key indexes are not suitable for range scans i.e. predicates like "where c > 1234 " will not use index. The only queries that may be benefited will be using the predicates like "where c = 1234".


Descending Indexes -

The default ordering of the index key values in the normal B*Tree index is ascending i.e. from lower value to higher. Those indexes are useful in all situations of queries which use unidirectional ordering clause whether ascending or descending, as the databases are known to have the capacity to scan the index in forward or reverse according to unidirectional "asc" or "desc" options specified on the columns in "order by" clause. The descending indexes allow to store the index key values in order from higher to lower. Oracle 8i onwards and SQL Server 2000 are known to use the descending indexes, to deal with a situation when the order by clause in MOST queries may contain different directions for the key columns. With the descending indexes the last step of sorting in the phase of optimization is eliminated and hence there is considerable enhancement in the performance of the query.


Explanation - If in most of the queries you use the ordering clause of the form "order by col1, col2 DESC" or "order by col1 DESC, col2" then you should prefer to create the index of the form "create index <name> on <tablename> (col1, col2 DESC)". If you create the index without using "DESC" option on any one of the columns then the queries using the order by clause of the aforesaid patterns will have a "sort" step in their execution plan. However if the ordering clause is of the form "order by col1 desc, col2 desc" or "order by col1, col2" then you don't have to use "desc" with any of the columns while creating the index.


Function-Based Indexes -

The database optimizers have tendency to ignore the indexes created on bare column(s) when in actual query the predicate uses some function around the column(s) (or if the column(s) is(are) used in some expression or equation). It means that the indexes should be created on just the same expressions or equations to effectively make use of them rather than the bare column(s) involved in them. So function-based indexes give us the ability to index computed columns, which allows us to have searches on different expressions for example - case insensitive searches, search on an equation involving columns etc as explained below.

Explanation - Imagine if you are allowing the user to store the data in "ename" column of "emp" table without any case restriction. However, because the search is case sensitive, you may require to use the predicate with a function around ename. For example "where upper(ename) = 'SCOTT'" to hit all the rows of "Scott" irrespective of the case in which the name is stored. If you have only created the index on "ename" then, for such predicates the index will be ignored and despite the presence of index the optimizer will go for a full table scan. So we may create the index itself on "upper(ename)" rather than just "ename". We may also create index on some expressions like "(x /y + z)" where x, y and z may be the columns of some table on which index is being created.


Create Index Statements -

For a normal B*Tree index -
create index <name> on <tname> (col1[,col2,col3,...coln]);

For a bitmap index -
create bitmap index <name> on <tname> (col1[,col2, col3,...coln]);

For a reverse key index -
create index <name> on <tname> (col1) reverse;

For descending index -
create index <name> on <tname> (col1 , col2 DESC);

For Function-Based index -
create index <name> on <tname> (F(col1));

 Why not take this small quiz and challenge yourself to test your DB knowledge or get it if you don't have this information : click here

No comments:

Post a Comment