Tuesday, 3 April 2012

What are Index Organized Tables (IOT)?

Introduction -
Tables are what, RDBMS is all about. Tables provide a higher level abstraction layer for the logical storage of data, so much so as reference to the tablename is all required for specifying the source (read location) of data in the non-procedural SQL language. We all commonly know about the structure of the table simply to consist of columns and rows. But this basic table may not serve the purpose of different access patterns and data handling needs from optimization point of view. Hence sophisticated commercial RDBMS like Oracle may have some more types of tables than the "Regular" or "Normal" table; which by the way is called as "Heap Organized Table". We are going to look into these many different types of tables used in Oracle as our reference database system. There are about the same types of tables also used in database systems from other vendors, but they may be named differently or may use slightly different terminology. The visitors are free to post the links or comments to enrich the article in that respect.

Here is the list of different types of tables used in Oracle -
Heap Organized Tables -
This is the basic or "normal" standard table type based on the concepts of Codd's Rules #1 and #2. "Heap" is a bunch of space used in somewhat random fashion, and that is how the rows are managed (stored) in these tables and hence the name. A deleted row will makes the space available to the new row and the new data (row) is stored in the first free space slot found in the table and not necessarily at the end or bottom of it. So there is no definite order ultimately observed in the rows stored in such table; not even the order in which the rows were inserted. Further, those rows will come out (against a query) in a equally random order and depending upon other options being used (like parallel query or different optimizer mode, etc.), may come out in a different order with the same query. If you want to receive the rows in any particular order based on values of column(s) then use "order by" clause around such column(s) in the query. Most of the tables in the database are of this type as they satisfy the most of the application needs.

While the "Heap Organization" of these tables comes as a boon in re-utilization of the space released by deleted rows, hence effective on space utilization, the drawback is that, when the query has search criteria not based on a key (primary key) or unique column will require the database to perform a full table scan for locating the rows. This could be a serious resource bottleneck for tables containing huge number of rows, until appropriate indexes are created on the relevant column(s). So heap organized tables almost always to be associated with indexes on the search relevant column(s) and hence consume space.

Index-Organized Tables -
Tables stored in an index structure are called as "Index-Organized Tables" (IOT). Unlike heap-organized tables, where data (rows) gets stored randomly at any available free space, the index-organized tables store the data sorted on the primary key. This organization does not make any change in the SQL though.

Organization of a table in the form of index however requires lot of work to manage and maintain (with a volume of transaction and resulting frequent re-organization) since B-Tree organization of index is rather a complex structure (in the form of an inverted tree). This one aspect therefor justifies why each table (anyway in RDBMS would have primary key) is not created as IOT. Heap tables on the other hand are easy to manage given a transaction volume even with associated indexes that might take more space than IOT (which may not have a separate index) but much less or close to no requirement for re-organization. In short, response to a transaction is much slower in IOT along with more resources consumption in reorganization as compared to heap table. So that is the reason for not creating every table as IOT.

The implementations that beg for an IOT are where the lookup is based mostly on primary key for single row at time (or sequential fetch for multiple rows) and the tables do not have much of transactions (particularly deletes and inserts) but only selects. For example - in a train reservation system there may be a lookup table with train number and train name and some other data like starting point and termination. The data hardly changes and the lookup is always based on train number which is unique (PK). Similarly zip code to state may also be a good candidate for IOT.

Creating Index-Organized Tables -
You may create an IOT using the following command structure -
create table sampleIOT (col1 number primary key,
                        col2 varchar2(20),
                        coln <datatype>(<precision>))
organization index tablespace tbs1
pctthreshold 10
overflow tablespace tbs2);
Command Explanation -
Before we discuss about the different options used in the preceding "create table" statement there is one most important observation to be made - The primary key constraint is mandatory in the creation of the IOT since the index organization of IOT is based on primary key as aforesaid. You may not even disable the primary key constraint in such table. Now there are some key options used here -
  • organization index tells database system (Oracle) to create this table not as normal heap organized but an Index Organized.
  • pctthreshold specifies what % part of the row in IOT to be stored in "index block". This block contains the part of the row to go with primary key column (which includes those columns which are accessed frequently in the query)
  • overflow is the remaining part of the row which is made of columns which are not mentioned in the query frequently and to be treated as overflow and stored separately in specified separate tablespace (in overflow blocks). 

No comments:

Post a Comment