Wednesday 4 April 2012

What are Clustered Tables ?

For an introduction to the table types and list visit What are Index Organized Tables?
 

Clustered Tables -

A "Cluster" is a way to store a group of tables that share some common column(s) (related tables) in the same database blocks rather than at different locations. So in a way "cluster" is a way of storing the "related tables" in a "pre-joined" (not exactly joined) form at a common place. The tables are still accessible individually and the SQL obviously remains unaffected. This is DIFFERENT than the "clustered index" used by some other database systems (comments invited from users of such database systems).

In normal case, for example the "emp" and "dept" tables are stored as their own segments like -
Table : emp
empno enamejobdeptno
1wardclerk10
2allenanalyst20
3kingpresident10
4milleranalyst20
5scottaccountant10
Table :dept
deptnodname
10accounting
20research
30sales
40operations
50training
however in cluster named "empdept" the same tables will be stored thus -
Cluster empdept : block1
deptno dname
10accounting
empnoenamejob
1wardclerk
3kingpresident
5scottaccountant
Cluster empdept : block2
deptno dname
20research
empnoenamejob
2allenanalyst
4milleranalyst
.............
Now here observe that you are storing data together grouped by some column (on every block). For example, all of the employees working in department 10 will be stored in the same block (there may be multiple blocks for one particular department if all the employees are not accommodated in one block.). However it does not amount storing the data sorted, as is done in the Index Organized Table, but data is segregated according to the "cluster key" i.e. deptno in a heap on every block and also deptno 50 may be right next to deptno 10 and physically stored quite apart on the disk.

With that basic concept, now there are two types of clusters namely - Index Clustered Tables and Hash Clustered Tables. Also remember that, no matter what type, the existing tables can not be clustered, the process of creating clustered tables is to first create the cluster and then create the tables as part of the cluster. (Watch the syntax and steps below)

Index Clustered Tables -

Actually the above example is of this same type of table cluster only. This cluster is so called (Index Clustered Table) because you are required to create an index on the "cluster" based on the "cluster key" i.e. the column which is common to both the tables and helpful in grouping the rows on blocks. This index, called "cluster index" consists of block addresses (not row ids unlike regular table/column indexes) of the blocks that contain the data about cluster key values i.e each cluster key value points to a block address and that same is resolved from the "cluster index" when a user fires a query referring to a cluster key value. Steps and general syntax for creating the Index Clustered Tables is -

Creating the cluster -
Create cluster empdept (deptno number(3)) size 1024;
Explanation : the cluster name is "empdept", the cluster key is deptno, size 1024 indicates that we expect about 1024 bytes of data to be associated with each cluster key value.

Creating the "cluster index" -
Create index empdeptidx on cluster empdept;
Explanation : the index is created on "cluster key" however it is not allowed to specify it here in its create statement because this command may pick it up from definition of the cluster (metadata).

Creating the tables in the cluster -
Create table dept (deptno number(3) constraint pk_dept primary key,
                   dname  varchar2(10),
                   loc    varchar2(10))
cluster empdept (deptno);

Create table emp (empno number(4) constraint pk_emp primary key,
                  ename varchar2(10),
                    ........
                  deptno number(3) constraint fk_emp references dept (deptno))
cluster empdept (deptno);

Explanation : The non-bold portion of the create table statement is just a normal create table statement. The emboldened part of the statement tells Oracle to create these tables in cluster "empdept" and map the column "deptno" to the cluster key (also "deptno"). The column of the table which maps to the cluster key must have same datatype and precision specifications. Also remember that the constraint definitions are NOT mandatory.

Hash Clustered Tables -

The only difference from the index cluster is that the "cluster index" is replaced with a "hash function". The cluster key value is passed to a hash function (either internal or user defined) to produce a hash value in the output which points to the block location of the data corresponding to the value. Hence no index (and hence no index scan) is required, but with the limitation that, the hash function can take only one value at a time for input, that makes hash clusters unsuitable for range queries based on "cluster key" but suitable for "equality" queries and is more suitable for high cardinality (distinctly valued) of cluster key values. Steps and general syntax for creating "hash clustered tables" is -

Creating the cluster -
create cluster empdept (deptno number(3))
hashkeys 1000
hash is deptno
size 1024;
Explanation : The "hashkeys" indicate the number of possible distinct values of cluster key. "Hash is" is used to point to the hash function to be used - for internal hash function you have to specify simply the name of the hash key or omit this clause to default to it. The meaning of "size" is same as above.

Creating the tables -
There is no change in the syntax for creating the tables.

Why use "Clustered Tables"? -

Clusters help read "join" intensive queries on related tables with reduced I/O. However unless the size parameter and hashkeys parameter (in hash clusters) are not correctly estimated then the space utilization factor may be affected adversely also clusters slow down the DML.

Can user Cluster on Single table ? -

Yes, sometimes Hash Cluster may used for single table with the advantage of faster access in high cardinality cluster key (based on primary key). Such cluster may be called "Single Table" hash cluster and only requires "single table" clause to the above syntax. It then restricts you from creating a second table in the cluster. When there is one-to-one mapping between hash keys and data rows then the access is somewhat faster.

Dropping a Table Cluster -

You will have to drop all the tables first from the cluster using the normal "drop table" command or use "drop cluster <name> including tables" statement. Tables contain the data or empty does not matter for dropping the cluster.

Visit this if your are a Careerholic

2 comments:

  1. Boring.... Need simple to understand

    ReplyDelete
  2. Simple and terse explanation, especially from an Java or other language application developer's interview perspective

    ReplyDelete