Saturday, 31 March 2012

What are Tablespaces?

"Tablespace" is part of storage architecture of some of the commercial database systems like Oracle or DB2. Some of the database systems (like Microsoft SQL Server) are not known to use the concept of tablespace (I invite MS SQL Server or other database system users to elaborate on this in comments or provide links to a detail information if they know). Here we are considering Oracle as the model for learning the concept of tablespaces, which provides a layer of abstraction between physical and logical data. Rather, the cause of discussion about the tablespaces is that we are going to study the storage architecture of Oracle database system.

Oracle Storage Architecture -

We all know, there are two agencies at work when we come to storing and accessing the data in the databases - The OS (Operating System) and DBMS (Oracle). The calls are made for the data handling / manipulation from the DBMS in the form of (SQL) statements which requires a "view level" abstraction for referring to the data, whereas when it actually comes to read/write operation, it is the OS which plays that role, has to deal with "physical level" of the data storage. And thus the overall study of storage architecture would involve "Logical" and "Physical" perspective, the former from DBMS point of view and later from OS's side. Incidentally, we formally define the terms - "Physical" as something, the existence of which can be proved in terms of it occupying some space (area or volume) and "Logical" which exists in terms of an assumption or idea that does not have a shape.

As such the OS identifies files as the containers to hold or store the data. The files (datafiles) have an area and occupy a certain space on the storage device (hard drive), hence physical. "Tablespace" however is the logical storage as perceived by the DBMS and is well defined as "group of datafiles".

The concept of "group" itself is logical which exists in terms of its members (physical). For example your friend circle - each member is physical and the notion of belonging to the "circle" is there in the mind of each friend and has no physical existence. I have been insisting on this analogy for this would help you understand and digest the forthcoming discussion.

Why use Tablespaces? -

Having defined the tablespaces and accepting it as "group of datafiles", there are two main reasons, among others for using the tablespaces - Optimize the performance and provide virtually unlimited size of the segment (say Table) to grow.

Tablespace allows its data files to be located on different physical hard drives (do all you friends always move together?) and this arrangement comes as a great relief in reducing read/write contention in different parts of the segment in a multi-user environment, as the segment can spread within the imaginary boundary of the tablespace which contains (read groups) those data files. Simply put, a segment can not remain confined to a data file but to a tablespace. So the "chunks of spaces" (say extents) allocated to a segment scatter over all the data files belonging to a tablespace. For example consider a tablespace named "tbs1" has datafiles named "df1", "df2" and "df3" and they may be residing on hard drives "hd1", "hd2" and "hd3", then for some table "T" the extents will scatter over those datafiles (and hence on multiple drives) and as such will reduce the R/W contention. So for creating a tablespace of a size of say 500MB (just example), instead of creating a single file of that size to become part of this tablespace, have 5 files of 100MB each located on different drives. It has been observed some times that a lot of query optimization efforts go in vain if this basic tip about the tablespace layout is not followed.

Secondly, you may have virtually unlimited size of a segment (table) as it may no more remain confined to the maximum allowable size of the file. A segment can spread over all the files of a tablespace which may potentially have 1023 datafiles. Moreover partitioning of tables or indexes also allow them to be stored over multiple tablespaces with each partition being treated as a segment and hence can reside in different tablespace.

The other few advantages of tablespace are -
  • Database systems provide the facility of performing the backup / recovery at the tablespace level.
  • A tablespace may be taken offline for addressing any maintenance issues or reorganization of data files, so no need to stop entire database.
  • Tablespaces with "read only" status help reduce the backup time, size, resources and efforts as such tablespaces need not be backed up in every backup cycle (as they do not allow any change in the data in the segments stored in them).
  • Tablespaces allow tables / indexes with different access patterns to group them.

Tablespace Types -

Oracle has classified the tablespaces in different ways -
  • System & Non-System - This fundamental classification identifies the "system" tablespace as the first, mandatory and single tablespace with the same name as its type created in the database (created at the time of creation of the database, rather creating database summarily is creating a system tablespace at its minimum), which houses the data dictionary or database repository or catalog. Theoretically it is possible to have this only one tablespace in the database to also hold user's tables, but that is something discouraged in practice and hence a set of tablespaces created for holding users' data are called "non-system" tablespaces. There may be several of them in the database.
  • Permanent, Temporary and Undo- A "Permanent" tablespace is that contains "permanent" type of segments i.e. those which are created using an explicit "create ..." statement and dropped using an explicit "drop ..." statement. Such segments are for example Tables, Indexes, Clusters and Partitions which are created / dropped with an explicit command. The "Temporary" tablespaces are used to hold the "temporary" segments, which are used as scratchpad during a sorting event and only created implicitly whenever required as a SQL statement needs to sort (Query statements with distinct, group by or order by clauses). The "Undo" tablespace category was introduced in Oracle 9i to support automatic undo management feature i.e rollback segments and its contents and creation are managed implicitly by the system and also it supports Flashback Query (FBQ); a feature which allows to "view" the data as it was at particular time in past within the scope of "retention time". Prior to version 9i Oracle DBA's would store the rollback segments in some "permanent" type of tablespaces by creating them explicitly. The "temporary" tablespaces are said to contain "tempfiles" rather than data files, which have same format & structure as data files but only do not generate "redo" (history of changes to the data, required for future recovery of database) as the same is not required to be stored for a trivial event like sorting of the data that does not amount to any change to the database as such.
  • Dictionary Managed and Locally Managed - The later type was only introduced in 8i (version 8.1.5) as option and now in later versions used mostly as default type of tablespaces created in Oracle database. "Dictionary Managed" tablespaces had to "consult" the data dictionary for the account of allocation / de-allocation of the extents. This would involve some "recursive sql" to be fired (at data dictionary) to fetch and/or update the catalog and also would, in a volatile database, tend to serialize the operation causing overheads and delays in operations which particularly required extension of space (for segments). Another fact with the "Dictionary Managed" tablespaces was a marginal under-utilization of space due to "bubble free space" getting trapped in non-uniform sized extents of different segments. The problems are removed in "Locally Managed" tablespace by having a bitmap stored in headers of the datafiles to indicate vacancy or occupancy of the extents and also by uniformly sizing the extents or auto allocation feature(system will decide the size of extent) has eliminated the possibility of under utilization. However some DBAs have reservations about this new type as they opine that space management and growth of segments is their job as administrator and to that scope locally managed tablespaces are an encroachment on their role of space administration.

Data Files -

The discussion could be incomplete without considering the "Data Files" which are the "Physical" entities which make the tablespaces and it is here where the data is actually stored and read from. There is no free existence of the data file and it has to belong to some tablespace as Oracle refers to storage from its own stand point of view as we have discussed so far. Data files in Oracle are formatted according to Oracle's own block size as specified in db_block_size parameter. The standard allowable sizes are 2K, 4K, 8K, 16K, 32K. Normally only one of these block sizes should be used but as from Oracle version 9i onwards, Oracle allows multiple block sizes and hence there may be tablespaces with different block sizes created in latest of the databases mounted on these versions. These block sizes must be in multiples of OS block sizes as the data is ultimately read/written by the OS and for an abnormal size of Oracle block (not in multiple of OS block) there may be wastage of space and increased I/O for same amount of data.

How to determine the data block size -

The choice of the correct block size is a very critical decision. And there are certain guidelines to be followed to arrive at the most appropriate size. 

In general Oracle recommends 2K & 4K sizes for an OLTP type of database, because the applications which run from them usually has fair mix of all types of data access statements and deal with small amount of data for each operation. The larger range 16K & 32K are recommended for DSS systems (Decision Support Systems aka data warehouses) as the access pattern is mostly in the form of "select" statements with aggregate functions which require bulk of data to be read in the memory. And of course for mixed workload environment 8K is normally preferred.

The block size should also be large enough to accommodate entire size of the rows and the rows need not to be fragmented. The phenomenon is called "row chaining" and has to be avoided to reduce the I/O or otherwise the fragments have to be read from different blocks.

Small range of blocks are also considered to be suitable for random access for small average row sizes, whereas for sequential access pattern for the same small sized rows, larger range of block sizes may be recommended.

You have to also consider the volatility of the databases and density of operations, since small block size with high density of operations may cause contention and hence may not be suitable for highly volatile data tables.

Larger block sizes are obviously more suitable for LOB type columns. 

Conclusion -
The knowledge of the storage structure of the database is extremely crucial not only from the point of view of space management but also from performance optimization.

No comments:

Post a Comment