To a large extent, the choice between an indexed retrieval and a full table scan depends on the size of the table and the pattern of access to that table. For example, if the majority of a large table is being processed, a serial search, or full table scan, can actually be faster. If the rows being accessed are randomly dispersed throughout the table, processing them in logical (indexed) sequence might be quite slow. In addition to the disk head movement required to retrieve the records, remember that every read of a row requires an additional read of the index.

If every read of the index and record were performed by a single physical disk read (worst case scenario) and in a contiguous disk sector sequence, the break-even point for reading the entire table would be around 50.1% (one index and one table read per record). Because use of the System Global Area (SGA) cache reduces the number of physical to logical reads necessary, the actual break-even point is not quite so obvious.

There has been a good deal of testing on the choice between a full table scan and an indexed retrieval. The tests have shown that choosing a full table scan over an indexed retrieval depends on a number of factors. Issues such as how many rows of the table can fit into a single Oracle block can directly affect the outcome. Oracle blocks are read, written, and cached in the SGA as entire blocks. The more rows contained within a block, the fewer physical reads are needed to scan the entire table. The more dispersed the (indexed) consecutive rows are throughout the table and the fewer the number of rows that can be contained within an Oracle block, the less the likelihood of the next row's being within the SGA cache.

If the only columns being selected were indexed columns or pseudo SQL columns (USER, SYSDATE, LEVEL, COUNT(*), etc.), an index read would always be the most efficient. In such a case, only the index would need to be read. The physical rows would never actually be retrieved. If data (columns) are required to be retrieved from each row, the break-even point is a combination of the percentage of the table that needs to be read and at least some (and perhaps all) of the following elements:

  • Operating system block size
  • Oracle block size (e.g., DB_BLOCK_SIZE initialization parameter)
  • Shared pool size (e.g., SHARED_POOL_SIZE or PGA_AGGREGATE_TARGET initialization parameters)
  • Oracle data read-ahead size (e.g., DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter)
  • Operating system filing system (e.g., BAKED, JOURNALING, RAW...)
  • Disk subsystem type (e.g., RAID 0 ... RAID 5)

In the following figures, the leftmost column shows what percentage of the table (number of records) met the SQL selection criteria. The first example shows that if approximately 52% of the table needed to be read, a full table scan performs better.

  • Figure 1

In the next figure, the break-even point for the same database setup and hardware configuration is only 15.5% of the table. This difference is credited to the smaller physical size of each record and, consequently, the larger number of rows that can fit in a single Oracle block. The more rows per block, the less reading is required by a full table scan to read the entire table.

  • Figure 2

Although some guidelines have been presented, remember to test each table separately; one rule does not fit all. For those people who insist on quoting a full table scan rule of thumb, it is safer to look at a break-even point based on the combination of the physical number of rows per Oracle block and the percentage of the table that requires reading.

Most people generally accept a break-even point of 15-20%. Others go as low as 10%. The true answer is that it all depends, so test your applications well.