Everybody uses indexes in order to improve the performance of queries. However I have seen that just few people understand the concept of an Index in Oracle, even B-tree indexes that is the commonest index used in the industry. The most DBAs know just the following:

  1. Oracle B-Tree Index is a B-tree. But do you understand how does the B-tree work?
  2. Indexes improves my queries.
  3. Indexes decreases the performance of my DMLs. But do you understand why?
  4. Indexes uses space.

But there are a lot of concepts behind of the Oracle Indexes. Do you understand the redundant Indexes? do you know that there are more types of indexes? do you understand the difference between Global Index and Local Index? Why PCTUSED doesn't make sense on indexes? do you know that PCTFREE is ignored after the "CREATE INDEX" sentence? do you know that the Index Leaf Nodes also have pointers to the next and also to the previous Index Leaf Node? Well, I can continue asking you a lot of questions about indexes. The problem nowadays with Oracle Technology is that everything is changing, Oracle always is trying to get sophisticated, Oracle wants to make the smartest Oracle Database in the market. This results in a complex documentation for us, a lot of manuals, a lot of books to read (if we want to be updated). Since we have to read a lot of new and fresh information, usually we don't do a deep understanding of what we read. At the end of the day we usually know the "General Concept" of the things. Well, in this article I will try to teach you the "Basic Concept of B-trees", I will adapt this concept to Oracle because you are supposed to be learning about Oracle and finally we will apply our "Basic Concept" on an example while we are taking a look into internals. At the end of this article you will be able to explain how the Oracle B-Tree indexes works and you will find out that Oracle is easy, it is the pure basic concept working with memory, cpu, and storage...

Are you ready? (do you have your cup of coffee?) Let's start!

For this example I will use this table:

SQL> create table dgomez.t1(
id number,
value varchar2(1)); 2 3

Table created.

I will use the following index:

SQL> create unique index dgomez.btree on dgomez.t1(id) pctfree 99;

Index created.

Important note: Don't use PCTFREE 99, I am using this only for study reasons. While we are discussing the topics, I will be inserting rows in the table and after the inserts I have to recreate the index. Please don't forget this, because PCTFREE is only used in the index creation. Anyways I will reminder you when you have to recreate the index.

The Basic concept adapted to Oracle context:

An Oracle B-Tree is the same than the "B-Tree" that we learnt when we were students, the B-tree also is balanced, it has "height" which is the number of levels of the B-Tree. The values in the indexes are always sorted. Every Leaf Node is in the same level. it has two kind of nodes:

  1. Leaf nodes:A leaf node has rows or entries you can call it how you want. Every entry has two values:
    • The indexed columns (it must have at least 1).
    • The ROWID: This is only an address that says where is physically our information (Table row). We can call just for this article "The Value".

    Each Leaf node in an Oracle B-tree can save a lot of entries, this depends of the PCTFREE and of course of the Block's size. I have to say that because the Generic Concept of B-tree says that it has only a value or entry. Well, Oracle can put a lot of entries in a single Leaf Node. Plus this information, the Leaf Nodes also have two pointers more: The previous Leaf Node and the Next Leaf Node.  Leaf block rows hold the <KEY, KEYDATA> pairs stored by the B-tree. 

    1. Branch nodes: A Branch node has pointers to each Leaf Node. Few people know that the a Branch Node also has data! yes, Branch nodes also have the data of the indexed values, this values are used by Oracle in order to decide in which Branch/Leaf node has to go in order to get the information requested by the user. Branch block rows hold <separator key, kdba> pairs used to guide the B-tree search to a row in a leaf block.

    What about the Root Node? well, in Oracle is a Leaf Node. We can say that the "root node" is the first node in a B-tree and its type is "Leaf Node". and when the index grows up the Root Node becomes in a Branch Node. 

    Inserting our first entry:

    When we insert a row in the table, also the index of that table is updated. Also here is a good point to say that if you table is empty and you create an index on it your index will not have any Segment, your index will exist only in the dictionary. If you try to see the Index Segment by dump it you will receive the following message:

    ORA-00600: internal error code, arguments: [25027], [4], [1], [], [], [], [], [], [], [], [], []

    The Oracle's way to say "your index doesn't have any entry" Cool.

    After to insert your first row in your table you will see that the index finally have a segment and also the root node will be created. Since the root with 1 entry is also a Leaf Node. The previous and Next Leaf Node pointers will be null because there are no more Leaf nodes.

    How does our index look? (recreate the index).

    ----- begin tree dump
    leaf: 0x100008b 16777355 (0: nrow: 1 rrow: 1)
    ----- end tree dump

    Yes, There is only one Leaf node. But wait, isn't a root node? If you have already the answer for this question then you are doing well Wink

    Would you like to take a look into the Leaf Node?

    Leaf block dump
    ===============
    header address 139950941831268=0x7f48de2c8064
    kdxcolev 0 <--block level
    KDXCOLEV Flags = - - -
    kdxcolok 0 <--itl of service tx holding block lock
    kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y <--block lock op code 
    kdxconco 1<--number of columns
    kdxcosdc 0<--Split or Deleted count
    kdxconro 1<--Number of entries
    kdxcofbo 38=0x26 <--offset where the free space in this block starts
    kdxcofeo 8021=0x1f55 <--offset where the free space in this block finishes
    kdxcoavs 7983 <--8021-38 (kdxcofeo-kdxcofbo) The free space!
    kdxlespl 0<--space held by unlocked split holes
    kdxlende 0 <--entries deleted
    kdxlenxt 0=0x0 <--there is not pointer to Next Leaf Node
    kdxleprv 0=0x0 <--there is not pointer to Previous Leaf Node
    kdxledsz 6<--bytes used by rowed data (KEYDATA)
    kdxlebksz 8032<--usable space in the block.
    row#0[8021] flag: ------, lock: 0, len=11, data:(6): 01 00 00 85 00 00
    col 0; len 2; (2): c1 02
    ----- end of leaf block dump -----

    Basically this Leaf node has the following:

    Value of this Leaf Node: 1
    ROWID (The physical address to the Table Row): 01 00 00 85 00 00

    Would you like to confirm it?

    SQL> var n number
    SQL> exec dbms_stats.convert_raw_value('c102',:n);

    PL/SQL procedure successfully completed.

    SQL> print

    N
    ----------
    1

    Let's suppose that our Leaf Nodes can only have 1 entry and our Branch Nodes can only have 814 pointers.
    What happen when we insert the second row in our table which means that a second entry has to be inserted in the Index? Well, Since a Leaf Node can have only 1 entry (in this example) now a Branch node has to be created. The Branch Node will have 2 nodes. And now the type of the root node is...?...?... yes! you got it, now the type of the root node is "Branch Node".

    How does our index look? (recreate the index).

    ----- begin tree dump
    branch: 0x1000946 16779590 (0: nrow: 2, level: 1)
    leaf: 0x100008c 16777356 (-1: nrow: 1 rrow: 1)
    leaf: 0x100008d 16777357 (0: nrow: 1 rrow: 1)
    ----- end tree dump

    Cool! isn't it? When the "Basic Concepts" matches with the internals everything looks easy CoffeeCool

    But why a "PREV" entry? the "PREV" entry is used when the value that we are searching is lesser than the value in the first entry, for example, look a the last image, "PREV" is just a pointer, it's not an entry really.  The first entry  is "2"  (value,rowid), so when you are searchings a value, that value is compared with "2" and if your value is lesser than "2" oracle immediately go to the address where the pointer "PREV" is pointing.

    Well now let's suppose that we have inserted 814 rows in our table. Our index should have 814 entries and our B-tree should look this way:

    What happen id we insert another row in our table? do you remember that our Branch Nodes can have only 814 pointers (in this example)?
    Now we want to insert another Leaf Node and the current Branch Node is not able to have other pointer. In this case, Oracle creates "another level" in B-Tree. Now we have The root (a branch with 2 entries), 2 branch nodes (1 with 814 entries, 1 with 1 entry) and 815 Leaf Nodes. Our B-tree should look this way:

    How does our index look? (recreate the index).

    ----- begin tree dump
    branch: 0x100008b 16777355 (0: nrow: 2, level: 2)
    branch: 0x1000946 16779590 (-1: nrow: 814, level: 1)
    leaf: 0x100008c 16777356 (-1: nrow: 1 rrow: 1)
    leaf: 0x100008d 16777357 (0: nrow: 1 rrow: 1)
    leaf: 0x100008e 16777358 (1: nrow: 1 rrow: 1)
    leaf: 0x100008f 16777359 (2: nrow: 1 rrow: 1)
    leaf: 0x1000608 16778760 (3: nrow: 1 rrow: 1)
    ...
    leaf: 0x10006e8 16778984 (218: nrow: 1 rrow: 1)
    leaf: 0x10006e9 16778985 (219: nrow: 1 rrow: 1)
    leaf: 0x10006ea 16778986 (220: nrow: 1 rrow: 1)
    ...
    leaf: 0x1000942 16779586 (810: nrow: 1 rrow: 1)
    leaf: 0x1000943 16779587 (811: nrow: 1 rrow: 1)
    leaf: 0x1000944 16779588 (812: nrow: 1 rrow: 1)
    branch: 0x1000948 16779592 (0: nrow: 1, level: 1)
    leaf: 0x1000945 16779589 (-1: nrow: 1 rrow: 1)
    ----- end tree dump

    Would you like to take a look into the Root Node? Don't be shy!

    Branch block dump
    =================
    header address 139950945835596=0x7f48de699a4c
    kdxcolev 2  <---- Level
    KDXCOLEV Flags = - - -
    kdxcolok 0
    kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
    kdxconco 1
    kdxcosdc 0
    kdxconro 1
    kdxcofbo 30=0x1e
    kdxcofeo 8048=0x1f70
    kdxcoavs 8018
    kdxbrlmc 16779590=0x1000946<--Address to Prev Node
    kdxbrsno 0<--Last index entry modified
    kdxbrbksz 8056<--usable space in the block.
    kdxbr2urrc 0
    row#0[8048] dba: 16779592=0x1000948 <--Pointer to the 2nd intermediate branch block
    col 0; len 3; (3): c2 09 10  <--first column (ID) value by which to navigate
    ----- end of branch block dump -----

    As you can see "kdxbrlmc" is our "PREV" and the pointer of the node which has the value 815 is "0x1000948" you can see it in "row#0". So this node has basically one entry with the following:

    Value=815
    Node address=0x1000948

    ah? come on! don't you believe me? let's confirm it:

    SQL> var n number
    SQL> exec dbms_stats.convert_raw_value('c20910', :n);

    PL/SQL procedure successfully completed.

    SQL> print

    N
    ----------
    815

    Yes, the value for the entry is 815!

    Would you like to take a look into the first Branch Node in the level 1?

    Branch block dump
    =================
    header address 139950945835596=0x7f48de699a4c
    kdxcolev 1
    KDXCOLEV Flags = - - -
    kdxcolok 0
    kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
    kdxconco 1
    kdxcosdc 0
    kdxconro 813<------- #entries in the block
    kdxcofbo 1654=0x676
    kdxcofeo 1659=0x67b
    kdxcoavs 5
    kdxbrlmc 16777356=0x100008c
    kdxbrsno 0
    kdxbrbksz 8056
    kdxbr2urrc 0
    row#0[8049] dba: 16777357=0x100008d
    col 0; len 2; (2): c1 03
    row#1[8042] dba: 16777358=0x100008e
    col 0; len 2; (2): c1 04
    row#2[8035] dba: 16777359=0x100008f
    col 0; len 2; (2): c1 05
    row#3[8028] dba: 16778760=0x1000608
    col 0; len 2; (2): c1 06
    ....
    row#810[1675] dba: 16779586=0x1000942
    col 0; len 3; (3): c2 09 0d
    row#811[1667] dba: 16779587=0x1000943
    col 0; len 3; (3): c2 09 0e
    row#812[1659] dba: 16779588=0x1000944
    col 0; len 3; (3): c2 09 0f
    ----- end of branch block dump -----

    As you can see "kdxbrlmc" is our "PREV" which has the pointer of the node which has the value of "1", the address is "0x100008c". The row#0 has the address of the Node which has the value of "2". So this node has basically 813 entries, the PREV has the value 1 and the first entry will have the value 2:

    PREV entry:

    value=1
    node address=0x100008c

    First entry:

    value=2
    node address=0x100008d

    SQL> var n number
    SQL> exec dbms_stats.convert_raw_value('c103', :n);

    PL/SQL procedure successfully completed.

    SQL> print

    N
    ----------
    2

    Last entry:

    Value=814
    Node address=0x1000944

    SQL>exec dbms_stats.convert_raw_value('c2090f', :n);

    PL/SQL procedure successfully completed.

    SQL> print

    N
    ----------
    814


    Finally, let's take a look into the Leaf Node that has the value "3":


    Leaf block dump
    ===============
    header address 139950945835620=0x7f48de699a64
    kdxcolev 0
    KDXCOLEV Flags = - - -
    kdxcolok 0
    kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y
    kdxconco 1
    kdxcosdc 0
    kdxconro 1
    kdxcofbo 38=0x26
    kdxcofeo 8021=0x1f55
    kdxcoavs 7983
    kdxlespl 0
    kdxlende 0
    kdxlenxt 16777359=0x100008f
    kdxleprv 16777357=0x100008d
    kdxledsz 6
    kdxlebksz 8032
    row#0[8021] flag: ------, lock: 0, len=11, data:(6): 01 00 00 85 00 02
    col 0; len 2; (2): c1 04
    ----- end of leaf block dump -----

    Basically this Leaf node has the following:

    Previous Leaf Node:

    Value=2
    Node address=0x100008d

    Next Leaf Node:

    Value=4
    Node address=0x100008f

    Value of this Leaf Node: 3
    ROWID (The physical address to the Table Row)=01 00 00 85 00 02

    But, does this Leaf node really have the value "3"?

    SQL> var n number
    SQL> exec dbms_stats.convert_raw_value('c104', :n);

    PL/SQL procedure successfully completed.

    SQL> print;

    N
    ----------
    3

    If you continue inserting rows, a lot of rows you will have a B-tree alike the following:

    Also keep in mind that every leaf node has a pointer to next and previous Leaf Nodes I mean, the Leaf nodes are connected as well, look at the following image for a better idea:

    So, so far you know in which Node is the value "3". Which Nodes should you read in order to find the correct Leaf Node?, let's simulate the Searching process.

    1. Root node.

      if "3" >= "815" (c2 09 10) 
             go to the node 0x1000948
      else
             go to the node 0x1000946

    2. We are in node 0x1000946, First Branch Node at Level 1)

      if "3"<"2" (c1 03) then
             go to 0x100008c
      else
             if "3" ==2 (c1 03) then 
                    go to 0x100008d
             else
                    if "3"=="3" (c1 04) then
                           go to 0x100008e
                    else
                           (more conditions)....

    3. We are in the First Leaf Node at Level 2, this Node is in the First Branch Node at Level 1

      What is the ROWID? 
      The ROWID is=01 00 00 85 00 02

    4. Go to Read the table block where the ROWID is pointing

    Read the Table Row (Id, value) and show the result.

     

    How many reads did we do? 4, right? Well, let's confirm it.

    SQL> select id, value from dgomez.t1 where id=3;

    ID V
    ---------- -
    3 D


    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 408250987

    -------------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
    -------------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 1 | 15 | 2 (0)| 00:00:01 |
    | 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 15 | 2 (0)| 00:00:01 |
    |* 2 | INDEX UNIQUE SCAN | BTREE | 1 | | 2 (0)| 00:00:01 |
    -------------------------------------------------------------------------------------

    Predicate Information (identified by operation id):
    ---------------------------------------------------

    2 - access("ID"=3)


    Statistics
    ----------------------------------------------------------
    0 recursive calls
    0 db block gets
    4 consistent gets
    0 physical reads
    0 redo size
    457 bytes sent via SQL*Net to client
    508 bytes received via SQL*Net from client
    1 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    1 rows processed

    SQL>

    Hope you enjoyed this article, cheers Beer

    Update: 29 October 2014 - Continuation of this article: Oracle Bitmap Index: From the concept to Internals

    by Deiby Gómez.