Hello, you are not logged in.  Login or sign up
Community >> Blogs
Search Toad World Search

Blogs
Toad and Database Commentaries

 Toad World blogs are a mix of insightful how-tos from Quest experts as well as their commentary on experiences with new database technologies.

Do you have a topic that you'd like discussed?  We'd love to hear from you.  Send us your idea for a blog topic.

Always Bulk Collect
 
Location: Blogs Steven Feuerstein's Blog    
 StevenFeuersteinTW 6/23/2008 7:34 AM
You learn something new every day, right? Well, I certainly do (more or less). Even about PL/SQL, about which I am sure many people think I already know everything. Far from it.
 
In fact, I learned just last week from the PL/SQL Product Manager, Bryn Llewellyn, that his recommendation regarding cursor FOR loops and bulk collect is different from mine – and for a very good reason.
 
Several years ago, he informed me (and provided a script to verify the statement: 10g_optimize_cfl.sql in the demo.zip) that in Oracle10g, the compiler now optimizes every cursor FOR loop to execute at a level of performance similar to BULK COLLECT.
 
This is a wonderful and non-trivial optimization. Rather than have to go through a manual rewrite of all cursor FOR loops, you can leave them in place and reap the benefits of the BULK COLLECT (sort of).
 
Well, that's not quite true – and it's even less true than I thought.
 
First of all, if your cursor FOR loop contains any DML statements, you will still want to explicitly convert to BULK COLLECT. The reason is that while the optimizer will improve the performance of the query step, it will not optimize the DML statements to execute like FORALLs (the bulk processing analogue for updates, inserts and deletes).
 
So if you have code that looks like (cfl_to_bulk_0.sql in demo.zip):
 
PROCEDURE upd_for_dept (
   dept_in     IN   employees.department_id%TYPE
 , newsal_in   IN   employees.salary%TYPE
)
IS
   CURSOR emp_cur
   IS
      SELECT employee_id, salary, hire_date
        FROM employees
       WHERE department_id = dept_in;
BEGIN
   FOR rec IN emp_cur
   LOOP
      BEGIN
         INSERT INTO employee_history
                     (employee_id, salary, hire_date
                     )
              VALUES (rec.employee_id, rec.salary, rec.hire_date
                     );
 
         adjust_compensation (rec.employee_id, rec.salary);
 
         UPDATE employees
            SET salary = newsal_in
          WHERE employee_id = rec.employee_id;
      EXCEPTION
         WHEN OTHERS
         THEN
            log_error;
      END;
   END LOOP;
END upd_for_dept;
 
You will need to change the cursor FOR loop to BULK COLLECT so that you can fill up collections with data from the query, which can then be used in the FORALL statement. You will, sadly (due to increased complexity and program length) end up with something like this (cfl_to_bulk_5.sql in demo.zip):
 
PROCEDURE upd_for_dept (
   dept_in     IN   employees.department_id%TYPE
 , newsal_in   IN   employees.salary%TYPE
)
IS
   bulk_errors    EXCEPTION;
   PRAGMA EXCEPTION_INIT (bulk_errors, -24381);
 
   TYPE employee_tt IS TABLE OF employees.employee_id%TYPE
      INDEX BY BINARY_INTEGER;
 
   employee_ids   employee_tt;
 
   TYPE salary_tt IS TABLE OF employees.salary%TYPE
     INDEX BY BINARY_INTEGER;
 
   salaries       salary_tt;
 
   TYPE hire_date_tt IS TABLE OF employees.hire_date%TYPE
      INDEX BY BINARY_INTEGER;
 
   hire_dates     hire_date_tt;
 
   CURSOR employees_cur
   IS
      SELECT     employee_id, salary, hire_date
            FROM employees
           WHERE department_id = dept_in
      FOR UPDATE;
 
   PROCEDURE fetch_data_quickly (
      limit_in           IN       PLS_INTEGER
    , employee_ids_out   OUT      employee_tt
    , salaries_out       OUT      salary_tt
    , hire_dates_out     OUT      hire_date_tt
   )
   IS
   BEGIN
      FETCH employees_cur
      BULK COLLECT INTO employee_ids_out, salaries_out, hire_dates_out LIMIT limit_in;
   END fetch_data_quickly;
 
   PROCEDURE adj_comp_for_arrays (
      employee_ids_io   IN OUT   employee_tt
    , salaries_io       IN OUT   salary_tt
   )
   IS
   BEGIN
      FOR indx IN 1 .. employee_ids_io.COUNT
      LOOP
         adjust_compensation (employee_ids_io (indx), salaries_io (indx));
      END LOOP;
   END adj_comp_for_arrays;
 
   PROCEDURE insert_history
   IS
   BEGIN
      FORALL indx IN employee_ids.FIRST .. employee_ids.LAST SAVE EXCEPTIONS
         INSERT INTO employee_history
                     (employee_id, salary, hire_date
                     )
              VALUES (employee_ids (indx), salaries (indx), hire_dates (indx)
                     );
   EXCEPTION
      WHEN bulk_errors
      THEN
         FOR indx IN 1 .. SQL%BULK_EXCEPTIONS.COUNT
         LOOP
            -- Log the error
            log_error
                     (   'Unable to insert history row for employee '
                      || employee_ids
                                     (SQL%BULK_EXCEPTIONS (indx).ERROR_INDEX)
                    , SQL%BULK_EXCEPTIONS (indx).ERROR_CODE
                     );
            /*
            Communicate this failure to the update phase:
            Delete this row so that the update will not take place.
            */
            employee_ids.DELETE (SQL%BULK_EXCEPTIONS (indx).ERROR_INDEX);
         END LOOP;
   END insert_history;
 
   PROCEDURE update_employee
   IS
   BEGIN
      /*
        Use Oracle10g INDICES OF to avoid errors
        from a sparsely-populated employee_ids collection.
      */
      FORALL indx IN INDICES OF employee_ids SAVE EXCEPTIONS
         UPDATE employees
            SET salary = newsal_in
              , hire_date = hire_dates (indx)
          WHERE employee_id = employee_ids (indx);
   EXCEPTION
      WHEN bulk_errors
      THEN
         FOR indx IN 1 .. SQL%BULK_EXCEPTIONS.COUNT
         LOOP
            log_error
                     (   'Unable to update salary for employee '
                      || employee_ids
                                     (SQL%BULK_EXCEPTIONS (indx).ERROR_INDEX)
                    , SQL%BULK_EXCEPTIONS (indx).ERROR_CODE
                     );
         END LOOP;
   END update_employee;
BEGIN
   OPEN employees_cur;
 
   LOOP
      fetch_data_quickly (100, employee_ids, salaries, hire_dates);
      EXIT WHEN employee_ids.COUNT = 0;
      insert_history;
      adj_comp_for_arrays (employee_ids, salaries);
      update_employee;
   END LOOP;
END upd_for_dept;
 
Sorry, I know that I should probably explain this code but to be brutally honest my dad had heart bypass surgery and I am currently at the hospital and so you get the abbreviated version ....
 
Anyway, that is one scenario where you definitely need to convert from the cursor FOR loop.
 
But Bryn, to my surprise, recommends that you always convert explicitly to BULK COLLECT. Why is that? For performance reasons. Apparently, the cursor FOR loop optimization makes the code run faster, but not as quickly as BULK COLLECT.
 
In my tests (which you can reproduce in cfl_vs_bulkcollect.sql), I found that BULK COLLECT ran 33% faster than the cursor FOR loop:
  • Cursor For Loop Elapsed: 8.12 seconds.
  • Bulk Collect Elapsed: 5.46 seconds. 
So if you really want the best performance, take the time and make the effort to switch over to BULK COLLECT.
 
Permalink |  Trackback

Comments (4)   Add Comment
By ThomasKyte on 6/24/2008 12:24 PM
Steve,

I was looking at this - and while I agree that if you are going to read/update in a loop - using forall is the way to go (if you cannot do it in a single sql statement, which you many times can - no code) - and therefore using bulk collect with a reasonable limit would be the right approach.

However, when you say "always convert", I'm not so sure.

I reviewed your test case (cfl_vs_bulkcollect) and my notes are:

a) you bulk collect over a million records. That is about 620mb of pga for that array alone. I would be very upset at the development staff that needs a half a gigabyte of RAM for an array.

b) you would use the simple cursor for loop when you wanted to process row by row - your example stuffs them all into an array, you would not do that in real life. In real life, you would just have "for x in (select) loop process; end loop;"

c) you measure elapsed time, elapsed time is very variable depending on what is going on, I would like to measure cpu and elapsed time.

So, I took your example and reimplemented it as follows

ops$tkyte%ORA11GR1> CREATE TABLE employees_big
2 AS SELECT e1.* FROM hr.employees e1, hr.employees e2, hr.employees e3
3 /

Table created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> create or replace procedure simple( p_iters in number default 1 )
2 as
3 l_start_cpu number := dbms_utility.get_cpu_time;
4 l_start_ela number := dbms_utility.get_time;
5 begin
6 for ITERS in 1 .. p_iters
7 loop
8 for x in ( select * from employees_big )
9 loop
10 if ( x.employee_id+0.1 = x.employee_id )
11 then
12 dbms_output.put_line( 'doh' );
13 end if;
14 end loop;
15 dbms_output.put_line
16 ( iters || '-simple ' ||
17 ' cpu = ' || (dbms_utility.get_cpu_time-l_start_cpu) ||
18 ' ela = ' || (dbms_utility.get_time-l_start_ela) );
19 end loop;
20 end;
21 /

Procedure created.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> create or replace procedure complex( p_iters in number default 1 )
2 as
3 l_start_cpu number := dbms_utility.get_cpu_time;
4 l_start_ela number := dbms_utility.get_time;
5 type array is table of employees_big%rowtype;
6 l_data array;
7 cursor c is select * from employees_big;
8 begin
9 for ITERS in 1 .. p_iters
10 loop
11 open c;
12 loop
13 fetch c bulk collect into l_data limit 100;
14 for i in 1 .. l_data.count
15 loop
16 if ( l_data(i).employee_id+0.1 = l_data(i).employee_id )
17 then
18 dbms_output.put_line( 'doh' );
19 end if;
20 end loop;
21 exit when c%notfound;
22 end loop;
23 close c;
24 dbms_output.put_line
25 ( iters || '-complex ' ||
26 ' cpu = ' || (dbms_utility.get_cpu_time-l_start_cpu) ||
27 ' ela = ' || (dbms_utility.get_time-l_start_ela) );
28 end loop;
29 end;
30 /

Procedure created.


that compares apples to apples - we both want to process all of the rows in the table - we inspect values in the rows and do something. They both do equivalent work. First I was curious "how much memory" would that original approach without limit take?

[continued below]

By ThomasKyte on 6/24/2008 12:25 PM
ops$tkyte%ORA11GR1> connect /
Connected.
ops$tkyte%ORA11GR1> column value new_val V
ops$tkyte%ORA11GR1> select a.name, b.value
2 from v$statname a, v$mystat b
3 where a.statistic# = b.statistic#
4 and a.name = 'session pga memory max';

NAME VALUE
------------------------------ ----------
session pga memory max 1841420

ops$tkyte%ORA11GR1> set echo on
ops$tkyte%ORA11GR1> declare
2 type array is table of employees_big%rowtype;
3 l_data array;
4 begin
5 select * bulk collect into l_data from employees_big;
6 end;
7 /

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1> select a.name, to_char(b.value-&V)/1024/1024 || ' mb'
2 from v$statname a, v$mystat b
3 where a.statistic# = b.statistic#
4 and a.name = 'session pga memory max';
old 1: select a.name, to_char(b.value-&V)/1024/1024 || ' mb'
new 1: select a.name, to_char(b.value- 1841420)/1024/1024 || ' mb'

NAME TO_CHAR(B.VALUE-1841420)/1024/1024||'MB'
------------------------------ -------------------------------------------
session pga memory max 618.6875 mb



that is way more than I am willing to give anyone for a single variable like that - get two or three of those going and bam.

Then, I wanted to see the relative performance:

ops$tkyte%ORA11GR1> connect /
Connected.
ops$tkyte%ORA11GR1> exec complex(5)
1-complex cpu = 543 ela = 596
2-complex cpu = 1075 ela = 1127
3-complex cpu = 1606 ela = 1658
4-complex cpu = 2142 ela = 2194
5-complex cpu = 2677 ela = 2730

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1>
ops$tkyte%ORA11GR1> connect /
Connected.
ops$tkyte%ORA11GR1> exec simple(5)
1-simple cpu = 525 ela = 525
2-simple cpu = 1049 ela = 1049
3-simple cpu = 1572 ela = 1573
4-simple cpu = 2096 ela = 2097
5-simple cpu = 2622 ela = 2623

PL/SQL procedure successfully completed.


Here, the simple approach was *marginally* faster - not enough to be significant - but - the simple code is not "much slower" than the complex code - they are just about the same (similar results in 10g - and if you run it often enough sometimes complex is a LITTLE faster than simple and vice versa - they are about the same)


Thomas Kyte
http://asktom.oracle.com/

By StevenFeuersteinTW on 7/4/2008 9:53 AM
Thanks, Tom! I ran your code and reproduced similar results. I have asked Bryn to also take a look. I definitely would be happy to not feel compelled to convert to bulk collect at all times. But the two of you will need to "duke it out"....

:-)

SF

By StevenFeuersteinTW on 7/23/2008 8:37 AM
I have put together a more complete script to compare cursor for loop, unlimited bulk collect, and bulk collect with various limit clauses. Here are the results I get:

"Bulk Collect with LIMIT 100" Elapsed CPU time: 347
"Bulk Collect with LIMIT 500" Elapsed CPU time: 335
"Bulk Collect with LIMIT 10000" Elapsed CPU time: 323
"Cursor For Loop" Elapsed CPU time: 586
"Bulk Collect" Elapsed CPU time: 461

"Cursor For Loop" Elapsed CPU time: 584
"Bulk Collect" Elapsed CPU time: 461
"Bulk Collect with LIMIT 100" Elapsed CPU time: 331
"Bulk Collect with LIMIT 500" Elapsed CPU time: 312
"Bulk Collect with LIMIT 10000" Elapsed CPU time: 327

This shows that at least for the sort of code shown below, bulk collect with a limit clause works very efficiently, even faster than the unlimited bulk collect - which in this case likely has most to do with memory issues (this is a table of 1.19M rows). I also saw a consistent but small improvement will a limit of 500 over 100, but no reliable improvement in performance at the 10000 limit.

I cannot explain why these results are at odds with Tom's code.

My script:

DROP TABLE employees_big
/
CREATE TABLE employees_big
AS SELECT e1.* FROM employees e1, employees e2, employees e3
/

CREATE OR REPLACE PROCEDURE cfl_vs_bulkcollect (
iterations_in IN PLS_INTEGER := 1
)
IS
TYPE employee_tt IS TABLE OF employees_big%ROWTYPE
INDEX BY BINARY_INTEGER;

CURSOR big_cur
IS
SELECT *
FROM employees_big;

l_start_time PLS_INTEGER;

PROCEDURE start_timing
IS
BEGIN
DBMS_SESSION.free_unused_user_memory;
l_start_time := DBMS_UTILITY.get_cpu_time;
END start_timing;

PROCEDURE show_elapsed_time (str IN VARCHAR2)
IS
BEGIN
DBMS_OUTPUT.put_line ( '"'
|| str
|| '" Elapsed CPU time: '
|| TO_CHAR ( DBMS_UTILITY.get_cpu_time
- l_start_time
)
);
END show_elapsed_time;

PROCEDURE cfl
IS
l_employees employee_tt;
BEGIN
start_timing ();

FOR indx IN 1 .. iterations_in
LOOP
FOR rec IN (SELECT *
FROM employees_big)
LOOP
l_employees (NVL (l_employees.LAST, 0) + 1) := rec;
END LOOP;
END LOOP;

show_elapsed_time ('Cursor For Loop');
END cfl;

PROCEDURE bulk_collect
IS
l_employees employee_tt;
BEGIN
start_timing ();

FOR indx IN 1 .. iterations_in
LOOP
SELECT *
BULK COLLECT INTO l_employees
FROM employees_big;
END LOOP;

show_elapsed_time ('Bulk Collect');
END bulk_collect;

PROCEDURE bulk_with_limit (limit_in IN PLS_INTEGER)
IS
l_employees employee_tt;
BEGIN
start_timing ();

FOR indx IN 1 .. iterations_in
LOOP
OPEN big_cur;

LOOP
FETCH big_cur
BULK COLLECT INTO l_employees LIMIT limit_in;

EXIT WHEN l_employees.COUNT = 0;
END LOOP;

CLOSE big_cur;
END LOOP;

show_elapsed_time ('Bulk Collect with LIMIT ' || limit_in);
END bulk_with_limit;

PROCEDURE run_all (order_in IN VARCHAR2)
IS
BEGIN
CASE order_in
WHEN 'bulk first'
THEN
bulk_with_limit (100);
bulk_with_limit (500);
bulk_with_limit (10000);
cfl ();
bulk_collect ();
WHEN 'bulk last'
THEN
cfl ();
bulk_collect ();
bulk_with_limit (100);
bulk_with_limit (500);
bulk_with_limit (10000);
END CASE;
END run_all;
BEGIN
run_all ('bulk first');
run_all ('bulk last');
END;
/


Comment:
Add Comment   Cancel 
Search Blog Entries
 
Blogger and Topic List
 

 

All Recent Entries
 

 

Johannes Ahrends
Unicode

Steven Feuerstein
Oracle PL/SQL

Daniel Norwood
Toad for Data Analysts
John Pocknell
Toad for Oracle
Bert Scalzo
Toad for Oracle, Data Modeling, Benchmarking
Jeff Smith
Toad product family
Richard To
SQL Optimization
Jim Wankowski
DB2 - LUW and z/OS
John Weathington
Compliance
Doug Williams
Database Musings
  Henrik "Mauritz" Johnson
Toad Tips & Tricks on the "other" Toads
  Toad World Editor
Toad World issues

  Toad Data Modeler Opens in a new window
Data Modeling