Quick Hint: Using LOBS with PL/SQL – Performance Problems

Published in Database Tips, PL/SQL, Quick Tips, Tuning by gpike Monday May 12, 2008

I just recently authored a PL/SQL Server Page designed to funnel large amounts of data via the htp.p function to a web application. Because the volume of data could be very large, I wanted to minimize the individual calls to the htp.p function (which can only take a maximum of 32767 characters per call).

The answer was obviously CLOBS! I proceeded to accumulate the full multi-megabyte response in a CLOB by concatenating the individual VARCHAR2 results line by line as they came out of the database.

Now I’ll be the first to admit that my experience with LOBs in Oracle has been very limited. And with the onset of SecureFiles in 11g, maybe I will never really become very proficient with LOBs. But in my LOB naivety, I assumed that if it COULD be done, it would probably work fine. You would think that after 15 years of Oracle development, I would know better…nope.

DECLARE
    bigLob CLOB;
    CURSOR getVarchars2 IS
         SELECT singleColumn FROM table;
BEGIN
FOR v_rec IN getVarchars2 LOOP
    bigClob := bigClob||v_rec.singleRow||linebreak;
END LOOP;
    doStuffWithTheCLOB(bigCLOB);
END;

It looks so innocent. Append the VARCHAR2 to the end of the CLOB. Let the database do the data type conversion from VARCHAR2 to CLOB. No problema! 10 rows, 100 rows, 1000 rows, all working just fine. Then came the request for 44K rows of data. The database became very angry and would not cooperate (“Stop asking for 44k rows!”). The user community became very angry and would not cooperate (“We need our 44k rows!”).

The solution is of course found in an obscure part of the Oracle documentation.

Here are the relevant entries:
• When Possible, Read/Write Large Data Chunks at a Time: Because LOBs are big, you can obtain the best performance by reading and writing large pieces of a LOB value at a time. This helps in several respects:
    o Writing to the LOB creates a new version of the LOB chunk. Therefore, writing small amounts at a time will incur the cost of a new version for each small write. If logging is on, then the chunk is also stored in the redo log.

• If you use the newly provided enhanced SQL semantics functionality in your applications, then there will be many more temporary LOBs created silently in SQL and PL/SQL than before. Ensure that temporary tablespace for storing these temporary LOBs is large enough for your applications. In particular, these temporary LOBs are silently created when you use the following:
     o SQL functions on LOBs
     o PL/SQL built-in character functions on LOBs
     o Variable assignments from VARCHAR2/RAW to CLOBs/BLOBs, respectively.
     o Perform a LONG-to-LOB migration

I highly recommend looking at this chapter in the LOB documentation prior to working with LOBs in SQL and PL/SQL.


Greg Pike

Greg Pike
PIOCON Technologies Website

Oracle Web Development: Pagination with a Single Query

Published in Complex SQL, SQL, Tuning by gpike Sunday May 13, 2007

Note: This previous article was enhanced with additional information regarding read-consistent pagination.

Found at the top and bottom of almost any web page that displays long lists of information are seemingly-innocuous little widgets commonly called “pagination” objects. They allow the user to advance to the next or previous page of information as well as jumping to a specific page. Although they may be a small part of a web page’s functionality, they can be a resource hog to produce if not developed with proper techniques.

Java applications seem to be a conspicuous culprit in producing poorly-conceived pagination objects. The scenario plays out in the following way:

1) Select the complete list of items to be displayed in the proper order; perhaps 1000’s of rows are selected.

2) March through the ordered list in a loop and accumulate a count of the items. When the appropriate section of the result set is found, accumulate these details into a display object.

3) Continue through the rest of the list to get full a count of the total items to feed the pagination widget.

Alternate scenarios may use two queries, one to count the total items for the entire result set and the other to get just those items to be displayed on the web page. This is an improvement over digging through the entire result set, but its still inefficient.

The appropriate solution is a Single Query. The goal is to produce a query that returns BOTH the total of count of the items in the full result set (to support creating the pagination object) and only those rows that will be displayed on the web page.

For our example, consider a table of PRODUCTS and a web page that displays any product with names based on a requested search criteria, 25 to a page. This is not an exercise in appropriate search conditions, so for our example we assume the user enters a keyword and we use % symbols on each end of the text fragment to find matching rows. The uninspired query looks like this:

SELECT product_name,
       product_details
FROM   products
WHERE product_name like '%Television%'
ORDER BY product_id;

The inefficient program first issues the query and then counts the rows in the result set, pulls out just the appropriate rows for display and crafts the pagination object all in a loop construct.

But the database can do most of the work for you. The following query retrieves only the rows to be displayed using in-line views and the total number of rows for the entire result set using the magic of the PARTITION BY clause.

SELECT product_name,
       product_details,
       total_rows
FROM   (
        SELECT product_name,
               product_details,
               total_rows,
               rownum row_counter
        FROM   (
                SELECT product_name,
                       product_details,
                       count(*) OVER () total_rows
                FROM   products
                WHERE product_name like '%Television%'
                ORDER BY product_name
               )
       )
WHERE row_counter between v_start_row and v_end_row;

Here’s how the query works. In the innermost in-line view, the entire result set is selected and the the count(*) OVER () column makes sure that every row contains a column with the total number of rows in the query. No matter what subset of rows are ultimately selected, the count of the entire inner result set is always available. This inner in-line view also sorts the data.

In the next in-line view, all the rows are selected from the innermost in-line view with an additional column added to identify each row by an incrementing number. Perhaps you are thinking to yourself, “Why didn’t he just add the rownum column to innermost in-line view?” The answer is that rownum is added to a result set PRIOR to an ORDER BY clause. It is necessary to first sort the result set, then add the rownum later in the process.

In the final step, our outermost query actually selects only the rows of interest by specifying BETWEEN a start and end position (provided here by the v_start_row and v_end_row variables). Most applications that use pagination probably are passed a variables for the page to display and a variable for the number of lines to display per page. In that case, the start and end positions are calculated:

v_end_row = page to display X number of rows per page
v_start_row = ((page to display - 1) X number of rows per page) + 1

The pagination object for the new page is created by capturing the total row count from any of the individual rows that come out of the query. In the loop that processes these rows, the total can be assigned to a variable for later use by the pagination constructor.

An Alternative

If you are sorting by a unique, numeric column (say a product_id), you can add use a row_number() function to create the row numbers in the innermost in-line view and simplify the query:

SELECT product_name,
       product_details,
       total_rows
FROM   (
        SELECT product_id,
               product_name,
               product_details,
               COUNT(*) OVER () total_rows,
               ROW_NUMBER() OVER (ORDER BY product_id) row_counter
        FROM   products
        ORDER BY product_id
       )
WHERE  row_counter between v_start_row and v_end_row;

Preserving Read-Consistency with Pagination

If you have rapidly-changing source tables, pagination becomes more complicated since each request to the server for a different page of information will potentially yield different data sets. For example, if a page of 25 results are returned and during the time the user views the page, a new item is inserted into the database that would have appeared in position 10 on that page. When the user requests the next 25 entries, item 25 from the previous page now is shown as item 1 on the new page. The results are even worse if the old item 10 is deleted, since the item that should have been displayed as item 1 on on the new page has now dropped to item 25 of the previous page. The user never saw it! To eliminate this behavior, you can use the System Change Number and flashback query:

CREATE TABLE products (product_id NUMBER, product_name VARCHAR2(20));
INSERT INTO products VALUES (1,'Widget A');
INSERT INTO products VALUES (2,'Widget B');
INSERT INTO products VALUES (4,'Widget D');
INSERT INTO products VALUES (5,'Widget E');
COMMIT;

SELECT DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER FROM dual;

GET_SYSTEM_CHANGE_NUMBER
------------------------
                58961080

SELECT product_name,
       total_rows
FROM   (
        SELECT product_id,
               product_name,
               COUNT(*) OVER () total_rows,
               ROW_NUMBER() OVER (ORDER BY product_id) row_counter
        FROM   products
        ORDER BY product_id
       )
WHERE  row_counter between 2 and 4;

PRODUCT_NAME         TOTAL_ROWS
-------------------- ----------
Widget B                      4
Widget D                      4
Widget E                      4

INSERT INTO products VALUES (3,'Widget C');
COMMIT;

SELECT product_name,
       total_rows
FROM   (
        SELECT product_id,
               product_name,
               COUNT(*) OVER () total_rows,
               ROW_NUMBER() OVER (ORDER BY product_id) row_counter
        FROM   products as of scn 58961080 --  Flashback query!
        ORDER BY product_id
       )
WHERE  row_counter between 2 and 4;

PRODUCT_NAME         TOTAL_ROWS
-------------------- ----------
Widget B                      4
Widget D                      4
Widget E                      4

Note that the new row for Widget C did not appear and the total count remains 4. If this query is run without the as of scn clause, the expected result set is obtained:

PRODUCT_NAME         TOTAL_ROWS
-------------------- ----------
Widget B                      5
Widget C                      5
Widget D                      5

When creating pagination web objects, simply include the system change number in the web page to enable flashback query. More information on this topic can be found in the Oracle forums here.

So there you have it. A Single Query that can dramatically shorten the amount of work for loops and cursors in web applications. As an added benefit, the bigger the initial result set, the more time savings the application will experience.


Greg Pike

Greg Pike
PIOCON Technologies Website

Monitoring Oracle Parallel Queries: PARALLEL Hint Part 2

Published in Complex SQL, Hints, SQL, Tuning by gpike Thursday May 3, 2007

<-- Back to Part 1: Using the /*+ PARALLEL */ Hint

Monitoring Parallel Processes

While using parallel processing is a key advantage when handling extremely large data sets, it can be confusing to understand exactly what is occurring in the database during the execution of a parallel query. Fortunately, Oracle provides a whole host of V$ views that allow the developer and DBA alike to monitor parallel query execution and take proactive steps to rewrite ineffective code or kill resource-intensive processes.

The V$ Views

The queries in the following examples require access to several v_$ views that are owned by SYS. Typically, v_$ views are exposed to individual users through public synonyms with a naming convention that omits the underscore. For example, the SYS.V_$PX_SESSION view is typically referred to as V$PX_SESSION by individuals. The easiest (and least secure!) way to grant access to the entire suite of V_$ views is to GRANT SELECT ON ANY TABLE TO . However, be advised that V$ synonyms for all of these views sometimes do not exist. It has been this authors experience that the typical Oracle 10G database will already have these synonyms in place, but there are exceptions.

Basic Overview of Parallel Query Processes

If you took advantage of the link to the Oracle documentation from part 1 (or click here) you may have learned a little more about parallel processing. In general, when a parallel query executes, a query coordinator session acts as the ultimate parent for the parallel servers. For queries, two sets of parallel servers is used during execution. The first set of processes (called producers) typically handle full table scans while the second set (called consumers) performs operations (joins, hashes, etc.) upon the data retrieved from the first set. The degree of parallelism for a particular query governs how many parallel server processes will be found in each server set, not a total for the query. Lets use the following query to illustrate:

SELECT /*+ parallel(s 2) */ DISTINCT time_of_day
from sales s;

As this query executes, we can simultaneously execute the following query in another session to examine its parallel processes:

SELECT   qcsid,
         sid,
         NVL(server_group,0) server_group,
         server_set,
         degree,
         req_degree
FROM     SYS.V_$PX_SESSION
ORDER BY qcsid,
         NVL(server_group,0),
         server_set;

This query produces the following result (only during the time the parallel query is active):

     QCSID        SID SERVER_GROUP SERVER_SET     DEGREE REQ_DEGREE
---------- ---------- ------------ ---------- ---------- ----------
       390        390            0
       390        368            1          1          2          2
       390        353            1          1          2          2
       390        332            1          2          2          2
       390        330            1          2          2          2

Please note that this query, with a requested parallelism degree of 2, actually used 5 total processes to execute. Row one represents the query coordinator (QC). Rows 2 and 3 represent a set of parallel processes (server_set=1) that performed a full table scan of the SALES table while rows 4 and 5 (server_set=2) represent the parallel processes that would perform additional work on the data from SALES if necessary.

If we increase the degree of parallelism to 4 in a database that has a maximum allowable degree of 3, I would see something like this:

     QCSID        SID SERVER_GROUP SERVER_SET     DEGREE REQ_DEGREE
---------- ---------- ------------ ---------- ---------- ----------
       390        390            0
       390        351            1          1          3          4
       390        342            1          1          3          4
       390        330            1          1          3          4
       390        332            1          2          3          4
       390        353            1          2          3          4
       390        339            1          2          3          4

Although the requested degree of parallelism was 4 (REQ_DEGREE), the degree actually used was in fact only 3 (DEGREE) which explains why we have 3 processes in each server set.

Monitoring the Processes By Physical Reads

There are several methods for monitoring the status of parallel processes. When the volume of expected data is well known, physical reads represent a reasonable method for estimating the time for the query to complete. Physical reads may also be an indicator if a query is not retrieving the desired result:

SELECT   a.qcsid,
         a.sid,
         a.server_group,
         a.server_set,
         substr(b.name,1,20) operation,
         a.value
FROM     v$px_sesstat a,
         v$statname b
WHERE    a.statistic# = b.statistic#
AND      UPPER(b.name) = 'PHYSICAL READS'
ORDER BY a.qcsid,
         a.server_group,
         a.server_set;

This query produces the following result set only during the execution of a query with a parallelism of 2 and including complex joining or sorting operations combined with full table scans:

     QCSID        SID SERVER_GROUP SERVER_SET OPERATION             VALUE
---------- ---------- ------------ ---------- -------------------- ------
       332        330            1          1 physical reads         4632
       332        351            1          1 physical reads         4697
       332        333            1          2 physical reads         4554
       332        339            1          2 physical reads         4605
       332        332                         physical reads       168005

This query indicates that at the time of execution, both parallel server sets have performed work. SID 332 is the query coordination session (the session that actually initiated the query), and its physical read count will continually increase until the session is disconnected.

Monitoring Long Operations with V$SESSION_LONGOPS

The V$SESSION_LONGOPS table is used by the Oracle database to store the status of operations that are expected to take at least 6 seconds. Full table scans, hash joins and sort merges are just some of the operations that may log themselves to this table during the execution of a query. A nice feature of long operations is a estimated completion time in seconds which allows developers and DBAs monitor the status of a query. for queries that run in parallel, each parallel server will log entries to V$SESSION_LONGOPS in some cases:

SELECT sl.sid,
       sl.serial#,
       substr(sl.opname,1,30),
       sl.sofar,
       TO_CHAR(sl.start_time,'DD-MON-YYYY:HH24:MI:SS') start_time,
       sl.elapsed_seconds elapsed,
       sl.time_remaining remaining
FROM   v$session_longops sl,
       v$session s
WHERE  s.sid = sl.sid
AND    s.serial#=sl.serial#
AND    s.username='&username'
ORDER BY sl.start_time DESC,
         sl.time_remaining ASC;

During the execution a long, parallel query, you might see the following results from the long operations query:

   SID    SERIAL# OPERATION   START_TIME               ELAPSED  REMAINING
------ ---------- ----------- ---------------------- --------- ----------
   332      18956 Table Scan  03-MAY-2007:14:26:11          42         18
   333      18957 Table Scan  03-MAY-2007:14:26:13          40         20

It is important to note that V$SESSION_LONGOPS is not purely a tool to monitor parallel queries, but rather any query that includes long operations like table scans. Also, it has been this author’s experience that different versions of the database and even different queries affect the regularity with which this view is updated (or used at all!).

There are many other advanced tools and views for monitoring queries (V$SESSION_WAIT is a favorite of mine), but the basic tools presented here are a good starting point for the developer using parallel processing. Happy coding!


Greg Pike

Greg Pike
PIOCON Technologies Website

Oracle 10g Parallel Execution and the /*+ PARALLEL */ hint

Published in Complex SQL, SQL, Tuning by gpike Tuesday March 27, 2007

The /*+ PARALLEL(table degree) */ hint may actually be my favorite query tuning tool. Actually, my favorite tool is the one that makes whatever query I’m tuning run faster, but I pull this hint out of the tool chest all the time. The PARALLEL hint is powerful, but potentially dangerous if used inappropriately. With most hints, usually the worst you can do is make your own query run faster (or slower!), but with PARALLEL you can significantly and adversely affect an entire database if you’re not careful.

The PARALLEL hint is just one way to take advantage of Oracle parallel execution. During parallel processing, large database operations are split into multiple pieces for multi-threaded execution. Parallel execution can be used in queries as well as DML (i.e. INSERT and UPDATE), DDL (i.e. CREATE INDEX) and in other ways not covered here. The Oracle documentation does a good job of fully explaining the inner sanctum of parallel execution, so we will not spend to much time on that topic. Let’s take a closer look at PARALLEL as a hint (it can also be used as a clause in a SQL statement) and what it can do for you…and to you!

The once-named Parallel Query Option (PQO) was first introduced in Oracle 7 and is primarily concerned with multi-threading large SQL operations. For our purposes, a large SQL operation typically means full table scans, sorts and hash joins, although index range scans on partitioned tables can occur in parallel with the PARALLEL_INDEX hint. Nested loops (index look ups) are not by themselves executable in parallel but these operations can be done in conjunction with a full table scans in parallel. So only certain types of queries will benefit from a PARALLEL hint. Queries that already include one or more full table scans are the most likely candidates. But, as we will see, queries that exclusively use nested loops (index look ups) may also benefit with a PARALLEL hint if the query is also asked to substitute full table scans for the nested loops.

Assuming that the parallel execution is available in your database (the DBA can set the maximum and minimum size of the process pool as well as many other parameters that affect parallel execution), you can take advantage of parallel processing in a number of ways. If a table is created or altered to include a DEGREE of parallelism, queries will automatically take advantage of parallel execution when large SQL operations are involved. However, it has been my experience that tables are rarely created with a parallelism DEGREE, since the ramifications for the database can be significant. As an example, query execution plans that once used indexes might decide to use full table scans and hash joins if the underlying table provides an inherent DEGREE of parallelism. Actually, the query does not decide anything - the Cost Based Optimizer decides; but that is a topic for another day. DEGREE, most simply, is the number a parallel processes that the database will spawn to execute the large SQL operation.

Let’s take a look at a “slow” query and how the PARALLEL can help it along. In this example, the “really_big_table” table has 10M rows and the “lookup table” has 100 rows. Of special note is lookup_id = 1 which has 5M rows in the really_big_table (50% of the table) while the other 99 lookup_ids are evenly distributed (about 50K rows a piece).

SELECT lut.lookup_name, count(*)
FROM   really_big_table rbt,
          lookup_table  lut
WHERE rbt.lookup_id = lut.lookup_id
GROUP BY lookup_name
ORDER BY lookup_name;

In our Oracle 10G database, the query execution plan looks like this:

parallel_no_parallel.jpg

This is the expected result. We need to look at every row in the really_big_table and join this to the lookup_table to get the lookup name. After the query completes the table scan, it hash joins the results and sorts the output to get the count(*) and sorts again to get us the final order. Its an efficient plan, but its also a single-threaded approach and if this query is running on a 16 processor machine, we can do better. If we add the PARALLEL hint with the table name (if the table is aliased, you may use the alias instead of the name) and the number of parallel threads (DEGREE of 2 in this case) we want to use, we get the following query with hint:

SELECT /*+ PARALLEL(rbt 2) */ lut.lookup_name, count(*)
FROM   really_big_table rbt,
          lookup_table  lut
WHERE rbt.lookup_id = lut.lookup_id
GROUP BY lookup_name
ORDER BY lookup_name;

Please note that the arguments to the PARALLEL hint are separated by a space, not a comma as if so often mistaken. A comma will actually cause the hint to be ignored in 9i, but in 10g it works both ways. This query produces a dramatically different query execution plan:

parallel.jpg

For those of you not used to parallel processing, this execution plan may look a little strange. All the extra lines here represent the operations necessary to take a serialized query, perform parallel execution, and finally gather the results back into a final result set (we promise to discuss the intricacies of these plans in more detail in a future post). In this particular case, the new, parallelized query ran 40% faster than the original query although the execution plan really didn’t change much. If you look closely, you will see that this query still performed 2 table scans, 1 hash join and some sorting. Its magic! Without making a single change to the actual query (except adding a PARALLEL hint), we have conjured a 40% improvement. Of course its not magic, we just commandeered more of the hardware resources. I performed these tests on a small windows box with two processors. As the following CPU usage diagrams show, the first query used one processor only while the parallelized query used both processors in parallel.

CPU without a PARALLEL hint CPU Usage with a PARALLEL 2
No Parallel Hint Parallel execution with a degree of 2

If two processes were, good 4, 8 or even 16 must be all that much better. Obviously, this will depend entirely on the amount of processing power and disk I/O your hardware can muster. The amount of time savings will degrade as system resources are exhausted. Ultimately, queries that use too many parallel slaves (or too many parallel queries running at the same time) will degrade system performance for all other users. So care must be taken in choosing the proper degree of parallelism. For maximum effectiveness, 2 X number of available processors is a good rule of thumb. Even if you have a dedicated server, at some point adding parallel query processors will not appreciably increase the speed of query execution. The following CPU diagram shows the same query (on my small windows box) executed with 4, 8 and 16 degrees of parallelism (see the three big CPU bumps).


Too much parallelism!

As it turns out, the CPU usage and query execution time did not change at all! Since there was available excess CPU time, this query likely was limited by disk I/O. But for enterprise quality hardware configurations, the PARALLEL hint will gobble up whatever resources it can get away with. Of course, there are many ways the DBA can limit the available resources, but in environments where parallel execution is not used such governors may not be in place. The result: A greedy PARALLEL hint can bring a database server to its knees. However, it has been this author’s experience that the PARALLEL hint can significantly improve query performance using resources that may be underutilized anyway. The PARALLEL hint can make you a hero!

Part 2: Monitoring Parallel Query Execution –>

Click here to ask SingleQuery.com a question.


Greg Pike

Greg Pike
PIOCON Technologies Website

53 queries. 0.562 seconds.
Powered by Wordpress
theme by cmoanz