Pivot and Unpivot in 11g

Published in Oracle 11g, SQL by malam Wednesday December 19, 2007

Pivot

The Pivot clause is a new feature that has been introduced with the introduction of 11g.The simple logic behind the Pivot clause is that it enables the user to rotate rows into columns in the output of a query and at the same time enable the user to run aggregate functions on the data.

Unpivot

Unpivot clause is also a new built in Analytical function that is introduced with the release of 11g,unpivot clause lets the user rotate columns into rows in the output from a query.

Examples to illustrate the use of Pivot clause

If you can log in to the Scott schema of the database then skip step 2 and 3 to execute the query directly

However , If you want to create the table in your database and then run the query please follow steps 1 through 3

1.


CREATE TABLE emp ( empno NUMBER(4,0), ename VARCHAR2(10 BYTE), job VARCHAR2(9 BYTE), mgr NUMBER(4,0), hiredate DATE, sal NUMBER(7,2), comm NUMBER(7,2), deptno NUMBER(2,0) );

2.Insert Script

The script below can be used to insert data into the table created in step 1.


INSERT INTO emp (empno,ename,job,mgr,hiredate,sal,comm,deptno)VALUES ('7369','SMITH','CLERK','7902',TO_DATE('17-Dec-80','DD-MON-RR'),'800','0','30');
INSERT INTO emp (empno,ename,job,mgr,hiredate,sal,comm,deptno)VALUES ('7499','ALLEN','SALESMAN','7698',TO_DATE('20-Feb-81','DD-MON-RR'),'1600','300','30');
INSERT INTO emp (empno,ename,job,mgr,hiredate,sal,comm,deptno)VALUES ('7521','WARD','SALESMAN','7698',TO_DATE('22-Feb-81','DD-MON-RR'),'1250','500','30');
INSERT INTO emp (empno,ename,job,mgr,hiredate,sal,comm,deptno)VALUES ('7525','JACK','SALESMAN','7690',TO_DATE('26-Sep-81','DD-MON-RR'),'1550','500','20');
INSERT INTO emp (empno,ename,job,mgr,hiredate,sal,comm,deptno)VALUES ('7527','JONES','CLERK','7622',TO_DATE('22-Feb-81','DD-MON-RR'),'1000','300','20');
INSERT INTO emp (empno,ename,job,mgr,hiredate,sal,comm,deptno)VALUES ('7521','KANYE','PRESIDENT','',TO_DATE('28-Feb-81','DD-MON-RR'),'5000','500','20');
INSERT INTO emp (empno,ename,job,mgr,hiredate,sal,comm,deptno)VALUES ('7500','SAM','SALESMAN','7689',TO_DATE('23-Feb-81','DD-MON-RR'),'1257','500','10');
INSERT INTO emp (empno,ename,job,mgr,hiredate,sal,comm,deptno)VALUES ('7400','JOE','MANAGER','7607',TO_DATE('22-Feb-81','DD-MON-RR'),'1540','300','10');
INSERT INTO emp (empno,ename,job,mgr,hiredate,sal,comm,deptno)VALUES ('7521','WARD','SALESMAN','7698',TO_DATE('22-Feb-81','DD-MON-RR'),'1250','500','10');
COMMIT;
/
3.Pivot Query
SELECT *
FROM (SELECT deptno,sal
      FROM emp
      WHERE deptno > 0
)
PIVOT (SUM(sal) AS SUM_AMOUNT FOR (deptno) IN (10 AS ACCOUNTING ,20 AS RESEARCH,30 AS SALES,40 AS OPERATIONS));

/*
General Form of query using PIVOT clause
SELECT * FROM (INNER QUERY)
PIVOT (
AGGREGATE_FUNCTION
FOR (COLUMN TO BE PIVOTED) IN (LIST OF VALUES));
*/

A single Pivot clause can also support multiple Aggregate Functions.
PIVOT CLAUSE provides great support to businesses which would like to do a detailed analysis on their Sales methodology and help them make future decisions on their sales and marketing strategies.

Using the UNPIVOT clause

The UNPIVOT clause rotates columns into rows.UNPIVOT clause is useful when a user has to execute a query that returns many columns , using the UNPIVOT clause the user can view those columns as rows.

Lets Illustrate the concept discussed above

1.Create a table pivot_emp_data using the query in step 3 as

CREATE TABLE pivot_emp_data AS
SELECT *
FROM (SELECT deptno,sal
              FROM emp
              WHERE deptno > 0
)
PIVOT (SUM(sal) AS SUM_AMOUNT
FOR (deptno)
IN (10 AS ACCOUNTING ,20 AS RESEARCH,30 AS SALES,40 AS OPERATIONS);

2.Once the table has been created we can run the query below

SELECT *
  FROM pivot_emp_data
  UNPIVOT
(sal FOR deptno
IN(ACCOUNTING_SUM_AMOUNT,RESEARCH_SUM_AMOUNT,SALES_SUM_AMOUNT,OPERATIONS_SUM_AMOUNT))

The Query rotates the pivoted Data .Also , do realize that If you are working with pivoted data, an UNPIVOT operation cannot reverse any aggregations that have been made by PIVOT or any other means.

Oracle Web Development: Pagination with a Single Query

Published in Complex SQL, SQL, Tuning by gpike Sunday May 13, 2007

Note: This previous article was enhanced with additional information regarding read-consistent pagination.

Found at the top and bottom of almost any web page that displays long lists of information are seemingly-innocuous little widgets commonly called “pagination” objects. They allow the user to advance to the next or previous page of information as well as jumping to a specific page. Although they may be a small part of a web page’s functionality, they can be a resource hog to produce if not developed with proper techniques.

Java applications seem to be a conspicuous culprit in producing poorly-conceived pagination objects. The scenario plays out in the following way:

1) Select the complete list of items to be displayed in the proper order; perhaps 1000’s of rows are selected.

2) March through the ordered list in a loop and accumulate a count of the items. When the appropriate section of the result set is found, accumulate these details into a display object.

3) Continue through the rest of the list to get full a count of the total items to feed the pagination widget.

Alternate scenarios may use two queries, one to count the total items for the entire result set and the other to get just those items to be displayed on the web page. This is an improvement over digging through the entire result set, but its still inefficient.

The appropriate solution is a Single Query. The goal is to produce a query that returns BOTH the total of count of the items in the full result set (to support creating the pagination object) and only those rows that will be displayed on the web page.

For our example, consider a table of PRODUCTS and a web page that displays any product with names based on a requested search criteria, 25 to a page. This is not an exercise in appropriate search conditions, so for our example we assume the user enters a keyword and we use % symbols on each end of the text fragment to find matching rows. The uninspired query looks like this:

SELECT product_name,
       product_details
FROM   products
WHERE product_name like '%Television%'
ORDER BY product_id;

The inefficient program first issues the query and then counts the rows in the result set, pulls out just the appropriate rows for display and crafts the pagination object all in a loop construct.

But the database can do most of the work for you. The following query retrieves only the rows to be displayed using in-line views and the total number of rows for the entire result set using the magic of the PARTITION BY clause.

SELECT product_name,
       product_details,
       total_rows
FROM   (
        SELECT product_name,
               product_details,
               total_rows,
               rownum row_counter
        FROM   (
                SELECT product_name,
                       product_details,
                       count(*) OVER () total_rows
                FROM   products
                WHERE product_name like '%Television%'
                ORDER BY product_name
               )
       )
WHERE row_counter between v_start_row and v_end_row;

Here’s how the query works. In the innermost in-line view, the entire result set is selected and the the count(*) OVER () column makes sure that every row contains a column with the total number of rows in the query. No matter what subset of rows are ultimately selected, the count of the entire inner result set is always available. This inner in-line view also sorts the data.

In the next in-line view, all the rows are selected from the innermost in-line view with an additional column added to identify each row by an incrementing number. Perhaps you are thinking to yourself, “Why didn’t he just add the rownum column to innermost in-line view?” The answer is that rownum is added to a result set PRIOR to an ORDER BY clause. It is necessary to first sort the result set, then add the rownum later in the process.

In the final step, our outermost query actually selects only the rows of interest by specifying BETWEEN a start and end position (provided here by the v_start_row and v_end_row variables). Most applications that use pagination probably are passed a variables for the page to display and a variable for the number of lines to display per page. In that case, the start and end positions are calculated:

v_end_row = page to display X number of rows per page
v_start_row = ((page to display - 1) X number of rows per page) + 1

The pagination object for the new page is created by capturing the total row count from any of the individual rows that come out of the query. In the loop that processes these rows, the total can be assigned to a variable for later use by the pagination constructor.

An Alternative

If you are sorting by a unique, numeric column (say a product_id), you can add use a row_number() function to create the row numbers in the innermost in-line view and simplify the query:

SELECT product_name,
       product_details,
       total_rows
FROM   (
        SELECT product_id,
               product_name,
               product_details,
               COUNT(*) OVER () total_rows,
               ROW_NUMBER() OVER (ORDER BY product_id) row_counter
        FROM   products
        ORDER BY product_id
       )
WHERE  row_counter between v_start_row and v_end_row;

Preserving Read-Consistency with Pagination

If you have rapidly-changing source tables, pagination becomes more complicated since each request to the server for a different page of information will potentially yield different data sets. For example, if a page of 25 results are returned and during the time the user views the page, a new item is inserted into the database that would have appeared in position 10 on that page. When the user requests the next 25 entries, item 25 from the previous page now is shown as item 1 on the new page. The results are even worse if the old item 10 is deleted, since the item that should have been displayed as item 1 on on the new page has now dropped to item 25 of the previous page. The user never saw it! To eliminate this behavior, you can use the System Change Number and flashback query:

CREATE TABLE products (product_id NUMBER, product_name VARCHAR2(20));
INSERT INTO products VALUES (1,'Widget A');
INSERT INTO products VALUES (2,'Widget B');
INSERT INTO products VALUES (4,'Widget D');
INSERT INTO products VALUES (5,'Widget E');
COMMIT;

SELECT DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER FROM dual;

GET_SYSTEM_CHANGE_NUMBER
------------------------
                58961080

SELECT product_name,
       total_rows
FROM   (
        SELECT product_id,
               product_name,
               COUNT(*) OVER () total_rows,
               ROW_NUMBER() OVER (ORDER BY product_id) row_counter
        FROM   products
        ORDER BY product_id
       )
WHERE  row_counter between 2 and 4;

PRODUCT_NAME         TOTAL_ROWS
-------------------- ----------
Widget B                      4
Widget D                      4
Widget E                      4

INSERT INTO products VALUES (3,'Widget C');
COMMIT;

SELECT product_name,
       total_rows
FROM   (
        SELECT product_id,
               product_name,
               COUNT(*) OVER () total_rows,
               ROW_NUMBER() OVER (ORDER BY product_id) row_counter
        FROM   products as of scn 58961080 --  Flashback query!
        ORDER BY product_id
       )
WHERE  row_counter between 2 and 4;

PRODUCT_NAME         TOTAL_ROWS
-------------------- ----------
Widget B                      4
Widget D                      4
Widget E                      4

Note that the new row for Widget C did not appear and the total count remains 4. If this query is run without the as of scn clause, the expected result set is obtained:

PRODUCT_NAME         TOTAL_ROWS
-------------------- ----------
Widget B                      5
Widget C                      5
Widget D                      5

When creating pagination web objects, simply include the system change number in the web page to enable flashback query. More information on this topic can be found in the Oracle forums here.

So there you have it. A Single Query that can dramatically shorten the amount of work for loops and cursors in web applications. As an added benefit, the bigger the initial result set, the more time savings the application will experience.


Greg Pike

Greg Pike
PIOCON Technologies Website

Using Oracle SQL to Convert Numbers to Words and Back Again

Published in Complex SQL, SQL by gpike Friday May 11, 2007

As a infrequent poster (but constant lurker) on the Oracle forums, I often see questions about converting numbers to their English language equivalent. Recently, the PL/SQL and SQL forum was asked the opposite question: How does one convert words (i.e. “ONE HUNDRED TWENTY THREE”) to a number (i.e. 123). After the talented members of the forum came up with several solutions, I decided the whole topic was deserving of a SingleQuery posting.

Converting Numbers to Words

First the easy part. This is an old-school Oracle trick that’s been around forever:

SELECT TO_CHAR(TO_DATE(123,'J'),'JSP') to_words
FROM   dual;

TO_WORDS
------------------------
ONE HUNDRED TWENTY-THREE

Pretty straight forward. The TO_DATE(123,’J') creates the 123rd Julian Day, which is defined as the 123rd day after Monday, January 1, 4713 BC. Click here to learn more about the Julian Day if you feel obligated. The TO_CHAR with the ‘JSP’ format mask converts the Julian Day into the words “ONE HUNDRED TWENTY-THREE”.

Unfortunately, there are some limitations with this little trick. Oracle cannot compute the Julian Day past 31-DEC-9999 so that means we can only convert numbers to English up to 5,373,484 (the number of days between the date the pyramids were constructed and the date that the Chicago Cubs will win the World Series again). Also, this method only works in English; I have seen no solution to do this in other languages.

Converting Words to Numbers

Now the hard part. As previously mentioned, one of the Oracle forums recently discussed how one can convert from the English language to numbers. There were lots of interesting alternatives (click here to see the forum thread) with many focused upon procedural solutions. But since my specialty is a Single Query solution, I wanted to take a stab at it and came up with the following that works on 9i and 10g:

SELECT LEVEL wordasint
FROM   dual
WHERE  TO_CHAR(TO_DATE(LEVEL,'J'), 'JSP') = 'ONE HUNDRED TWENTY-THREE'
CONNECT BY TO_CHAR(TO_DATE(LEVEL-1,'J'),'JSP') != 'ONE HUNDRED TWENTY-THREE'
AND    LEVEL < 10001;

 WORDASINT
----------
       123

This is a rather strange CONNECT BY construct that you may not have seen before and a good explanation of what this does can be found here. It works, although with the similar maximum limitation of 5,373,484 (”FIVE MILLION THREE HUNDRED SEVENTY-THREE THOUSAND FOUR HUNDRED EIGHTY-FOUR”) because of the Julian Day. Also, the larger the number requested, the slower this query gets since it performs more and more CONNECT BYs as the requested number gets bigger. The last clause “AND LEVEL < 10001" is designed to stop typos (i.e. "ONES HUNDERD") from causing this query to execute until it exceeds the maximum. Simply provide the largest number+1 you will ever expect to receive and this query will return no rows if it is not found (but will not fail)!

Here is another slick solution for Oracle 10g (provided by Sundar M) that uses the MODEL clause, but it unfortunately stops functioning at 86399:

SELECT sp,
       n
FROM   (
        SELECT 'EIGHTY-SIX THOUSAND THREE HUNDRED NINETY-NINE' sp
        FROM dual
       )
   MODEL
   DIMENSION BY (1 dim)
   MEASURES (0 n, sp)
   RULES ITERATE (86400) UNTIL (TO_CHAR(DATE '0001-01-01' +
         (ITERATION_NUMBER/86400),'SSSSSSP')=sp[1])
         ( n[1]=ITERATION_NUMBER );

SP                                                     N
--------------------------------------------- ----------
EIGHTY-SIX THOUSAND THREE HUNDRED NINETY-NINE      86399

At 86400:

SP                                        N
-------------------------------- ----------
EIGHTY-SIX THOUSAND FOUR HUNDRED      86399  <--  Whoops!

Words to Numbers for Oracle 8/8i

The prior queries include functionality not available for Oracle 8/8i databases, but this one should work. I will admit, I don’t have an 8i database laying around, so this is untested:


SELECT theword,
       thenum
FROM
       (
        SELECT 'ZERO' theword,
               0      thenum
               FROM   dual
        UNION  ALL
        SELECT /*+ parallel (rbt 2) */
               TO_CHAR(TO_DATE(ROWNUM,'J'), 'JSP')  theword,
               ROWNUM
        FROM   really_big_table rbt
        WHERE ROWNUM <100001
       )
WHERE  theword  = 'TEN THOUSAND FIVE HUNDRED THIRTY-FOUR';

The limitations for this query are typical; you can’t generate numbers larger than 5,373,484 and you need a really_big_table to use as the seed table for generating rows (as a replacement for the CONNECT BY). Also, the bigger the maximum number (in this case 100,000), the longer this query will take to execute. However, this query can be modified for speed if specific numeric ranges are known in advance.

Dollars and Cents From Words

Oracle 8/8i (or 9i and 10g as well of course)i:


SELECT /*+ ordered all_rows */
       TO_CHAR(temp1.thenum+(temp2.thenum/100),'$9,999,999.99') amount
FROM
       (
        SELECT thenum,
               SUBSTR('ONE HUNDRED TWENTY-THREE AND TWELVE CENTS',
                 INSTR('ONE HUNDRED TWENTY-THREE AND TWELVE CENTS',' AND')+5,
                 INSTR('ONE HUNDRED TWENTY-THREE AND TWELVE CENTS',' CENTS')
                 - INSTR
                 ('ONE HUNDRED TWENTY-THREE AND TWELVE CENTS',' AND')
                 -5) thecents
        FROM
               (
                SELECT 'ZERO' theword,
                0      thenum
                FROM   dual
                UNION ALL
                SELECT TO_CHAR(TO_DATE(ROWNUM,'J'), 'JSP')  theword,
                       ROWNUM thenum
                FROM   really_big_table
                WHERE  ROWNUM < 100000
               )
        WHERE  theword=SUBSTR('ONE HUNDRED TWENTY-THREE AND TWELVE CENTS',1,
                   INSTR('ONE HUNDRED TWENTY-THREE AND TWELVE CENTS',' AND')-1)
       ) temp1,
       (
        SELECT 'ZERO' theword,
               0 thenum
        FROM DUAL
        UNION ALL
        SELECT TO_CHAR(TO_DATE(ROWNUM,'J'), 'JSP'),
               ROWNUM
        FROM   really_big_table
        WHERE  ROWNUM <101
       ) temp2
WHERE  temp1.thecents = temp2.theword;

AMOUNT
--------------
       $123.12

Oracle 9i/10g (formatted to fit in this page):

SELECT /*+ ordered all_rows */
       TO_CHAR(temp1.thenum+(temp2.thenum/100),'$9,999,999.99') amount
FROM
       (
        SELECT LEVEL thenum,
               SUBSTR('ONE HUNDRED TWENTY-THREE AND TWELVE CENTS',
                 INSTR('ONE HUNDRED TWENTY-THREE AND TWELVE CENTS',' AND')+5,
                 INSTR('ONE HUNDRED TWENTY-THREE AND TWELVE CENTS',' CENTS')
                 - INSTR
                 ('ONE HUNDRED TWENTY-THREE AND TWELVE CENTS','
                 AND')-5) thecents
        FROM    dual
        WHERE  TO_CHAR(TO_DATE(LEVEL,'J'), 'JSP') =
                 SUBSTR('ONE HUNDRED TWENTY-THREE AND TWELVE CENTS',1,
                 INSTR('ONE HUNDRED TWENTY-THREE AND TWELVE CENTS',' AND')-1)
        CONNECT BY TO_CHAR(TO_DATE(LEVEL-1,'J'),'JSP') !=
                     SUBSTR('ONE HUNDRED TWENTY-THREE AND TWELVE CENTS',1,
                     INSTR
                     ('ONE HUNDRED TWENTY-THREE AND TWELVE CENTS','
                     AND')-1)
        AND LEVEL < 100001
       ) temp1,
       (
        SELECT LEVEL thenum,
               TO_CHAR(TO_DATE(LEVEL,'J'), 'JSP') theword
        FROM   dual
        CONNECT BY 1 = 1
        AND LEVEL < 101
       ) temp2
WHERE  temp1.thecents = temp2.theword;

AMOUNT
--------------
       $123.12

Complete Solutions

Probably the most complete solution (but not a Single Query) was provided by a forum frequent poster that goes by the name of Volder. His combination of a query and a supporting function are only viable in Oracle 10g, but allow word descriptions of numbers up to 65 digits to be very quickly converted! For the sake of space, I am not reprinting his solution. Please refer to the forum thread for details.

If you have any other bright ideas on how to turn words into numbers, please don’t hesitate to post a comment. Happy coding!


Greg Pike

Greg Pike
PIOCON Technologies Website

Monitoring Oracle Parallel Queries: PARALLEL Hint Part 2

Published in Complex SQL, Hints, SQL, Tuning by gpike Thursday May 3, 2007

<-- Back to Part 1: Using the /*+ PARALLEL */ Hint

Monitoring Parallel Processes

While using parallel processing is a key advantage when handling extremely large data sets, it can be confusing to understand exactly what is occurring in the database during the execution of a parallel query. Fortunately, Oracle provides a whole host of V$ views that allow the developer and DBA alike to monitor parallel query execution and take proactive steps to rewrite ineffective code or kill resource-intensive processes.

The V$ Views

The queries in the following examples require access to several v_$ views that are owned by SYS. Typically, v_$ views are exposed to individual users through public synonyms with a naming convention that omits the underscore. For example, the SYS.V_$PX_SESSION view is typically referred to as V$PX_SESSION by individuals. The easiest (and least secure!) way to grant access to the entire suite of V_$ views is to GRANT SELECT ON ANY TABLE TO . However, be advised that V$ synonyms for all of these views sometimes do not exist. It has been this authors experience that the typical Oracle 10G database will already have these synonyms in place, but there are exceptions.

Basic Overview of Parallel Query Processes

If you took advantage of the link to the Oracle documentation from part 1 (or click here) you may have learned a little more about parallel processing. In general, when a parallel query executes, a query coordinator session acts as the ultimate parent for the parallel servers. For queries, two sets of parallel servers is used during execution. The first set of processes (called producers) typically handle full table scans while the second set (called consumers) performs operations (joins, hashes, etc.) upon the data retrieved from the first set. The degree of parallelism for a particular query governs how many parallel server processes will be found in each server set, not a total for the query. Lets use the following query to illustrate:

SELECT /*+ parallel(s 2) */ DISTINCT time_of_day
from sales s;

As this query executes, we can simultaneously execute the following query in another session to examine its parallel processes:

SELECT   qcsid,
         sid,
         NVL(server_group,0) server_group,
         server_set,
         degree,
         req_degree
FROM     SYS.V_$PX_SESSION
ORDER BY qcsid,
         NVL(server_group,0),
         server_set;

This query produces the following result (only during the time the parallel query is active):

     QCSID        SID SERVER_GROUP SERVER_SET     DEGREE REQ_DEGREE
---------- ---------- ------------ ---------- ---------- ----------
       390        390            0
       390        368            1          1          2          2
       390        353            1          1          2          2
       390        332            1          2          2          2
       390        330            1          2          2          2

Please note that this query, with a requested parallelism degree of 2, actually used 5 total processes to execute. Row one represents the query coordinator (QC). Rows 2 and 3 represent a set of parallel processes (server_set=1) that performed a full table scan of the SALES table while rows 4 and 5 (server_set=2) represent the parallel processes that would perform additional work on the data from SALES if necessary.

If we increase the degree of parallelism to 4 in a database that has a maximum allowable degree of 3, I would see something like this:

     QCSID        SID SERVER_GROUP SERVER_SET     DEGREE REQ_DEGREE
---------- ---------- ------------ ---------- ---------- ----------
       390        390            0
       390        351            1          1          3          4
       390        342            1          1          3          4
       390        330            1          1          3          4
       390        332            1          2          3          4
       390        353            1          2          3          4
       390        339            1          2          3          4

Although the requested degree of parallelism was 4 (REQ_DEGREE), the degree actually used was in fact only 3 (DEGREE) which explains why we have 3 processes in each server set.

Monitoring the Processes By Physical Reads

There are several methods for monitoring the status of parallel processes. When the volume of expected data is well known, physical reads represent a reasonable method for estimating the time for the query to complete. Physical reads may also be an indicator if a query is not retrieving the desired result:

SELECT   a.qcsid,
         a.sid,
         a.server_group,
         a.server_set,
         substr(b.name,1,20) operation,
         a.value
FROM     v$px_sesstat a,
         v$statname b
WHERE    a.statistic# = b.statistic#
AND      UPPER(b.name) = 'PHYSICAL READS'
ORDER BY a.qcsid,
         a.server_group,
         a.server_set;

This query produces the following result set only during the execution of a query with a parallelism of 2 and including complex joining or sorting operations combined with full table scans:

     QCSID        SID SERVER_GROUP SERVER_SET OPERATION             VALUE
---------- ---------- ------------ ---------- -------------------- ------
       332        330            1          1 physical reads         4632
       332        351            1          1 physical reads         4697
       332        333            1          2 physical reads         4554
       332        339            1          2 physical reads         4605
       332        332                         physical reads       168005

This query indicates that at the time of execution, both parallel server sets have performed work. SID 332 is the query coordination session (the session that actually initiated the query), and its physical read count will continually increase until the session is disconnected.

Monitoring Long Operations with V$SESSION_LONGOPS

The V$SESSION_LONGOPS table is used by the Oracle database to store the status of operations that are expected to take at least 6 seconds. Full table scans, hash joins and sort merges are just some of the operations that may log themselves to this table during the execution of a query. A nice feature of long operations is a estimated completion time in seconds which allows developers and DBAs monitor the status of a query. for queries that run in parallel, each parallel server will log entries to V$SESSION_LONGOPS in some cases:

SELECT sl.sid,
       sl.serial#,
       substr(sl.opname,1,30),
       sl.sofar,
       TO_CHAR(sl.start_time,'DD-MON-YYYY:HH24:MI:SS') start_time,
       sl.elapsed_seconds elapsed,
       sl.time_remaining remaining
FROM   v$session_longops sl,
       v$session s
WHERE  s.sid = sl.sid
AND    s.serial#=sl.serial#
AND    s.username='&username'
ORDER BY sl.start_time DESC,
         sl.time_remaining ASC;

During the execution a long, parallel query, you might see the following results from the long operations query:

   SID    SERIAL# OPERATION   START_TIME               ELAPSED  REMAINING
------ ---------- ----------- ---------------------- --------- ----------
   332      18956 Table Scan  03-MAY-2007:14:26:11          42         18
   333      18957 Table Scan  03-MAY-2007:14:26:13          40         20

It is important to note that V$SESSION_LONGOPS is not purely a tool to monitor parallel queries, but rather any query that includes long operations like table scans. Also, it has been this author’s experience that different versions of the database and even different queries affect the regularity with which this view is updated (or used at all!).

There are many other advanced tools and views for monitoring queries (V$SESSION_WAIT is a favorite of mine), but the basic tools presented here are a good starting point for the developer using parallel processing. Happy coding!


Greg Pike

Greg Pike
PIOCON Technologies Website

Generating Multiple Artificial Rows in Oracle 10g Using a Query

Published in Database Tips, Quick Tips, SQL by jweicher Tuesday April 24, 2007

Have you ever been in a situation that required you to generate a set of multiple “artificial rows”? Perhaps you needed it for a report. Typically, you would need to do this in a PL/SQL loop of some kind. How else does one generate a >1 set of rows dynamically, from nothing? With a single query of course!

Our Scenario:

For a report of some kind, or perhaps for use in a much larger query, you need to create a rowset that consists of the dates of the 365 days previous to today, each along with a random number between 1 and 20 (pointless I agree, but useful as an example).

There are a couple ways to tackle this.

 

Method 1 - Using DUAL and a CONNECT BY
This technique actually came to us late in the drafting of this topic by way of one of our readers, Rose. But it is so simple, elegant and ingenious that it immediately became our chosen solution for most situations. Our hat tip to Rose.

Many people I’m sure are already familiar with Oracle’s CONNECT BY clause, which allows one to construct queries that select rows from a table in a hierarchical ordering that may be present in the data, by specifying the conditions that identify a parent-child relationship between rows. But for those that aren’t, a brief example is in order, though greater detail will be omitted.

Consider the following Employee table:

emp_id emp_name mgr_id
1 M. Vranicar null
2 J. Simmons 1
3 J. Weicher 2
4 R. Rolek 2

The CONNECT BY clause (and its associated optional keywords) allow us to cleverly query the table, ordering the rows hierarchically, using a relationship we know is defined through columns in the table itself, in this case, an employee to his manager:


> SELECT  LEVEL  – oracle keyword available in CONNECT BY queries
          LPAD(’ ‘, LEVEL*2) || emp_name name
  FROM employee
  START WITH mgr_id is null
  CONNECT BY PRIOR emp_id = mgr_id

Connect By Example

Without going into greater detail, Oracle retrieves the records using the CONNECT BY PRIOR condition to in essence “filter” the records to determine the next appropriate “child” of the record last retrieved, when determining the retrieval order.

Returning to our example scenario, we can use this functionality to generate the 365 row result set that we are looking for by omitting the PRIOR keyword, and using only a condition based on the LEVEL. This is doable as the CONNECT BY PRIOR clause really just acts as a filter, determining which record is the next child (or in this case, simply when to stop selecting more “child” rows!):


> SELECT  sysdate - LEVEL thedate,
          mod(abs(dbms_random.random), 20) + 1 thevalue
  FROM dual
  CONNECT BY LEVEL <= 365;

THEDATE     THEVALUE
--------- ----------
23-APR-07         20
22-APR-07         19
21-APR-07         10
20-APR-07          5
19-APR-07         13
18-APR-07          9
17-APR-07         12
16-APR-07         14
15-APR-07         19
14-APR-07         14
...
28-APR-06          9
27-APR-06          7
26-APR-06          4
25-APR-06         10
24-APR-06          8

365 rows selected.


What is happening?
This technique works because the CONNECT BY clause causes Oracle to execute the query by taking the following steps:

1) Select the first date and random number using the single record from dual.

2) Next, “connect back” to select the next appropriate “child” record, again from dual, using the CONNECT BY condition as a “filter”. Dual only has one record, and the only “filter” is that LEVEL be <= 365. The result is the single record being selected from dual again, though this time LEVEL is automatically incremented by 1.

3) Step 2 will be continually repeated, as the CONNECT BY condition/filter will continue to find the single eligible record in dual with every "connect back", until one is retrieved where LEVEL has been incremented beyond 365!

Kudos again to Rose for this one.

 

Method 2 - Using DUAL and a WITH Clause
Another technique is available, which also uses DUAL, but relies on a WITH clause instead of CONNECT BY. It should be considered more of a “brute force” method. The goal is to build an “empty” rowset of size 2n that is equal or greater than our target rowset size.

We begin by selecting anything from DUAL twice, and using a UNION ALL to preserve the duplication:


> SELECT 1 FROM DUAL UNION ALL SELECT 1 FROM DUAL;

         1
----------
         1
         1

Next, let’s use this query in a WITH clause for efficiency, and select from it multiple times without a join condition- a Cartesian Product!

> WITH tworows AS
( SELECT 1 FROM dual UNION ALL SELECT 1 FROM dual)
SELECT	*
FROM	tworows	t1, /* produces 2 rows */
	tworows t2, /* produces 4 rows */
	tworows t3, /* produces 8 rows */
	tworows t4, /* produces 16 rows */
	tworows t5, /* produces 32 rows */
	tworows t6, /* produces 64 rows */
	tworows t7, /* produces 128 rows */
	tworows t8, /* produces 256 rows */
	tworows t9  /* produces 512 rows */

What we see produced is a result set of 512 rows and 9 columns:


         1          1          1          1          1          1
---------- ---------- ---------- ---------- ---------- ---------- ...
         1          1          1          1          1          1
         1          1          1          1          1          1
         1          1          1          1          1          1
         1          1          1          1          1          1
         1          1          1          1          1          1
         1          1          1          1          1          1
         1          1          1          1          1          1
...
         1          1          1          1          1          1
         1          1          1          1          1          1
         1          1          1          1          1          1

512 rows selected.

Next, we can extend our query by actually selecting the data we are interested in by utilizing ROWNUM: the past 365 days and a random number:


> WITH tworows AS
( SELECT 1 FROM dual union all SELECT 1 FROM dual)
SELECT	sysdate - ROWNUM thedate,
	mod(abs(dbms_random.random), 20) + 1 thevalue
FROM	tworows	t1, /* produces 2 rows */
	tworows t2, /* produces 4 rows */
	tworows t3, /* produces 8 rows */
	tworows t4, /* produces 16 rows */
	tworows t5, /* produces 32 rows */
	tworows t6, /* produces 64 rows */
	tworows t7, /* produces 128 rows */
	tworows t8, /* produces 256 rows */
	tworows t9  /* produces 512 rows */

And again, we see that this produces a rowset of 512 rows, this time with only values we are interested in:


THEDATE     THEVALUE
--------- ----------
12-APR-07          2
11-APR-07         18
10-APR-07         12
09-APR-07         18
08-APR-07         19
07-APR-07          2
...
22-NOV-05          9
21-NOV-05         17
20-NOV-05          5
19-NOV-05          1
18-NOV-05         19
17-NOV-05          8

512 rows selected.

Finally, we can limit the number of rows down to our target size with a filter on “rownum”:


> WITH tworows AS
( SELECT 1 FROM dual union all SELECT 1 FROM dual)
SELECT	sysdate - rownum thedate,
	mod(abs(dbms_random.random), 20) + 1 thevalue
FROM	tworows	t1, /* produces 2 rows */
	tworows t2, /* produces 4 rows */
	tworows t3, /* produces 8 rows */
	tworows t4, /* produces 16 rows */
	tworows t5, /* produces 32 rows */
	tworows t6, /* produces 64 rows */
	tworows t7, /* produces 128 rows */
	tworows t8, /* produces 256 rows */
	tworows t9  /* produces 512 rows */
WHERE	rownum <= 365

And we now have the result set we were looking for:


THEDATE     THEVALUE
--------- ----------
12-APR-07         16
11-APR-07         20
10-APR-07         12
09-APR-07         12
08-APR-07         11
07-APR-07         17
06-APR-07          5
...
17-APR-06         13
16-APR-06          7
15-APR-06          7
14-APR-06          2
13-APR-06         18

365 rows selected.

 

So Which is Better?
You guessed it: it depends. In our testing, we’ve found that while Method 2 is a bit less efficient in terms of database activities (gets, sorts, etc.), the timing differences are negligible at worst when dealing with smaller result set sizes. However, this does not remain the case should you be generating significantly larger numbers of records. With large rowsets, Method 2 wins out in the end.

For purposes of illustration, consider a rowset of ~1.7 million records (224).

If we run our query using Method 1, we see that the execution time is a little over 10 seconds (~10.3):


> SELECT count(*) from (
    SELECT  sysdate - rownum thedate,
	    mod(abs(dbms_random.random), 20) + 1 thevalue
    FROM dual
    CONNECT BY LEVEL <= 16777216
  )

  COUNT(*)
----------
  16777216

Elapsed: 00:00:10.29

Running this query using Method 2 however, takes about 7 seconds (~6.9), a savings of about 3 seconds:


> WITH tworows AS
( SELECT 1 FROM dual union all SELECT 1 FROM dual)
select count(*) from (
SELECT	sysdate - rownum thedate,
	mod(abs(dbms_random.random), 20) + 1 thevalue
FROM	tworows	t1,  /* produces 2 rows */
	tworows t2,  /* produces 4 rows */
	tworows t3,  /* produces 8 rows */
	tworows t4,  /* produces 16 rows */
	tworows t5,  /* produces 32 rows */
	tworows t6,  /* produces 64 rows */
	tworows t7,  /* produces 128 rows */
	tworows t8,  /* produces 256 rows */
	tworows t9,  /* produces 512 rows */
	tworows t10,  /* produces 1024 rows */
	tworows t11,  /* produces 2048 rows */
	tworows t12,  /* produces 4096 rows */
	tworows t13,  /* produces 8192 rows */
	tworows t14,  /* ... */
	tworows t15,
	tworows t16,
	tworows t17,
	tworows t18,
	tworows t19,
	tworows t20,
	tworows t21,
	tworows t22,
	tworows t23,
	tworows t24   /* produces 16777216 rows */
)

  COUNT(*)
----------
  16777216

Elapsed: 00:00:06.90

The real savings for large results found with Method 2 however, comes from the ability to “tune” it. Specifically, there is nothing stopping us from increasing the the number of records initially produced in the WITH clause subquery from 2, to say 4. Due to Oracle’s efficiency in MERGE JOIN CARTESIANS, the dominant database operation resulting from Method 2, the savings can be significant:


> WITH fourrows AS
( SELECT 1 FROM dual
  union all
  SELECT 1 FROM dual
  union all
  SELECT 1 FROM dual
  union all
  SELECT 1 FROM dual
)
select count(*) from (
SELECT	sysdate - rownum thedate,
	mod(abs(dbms_random.random), 20) + 1 thevalue
FROM	fourrows t1,  /* produces 4 rows */
	fourrows t2,  /* produces 16 rows */
	fourrows t3,  /* produces 64 rows */
	fourrows t4,  /* produces 256 rows */
	fourrows t5,  /* produces 1024 rows */
	fourrows t6,  /* produces 4096 rows */
	fourrows t7,  /* produces 16384 rows */
	fourrows t8,  /* produces 65536 rows */
	fourrows t9,  /* produces 262144 rows */
	fourrows t10, /* produces 1048576 rows */
	fourrows t11, /* produces 4194304 rows */
	fourrows t12  /* produces 16777216 rows */
)

  COUNT(*)
----------
  16777216

Elapsed: 00:00:03.54

 

Conclusion
For all but the largest record sets, Method 1 is ideally suited for dynamically generating artificial rowsets using a query.

 
Direct E-mail John Weicher
PIOCON Technologies Logo

The Art of Writing a Complex Oracle 10g SQL Query: Part 1

Published in Complex SQL, SQL by gpike Thursday April 19, 2007

“One is all you need!” Its the SingleQuery.com slogan and my personal philosophy: In the Oracle database, just about anything can (and should) be done with a Single Query. Sure, there are clear cases where a Single Query is not the best option; for example, when you need to perform detailed row-level auditing or error handling logic. And placing complex logic into queries may yield an almost unreadable jumble of DECODES, CASE statements, PARTITION BYs, sub-queries and in-line views. Extensive programmer comments are a must for the complex Single Query; even this author cannot remember what his last Single Query is doing without some reminders.

When speed is of the essence (when isn’t it!), a Single Query is the only way to go. Show me just about any PL/SQL cursor FOR LOOP (or a Java looping structure or whatever else you can dream up) and I’ll write you a Single Query that will dramatically outperform the loop every time. Why? Because the Oracle database is a highly-efficient, multi-threaded computational engine just itching for a tough query to solve.

Now that I’ve hopefully convinced you to attempt a Single Query, how do we go about crafting this monster. Oracle SQL gives us lots of tools to choose from; we need only decide what tools best meet our needs:

  • Result set selectivity is accomplished with good old fashioned inner joins and WHERE clauses. Parallel processing (see our post on the PARALLEL hint) provides the power to handle large data sets.

  • Dealing with incomplete or sparse data requires the use of outer joins and, optionally, Cartesian products.

  • Conditional logic is handled with DECODEs and CASE statements.

  • Complex, set-based calculations are performed with analytic functions combined with GROUP BY or PARTITION BY clauses.

  • Inter-row data sharing (row 10 knowing about a column in row 11) is handled by windowing functions.

  • Spreadsheet-style calculations are enabled with the MODEL clause.

  • Step-by-step logic, complex operations and query readability requires liberal use of in-line views.

This last point is critical to writing the Single Query. An in-line view is really just a query of its own with its result set exposed as a virtual table to a higher-level query. In the strictest sense, I suppose everything cannot be done in a single query, because complex queries will include the SELECT keyword many times. Argue semantics with me if you must; I say if it produces a single result set, its a Single Query.

In Part 2, I will construct a query that begins with a single-column table and produces a complex result set. This practical example is designed to illustrate a repeatable process of constructing a complex Single Query step by step.

Continue to part 2 –>


The Pickle
Greg Pike - gpike@piocon.com

The Art of Writing a Complex Oracle 10g SQL Query: Part 2

Published in Complex SQL, SQL by gpike Thursday April 19, 2007

<-- Back to Part 1

Learning By Example: The Store Transactions Query

Let’s move on to an example of building a complex query from scratch. In this example, a store captures the date and time for each sales transaction. Our goal is to determine which three-hour interval during the typical day has the highest average transactions (the most customers) so the store manager can determine when to increase the sales staff.

Our query will return the total transactions that occurred in each hour of the day, the three-hour moving average (and the rank compared to the other three-hour time periods) as well as the difference from the average transactions/hour for the entire day. For those that might want to try this query at home, the raw data and INSERT statements can be found by clicking here.

The simple source table contains only one column: time_of_day (DATE type). Every row in the table represents the date and time that a single sales transaction occurred (51 rows). From the SALES table, we will produce the following complex result set with a SingleQuery:

 Time of    Txns  3 Hour 3  Hour   Daily    Diff. from
   Day             Avg.  Ranking  Average   Daily Avg.
--------------------------------------------------------
 8:00 A.M.    1    2.0      8       5.1       -61%
 9:00 A.M.    3    4.7      6       5.1       - 8%
10:00 A.M.   10    6.7      4       5.1       +31%
11:00 A.M.    7    5.7      5       5.1       +12%
12:00 A.M.    0    8.7      1       5.1       +71% <-- No rows in the table!
 1:00 P.M.   19    8.3      2       5.1       +63%
 2:00 P.M.    4    8.0      3       5.1       +57%
 3:00 P.M.    1    2.3      7       5.1       -55%
 4:00 P.M.    2    2.3      7       5.1       -55%
 5:00 P.M.    4    2.0      8       5.1       -61%

I know what you might be thinking, “This would be a breeze with a CURSOR FOR LOOP and a couple of PL/SQL tables or VARRAYS. I could step through all of the rows in the table, add them to the appropriate array buckets, do a few division operations and out comes the result!” Well, that method may work, but its not the preferred solution for SingleQuery.com.

Creating the SingleQuery:

To construct any complex query, always start with the basics and iterate; add additional parts to the query and continually assure yourself that the query is functioning properly as it is built. Attempting to create a complex query with all logic included from the beginning will quickly prove a frustrating and perhaps futile experience. For this query, the first, obvious step is to determine the total number of transactions by hour since these are the first two columns. We will leave this portion of the query in blue throughout the rest of the example:

SELECT   TRUNC(time_of_day, 'HH24') hour_of_day,
         count(*) total_txns
FROM     sales
GROUP BY TRUNC(time_of_day, 'HH24');

Our desired result set includes 10 rows (see above), but when this query is executed it produces only 9 rows since no transactions occurred during the noon hour. But we need that zero-transaction row for our report! Even with zero transactions, the noon hour had the highest three-hour average for the day. For our store manager, the noon hour is right in the middle of the busy time. Next step: Get back the missing row.

Adding an In-Line Views, a Cartesian Product and an Outer Join

Since any hour might have zero transactions, we need to construct a “model day” (a set of rows that contains all of the possible store operating hours) and Outer Join this to the real data to manufacture the “missing” hours. This cannot be accomplished at the individual transaction level (pre-GROUP BY) since the 51 individual transaction rows have not been aggregated into hours. Enter the In-Line view and a Cartesian Product. To construct a model day, create a query that will always return the distinct store operating hours (assume the store is open from 8:00 AM to 6:00 PM everyday). We will leave this portion of the query in red throughout the rest of the example:

SELECT  days.day+(hours.hour/24) model_day_hours
FROM     (
          SELECT 7+rownum hour
          FROM   all_objects
          WHERE  rownum <11
         ) hours,
         (
          SELECT DISTINCT trunc(time_of_day) day
          FROM sales
         ) days;

The HOURS In-Line view gets the numbers 8 through 17 (the operating hours in HH24 format) and this is Cartesian joined (no join condition) to the DAYS In-Line view which returns a distinct set of the each day where even a single transaction occurred.

——
Note #1: Although our sample data includes only one day, the query should work for any number of days.
Note #2: I used the ALL_OBJECTS view here since we can be assured that this data dictionary view will always contain at least 10 rows. There are other ways to manufacture data sets of any size and you can learn about those here at SingleQuery.com.
——

To manufacture the missing zero-transaction row into the result set, outer join the “model day” query to the real data set using another set of In-Line Views (aliased as MDL and TXN). We will leave the new portions of the query in green throughout the rest of the example:

SELECT   mdl.model_day_hours,
         nvl(txn.total_txns,0) total_txns
FROM     
         (
          SELECT   TRUNC(time_of_day, ‘HH24′) hour_of_day,
                   count(*) total_txns
          FROM     sales
          GROUP BY TRUNC(time_of_day, ‘HH24′)
         ) txn,
         (
          SELECT  days.day+(hours.hour/24) model_day_hours
          FROM     (
                    SELECT 7+rownum hour
                    FROM   all_objects
                    WHERE  rownum <11
                   ) hours,
                   (
                    SELECT DISTINCT trunc(time_of_day) day
                    FROM sales
                   ) days
         ) mdl
WHERE    txn.hour_of_day(+) = mdl.model_day_hours;

This is starting to look a little complex! We have now introduced 4 in-line views with a Cartesian Product and an outer join. Of course, if you had an HOURS_OF_OPERATION table, you could have substituted it in place of the entire MDL In-Line view. But that’s not fun since we don’t learn anything about constructing the complex SingleQuery.

Adding the Daily and 3-hour Averages:

Although the result set from the above query contains the correct hourly rows, we have not much furthered our goal to calculate average transactions for the day as well as the three-hour averages. These calculations are preformed by using Oracle’s advanced analytic functions:

Daily Avg:     AVG(nvl(txn.total_txns,0))
               OVER (PARTITION BY day) daily_avg

3-hour Avg:    AVG(nvl(txn.total_txns,0))
               OVER (PARTITION BY day
                     ORDER BY mdl.model_day_hours
                     ROWS BETWEEN 1 PRECEDING
                     AND 1 FOLLOWING) hr_avg

The Daily Average is a relatively straightforward calculation that simply produces the average transactions per hour for the day. Each and every row of the result set will include this daily value. Notice the nvl(txn.total_txns,0) which is required to turn outer-joined rows from NULLs into usable zeros. The 3-hour average is calculated by “virtually ordering” the table by the hour of day and then applying a moving 3 hour window. The ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING clause automatically handles the ends of the days when the average is calculated using only two rows. These types of calculations only scratch the surface of what is available with advanced analytics; a full discussion of the nuances of the PARTITION_BY clause is beyond the scope of this article (but coming soon). These two additional columns, plus the time of day are added to the above query in green:

SELECT   mdl.model_day_hours,
         to_char(mdl.model_day_hours,'HH24') model_time,   <-- new column
         nvl(txn.total_txns,0) total_txns,
         AVG(nvl(txn.total_txns,0))                      <-- new column
           OVER (PARTITION BY day
               ORDER BY mdl.model_day_hours
               ROWS BETWEEN 1 PRECEDING
               AND 1 FOLLOWING) hr_avg,
         AVG(nvl(txn.total_txns,0))               <-- new column
           OVER (PARTITION BY day) daily_avg
FROM     
         (
          SELECT   TRUNC(time_of_day, ‘HH24′) hour_of_day,
                   count(*) total_txns
          FROM     sales
          GROUP BY TRUNC(time_of_day, ‘HH24′)
         ) txn,
         (
          SELECT  days.day+(hours.hour/24) model_day_hours
          FROM     (
                    SELECT 7+rownum hour
                    FROM   all_objects
                    WHERE  rownum <11
                   ) hours,
                   (
                    SELECT DISTINCT trunc(time_of_day) day
                    FROM sales
                   ) days
         ) mdl
WHERE    txn.hour_of_day(+) = mdl.model_day_hours;

Adding in the Rank and Returning the Final Result Set:

The last missing columns are the RANK() of the three-hour averages and the percentage difference of the current three-hour average to the daily average. Due to limitations of the RANK() analytic function, it is required to first calculate the three-hour average and then determine the ranks; calculating a RANK() on a PARTITION_BY will yield an Oracle error. To eliminate this problem, we employ yet one more in-line view and finally order the final result set:

SELECT   model_day_hours,
         model_time,
         total_txns,
         hr_avg,
         RANK() OVER ORDER BY hr_avg DESC) daily_rank,
         daily_avg,
         (hr_avg - daily_avg)/daily_avg*100 percent_diff
FROM
  (
   SELECT mdl.model_day_hours,
          to_char(mdl.model_day_hours,'HH24') model_time,
          nvl(txn.total_txns,0) total_txns,
          AVG(nvl(txn.total_txns,0))
            OVER (PARTITION BY day
                  ORDER BY mdl.model_day_hours
                  ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
                 ) hr_avg,
          AVG(nvl(txn.total_txns,0))
            OVER (PARTITION BY day) daily_avg
   FROM
     (
      SELECT   TRUNC(time_of_day, 'HH24') hour_of_day,
               count(*) total_txns
      FROM     sales
      GROUP BY TRUNC(time_of_day, 'HH24')
     ) txn,
     (
      SELECT days.day,
             days.day+(hours.hour/24) model_day_hours
      FROM
        (
         SELECT 7+rownum hour
         FROM   all_objects
         WHERE  rownum <11
        ) hours,
        (
         SELECT DISTINCT trunc(time_of_day) day
         FROM sales
        ) days
     ) mdl
   WHERE    txn.hour_of_day(+) = mdl.model_day_hours
  )
ORDER BY model_day_hours;

This truly is a complex SingleQuery but taken in its respective parts, not that difficult to understand or create. As with anything, creating a complex SingleQuery takes patience and practice. Might I also add that this query has not been optimized in any way. In the shorter part three, we will take this query and modify it yet again to run efficiently.

Continue To Part 3–>

The Pickle

The Art of Writing a Complex Oracle 10g SQL Query: Part 3

Published in Complex SQL, SQL by gpike Thursday April 19, 2007

<-- Back to Part 2

Finishing the SingleQuery: Adding Efficiency Alternatives

The query from Part II is a powerful tool to extract our desired result set, but it may not be the best way to perform the query in real-life circumstances. A SingleQuery must be both accurate and efficient to be a viable solution. The query (repeated below), presents several opportunities for improvement and this is the final step in the exercise.

Adding a WITH clause to eliminate multiple passes over the source table:

In the following query, please note the highlighted areas that both access the SALES table. In the first section, the query is grouping all of the individual transactions into days and includes totals. The second section simply creates the list of distinct days from SALES table to facilitate creation of the “model day” explained in Part II. An explain plan of this query shows that the SALES table will need two full table scans to complete, however the information needs for the second section is simply a subset of the information pulled for the first section:

SELECT   model_day_hours,
         model_time,
         total_txns,
         hr_avg,
         RANK() OVER (ORDER BY hr_avg DESC) daily_rank,
         daily_avg,
         (hr_avg - daily_avg)/daily_avg*100 percent_diff
FROM
  (
   SELECT mdl.model_day_hours,
          to_char(mdl.model_day_hours,'HH24') model_time,
          nvl(txn.total_txns,0) total_txns,
          AVG(nvl(txn.total_txns,0))
            OVER (PARTITION BY day
                  ORDER BY mdl.model_day_hours
                  ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
                 ) hr_avg,
          AVG(nvl(txn.total_txns,0))
            OVER (PARTITION BY day) daily_avg
   FROM
     (
      SELECT   TRUNC(time_of_day, ‘HH24′) hour_of_day,
               count(*) total_txns
      FROM     sales
      GROUP BY TRUNC(time_of_day, ‘HH24′)
     ) txn,
     (
      SELECT days.day,
             days.day+(hours.hour/24) model_day_hours
      FROM
        (
         SELECT 7+rownum hour
         FROM   all_objects
         WHERE  rownum <11
        ) hours,
        (
         SELECT DISTINCT trunc(time_of_day) day
         FROM sales
        ) days
     ) mdl
   WHERE    txn.hour_of_day(+) = mdl.model_day_hours
  )
ORDER BY model_day_hours;

As an alternative, the WITH clause (often refered to as Subquery Factoring) can be employed to select from SALES one time and use this aliased query throughout the rest of the statement. In the following example, the WITH clause and references to the new alias are indicated in red:


WITH txn AS
    (SELECT    /*+ parallel (s 2) */ TRUNC(time_of_day, ‘HH24′) hour_of_day,
               count(*) total_txns
      FROM     sales s
      GROUP BY TRUNC(time_of_day, ‘HH24′)
      )
SELECT   model_day_hours,
         model_time,
         total_txns,
         hr_avg,
         rank() OVER (ORDER BY hr_avg DESC) daily_rank,
         daily_avg,
         (hr_avg - daily_avg)/daily_avg*100 percent_diff
FROM
  (
   SELECT mdl.model_day_hours,
          to_char(mdl.model_day_hours,’HH24′) model_time,
          nvl(txn.total_txns,0) total_txns,
          AVG(nvl(txn.total_txns,0))
            OVER (PARTITION BY day
                  ORDER BY mdl.model_day_hours
                  ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
                 ) hr_avg,
          AVG(nvl(txn.total_txns,0))
            OVER (PARTITION BY day) daily_avg
   FROM txn,
     (
      SELECT days.day,
             days.day+(hours.hour/24) model_day_hours
      FROM
        (
         SELECT 7+rownum hour
         FROM   all_objects
         WHERE  rownum <11
        ) hours,
        (
         SELECT DISTINCT trunc(hour_of_day) day
         FROM txn
        ) days
     ) mdl
   WHERE    txn.hour_of_day(+) = mdl.model_day_hours
  )
ORDER BY 1;

Unfortunately, as efficiencies are added to queries, it can make them more difficult to decipher and it becomes critical to maintain proper comments for future developers (and for the author!).

Other Query Changes and Alternative Tuning Techniques:

There are several possible changes that an be made to improve the response time of this SingleQuery. In tests with 10 million rows, response times of less than 5 seconds were obtained using a combination of the following techniques:

1) Replace the calculation of the model day with a table of operating hours.
2) Replace the selection of distinct days with a materialzied view.
3) Add a /*+ parallel */ hint to improve the full table scan of the SALES table.
4) Cache the SALES table into memory.

The Pickle

Dealing with Oracle 10g’s PARTITION BY and Bitmap Indexes Bug

Published in Database Tips, Quick Tips, SQL by jweicher Thursday March 29, 2007

While being an incredibly powerful and efficient way of indexing certain types of data, it is now relatively well accepted that Oracle’s Bitmap Indexes still have some kinks and sometimes cause erratic behavior in queries, particularly when dealing with parallel queries.

Unfortunately, this erratic behavior can even crop up when parallelism has been set to zero both on a given table, and in a query itself, if you are trying to execute even simple queries that contain PARTITION BY clauses. Unfortunately, there often appears to be no rhyme or reason as to when queries will exhibit this behavior when querying one bitmap-indexed table versus another.

Nonetheless, you may have found yourself faced with the following situation, and simply need a workaround.

Consider:

A fact table, ORDER_ITEM_F of the following general structure:

ORDER_ITEM_F
record_pk NUMBER NOT NULL
loc_id NUMBER NOT NULL
prod_id NUMBER NOT NULL
quantity NUMBER NOT NULL
cost NUMBER NOT NULL

This table has numerous records already in it:


> SELECT count(*) FROM order_item_f;

  COUNT(*)
----------
    657802

Additionally, among others, a bitmap index has been created on the location ID field (loc_id).

> CREATE BITMAP INDEX order_item_f_b_idx_11 ON order_item_f (loc_id)

Next, for some internal reporting you would like to query this table to find counts of order items per sales location, while attaching a total for the set to each row (perhaps you are using this query in an Oracle Report and doing some nice formatting).

A simple query could be as follows:


> SELECT  distinct loc_id,
          count(*) over (partition by loc_id) loc_total,
          count(*) over () tbl_total
  FROM    order_item_f
  ORDER
  BY      loc_id;

What you expect to receive from such a query is a row for each LOC_ID value, which includes a count of records having that LOC_ID, and an extra column containing the total number of records in the table (this value will be the same for each row returned).

Unfortunately, what you receive instead falls into one of the two scenarios below:

Scenario 1: Inaccurate counts based on full table count query (example above):


    LOC_ID  LOC_TOTAL  TBL_TOTAL
---------- ---------- ----------
        17         12         89
        18         12         89
        19          8         89
        20          9         89
        21          1         89
        22          3         89
        23          4         89
        24          6         89
        25          6         89
        26          1         89
        27         11         89
        28          6         89
        29          4         89
        30          3         89
        31          1         89
        44          1         89
        45          1         89

Scenario 2: ORA-0600 - The Dread Internal Error


ERROR at line 1:
ORA-00600: internal error code, arguments: [kxfqupp_bad_cvl], [7215], [4],
 [0], [], [], [], []

Enabling autotrace (>SET AUTOTRACE ON) and re-running the query reveals that the database attempted to retrieve your results using the bitmap index:


Execution Plan
----------------------------------------------------------
Plan hash value: 2270429775

-------------------------------------------------------------------------
| Id  | Operation                      | Name                  | Rows
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                       |    16
|   1 |  SORT UNIQUE                   |                       |    16
|   2 |   WINDOW SORT                  |                       |    16
|   3 |    WINDOW BUFFER               |                       |    16
|   4 |     BITMAP INDEX FAST FULL SCAN| ORDER_ITEM_F_B_IDX_11 |   657K
-------------------------------------------------------------------------

An inspection of the plan shows that ~657K records are in fact accounted for by the fast full scan of the bitmap index. Yet in the case of Scenario 1 above, the counts coming back clearly do not reflect this total (if they come back at all!).

The PARALLEL/BITMAP bug strikes!

While a much more in-depth discussion of the cause for this bug could follow, sometimes you just need a practical fix:

The Fix: Full Table Scan Hint

To quickly resolve your problem, a simple hint can be added to your SELECT query, to force a full table scan instead in place of the bitmap index scan:


> SELECT  /*+ FULL(order_item_f) */
          distinct loc_id,
          count(*) over (PARTITION by loc_id) loc_total,
          count(*) over () tbl_total
  FROM    order_item_f
  ORDER
  BY      loc_id;

    LOC_ID  LOC_TOTAL  TBL_TOTAL
---------- ---------- ----------
        17     127674     657802
        18     136815     657802
        19      49540     657802
        20      51782     657802
        21       3908     657802
        22      11463     657802
        23      33415     657802
        24      29190     657802
        25      34141     657802
        26       1901     657802
        27      97483     657802
        28      31405     657802
        29      32619     657802
        30      15248     657802
        31          1     657802
        44       1090     657802
        45        127     657802

17 rows selected.

 
Direct E-mail John Weicher
PIOCON Technologies Logo

Oracle 10g Parallel Execution and the /*+ PARALLEL */ hint

Published in Complex SQL, SQL, Tuning by gpike Tuesday March 27, 2007

The /*+ PARALLEL(table degree) */ hint may actually be my favorite query tuning tool. Actually, my favorite tool is the one that makes whatever query I’m tuning run faster, but I pull this hint out of the tool chest all the time. The PARALLEL hint is powerful, but potentially dangerous if used inappropriately. With most hints, usually the worst you can do is make your own query run faster (or slower!), but with PARALLEL you can significantly and adversely affect an entire database if you’re not careful.

The PARALLEL hint is just one way to take advantage of Oracle parallel execution. During parallel processing, large database operations are split into multiple pieces for multi-threaded execution. Parallel execution can be used in queries as well as DML (i.e. INSERT and UPDATE), DDL (i.e. CREATE INDEX) and in other ways not covered here. The Oracle documentation does a good job of fully explaining the inner sanctum of parallel execution, so we will not spend to much time on that topic. Let’s take a closer look at PARALLEL as a hint (it can also be used as a clause in a SQL statement) and what it can do for you…and to you!

The once-named Parallel Query Option (PQO) was first introduced in Oracle 7 and is primarily concerned with multi-threading large SQL operations. For our purposes, a large SQL operation typically means full table scans, sorts and hash joins, although index range scans on partitioned tables can occur in parallel with the PARALLEL_INDEX hint. Nested loops (index look ups) are not by themselves executable in parallel but these operations can be done in conjunction with a full table scans in parallel. So only certain types of queries will benefit from a PARALLEL hint. Queries that already include one or more full table scans are the most likely candidates. But, as we will see, queries that exclusively use nested loops (index look ups) may also benefit with a PARALLEL hint if the query is also asked to substitute full table scans for the nested loops.

Assuming that the parallel execution is available in your database (the DBA can set the maximum and minimum size of the process pool as well as many other parameters that affect parallel execution), you can take advantage of parallel processing in a number of ways. If a table is created or altered to include a DEGREE of parallelism, queries will automatically take advantage of parallel execution when large SQL operations are involved. However, it has been my experience that tables are rarely created with a parallelism DEGREE, since the ramifications for the database can be significant. As an example, query execution plans that once used indexes might decide to use full table scans and hash joins if the underlying table provides an inherent DEGREE of parallelism. Actually, the query does not decide anything - the Cost Based Optimizer decides; but that is a topic for another day. DEGREE, most simply, is the number a parallel processes that the database will spawn to execute the large SQL operation.

Let’s take a look at a “slow” query and how the PARALLEL can help it along. In this example, the “really_big_table” table has 10M rows and the “lookup table” has 100 rows. Of special note is lookup_id = 1 which has 5M rows in the really_big_table (50% of the table) while the other 99 lookup_ids are evenly distributed (about 50K rows a piece).

SELECT lut.lookup_name, count(*)
FROM   really_big_table rbt,
          lookup_table  lut
WHERE rbt.lookup_id = lut.lookup_id
GROUP BY lookup_name
ORDER BY lookup_name;

In our Oracle 10G database, the query execution plan looks like this:

parallel_no_parallel.jpg

This is the expected result. We need to look at every row in the really_big_table and join this to the lookup_table to get the lookup name. After the query completes the table scan, it hash joins the results and sorts the output to get the count(*) and sorts again to get us the final order. Its an efficient plan, but its also a single-threaded approach and if this query is running on a 16 processor machine, we can do better. If we add the PARALLEL hint with the table name (if the table is aliased, you may use the alias instead of the name) and the number of parallel threads (DEGREE of 2 in this case) we want to use, we get the following query with hint:

SELECT /*+ PARALLEL(rbt 2) */ lut.lookup_name, count(*)
FROM   really_big_table rbt,
          lookup_table  lut
WHERE rbt.lookup_id = lut.lookup_id
GROUP BY lookup_name
ORDER BY lookup_name;

Please note that the arguments to the PARALLEL hint are separated by a space, not a comma as if so often mistaken. A comma will actually cause the hint to be ignored in 9i, but in 10g it works both ways. This query produces a dramatically different query execution plan:

parallel.jpg

For those of you not used to parallel processing, this execution plan may look a little strange. All the extra lines here represent the operations necessary to take a serialized query, perform parallel execution, and finally gather the results back into a final result set (we promise to discuss the intricacies of these plans in more detail in a future post). In this particular case, the new, parallelized query ran 40% faster than the original query although the execution plan really didn’t change much. If you look closely, you will see that this query still performed 2 table scans, 1 hash join and some sorting. Its magic! Without making a single change to the actual query (except adding a PARALLEL hint), we have conjured a 40% improvement. Of course its not magic, we just commandeered more of the hardware resources. I performed these tests on a small windows box with two processors. As the following CPU usage diagrams show, the first query used one processor only while the parallelized query used both processors in parallel.

CPU without a PARALLEL hint CPU Usage with a PARALLEL 2
No Parallel Hint Parallel execution with a degree of 2

If two processes were, good 4, 8 or even 16 must be all that much better. Obviously, this will depend entirely on the amount of processing power and disk I/O your hardware can muster. The amount of time savings will degrade as system resources are exhausted. Ultimately, queries that use too many parallel slaves (or too many parallel queries running at the same time) will degrade system performance for all other users. So care must be taken in choosing the proper degree of parallelism. For maximum effectiveness, 2 X number of available processors is a good rule of thumb. Even if you have a dedicated server, at some point adding parallel query processors will not appreciably increase the speed of query execution. The following CPU diagram shows the same query (on my small windows box) executed with 4, 8 and 16 degrees of parallelism (see the three big CPU bumps).


Too much parallelism!

As it turns out, the CPU usage and query execution time did not change at all! Since there was available excess CPU time, this query likely was limited by disk I/O. But for enterprise quality hardware configurations, the PARALLEL hint will gobble up whatever resources it can get away with. Of course, there are many ways the DBA can limit the available resources, but in environments where parallel execution is not used such governors may not be in place. The result: A greedy PARALLEL hint can bring a database server to its knees. However, it has been this author’s experience that the PARALLEL hint can significantly improve query performance using resources that may be underutilized anyway. The PARALLEL hint can make you a hero!

Part 2: Monitoring Parallel Query Execution –>

Click here to ask SingleQuery.com a question.


Greg Pike

Greg Pike
PIOCON Technologies Website

105 queries. 0.890 seconds.
Powered by Wordpress
theme by cmoanz