Contact SingleQuery

Published in Uncategorized by gpike Tuesday December 19, 2006

Using Oracle’s WITH Clause (Subquery Factoring)

Published in SQL by gpike Monday December 18, 2006

The Oracle WITH clause is an incredibly powerful tool available since Oracle9i R2 that enables the user to create virtual views that become reusable via an alias throughout the main body of a query. A WITH clause (officially called a Subquery Factoring Clause) is pre-pended to a SELECT statement. The power of the WITH clause becomes evident in situations where a subquery, or indeed any portion of a query, is used in more than one location throughout a SELECT statement. Since the results from a WITH clause are calculated only once, dramatic performance improvements may be realized. As an added benefit, WITH clauses tend to simplify the look of a query since repeated sections are centralized and aliased. The basic syntax for a SELECT statement using a WITH clause may look a little strange at first and some tools (i.e. Oracle Reports in some cases) may be unable to properly parse this construct:

WITH
 alias_name         -- Alias to use in the main query
AS
 (insert a query here)
SELECT...            -- Beginning of the query main body

It should be noted that multiple aliases can be defined in the WITH clause:

WITH
 alias_name1
AS
 (query1)
 aleas_name2
AS
 (query2)
SELECT...

As previously indicated, the intended use of a WITH clause is to increase efficiency by eliminating repeated query sections in a SELECT statement. The following query (which we realize could be written in a more efficient manner!), returns customers from a fictitious SALES table that were found in the top ten for both January and February. This query is constructed using two in-line views, one for the January calculation and one for February. Notice the duplication in the in-line views aliased as “jan” and “feb”; only the date ranges are different:

For this exercise, we employ the simple table Sales
 with three columns:
    customer_name  VARCHAR2;
    sales_date         DATE;
    customer_sales   NUMBER;

SELECT  top_10_jan.customer_name,
        top_10_jan.total_sales_jan,
        top_10_feb.total_sales_feb
FROM    (
         SELECT customer_name,
                total_sales_jan
         FROM   (
                 SELECT customer_name,
                        SUM(customer_sales) total_sales_jan
                 FROM   sales
                 WHERE  sales_date
                       BETWEEN '01-jan-06' AND '31-JAN-06'
                 GROUP BY customer_name
                 ORDER BY 2 DESC
                ) jan
         WHERE rownum <11
        ) top_10_jan,
        (
         SELECT customer_name,
                total_sales_feb
         FROM   (
                 SELECT customer_name,
                        SUM(customer_sales) total_sales_feb
                 FROM   sales
                 WHERE  sales_date
                     BETWEEN '01-FEB-06' AND '28-FEB-06'
                 GROUP BY customer_name
                 ORDER BY 2 DESC
                ) feb
         WHERE rownum <11
        ) top_10_feb
WHERE top_10_jan.customer_name = top_10_feb.customer_name;

By substituting WITH clause in place of the two in-line views, the above query can be rewritten in a more efficient manner:

WITH
tot_sales
AS
(
SELECT customer_name,
                sales_date,
                total_sales,
                RANK() OVER (PARTITION BY sales_date
                       ORDER BY total_sales DESC) month_rank
         FROM   (
                 SELECT customer_name,
                        TRUNC(sales_date,'MONTH') sales_date,
                        SUM(customer_sales) total_sales
                 FROM   sales
                 WHERE  sales_date >= '01-JAN-06'
                 AND      sales_date <= '28-FEB-06'
                 GROUP BY customer_name,
                 TRUNC(sales_date,'MONTH')
                ) jan
)
SELECT tot_jan.customer_name,
           tot_jan.total_sales,
           tot_feb.total_sales
FROM    tot_sales tot_jan,
           tot_sales tot_feb
WHERE  tot_jan.sales_date = '01-JAN-06'
AND      tot_feb.sales_date = '01-FEB-06'
AND      tot_jan.month_rank <11
AND      tot_feb.month_rank <11
AND      tot_jan.customer_name = tot_feb.customer_name;

Just how much more efficient is our new query? Explain plans for each query using Oracle 10G R2 and a table with 25000 rows yields the following query costs:

Query method 1: In-line views

Query Method 2: WITH Clause

The query using in-line views shows a cost of 1449, but re-writing the query to use a WITH clause shows a drop to 861! This big savings comes from the elimination of a second full table scan of the SALES table. A WITH clause calculates its result set only once and re-uses it throughout the query. In our simulations, this query actually resulted in a 40% improvement in execution time. Not a bad little tuning trick!

We realize of course that this query could have been written without the need for in-line views OR the WITH clause and perhaps even more efficiently. Take for example the following query, which has a cost of 727 (but in practice was actually slower than the WITH query in terms of execution time):

SELECT customer_name,
       jan_sales,
       feb_sales
FROM
    (
     SELECT  customer_name,
             jan_sales,
             feb_sales,
             RANK() OVER (ORDER BY jan_sales DESC) jan_rank,
              RANK() OVER (ORDER BY feb_sales DESC) feb_rank
      FROM
         (
          SELECT customer_name,
                 SUM(DECODE(TRUNC(sales_date,'MONTH'),
                 '01-JAN-06', customer_sales,0)) jan_sales,
                 SUM(DECODE(TRUNC(sales_date,'MONTH'),
                 '01-FEB-06', customer_sales,0)) feb_sales
          FROM   sales
          WHERE  sales_date >= '01-JAN-06'
          AND      sales_date <= '28-FEB-06'
          GROUP BY customer_name
         )
     )
WHERE jan_rank <11
AND     feb_rank <11;

Clearly, there are many ways to solve a problem, and Subquerey Factoring adds another powerful tool to facilitate writing both efficient and elegant queries. Use it, but remember that the WITH clause comes with limitations and sometimes returns unexpected results. But more on that in a different post.

Pivot table Techniques

Published in SQL by gpike Thursday December 14, 2006

In an effort to gain the storage efficiencies of the First Normal Form (1NF), one often finds the resulting table less then ideal for reporting purposes. To eliminate the need for complex join conditions against 1NF tables without utilizing denormalization, use the “pivot table or view”. As demonstrated, the data is both more readable and reportable. For purposes of this discussion, assume an employee compensation table (EMP) in the 1NF:

Employee ID Comp Type Amount
1 SALARY $35,000
1 BONUS $5,000
2 SALARY $55,000
3 SALARY $100,000
3 BONUS $10,000

Assumptions and company business rules:

1. There are only two types of compensation: BONUS and SALARY.
2. All employees have a SALARY but only some have BONUS.

This table seems ideal for efficient storage of the compensation data. For reporting purposes, however, this table is less that perfect. A table with one row per employee makes visualization and summary rollup of the data more feasible:

Employee ID Salary Bonus Total Comp.
1 $35,000 $5,000 $40,000
2 $55,000 $0 $55,000
3 $100,000 $10,000 $110,000
Totals: $190,000 $115,000 $205,000

The Self-Join Technique:

The self-join technique consists of joining the source table to itself once for each non-derived attribute that appears in the summary table:

SELECT   sal.id Id,
         sal.amount Salary,
         nvl(bon.amount,0) Bonus,
         sum(sal.amount + nvl(bon.amount,0)) Com_Sum
FROM     emp sal,
         emp bon
WHERE    sal.id = bon.id(+)
AND      ‘SALARY’ = sal.comp_type
AND      ‘BONUS’ = bon.comp_type(+)
GROUP BY sal.id,
         sal.amount,
         nvl(bon.amount,0);

In the self-join technique, the EMP table is outer-joined to itself. Since all employees are required to posses one and only one salary, the first instance of the EMP table (aliased as “sal” in the query) will produce one row for each employee. Without joins to the “sal” table, the second instance of EMP (aliased as “bon”) will produce rows only for those employees that receive bonuses. To obtain a row for each employee regardless of compensation type, “bon” is outer-joined to “sal” with “sal” acting as the driving table.

The single-table, decode Technique:

This method is especially useful in cases where multiple non-derived attributes are necessary resulting in an inordinate number of self-joined source tables.

SELECT sal.id Id,
            sum(decode,comp_type,'SALARY',sal.amount,0) Salary,
            sum(decode,comp_type,'BONUS',sal.amount,0) Bonus
FROM emp sal,
GROUP BY sal.id;

The Union All Technique:

This method replaces the decode constructs from the previous method with a in-line view. The Union All technique creates one logical table for each non-derived attribute and aggregates the data via an in-line view and a GROUP BY:

SELECT id,
            sum(sal) Salary,
            sum(nvl(bon,0)) Bonus,
            sum(sal+nvl(bon,0)) Com_Sum
FROM       (
            SELECT id,
                   amount sal,
                   0 bon
            FROM   emp
            WHERE  comp_type = 'SALARY'
            UNION ALL
            SELECT id,
                   0,
                   amount
            FROM   emp
            WHERE  comp_type = 'BONUS'
           )
GROUP BY id;

This query simulates the outer join from the self-join method. Every row of interest from the EMP table is selected in the in-line view with each UNION ALL query extracting the primary key plus one non-derived attribute. The global query aggregates the data by grouping over the primary key. Please note that each UNION ALL query must provide all of the possible columns. In this case, the first query selects the salary and populates a NULL for the bonus while the second query populates a NULL for salary and selects the bonus.

The in-line view produces the following result set:

Employee ID Salary Bonus
1 $35,000 $0
1 $0 $5,000
2 $55,000 $0
3 $100,000 $0
3 $0 $10,000

Welcome to SingleQuery.com

Published in About SingleQuery.com by singlequery Tuesday December 12, 2006

Welcome to SingleQuery.com, your one-stop source for understanding the nuances of advanced Oracle query and database development techniques. Your SingleQuery.com moderators are Greg Pike and Jeremy Simmons, Principal consultants with PIOCON Technologies. As long-time technology consultants, we have witnessed first-hand the rise of the Tools Era in database development. Although the query writing experience is much simplified with advanced GUIs and code generators, we believe developers are forgetting their roots. SingleQuery.com invites you to join us as we re-learn the fine art of Query Authoring and problem solving.

Greg Pike
gpike@piocon.com

Jeremy Simmons
jsimmons@piocon.com

About

Published in Uncategorized by singlequery Thursday December 7, 2006

This is an example of a WordPress page, you could edit this to put information about yourself or your site so readers know where you are coming from. You can create as many pages like this one or sub-pages as you like and manage all of your content inside of WordPress.

62 queries. 0.535 seconds.
Powered by Wordpress
theme by cmoanz