Using Bind Variables in Filters for Oracle 10g Remote Queries

Published in ETL Tips by jweicher Wednesday February 28, 2007

A common task when building ETL (Extract-Transform-Load) processes is the extraction of data from remote data sources. In many cases these sources are non-Oracle based, such as Microsoft SQL Server. Typically, the extraction operations are performed through queries using database links to the remote source, which themselves are built atop an underlying Oracle Heterogeneous Services(HS) connection (or, “Generic Connectivity”) .

Whether due to concerns of network bandwidth utilization or the minimization of impact and potential locking of source system tables (depending on the system), it is also a very common requirement in such situations to limit the amount of data actually being pulled across the link, and to limit long-running queries. The obvious solution is to do as much filtering as is possible on the source data before pulling it across the link.

Unfortunately, when using database links to remote non-Oracle systems, some types of filtering can be difficult because of incompatibilities between the Oracle database and the source system; specifically, differing lists of supported SQL functions.

Consider the following simplified example:

You have an automated ETL operation that runs once a day, every day. The operation pulls transactional data from a remote SQL Server system into a local staging schema for further processing later on in the ETL pipeline. The source table has the following structure:

TRANSACT
Field: Type:
tran_pk NUMBER NOT NULL
tran_type VARCHAR2(10) NOT NULL
tran_amt NUMBER NOT NULL
tran_date DATE NOT NULL

A simple task on its face, the concerns mentioned above impose the added requirement that only the last 45 days worth of transactional data should be extracted, as updates to records are allowed up until that point. As this ETL runs every day, the 45-day window is obviously rolling.

A first attempt at this process would be to use a simple query similar to the following:

> SELECT * FROM transact@sserver WHERE tran_date >= (sysdate - 45);

Unfortunately, a look at the explain plan for this query shows that what is actually sent to the source system for execution does not include the filter using sysdate:


> EXPLAIN PLAN FOR SELECT * FROM transact@sserver
          WHERE "tran_date" >= (sysdate - 45);

Explained.

> SELECT plan_id,
         operation,
         object_type,
         other,
         id,
         parent_id,
         filter_predicates
 from    plan_table;

explain_03.jpg

SQL Server does not have a function sysdate()! Therefore, Oracle is forced to execute the query on the source side without the filter, pulling the entire result set across the link, and apply the filter after! We can see this by looking at the “OTHER” column of the explain plan table (which is the query actually passed to the source system).


The Solution: A “Remote Parameter” Table

As the query is actually parsed and dispatched from the Oracle database, we can’t simply substitute the SQL Server equivalent of the sysdate function. We can however, force Oracle to instead use a bind variable to pass the actual filter date in the query at execution time. We can do this with a “Remote Parameter” table.

Consider a table with the following structure:

REMOTE_PARAM_DATES
Field: Type:
param_name VARCHAR2(25) NOT NULL
param_value DATE NOT NULL

The purpose of this table is to store any slowly changing date values used in remote queries. An example of this table’s content could be:

PARAM_NAME PARAM_VALUE
tran_cutoff 10-JAN-07 12:00:00

An updated version of the extraction query can now join in the filtering value from this table, instead of relying on it to be calculated on the source-system side:


> SELECT  *
  FROM    transact@sserver t,
          remote_param_dates r
  WHERE   t.tran_date >= r.param_value
  AND     r.param_name = 'tran_cutoff';

And another look at the explain plan confirms the effect (only a few columns shown below for space):

explain_04.jpg

Notice the filter now being applied to the query that is actually being sent to the source side; a filter which uses a bind variable into which Oracle will put the actual date value used to filter the source data.

Admittedly, there might be those who would consider this a less than ideal solution. However, if filtering of source data is required in cases where the filter predicates must be dynamically generated using functions not supported by the source system, this method works well. Values in any remote parameter tables can be easily updated before query execution, using a pre-mapping process (another common ETL procedure), and then used in the queries themselves.

Finally, this is of course not the only option, such as creating a view on the source side that does this same filtering using the source system’s native function set. However, in large-scale ETL projects where modifications to source systems is either discouraged, not feasible or not allowed, this technique can be a good substitution.

 
Direct E-mail John Weicher
PIOCON Technologies Logo

SingleQuery welcomes our newest author John Weicher

Published in About SingleQuery.com by gpike Tuesday February 27, 2007
PIOCON stock photo SingleQuery is proud to announce the addition of PIOCON Technologies, Inc. Staff Consultant John Weicher as our newest author. As an expert with Oracle Warehouse Builder (OWB) as well as many other Oracle technologies, John will be expanding the scope of SingleQuery into uncharted territory. As an added benefit, John is the spitting image of Maurice Gibb which is of course why PIOCON hired him in the first place!

Read the alert log with SQL

Published in Database Tips by cdawes Tuesday February 27, 2007

There are often times when you want to know what has been recorded in the database’s alert log, but may not want to or be able to log into the database server and view the file. An SQL interface to the alert log can offer several advantages – you don’t need an account on the database server and you can read the alert log from SQL and PLSQL programs (which can be incorporated into other programs). This posting will show you how to create a usable SQL interface to the alert log. Then using this interface you can write simple SQL to show the alert log entries for the past hour or all the alert log entries that resulted from errors so far this month.

External tables, introduced in database release 9.0 give us the ability to access the alert log using SQL. The IGNORE NULLS option to the LAST analytic function, introduced in database release 10g, allows us to really make use of the dates we find in the alert log.

In this example, we'll create all the schema objects in the SYSTEM schema and run the commands as user SYS (we can’t grant ourselves privileges, so we can’t run all of the statements as SYSTEM)

The first block of code below does three things, first is to create a directory for the bdump_dest – the directory location of the alert log. We call it appropriately BDumpDir. Next, we need to grant the read privilege to SYSTEM, so this directory can be used for the external table. Finally, we create the external table on the file alert_sid.log:


DECLARE
BDumpDir VARCHAR2(200);
SID VARCHAR2(16);
ObjectExists EXCEPTION;
PRAGMA EXCEPTION_INIT(ObjectExists,-955);
BEGIN
-- get the bdump dir
SELECT value
INTO BDumpDir
FROM v$parameter
WHERE name='background_dump_dest';
-- create the directory for the bdump dir
EXECUTE IMMEDIATE 'CREATE OR REPLACE DIRECTORY bdump_dir AS '''||
BDumpDir||'''';
-- grant the necessary privileges
EXECUTE IMMEDIATE 'GRANT READ ON DIRECTORY bdump_dir TO system';
-- get the SID
SELECT instance_name INTO SID FROM v$instance;
-- create the external table
EXECUTE IMMEDIATE 'CREATE TABLE system.ALERT_LOG_EXTERNAL
(TEXT VARCHAR2(255)
) ORGANIZATION EXTERNAL
(TYPE ORACLE_LOADER
DEFAULT DIRECTORY BDUMP_DIR
ACCESS PARAMETERS
(records delimited by newline
nobadfile
nologfile
)
LOCATION (''alert_'||SID||'.log'')
)
REJECT LIMIT UNLIMITED'
;
-- ignore ORA-955 errors (object already exists)
EXCEPTION WHEN ObjectExists THEN NULL;
END;
/

Now we can query the external table to read the alert log.

SELECT * FROM system.alert_log_external
WHERE ROWNUM < 20;

TEXT
------------------------------------------------------------------------------
Thu Jan 25 14:58:36 2007
Thread 1 advanced to log sequence 227
Current log# 1 seq# 227 mem# 0: C:\INFO\ORACLE\ORADATA\ORA102\REDO01.LOG
Dump file c:\info\oracle\product\10.2.0\admin\ora102\bdump\alert_ora102.log
Fri Jan 26 09:18:36 2007
ORACLE V10.2.0.1.0 - Production vsnsta=0
vsnsql=14 vsnxtr=3
Windows XP Version V5.1 Service Pack 2
CPU : 2 - type 586
Process Affinity : 0×00000000
Memory (Avail/Total): Ph:671M/2046M
Fri Jan 26 09:18:36 2007
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Picked latch-free SCN scheme 2
Using LOG_ARCHIVE_DEST_10 parameter default value as USE_DB_RECOVERY_FILE_DEST
Autotune of undo retention is turned on.
IMODE=BR

19 rows selected.

This is a first step, but the log file isn’t really very usable - the entries are all timestamped, but the timestamp appears inline with the entries. We need to associate the timestamp with each line in the file, so we can query the alert log based on timestamp. We know the timestamp appears in a consistent format, so we can look for these timestamps and convert them to a date datatype. We create a function to do this for us, so we can trap and ignore any errors related to invalid dates. The alert_log_date function looks for strings in the expected timestamp format and converts them to a date. If the line in the alert log does not appear in a timestamp format (ORA-1846 is raised), we ignore it.


CREATE OR REPLACE FUNCTION system.alert_log_date( text IN VARCHAR2 )
RETURN DATE
IS
InvalidDate EXCEPTION;
PRAGMA EXCEPTION_INIT(InvalidDate, -1846);
BEGIN
RETURN TO_DATE(text,'Dy Mon DD HH24:MI:SS YYYY'
,'NLS_DATE_LANGUAGE=AMERICAN');
EXCEPTION
WHEN InvalidDate THEN RETURN NULL;
END;
/

So now, we can get the alert log entry and the timestamp, if present:

SELECT ROWNUM row_num ,system.alert_log_date(text) alert_date, text
FROM system.alert_log_external
WHERE ROWNUM < 20
/

Error Log Query, Image 1

This is progress, but we really want that datestamp carried down so each line of text is marked with the datestamp. This is where the IGNORE NULLS option to the LAST function comes into play. We use our working query, but then wrap another query around it.


SELECT row_num
,LAST_VALUE(alert_date IGNORE NULLS) OVER(ORDER BY row_num
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) alert_date
,alert_text
FROM (SELECT ROWNUM row_num
,system.alert_log_date(text) alert_date
,text alert_text
FROM system.alert_log_external
)
WHERE ROWNUM < 20
/

Alert Log, Image 2

So, now we get the line number, date stamp and text for each alert log entry. Hm.. It’s starting to look more like a good old table. Let’s add the starting line number for each alert log entry, so we can extract these entries easier. Again, we’ll build on the query we have by wrapping it with another. We get the starting line number based on the presence of a timestamp in the base table.


SELECT row_num
,LAST_VALUE(low_row_num IGNORE NULLS)
OVER(ORDER BY row_num ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) start_row
,LAST_VALUE(alert_date IGNORE NULLS)
OVER(ORDER BY row_num ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) alert_date
,alert_text
FROM (SELECT ROWNUM row_num
,NVL2(system.alert_log_date(text),ROWNUM,NULL) low_row_num
,system.alert_log_date(text) alert_date
,text alert_text
FROM system.alert_log_external
)
WHERE ROWNUM < 20
/

image_03.jpg

Finally, we put this nested query into a view to hide its complexity and create a public synonym on it. You can decide who you allow to read the alert log by granting them SELECT on the alert_log view.


CREATE OR REPLACE FORCE VIEW system.alert_log as
SELECT row_num
,LAST_VALUE(low_row_num IGNORE NULLS)
OVER(ORDER BY row_num ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) start_row
,LAST_VALUE(alert_date IGNORE NULLS)
OVER(ORDER BY row_num ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) alert_date
,alert_text
FROM (SELECT ROWNUM row_num
,NVL2(system.alert_log_date(text),ROWNUM,NULL) low_row_num
,system.alert_log_date(text) alert_date
,text alert_text
FROM system.alert_log_external
)
;

DECLARE
ObjectExists EXCEPTION;
PRAGMA EXCEPTION_INIT(ObjectExists,-955);
BEGIN
EXECUTE IMMEDIATE 'CREATE PUBLIC SYNONYM alert_log FOR system.alert_log';
-- If the synonym exists, drop and recreate it
EXCEPTION WHEN ObjectExists THEN
EXECUTE IMMEDIATE 'DROP PUBLIC SYNONYM alert_log';
EXECUTE IMMEDIATE 'CREATE PUBLIC SYNONYM alert_log FOR system.alert_log';
END;
/


Now with our usable view built, we can look for alert log entries with SQL. For example; let’s see the alert log entries for the past hour:

SELECT row_num, alert_text
FROM alert_log
WHERE alert_date > SYSDATE - 1/24
/

image_04.jpg

Let’s see all the alert log entries for the past month that resulted from errors (contain the text ORA-).


SELECT row_num, alert_text
FROM alert_log
WHERE start_row IN (SELECT start_row
FROM alert_log
WHERE REGEXP_LIKE(alert_text,'ORA-')
)
AND alert_date > TRUNC(SYSDATE,'MON')
/

image_05.jpg

The next step, that I’ll leave to you, is to write the SQL interface to read these trace files…
Happy coding!

SingleQuery welcomes Chip Dawes - our new contibuting Author

Published in About SingleQuery.com by gpike Thursday February 22, 2007

You probably have Chip Dawes to thank for getting you out of that tough spot a few years back. Perhaps you pulled out your handy Oracle PL/SQL Language Pocket Reference to figure out that syntax you couldn’t quite remember. Or maybe you aced your recent certification exam after you studied with the Ocp: Oracle 10g Certification guide. Chip authored these books and many others including:

OCP: Oracle 10g Certification Kit (1Z0-042 and 1Z0-043) Amazon
OCA: Oracle 10g Workshop I Study Guide Amazon
OCA/OCP: Introduction to Oracle9I SQL Study Guide Amazon
OCP: Oracle9I Certification Kit Amazon
OCP: Oracle8I Dba SQL and Pl/SQL Study Guide Exam 1Z0-001 Amazon
Oracle PL/SQL Language Pocket Reference Amazon
Oracle PL/SQL Built-ins Pocket Reference Amazon

SingleQuery welcomes Chip Dawes as our newest contributing Author. We don’t really know what gems of knowledge Chip will be offering up here at SingleQuery, but you would be wise to listen up. Because when we say “Chip wrote the book on that”…we mean it!

Caching The Results in Oracle 11g with /*+ result_cache */

Published in Hints, Oracle 11g, Quick Tips by jsimmons Wednesday February 21, 2007

select /*+ result_cache */ * from my_table

New for Oracle 11g, the result_cache hint caches the result set of a select statement. This is similar to:

alter table my_table cache

But as you can imagine adding columns or predicates makes /*+ result_cache */ considerably more powerful by caching a subset of larger tables and common queries.

select /*+ result_cache */ colA, colB, colC
from my_table where colA := myBindVar

Cartesian Products: Don’t Fear this Reaper

Published in Quick Tips, SQL by gpike Saturday February 3, 2007

Myth: From the Oracle® Database 10g SQL Reference “A Cartesian product always generates many rows and is rarely useful.”

Fact: The Cartesian product is a powerful tool that can help in situations where it is known in advance that every combination of every row in two or more tables is needed.

The Cartesian product (named for the French philosopher and mathematician Rene Descartes, but there appears to be evidence that he is not the actual inventor) is generated when two tables in a query contain no join condition. The Cartesian product produces a result set with all possible ordered pairs from the two (or more) tables. For example, a Cartesian product with two tables containing 10 rows each yields a result set of 100 rows.

Typically, when a Cartesian product pops up in a query, it’s a mistake that produces those irritating 50M row result sets. Oooops! But there are also very valid situations that can benefit from the strategically-placed Cartesian product. Use Cartesian products where sparse data requires an entry (usually a 0 or a NULL). These situations may arise in Business Intelligence or other applications where the results are rendered graphically.

Data warehouse example:

TABLE DATES_D
    date_id    NUMBER
    full_date  DATE

TABLE PRODUCT_D
    product_id    NUMBER
    product_name  VARCHAR2

Table ORDER_F:
    date_id      NUMBER
    product_id   NUMBER
    total_orders NUMBER

To ensure that any query against the ORDERS table will produce a row for every date and product combination, a Cartesian product is used. Please note that the Cartesian product is placed in an in-line view purely for emphasis.

SELECT full_date,
       product_name,
       nvl(total_orders,0) -- Make the "manufactured"
                           -- rows zeros
FROM
       (
        SELECT date_id,
               full_date,
               product_id,
               product_name
        FROM   dates_d d,  -- Woah, a Cartesian product!
               product_d p -- Now we have a complete
                           -- set of every possible product
                           -- and every possible day.
       ) temp,
       order_f o
WHERE  o.date_id    = temp.date_id(+)
AND    o.product_id = temp.product_id(+)
-- The outer joins ensure a row even for products that
-- do not have any sales on a given day

There are endless situations where a Cartesian product can be a valuable tool. Of course, it is important to remember that creating a Cartesian product may not be in your database’s best interest. If the DATE_D table in the above example contained every day since the Pyramids were constructed and the PRODUCT_D table listed every item found in your local grocery store, you might find this Cartesian to be a very, very bad idea. But in situations where the cardinality is known, don’t fear the much-maligned Cartesian product.

74 queries. 0.537 seconds.
Powered by Wordpress
theme by cmoanz