Using Bind Variables in Filters for Oracle 10g Remote Queries

Published in ETL Tips by jweicher Wednesday February 28, 2007

A common task when building ETL (Extract-Transform-Load) processes is the extraction of data from remote data sources. In many cases these sources are non-Oracle based, such as Microsoft SQL Server. Typically, the extraction operations are performed through queries using database links to the remote source, which themselves are built atop an underlying Oracle Heterogeneous Services(HS) connection (or, “Generic Connectivity”) .

Whether due to concerns of network bandwidth utilization or the minimization of impact and potential locking of source system tables (depending on the system), it is also a very common requirement in such situations to limit the amount of data actually being pulled across the link, and to limit long-running queries. The obvious solution is to do as much filtering as is possible on the source data before pulling it across the link.

Unfortunately, when using database links to remote non-Oracle systems, some types of filtering can be difficult because of incompatibilities between the Oracle database and the source system; specifically, differing lists of supported SQL functions.

Consider the following simplified example:

You have an automated ETL operation that runs once a day, every day. The operation pulls transactional data from a remote SQL Server system into a local staging schema for further processing later on in the ETL pipeline. The source table has the following structure:

TRANSACT
Field: Type:
tran_pk NUMBER NOT NULL
tran_type VARCHAR2(10) NOT NULL
tran_amt NUMBER NOT NULL
tran_date DATE NOT NULL

A simple task on its face, the concerns mentioned above impose the added requirement that only the last 45 days worth of transactional data should be extracted, as updates to records are allowed up until that point. As this ETL runs every day, the 45-day window is obviously rolling.

A first attempt at this process would be to use a simple query similar to the following:

> SELECT * FROM transact@sserver WHERE tran_date >= (sysdate - 45);

Unfortunately, a look at the explain plan for this query shows that what is actually sent to the source system for execution does not include the filter using sysdate:


> EXPLAIN PLAN FOR SELECT * FROM transact@sserver
          WHERE "tran_date" >= (sysdate - 45);

Explained.

> SELECT plan_id,
         operation,
         object_type,
         other,
         id,
         parent_id,
         filter_predicates
 from    plan_table;

explain_03.jpg

SQL Server does not have a function sysdate()! Therefore, Oracle is forced to execute the query on the source side without the filter, pulling the entire result set across the link, and apply the filter after! We can see this by looking at the “OTHER” column of the explain plan table (which is the query actually passed to the source system).


The Solution: A “Remote Parameter” Table

As the query is actually parsed and dispatched from the Oracle database, we can’t simply substitute the SQL Server equivalent of the sysdate function. We can however, force Oracle to instead use a bind variable to pass the actual filter date in the query at execution time. We can do this with a “Remote Parameter” table.

Consider a table with the following structure:

REMOTE_PARAM_DATES
Field: Type:
param_name VARCHAR2(25) NOT NULL
param_value DATE NOT NULL

The purpose of this table is to store any slowly changing date values used in remote queries. An example of this table’s content could be:

PARAM_NAME PARAM_VALUE
tran_cutoff 10-JAN-07 12:00:00

An updated version of the extraction query can now join in the filtering value from this table, instead of relying on it to be calculated on the source-system side:


> SELECT  *
  FROM    transact@sserver t,
          remote_param_dates r
  WHERE   t.tran_date >= r.param_value
  AND     r.param_name = 'tran_cutoff';

And another look at the explain plan confirms the effect (only a few columns shown below for space):

explain_04.jpg

Notice the filter now being applied to the query that is actually being sent to the source side; a filter which uses a bind variable into which Oracle will put the actual date value used to filter the source data.

Admittedly, there might be those who would consider this a less than ideal solution. However, if filtering of source data is required in cases where the filter predicates must be dynamically generated using functions not supported by the source system, this method works well. Values in any remote parameter tables can be easily updated before query execution, using a pre-mapping process (another common ETL procedure), and then used in the queries themselves.

Finally, this is of course not the only option, such as creating a view on the source side that does this same filtering using the source system’s native function set. However, in large-scale ETL projects where modifications to source systems is either discouraged, not feasible or not allowed, this technique can be a good substitution.

 
Direct E-mail John Weicher
PIOCON Technologies Logo

One Response to “Using Bind Variables in Filters for Oracle 10g Remote Queries”

  1. agha Says:

    what is exact substitute data type instead of varchar2

Leave a Reply

27 queries. 0.393 seconds.
Powered by Wordpress
theme by cmoanz