Sei sulla pagina 1di 18

external tables in oracle 9i

http://www.oracle-developer.net/display.php?id=204
External tables enable us to read flat-files (stored on the O/S) using SQL. They have been
introduced in Oracle 9i as an alternative to SQL*Loader. External tables are essentially stored
SQL*Loader control files, but because they are defined as tables, we can access our flat-file data
using all available read-only SQL and PL/SQL operations. We can also read flat-files in parallel
and join files to other files or tables, views and so on.
This article is a short introduction to external tables and how we can benefit from them.

creating an external table


We will begin by creating a simple external table that mimics the EMP table. Before we can
create the table, however, we require an Oracle directory. A directory (also new in 9i) is an
Oracle object that "points" to a filesystem location where our source flat-files are stored (as an
aside, note that directory objects can also be used to replace utl_file_dir in UTL_FILE read-write
operations). In the following example, we will create a directory named XT_DIR.
SQL> CREATE DIRECTORY xt_dir AS 'd:\oracle\dir';
Directory created.

Note that to create a directory, we require the CREATE ANY DIRECTORY system privilege.
Directories are system-wide, so if we cannot get the privilege, a DBA can create the directory
and grant READ or WRITE to our users as required (READ for external table users and
READ/WRITE as appropriate for UTL_FILE users).
We will reference this directory in our external table definition as follows. Some explanation of
the new syntax follows the table creation.
SQL>
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

CREATE TABLE emp_xt


( empno
NUMBER(4)
, ename
VARCHAR2(10)
, job
VARCHAR2(9)
, mgr
NUMBER(4)
, hiredate
DATE
, sal
NUMBER(7,2)
, comm
NUMBER(7,2)
, deptno
NUMBER(2)
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY xt_dir
ACCESS PARAMETERS
(
RECORDS DELIMITED by NEWLINE

18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

BADFILE 'emp_xt.bad'
LOGFILE 'emp_xt.log'
NODISCARDFILE
FIELDS TERMINATED BY ','
( empno
, ename
, job
, mgr
, hiredate CHAR(20) DATE_FORMAT DATE MASK "DD/MM/YYYY"
, sal
, comm
, deptno
)

)
LOCATION ('emp.dat')

)
REJECT LIMIT UNLIMITED;

Table created.

We have now created an external table and we can see a wide range of new and extended syntax.
In particular, we can see something that looks similar to (but not quite the same as) a
SQL*Loader control file. Some points to note are as follows.

Line 11: the ORGANIZATION EXTERNAL clause tells Oracle that we are creating an
external table;
Line 13: the ORACLE_LOADER driver is a new type that "powers" external tables;

Line 14: we can set a DEFAULT DIRECTORY once for the entire table if we wish. This
states that, unless we tell Oracle otherwise, all files created or read as part of this table
will reside in this named directory. In most cases, we will wish to write
log/bad/discardfiles to a logging directory and read our incoming data files from a data
directory. For simplicity, we have used a single XT_DIR in our examples for all files;

Lines 15-31: the ACCESS PARAMETERS clause contains the SQL*Loader-style


reference to enable Oracle to parse the flat-file into rows and columns. At this time,
external tables do not offer the extensive range of parse options that SQL*Loader
provides, yet still cater for most loading requirements. Note that if we have made any
syntactical errors in our ACCESS PARAMETERS, we can still create the external table.
Access parameters themselves are not parsed until we issue a SELECT against the
external table;

Lines 22-30: in this example, we have listed each of the fields in the file and table. Like
SQL*Loader, the ORACLE_LOADER external table driver assumes that all incoming
fields are CHAR(255) unless we state otherwise. In the EMP_XT table, we have a DATE
column that needs to be converted as on line 26, hence the need to list all fields.
Otherwise, we can simply use the default parsing specified in the FIELDS clause, which
in our simple example only lists the field delimiter;

Line 32: the LOCATION clause is where we specify the input file(s). Note that we don't
have a file named "emp.dat" at the time of table creation; this is not necessary (we shall
discuss this later); and

Line 34: similar to the ERRORS= clause of SQL*Loader, we can specify a REJECT
LIMIT for an external table. This is the number of bad records we will tolerate before the
load is failed.

There are numerous additions to the example table we created above. It is not possible to
describe all of the various permutations (they are in the online documentation) but other options
include: altering global field-parsing properties; altering field-specific properties (as we saw with
the DATE formatting above); LOAD WHEN clauses/DISCARD files; specifying
characterset/endianness; specifying delimiters and changing field trimming behaviour.
External tables are read-only. We cannot create indexes on them, nor can we issue DML against
them (although they can be sourced in DML statements, they cannot be the target).

using external tables


Now we have created our EMP_XT external table, we can use it as follows. Remember at this
stage we don't have the emp.dat file that the table expects (as per the LOCATION) clause). We
can generate a simple csv-file from the existing EMP table as follows (using the oracledeveloper.net DATA_DUMP utility).
SQL> BEGIN
2
data_dump(
3
4
5
6 END;
7 /

query_in
file_in
directory_in
nls_date_fmt_in

=>
=>
=>
=>

'SELECT * FROM emp',


'emp.dat',
'XT_DIR',
'DD/MM/YYYY' );

PL/SQL procedure successfully completed.


SQL> host dir d:\oracle\dir\emp.dat
Volume in drive D is USER
Volume Serial Number is 7476-8930
Directory of d:\oracle\dir
04/08/2002

18:57
633 emp.dat
1 File(s)
633 bytes
0 Dir(s) 26,696,744,960 bytes free

Now we are ready to select from our external table.


SQL> SELECT * FROM emp_xt;
EMPNO ENAME
DEPTNO

JOB

MGR HIREDATE

SAL

COMM

------------------7369
20
7499
30
7521
30
7566
20
7654
30
7698
30
7782
10
7788
20
7839
10
7844
30
7876
20
7900
30
7902
20
7934
40

---------- --------- ---------- ---------- ---------- ---------SMITH

CLERK

7902 17/12/1980

800

ALLEN

SALESMAN

7698 20/02/1981

1600

300

WARD

SALESMAN

7698 22/02/1981

1250

500

JONES

MANAGER

7839 02/04/1981

2975

MARTIN

SALESMAN

7698 28/09/1981

1250

BLAKE

MANAGER

7839 01/05/1981

2850

CLARK

MANAGER

7839 09/06/1981

2450

SCOTT

ANALYST

7566 09/12/1982

3000

KING

PRESIDENT

17/11/1981

5000

TURNER

SALESMAN

7698 08/09/1981

1500

ADAMS

CLERK

7788 12/01/1983

1100

JAMES

CLERK

7698 03/12/1981

950

FORD

ANALYST

7566 03/12/1981

3000

MILLER

CLERK

7782 23/01/1982

1300

1400

14 rows selected.

It is clear from the above that if we can query a flat-file via an external table, then we have
fundamentally changed the way in which we can load external data into our database! In fact, we
may not even need to load the file at all as we can query it directly (why bother to stage data that
will be used once and replaced every day?).

external table metadata


We can see information about our EMP_XT table in several dictionary views as follows (note
that XXX is a placeholder for USER, ALL and DBA).

XXX_TABLES;
XXX_ALL_TABLES;

XXX_EXTERNAL_TABLES; and

XXX_EXTERNAL_LOCATIONS.

The last two are specific to external tables only. For general information on our external table we
can query USER_EXTERNAL_TABLES, which is structured as follows.

SQL> desc USER_EXTERNAL_TABLES


Name
----------------------------------TABLE_NAME
TYPE_OWNER
TYPE_NAME
DEFAULT_DIRECTORY_OWNER
DEFAULT_DIRECTORY_NAME
REJECT_LIMIT
ACCESS_TYPE
ACCESS_PARAMETERS

Null?
Type
-------- ----------------NOT NULL VARCHAR2(30)
CHAR(3)
NOT NULL VARCHAR2(30)
CHAR(3)
NOT NULL VARCHAR2(30)
VARCHAR2(40)
VARCHAR2(7)
VARCHAR2(4000)

As its name suggests, the XXX_EXTERNAL_LOCATIONS views list the file(s) that an external
table is currently "pointed to". We can read multiple files via a single external table and these can
be in different directories. Currently, our EMP_XT location is as follows.
SQL> SELECT *
2 FROM
user_external_locations
3 WHERE table_name = 'EMP_XT';
TABLE_NAME
LOCATION
DIRECTORY_OWNER
DIRECTORY_NAME
------------ ------------ ----------------- ---------------EMP_XT
emp.dat
SYS
XT_DIR
1 row selected.

modifying location
Remember that the emp.dat file did not exist at the time we created the EXP_XT table. If we try
to query an external table that has an incorrect location clause, we receive the following error.
SQL> host del d:\oracle\dir\emp.dat
SQL> host dir d:\oracle\dir\emp.dat
Volume in drive D is USER
Volume Serial Number is 7476-8930
Directory of d:\oracle\dir
File Not Found
SQL> SELECT * FROM emp_xt;
SELECT * FROM emp_xt
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04040: file emp.dat in XT_DIR not found
ORA-06512: at "SYS.ORACLE_LOADER", line 14
ORA-06512: at line 1

Our EMP_XT external table currently has a single incoming flat-file (emp.dat) and this doesn't
exist. In operational systems, we are more likely to receive new files every day and these are

often uniquely named to distinguish one day's delivery from another (e.g. with a business date of
some format in the file name). The location clause of an external table can be modified to cater
for this without invalidating any dependencies (such as views or packages).
In the following example, we will create an emp_20020804.dat file and modify the EMP_XT
table to reference this new file. We will complete the example by selecting from it.
SQL> BEGIN
2
data_dump(
3
4
5
6 END;
7 /

query_in
file_in
directory_in
nls_date_fmt_in

=>
=>
=>
=>

'SELECT * FROM emp',


'emp_20020804.dat',
'XT_DIR',
'DD/MM/YYYY' );

PL/SQL procedure successfully completed.


SQL> ALTER TABLE emp_xt LOCATION ('emp_20020804.dat');
Table altered.
SQL> SELECT * FROM emp_xt;
EMPNO
DEPTNO
------------------7369
20
7499
30
7521
30
7566
20
7654
30
7698
30
7782
10
7788
20
7839
10
7844
30
7876
20
7900
30
7902
20
7934
40

ENAME

JOB

MGR HIREDATE

SAL

COMM

---------- --------- ---------- ---------- ---------- ---------SMITH

CLERK

7902 17/12/1980

800

ALLEN

SALESMAN

7698 20/02/1981

1600

300

WARD

SALESMAN

7698 22/02/1981

1250

500

JONES

MANAGER

7839 02/04/1981

2975

MARTIN

SALESMAN

7698 28/09/1981

1250

BLAKE

MANAGER

7839 01/05/1981

2850

CLARK

MANAGER

7839 09/06/1981

2450

SCOTT

ANALYST

7566 09/12/1982

3000

KING

PRESIDENT

17/11/1981

5000

TURNER

SALESMAN

7698 08/09/1981

1500

ADAMS

CLERK

7788 12/01/1983

1100

JAMES

CLERK

7698 03/12/1981

950

FORD

ANALYST

7566 03/12/1981

3000

MILLER

CLERK

7782 23/01/1982

1300

1400

14 rows selected.

investigating errors
External tables can optionally generate various log files in the same manner as SQL*Loader. In
our EMP_XT table, remember that we opted to create a logfile and badfile but didn't need a
discardfile (as we do not have a LOAD WHEN clause). For developers who are unfamiliar with
these three SQL*Loader-style files, their short descriptions are:

logfile: contains information on how the input files were parsed, the positions and error
messages for any rejected records and some other general information on the load such as
the number of row successfully read;
badfile: contains the rejected records (i.e. the records that couldn't be loaded for the
reasons given in the logfile); and
discardfile: contains the records that failed the LOAD WHEN clause (this clause is a
simple filter to prevent records with certain data characteristics from being loaded).

To investigate errors with an external table read or load, therefore, we have the same information
available to us as we did with SQL*Loader. It is likely that, when putting together access
parameters for the first few times, we will make mistakes. Oracle will not parse the parameters
when we create the table; rather they will be invoked when we try to read from the external table
(i.e. SELECT from it). Any syntactical errors will show up as a KUP-% message and will need to
be investigated in line with the online documentation (link provided at the end of this article).
Errors with the data, however, can be investigated by reference to the output files.
Note that logfiles are appended on each select from an external table so in a regular batch system
we might wish to uniquely name each output file for support and diagnostic purposes. Oracle
provides two switches to append to the file names in the LOGFILE, BADFILE and
DISCARDFILE clauses. These are %p (process ID) and %a (agent number for parallel query)
and are specified in the access parameters (e.g. LOGFILE 'emp_xt_%p.log'). Assuming each
select from the external table is performed in a new session, each output file is generated with a
new process ID.
It is unfortunate that we cannot use ALTER TABLE directly to modify the output file names as
we are more likely to want to append a date stamp to them (we can't do this because the file
names are part of the ACCESS PARAMETERS clause). We can, however, take advantage of the
fact that we can alter the access parameters to build a utility based somewhat loosely on the
following example. This is a very rough demonstration of the principles we can adopt to make a
log/bad/discardfile naming utility (obviously a "real" utility would be far more flexible and
parameterised). The following anonymous block finds the existing logfile name and replaces it
with "today's" logfile before using the external table.
SQL> DECLARE
2
3
v_access

VARCHAR2(32767);

4
v_replace VARCHAR2(128);
5
v_oldfile VARCHAR2(128);
6
v_newfile VARCHAR2(128) := 'emp_20020804.log';
7
8 BEGIN
9
10
SELECT access_parameters INTO v_access
11
FROM
user_external_tables
12
WHERE table_name = 'EMP_XT';
13
14
IF INSTR(v_access,'NOLOGFILE') > 0 THEN
15
v_access := REPLACE(
16
v_access, 'NOLOGFILE',
17
'LOGFILE ''' || v_newfile || ''''
18
);
19
ELSE
20
v_oldfile := SUBSTR(
21
v_access,
22
INSTR(v_access,'LOGFILE')+8,
23
(INSTR(v_access,'.log')+4)(INSTR(v_access,'LOGFILE')+7)
24
);
25
v_replace := REPLACE(REPLACE(v_oldfile, '"', ''''), '''');
26
v_access := REPLACE(v_access, v_replace, v_newfile);
27
END IF;
28
29
EXECUTE IMMEDIATE
30
'ALTER TABLE emp_xt ACCESS PARAMETERS (' || v_access || ')';
31
32 END;
33 /
PL/SQL procedure successfully completed.
SQL> SELECT access_parameters
2 FROM
user_external_tables
3 WHERE table_name = 'EMP_XT';
ACCESS_PARAMETERS
----------------------------------------------------------------------RECORDS DELIMITED by NEWLINE
BADFILE 'emp_xt.bad'
LOGFILE 'emp_20020804.log'
NODISCARDFILE
FIELDS TERMINATED BY ','
( empno
, ename
, job
, mgr
, hiredate CHAR(20) DATE_FORMAT DATE MASK "DD/MM/YYYY"
, sal
, comm
, deptno
)
1 row selected.

SQL> SELECT COUNT(*) FROM emp_xt;


COUNT(*)
---------14
1 row selected.
SQL> host dir d:\oracle\dir\emp*
Volume in drive D is USER
Volume Serial Number is 7476-8930
Directory of d:\oracle\dir
04/08/2002
04/08/2002
04/08/2002

19:09
633 emp_20020804.dat
21:54
2,558 emp_20020804.log
19:04
11,850 emp_xt.log
3 File(s)
15,041 bytes
0 Dir(s) 26,696,704,000 bytes free

advantages over sqlloader


We have seen some examples of external table syntax but have not yet explored why we might
use them over SQL*Loader. It is the case that SQL*Loader can parse and load almost any flatfile we wish to throw at it. External tables, on the other hand, cater for the more common
processing requirements. Despite this, their advantages over SQL*Loader are numerous, some of
which are as follows:

ease of use: external tables can be selected, sorted, filtered, joined, intersected, minused,
unioned and so on using SQL, the language most familiar database to Oracle developers.
Anything we can do in a SELECT statement with a "normal" table can be done with an
external table. This makes working with external flat-files very simple;
performance (1): reads/loads involving external tables can be parallelised. When
combined with direct path (in the case of INSERTs), this dramatically outperforms
SQL*Loader which has to load in serial mode. Parallelism can be set at the external table
level or more selectively in hints;

performance (2): as seen above, loading a table from an external table is faster than
SQL*Loader due to parallel query and DML. In addition to this, ETL processes that read
directly from external tables rather than pre-loaded staging tables are faster because they
do not incur the SQL*Loader step (i.e. data is only read once). For example, if an ETL
process takes 10 minutes in SQL*Loader and 10 minutes in SQL or PL/SQL loading,
using external tables directly in the latter process can eradicate up to 50% of the ETL
time;

disk space: external tables do not require any database space; only the space consumed
by the flat-files on the filesystem. SQL*Loader requires two copies of each file (one
inside the database in a staging table), so external tables are "cheaper" on disk;

error-trapping: SQL*Loader returns different codes based on its outcome. For batch
systems, this can be tricky because the error code is sometimes ambiguous. For example,
exit code 2 means "a bad file has been created". Of course, this may or may not be a
cause for concern. If we allow 50 bad records (errors=50), then the fact that a bad file
contains 1-49 bad records should not signal a batch failure. We therefore need to write
some clever code to interrogate the badfile or logfile to determine whether this is a "bad
code 2" or "acceptable code 2". With external tables, it is much more simple. Until we
reach the REJECT LIMIT, the SQL statement continues or completes successfully. If we
reach this limit, the statement fails;

debugging and support: external tables are equally useful as debugging and support
aids. For example, we can create further external tables over logfiles and badfiles to
investigate errors easily with SQL. DBAs can also create external tables over critical files
such as the alert log.

We can see that the advantages of external tables are compelling. In our batch systems, we can
use them in one of two ways and still achieve their benefits. First, and for maximum benefit, we
can completely remove the need for staging tables altogether and use the tables directly in our
ETL code. Second, and for medium benefit, we can use them as a straight replacement for
SQL*Loader and therefore load our staging tables in parallel direct-path.
There are, however, a few issues to be aware of that might limit the scope of how we use these
tables. For example, external tables can only have one location set at a time (i.e. different
sessions cannot share a table but set different locations). This means that their use for loading is
serialised for the length of time a specific location is required and in use (and probably should be
protected as such). If multiple sessions need to share the same table but load different files at the
same time, then either multiple tables must be created or some form of locking will be required.
For the latter scenario, the length of time a table is locked should be reduced to a minimum to
maximise concurrency. For this reason, multi-user external tables will probably not figure in
long-running batch processes.
In addition to this, external tables cannot be indexed. For most staging tables, this won't be an
issue as we would prefer hash joins for ETL batch queries, but in some cases indexes might be
required. For these, it would probably be sensible to use external tables to load the physical
staging tables rather than use them directly in the ETL queries.

external tables and the cbo


Because external tables are likely to be used in multi-table queries (possibly involving other
external tables in addition to base tables, lookup tables etc), it is critical that the optimizer knows
something of their contents. Fortunately, we can compute statistics using DBMS_STATS
(ANALYZE will not work). Note that the use of the word "compute" in the previous sentence
was deliberate as we must gather statistics on the entire flat-file(s) and not a sample of records.

The following example shows the "before and after" effects of gathering statistics on our
EMP_XT external table. Note how we instruct a full computation in DBMS_STATS (using the
estimate_percent parameter).
SQL> set autotrace traceonly explain
SQL> SELECT * FROM emp_xt;
Execution Plan
---------------------------------------------------------0
SELECT STATEMENT Optimizer=CHOOSE
1
0
EXTERNAL TABLE ACCESS (FULL) OF 'EMP_XT'
SQL> BEGIN
2
DBMS_STATS.GATHER_TABLE_STATS(user,'EMP_XT',estimate_percent=>NULL);
3 END;
4 /
PL/SQL procedure successfully completed.
SQL> SELECT * FROM emp_xt;
Execution Plan
---------------------------------------------------------0
SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=14 Bytes=518)
1
0
EXTERNAL TABLE ACCESS (FULL) OF 'EMP_XT' (Cost=2 Card=14
Bytes=518)
SQL> SELECT num_rows, blocks, last_analyzed
2 FROM
user_tables
3 WHERE table_name = 'EMP_XT';
NUM_ROWS
BLOCKS LAST_ANALYZED
---------- ---------- ------------14
1 04/08/2002
1 row selected.

Most flat-files will remain reasonably static in size or perhaps grow slowly over time. It is
probably sufficient in most cases to gather statistics on a representative file (or set of files) for
each external table once. These can then be locked until such time that a re-gather is deemed
necessary.

generating from sql*loader


Earlier in this article, a number of advantages of external tables over SQL*Loader were listed. It
was also noted that in some cases we might use external tables as a replacement for SQL*Loader
but continue to use staging tables. A new feature of SQL*Loader helps us to develop this
quickly; this is the EXTERNAL_TABLE option. This option generates an external table "load"
script from an existing SQL*Loader control file. Once we have this, the script can be cleaned

and modified as required. For existing loads, this dramatically removes the development time of
external tables and removes much of the time spent learning the new syntax through the
inevitable "trial and error" phases with the access parameters.
In the following example, we'll create a small SQL*Loader control file to load a staging table
named EMP_STG. We will use the new option EXTERNAL_TABLE=GENERATE_ONLY in
the control file and run it to generate a SQL script. Note that invoking SQL*Loader for this file
will not load any data. First, we can see the control file.
options (external_table=generate_only)
load data
infile 'd:\oracle\dir\emp.dat'
badfile 'd:\oracle\dir\emp.bad'
truncate
into table emp_stg
fields terminated by ','
trailing nullcols
(
empno
, ename
, job
, mgr
, hiredate date "dd/mm/yyyy"
, sal
, comm
, deptno
)

Note the options clause. The external_table=generate_only clause makes SQL*Loader run
"silently" and generate a logfile only. SQL*Loader is invoked as normal. The resulting logfile
contains the following SQL statements based on the contents of the control file. These statements
support an external table load to replace our SQL*Loader job.
CREATE DIRECTORY statements needed for files
-----------------------------------------------------------------------CREATE DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000 AS 'd:\oracle\dir\'
CREATE TABLE statement for external table:
-----------------------------------------------------------------------CREATE TABLE "SYS_SQLLDR_X_EXT_EMP_STG"
(
EMPNO NUMBER(4),
ENAME VARCHAR2(10),
JOB VARCHAR2(9),
MGR NUMBER(4),
HIREDATE DATE,
SAL NUMBER(7,2),
COMM NUMBER(7,2),
DEPTNO NUMBER(2)
)
ORGANIZATION external
(

TYPE oracle_loader
DEFAULT DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000
ACCESS PARAMETERS
(
RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8MSWIN1252
BADFILE 'SYS_SQLLDR_XT_TMPDIR_00000':'emp.bad'
LOGFILE 'emp.log_xt'
READSIZE 1048576
FIELDS TERMINATED BY "," LDRTRIM
MISSING FIELD VALUES ARE NULL
REJECT ROWS WITH ALL NULL FIELDS
(
EMPNO CHAR(255)
TERMINATED BY ",",
ENAME CHAR(255)
TERMINATED BY ",",
JOB CHAR(255)
TERMINATED BY ",",
MGR CHAR(255)
TERMINATED BY ",",
HIREDATE CHAR(255)
TERMINATED BY ","
DATE_FORMAT DATE MASK "dd/mm/yyyy",
SAL CHAR(255)
TERMINATED BY ",",
COMM CHAR(255)
TERMINATED BY ",",
DEPTNO CHAR(255)
TERMINATED BY ","
)
)
location
(
'emp.dat'
)
)REJECT LIMIT UNLIMITED
INSERT statements used to load internal tables:
-----------------------------------------------------------------------INSERT /*+ append */ INTO EMP_STG
(
EMPNO,
ENAME,
JOB,
MGR,
HIREDATE,
SAL,
COMM,
DEPTNO
)
SELECT
EMPNO,
ENAME,
JOB,
MGR,
HIREDATE,

SAL,
COMM,
DEPTNO
FROM "SYS_SQLLDR_X_EXT_EMP_STG"
statements to cleanup objects created by previous statements:
-----------------------------------------------------------------------DROP TABLE "SYS_SQLLDR_X_EXT_EMP_STG"
DROP DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000

Oracle has done the work for us and provided a script to create our EMP_XT external table. We
will probably choose to clean up certain elements of the generated code (such as the object
names, for example), but the hard work of converting a SQL*Loader load to an external table
load is done.

more on performance
All external table reads are direct path and with direct path inserts and parallel DML, external
tables will usually be quicker than SQL*Loader (which is serial). We've also seen that ETL
processes involving external tables can be faster because they do not require the "lead-in" time of
loading a staging table first. However, we have not yet looked at external tables compared with
"internal" tables (note that this how Oracle described such tables in the SQL*Loader logfile
above). In the following simple example, we will generate 1 million records in a heap table and a
flat-file and compare the time taken to scan this data.
First we will create a large table based on 1 million EMP records as follows.
SQL> CREATE TABLE million_emps
2 NOLOGGING
3 AS
4
SELECT e1.*
5
FROM
emp e1
6
,
emp, emp, emp, emp, emp
7
WHERE ROWNUM <= 1000000;
Table created.

We will write these records to a flat-file for our existing EMP_XT table to use.
SQL> BEGIN
2
data_dump(
3
4
5
6 END;
7 /

query_in
file_in
directory_in
nls_date_fmt_in

=>
=>
=>
=>

'SELECT * FROM million_emps',


'million_emps.dat',
'XT_DIR',
'DD/MM/YYYY' );

PL/SQL procedure successfully completed.


SQL> ALTER TABLE emp_xt LOCATION ('million_emps.dat');

Table altered.

Using the wall-clock and autotrace, we will compare a simple fetch of the data from both the
MILLION_EMPS and the EMP_XT tables, starting with the "internal" table.
SQL> set timing on
SQL> set autotrace traceonly statistics
SQL> SELECT * FROM million_emps;
1000000 rows selected.
Elapsed: 00:00:18.05
Statistics
---------------------------------------------------------0 recursive calls
0 db block gets
72337 consistent gets
5675 physical reads
0 redo size
13067451 bytes sent via SQL*Net to client
733825 bytes received via SQL*Net from client
66668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000000 rows processed

Next we can test the EMP_XT external table.


SQL> SELECT * FROM emp_xt;
1000000 rows selected.
Elapsed: 00:00:22.08
Statistics
---------------------------------------------------------192 recursive calls
0 db block gets
6282 consistent gets
0 physical reads
0 redo size
13067451 bytes sent via SQL*Net to client
733825 bytes received via SQL*Net from client
66668 SQL*Net roundtrips to/from client
6 sorts (memory)
0 sorts (disk)
1000000 rows processed

We can see that the external table is approximately 22% slower than the internal table on a single
read. If we re-run the fetches several times, similar timings are recorded. Note that autotrace does
not show any physical reads in its statistics for the external table (this is possibly a bug).

projection
Finally for this introduction, we will look at column projection and how it can affect the results
of an external table read. In the following example, we will modify the EMP_XT column to
shrink the maximum size of the ENAME column. When we select from the table, this modified
column will be the source of some bad records (as the data will be too wide). We will then select
from the EMP_XT table but without the "bad column" and compare the record counts.
First we will modify the ENAME column and set the location of EMP_XT to use the original
small flat-file.
SQL> ALTER TABLE emp_xt MODIFY ename VARCHAR2(5);
Table altered.
SQL> ALTER TABLE emp_xt LOCATION ('emp_20020804.dat');
Table altered.

We will now attempt to read all of the data from the flat-file as follows.
SQL> SELECT * FROM emp_xt;
EMPNO
DEPTNO
------------------7369
20
7499
30
7521
30
7566
20
7698
30
7782
10
7788
20
7839
10
7876
20
7900
30
7902
20

ENAME JOB

MGR HIREDATE

SAL

COMM

----- --------- ---------- ---------- ---------- ---------SMITH CLERK

7902 17/12/1980

800

ALLEN SALESMAN

7698 20/02/1981

1600

300

WARD

7698 22/02/1981

1250

500

JONES MANAGER

7839 02/04/1981

2975

BLAKE MANAGER

7839 01/05/1981

2850

CLARK MANAGER

7839 09/06/1981

2450

SCOTT ANALYST

7566 09/12/1982

3000

17/11/1981

5000

ADAMS CLERK

7788 12/01/1983

1100

JAMES CLERK

7698 03/12/1981

950

FORD

7566 03/12/1981

3000

KING

SALESMAN

PRESIDENT

11 rows selected.

ANALYST

Knowing the EMP table as well as we all do, we can see that 3 records are missing (the bad and
log files verify this as being related to ENAME data). What is interesting about external tables is
that we have seemingly erratic results depending on how we fetch from them. For example, if we
run a COUNT(*) as follows, we see that we have 14 records in our file as expected and not 11 as
fetched in the previous example.
SQL> SELECT COUNT(*) FROM emp_xt;
COUNT(*)
---------14
1 row selected.

These supposedly erratic results are not, in fact, erratic at all. This behaviour is related to column
projection. Oracle will not bother to project data unless it is actually required (it would be wasted
effort). With EMP_XT in its current format, if we don't ask Oracle to project the ENAME
column, we won't encounter any bad data.
We must also consider parsing and the difference between a field and a column (we have both in
an external table). In our example, Oracle can still parse the bad data. This is because our access
parameters clause defaults the ENAME field in the file to CHAR(255). This is easily wide
enough for all ENAME values in our flat-file. The ENAME column, however, is now defined as
VARCHAR2(5), so it is only when Oracle moves from parsing into reading and projecting that
errors are encountered. In other words, Oracle can parse every record of the ENAME field into a
CHAR(255), but it can't squeeze every record into a VARCHAR2(5) during projection.
To continue, as COUNT(*) is a special case, we should validate the projection behaviour with a
specific column. We will use COUNT(empno) as follows (EMPNO does not have any bad data).
SQL> SELECT COUNT(empno) FROM emp_xt;
COUNT(EMPNO)
-----------14
1 row selected.

Finally, if we project the ENAME column, we see the effects of the bad data once more.
SQL> SELECT COUNT(ename) FROM emp_xt;
COUNT(ENAME)
-----------11
1 row selected.

Incidentally, we haven't seen the effects of a violated reject limit as yet in this article. Seeing as
we have manufactured some bad data for the projection example, we can easily demonstrate this.

In the final example, we will set a 0 reject limit for EMP_XT and issue a SELECT against the
table.
SQL> ALTER TABLE emp_xt REJECT LIMIT 0;
Table altered.
SQL> SELECT * FROM emp_xt;
SELECT * FROM emp_xt
*
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEFETCH callout
ORA-30653: reject limit reached
ORA-06512: at "SYS.ORACLE_LOADER", line 14
ORA-06512: at line 1

Rather than a KUP error message, this time we get a meaningful, precise Oracle error that tells us
the issue with our external table read.

further reading
For more information on external tables, see Part III of the Utilities guide in the online
documentation.

source code
The source code for the examples in this article can be downloaded from here.