Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
14Feb2016.
OracleArticles
DesigningEfficientETLProcessesinOracle:
Introduction:
Dataconversiontheprocessofmigratingapplicationdataisamajorcomponentofmanydatabase
relatedprojects.Thisprocessessentiallyentailsthreesubprocesses:1)Extractingdatafromthe
sourcesystem,2)Transformingdatatotheformatrequiredbythetargetsystemand3)Loadingdata
intothetargetsystemhencetheabbreviationETL.Oracleoffersavarietyofdatabasefeaturesfor
implementingETLprocesses.Additionally,thereistherelativelynewOracleWarehouseBuilderafull
featuredETLtool.TherearealsoothertoolsbythirdpartyvendorssuchasInformatica.However,
despitethisrecentproliferationofETLtools,manyOraclebaseddatamigrationprojectscontinueto
usehomegrownprograms(mainlyPL/SQL)forETL.Thisismainlybecausesmallerprojectscannot
justifythecostandlearningtimerequiredforeffectiveuseofcommercialETLtools.Besides,doingit
oneselfoffersmuchmorecontrolovertheprocess.
Aswithanyprogrammingprojectthereareefficientand(infinitelymore)inefficientwaystodesign
andimplementETLprocesses.ThisarticleofferssometipsforthedesignofefficientETLprogramsin
anOracleenvironment.Thetipsarelistedinnoparticularorder.Ihopeyoufindthemhelpfulinyour
projects
Tip1:Useexternaltablestoloaddatafromexternalflatfiles
Thetraditionalworkhorseforloadingexternal(flatfile)dataintoOracleistheSQL*Loaderutility.
Sinceversion9i,externaltablesofferaneasier,andalmostasperformant,routetoloaddata.
Essentially,externaltablesenableustoaccessdatainanexternalfileviaanOracletable.Here'sa
simpleexample:
Thefollowingdataisavailableinatextfilecalledtest.txt:
scott,cio
kailash,dba
jim,developer
WewanttopullthisdataintoOracleusinganexternaltable.Thefirststepistocreateadirectory
objectinOracle,whichpointstotheoperatingsystemdirectoryinwhichtheabovefileresides.
Assumethisisanoperatingsystemdirectorycalledc:\test.TheSQLtomakethisdirectoryaccessible
fromOracleis:
createorreplacedirectorytest_diras'c:\test'
TheusermusthavethecreateanydirectoryprivilegeinordertoexecutethisSQL.Wethendefinean
externaltablebasedonourtextfile:
createtableext_test(
namevarchar2(10),
jobvarchar2(10))
organizationexternal
(typeoracle_loader
defaultdirectorytest_dir
accessparameters(fieldsterminatedby',')
location(test_dir:'test.txt'))
rejectlimit0
Itwouldtakeustoofarafieldtogointoeachoftheclausesintheabovestatement.ChecktheOracle
docsforfurtherdetails.TherelevantbookistheOracleUtilitiesGuide.Therearealsoseveralthird
partyarticlesavailableonlinedoagooglesearchon"oracleexternaltable"tolocatesomeofthese.
ToaccesstheexternaldatawesimplyissuethefollowingSQLstatementfromwithinOracle:
select*fromext_test
Thereareseveralaccessparameteroptionsavailabletohandlemorecomplicatedrecordformats
checktheOracledocumentationfordetails.
Tip2:Usesetbasedoperationstoprocessdata
MostETLprocessesdealwithlargenumbersofrecords.Eachrecordmustbecleanedatthefieldlevel
andthenloadedintothetargetsystem.ProgrammersunfamiliarwithSQLtendtouseprocedural
methods("for"loops,forexample)todealwithsuchsituations.Withsomethought,however,itis
oftenpossibletoconstructanSQLstatementthatisequivalenttotheproceduraltechnique.The
methodusingSQLshouldbepreferred,asSQLprocessestheentirebatchofrecordsasasingleset.
Providedthestatementisn'ttoocomplex,theSQLstatementwillgenerallyrunfasterthanthe
equivalentproceduralcode.Here'san(admittedlycontrived)exampletoillustratethepoint:
executefromSQLPlus
settimingon
createtabletest(
idnumber,
namevarchar2(128))
altertabletestaddconstraintpk_testprimarykey(id)
declare
cursorc_testis
select
object_id,
object_name
from
all_objects
begin
forr_testinc_testloop
insertintotest
(id,name)
values
(r_test.object_id,lower(r_Test.object_name))
endloop
commit
end
.
/
truncatetabletest
declare
begin
insertintotest(id,name)
select
object_id,
lower(object_name)
from
all_objects
commit
end
.
/
Youmayneedalargernumberofrowstoseeanappreciabledifferencebetweenthetwomethods.The
advantageoftheSQLbasedtechniquebecomesapparentwithgreaterdatavolumes.
HerearesometipsthatmayhelpinconvertingproceduralcodetoSQL:
1.ReplaceproceduralcontrolstatementswithSQLdecodeorcase:Oftenaproceduralstatement
such as if....then....else can be replaced by a single SQL statement that uses the decode
function or case statement. For simple conditionals the SQL option will almost always
outperform procedural code. The decode function is useful when your branching criterion is
based on discrete values for example "Male" or "Female". The case statement would be
preferred if the criterion is based on a continuous range eg: any criterion that uses > or <.
Here'sanexampleofthetwoinaction:
DECODE:Classifybygender
select
ename,decode(gender,'M','Male','F','Female','Unknown')
from
employees
CASE:isthecurrentdayaweekendorweekday?
select
casewhento_char(sysdate,'D')>'1'andto_char(sysdate,'D')<'6'then
'Weekday'
else
'Weekend'
endasday_type
from
dual
2.Pivoting:AcommonrequirementofETLprocessesistopivotatableaboutaparticularcolumn,
i.e. to convert rows to columns based on values in a specific column. This is easily achieved
using the decode function. Consider the following example: we want to flatten out data in a
table called SALES based on the MONTH column (which contains numbers ranging from 1 to
12).TheSALEStablehasthefollowingcolumns:REGION,YEAR,MONTHandSALES_DOLLARS
(primary key REGION, YEAR and MONTH). The sales data is to be flattened out with each
month's dollars appearing in a separate column JAN_SALES, FEB_SALES .... DEC_SALES
appearingasaseparatecolumn.Here'showthiscanbedone:
select
region
year,
min(decode(month,1,amount))jan_sales,
min(decode(month,2,amount))feb_sales,
min(decode(month,3,amount))mar_sales,
min(decode(month,4,amount))apr_sales,
min(decode(month,5,amount))may_sales,
min(decode(month,6,amount))jun_sales,
min(decode(month,7,amount))jul_sales,
min(decode(month,8,amount))aug_sales,
min(decode(month,9,amount))sep_sales,
min(decode(month,10,amount))oct_sales,
min(decode(month,11,amount))nov_sales,
min(decode(month,12,amount))dec_sales
from
sales
groupby
region,
year
The grouping is required to collapse twelve records for the region / year into a single record.
Notethatyoucanuseanygroupingfunction(AVG,MAX,SUM,forexample)ifallkeycolumns
other than MONTH are to be preserved in the flattened table. If this isn't so, you will have to
useSUMasyouwillbesummingoverrecordsforafixedmonth.Ineithercaseyoushouldbe
sureyouunderstandwhatyou'regetting!
3."Unpivoting": This is the converse of the above. Here we want to move data from several
columnsintoasinglecolumn,basedonsomecriterion.SaywehaveatablecalledFLAT_SALES
with columns REGION, YEAR, JAN_SALES,....DEC_SALES (produced by the above query). An
ETL process requires us to get the data in normalised format, i.e. columns REGION, YEAR,
MONTH,SALES_DOLLARS,asintheoriginalSALEStable.Here'saquerythatwillachievethis:
select
t1.regionregion,
t1.yearyear,
t2.rmonth,
decode(t2.r,
1,jan_sales,
2,feb_sales,
3,mar_sales,
4,apr_sales,
5,may_sales,
6,jun_sales,
7,jul_sales,
8,aug_sales,
9,sep_sales,
10,oct_sales,
11,nov_sales,
12,dec_sales)sales_dollars
from
flat_salest1,
(select
rownumr
from
all_objects
where
rownum<=12)t2
Thetablet2servestoproduce12rowsforeachoriginalrowviaacartesianproduct(nojoins).
The query uses a decode on rownum to pick out the sales value for each month from the
flattened table. Note that we could have used any table or view with 12 or more rows in the
definition of t2. I chose ALL_OBJECTS as this view is generally available to all schemas, and
containsalargenumberofrows.
4.Using builtin functions: One can achieve fairly complex data transformations using Oracle's
characterSQLfunctions.Thenicethingaboutthesefunctionsisthattheycanbeembeddedin
SQL, so you get the full advantage of setbased operations. The usual suspects include:
LENGTH, LOWER, LPAD, REPLACE, RPAD, SUBSTR, UPPER, (L)(R)TRIM. Some "lesser known"
functionsIhavefoundusefulinmyETLeffortsare:
CHRReturnsthecharacterassociatedwiththespecifiedASCIIcode.Thisisusefulwhen
you want to remove nonprintable characters such as carriage return (CR) and linefeed
(LF)frominputstrings.YouwoulddothisusingCHRintandemwithREPLACElikeso:
REPLACE(input_string,CHR(13)||CHR(10)), 13 and 10 being the ASCII codes forCRand
LFrespectively.
INITCAPThiscapitalizesthefirstletterineachwordoftheinputstring.Everyotherletter
islowercased.
INSTRReturnsthelocationofaspecifiedstringwithintheinputstring.
Allthesefunctionshavearangeofinvocationoptions.ChecktheOracleSQLdocumentationfor
fulldetails.
5.Analytics: Since version 8i, Oracle has introduced analytical extensions to SQL. These enable
one to do a range of procedural operations using SQL (yes, you read that right procedural
operations using SQL). Some operations possible using analytical SQL include: subgrouping
(different levels of grouping within a statement), ranking and other statistics over subsets of
data,comparing values in different rows, and presenting summary and detailed data using a
singleSQLstatement.CheckoutmyarticleonanalyticSQLtofindoutmore.
TomKyte'ssiteisagreatresourcefortipsonconvertingproceduralcodetoSQLtrysearchingonthe
keywords"ETL"and"analyticfunctions"forastart.
OK,afterextollingthevirtuesofusingSQLwehavetoadmitthattherearesituationsinwhich
proceduralcodebecomesunavoidable.Insuchcasesyoucanstilltakeadvantageofsetbased
processingbyusingbulkbindingwithinyourproceduralcode.SectionsoftheOracledocumentationon
bulkbindingareavailablehereandhere(freeregistrationrequired).Anintroductoryarticleonbulk
bindingisavailablehere(dbasupport.com).
Tip3:TRUNCATEtableswhendeletingalldata
Deletingalldatafromatableisbestdoneusingthetruncatestatement.Atruncaterunsfasterthana
deletebecauseitsimplyresetsthetable'shighwatermarktozero.Truncateisaveryquickoperation.
Indexesarealsotruncatedalongwiththetable.AtruncateisDDLandthereforecannotberolledback
besureyouconsiderthisbeforeusingtruncate.Notethattablesreferencedbyenabledforeignkey
constraintscannotbetruncatedunlessthekeysaredisabledfirst.Here'sablockofcodethatuses
truncatetocleanouttheSALEStable(whichisreferencedbyenabledforeignkeyconstraints):
declare
cursorc_referenced_byis
select
t1.constraint_nameconstraint_name,
t1.table_nametable_name
from
user_constraintst1,
user_constraintst2
where
t1.constraint_type='R'
and
t1.r_constraint_name=t2.constraint_name
and
t2.table_name='SALES'
begin
forr_referenced_byinc_referenced_byloop
executeimmediate
'altertable'||r_referenced_by.table_name||'disableconstraint'
||r_referenced_by.constraint_name
endloop
executeimmediate
'truncatetablesales'
end
Tip4:Usedirectpathinsertsforloadingdata
Adirectpathinsertoffersaquickwaytoloadlargevolumesofdataintoatable.Itdoesthisby
bypassingthebuffercacheandwritingdirectlytodatafiles.Adirectpathinsertisdoneusingtheinsert
into...select..idiom,togetherwithanappendhint.Theappendhintcausestherowstoinsertedabove
thetable'shighwatermarkanyfreespacebelowthehighwatermarkisnotused.Hereisthesyntax
foradirectpathinsertintoSALESfromSALES_STAGE:
insert/*+append*/into
sales
(region,
year,
month,
sales_dollars)
select
region,
year,
month,
sales_dollars
from
sales_stage
Adirectpathinsertbyitselfisfasterthanaconventionalinsert.Justhowmuchmileageonegets
dependsonthevolumeofdataloaded.Theperformanceofdirectpathinsertscanbefurtherenhanced
bydoingthefollowingbeforetheload:
1.Bypassingconstraintchecksbydisablingallconstraints.Remembertoreenabletheseafterthe
loadisdone.
2.Suppressingredogenerationbyputtingthetableinnologgingmode.Ifyour table has a large
number of nonunique indexes, you might also consider setting these to unusable state. This
will suppress redo generation associated with index maintenance. The indexes can then be
rebuiltaftertheload.Notethatuniqueindexesshouldnotbesetunusablebecausetheloadwill
fail with an ORA26026: unique index initially in unusable state error. Unique indexes can,
however,bedroppedbeforeandrecreatedaftertheload.Thisis,ineffect,whathappenswhen
the primary key constraint is disabled (as discussed in the previous point) the unique index
associatedwiththeprimarykeyisdropped.
Animportantconsequenceofsuppressingredoisthattheoperationisunrecoverable.Besureto
coordinatesuchoperationswithyourDBAsothatshecanscheduleabackupofthedatabaseor
relevanttablespacesaftertheload.
Here'sablockofcodethatbypassesconstraintschecksandsuppressesredogenerationfortheabove
insert:
declare
cursorc_constraintsis
select
table_name,
constraint_name
from
user_constraints
where
table_name='SALES'
cursorc_indexesis
select
index_name
from
user_indexes
where
table_name='SALES'
and
uniqueness<>'UNIQUE'
begin
forr_constraintsinc_constraintsloop
executeimmediate
'altertable'||r_constraints.table_name||'disableconstraint'
||r_constraints.constraint_name
endloop
optionaltruncatetocleanoutthetable
disablereferencingconstraints,ifneeded(seetip3)
executeimmediate
'truncatetablesales'
forr_indexesinc_indexesloop
executeimmediate
'alterindex'||r_indexes.index_name||'unusable'
endloop
executeimmediate
'altersessionsetskip_unusable_indexes=true'
executeimmediate
'altertablesalesnologging'
insert/*+append*/into
sales
(region,
year,
month,
sales_dollars)
select
region,
year,
month,
sales_dollars
from
sales_stage
executeimmediate
'altersessionsetskip_unusable_indexes=false'
forr_indexesinc_indexesloop
executeimmediate
'alterindex'||r_indexes.index_name||'rebuildnologging'
endloop
executeimmediate'altertablesaleslogging'
forr_constraintsinc_constraintsloop
executeimmediate
'altertable'||r_constraints.table_name||'enableconstraint'
||r_constraints.constraint_name
endloop
commit
end
Forsmalldatavolumes,theoverheadofdisablingconstraintsandindexes,andmakingthetable
nologgingwillswampthebenefitsgained.Ingeneral,thelargertheloadthegreaterthebenefitofthe
foregoingactions.Asalways,benchmarkbeforeimplementationinyourloads.
TheabovecodeusesalotofdynamicSQLsoitcanbeheavyondatabaseresources.Theuseof
dynamicSQLisunavoidablebecauseweneedtoperformtheoperationswithinamodule(PL/SQL
block,procedureorpackage).However,thisshouldnotcauseperformanceproblemsforbusiness
usersbecauseETLbatchprocessesnormallyrunduringoffpeakhours.
Tip5:UsetheMERGEcommandforupserts
Acommonrequirementistheneedtoperforman"upsert"i.e.:updatearowifitexists,insertitifit
doesnt.InOracle(versions9iandbetter)thiscanbedoneinonestepusingtheMERGEstatement.
Here'sanexamplethatusestheEMPtableofSCOTTschema.TheexampleusesatableEMP_STAGE
thatholdsupdatedandnewrecordsthataretobeupsertedintoEMP.Youneedtobeloggedintothe
SCOTTschema(oranyotherschemawithacopyofSCOTT.EMP)inordertoruntheexample:
createEMP_STAGE
createtableemp_stageasselect*fromempwhere1=2
insertupdaterecordsinEMP_STAGE
insertintoemp_stagevalues
(7369,'SMITH','CLERK',7902,to_date('17121980','ddmmyyyy'),1800,NULL,20)
insertintoemp_stagevalues
(7499,'ALLEN','SALESMAN',7698,to_date('2021981','ddmmyyyy'),2200,300,30)
insertintoemp_stagevalues
(7521,'WARD','SALESMAN',7698,to_date('2221981','ddmmyyyy'),1250,500,30)
insertintoemp_stagevalues
(7839,'KING','PRESIDENT',NULL,to_date('17111981','ddmmyyyy'),9500,NULL,10)
insertintoemp_stagevalues
(7782,'CLARK','MANAGER',7839,to_date('961981','ddmmyyyy'),8500,NULL,10)
insertnewrecordsinEMP_STAGE
insertintoemp_stagevalues
(7940,'WEBSTER','DBA',7782,to_date('2311985','ddmmyyyy'),7000,NULL,10)
insertintoemp_stagevalues
(7945,'HAMILL','DEVELOPER',7782,to_date('2151985','ddmmyyyy'),6000,NULL,10)
insertintoemp_stagevalues
(7950,'PINCHON','ANALYST',7782,to_date('20101985','ddmmyyyy'),6000,NULL,10)
commit
MERGErecordsintoEMP
mergeinto
empe
using
emp_stagees
on
(e.empno=es.empno)
whenmatchedthen
update
set
e.sal=es.sal
whennotmatchedthen
insert
(e.empno,
e.ename,
e.job,
e.mgr,
e.hiredate,
e.sal,
e.comm,
e.deptno)
values
(es.empno,
es.ename,
es.job,
es.mgr,
es.hiredate,
es.sal,
es.comm,
es.deptno)
commit
Threerowsareinsertedandfiveexistingrowsareupdatedbytheabove.
InOracle10gonecanalsoaddaconditionalclausestotheinsertandupdateportionsofthemerge
statement.Forexample:
MERGErecordsintoEMP,exceptforDEVELOPERS
mergeinto
empe
using
emp_stagees
on
(e.empno=es.empno)
whenmatchedthen
update
set
e.sal=es.sal
whennotmatchedthen
insert
(e.empno,
e.ename,
e.job,
e.mgr,
e.hiredate,
e.sal,
e.comm,
e.deptno)
values
(es.empno,
es.ename,
es.job,
es.mgr,
es.hiredate,
es.sal,
es.comm,
es.deptno)
where
es.job<>'DEVELOPER'
Inthiscasetherecordforemployee7945isnotinserted.
Theconditionalclausecanbeintheupdateportionaswell.In10gitisalsopossibletodeleterows
fromthedestinationtablebasedonconditionalcriteria.Checkthedocumentationfordetails.
Tip6:UseheterogeneousservicestoaccessdatainnonOraclerelationaldatabases
DatabaselinksareoftenusedtotransfersmallvolumesofdatabetweenOracledatabases.Itisless
wellknownthatdatabaselinkscanalsobesetupbetweenOracleandnonOracledatabases.Thisisa
usefulfeature,asETLprocessesoftenneedtoaccessandtransferdatatoOraclefromthirdparty
databasessuchasMSSQLServer.Thestandardwaytodothisisbyexportingdatafromthenon
Oracledatabasetoaflatfile,andthenimportingthedataintoOracleviaSQLLoaderorexternal
tables.OracleHeterogeneousServicesprovidesasinglestepoptiontoachievethetransfer.
HeterogeneousServicescomeintwoflavours:
1.Generic Heterogeneous Services: This option, which is bundled with the Oracle Server uses
ODBCtoconnecttothenonOracledatabase.AtutorialonaccessingSQLServerusingGeneric
HeterogeneousServicesisavailablehere.
2.TransparentGateways:Theseareextracostoptionsthatareoptimisedforspecific databases.
They offer better performance compared to the Generic option because they are designed to
exploitoptimisationsandfeaturesspecifictoparticulardatabases.Themechanicsofsettingup
Transparent Gateways is quite similar to the Generic option. Check the documentation that
comes with the specific Transparent Gateway for further details. The documentation is hidden
awayintherelevantgatewayinstallationdirectory.Forexample,thedocumentationfortheMS
SQL Server gateway sits in ORACLE_HOME/tg4msql.A tutorial on accessing SQL Server using
TransparentGatewaysisavailablehere.
Awarningondatatransfertechniquesthatusesdatabaselinks:theefficiencyofthetransferdepends
onyournetworkbandwidthandthequantityofdatatobetransferred.Ihaveusedit(withgreat
success)totransfersmalltomoderatequantitiesofdata(~100000rows,rowsize~1000bytes)over
colocatedmachineswithincorporatenetworks.
ClosingRemarks
ETLprocessespresentatechnicalchallengeastheyentailcomplextransformationsandloadsoflarge
quantitiesofdatawithinevershrinkingtimewindows.Asdevelopersweneedtouseallthetricksin
thebooktospeedupourETLprocedures.InthisarticleI'veoutlinedsomeofthetechniquesthatI
haveused,withsuccess,inseveralprojects.Ihopeyoufindthemusefulinyourwork.
Backtothetop
This page last modified on: 02/14/2016 10:15:27
Send us your comments.
Copyright: Kailash Awati & Arati Apte, 2000-2015.