Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
37534
Student Guide
Trademarks
The product or products described in this book are licensed Novell and SUSE are registered trademarks of Novell, Inc.,
products of Teradata Corporation or its affiliates. in the United States and other countries.
Oracle, Java, and Solaris are registered trademarks of
Teradata, Applications-Within, Aster, BYNET, Claraview, Oracle and/or its affiliates.
DecisionCast, Gridscale, MyCommerce, QueryGrid, SQL- QLogic and SANbox are trademarks or registered
MapReduce, Teradata Decision Experts, "Teradata Labs" trademarks of QLogic Corporation.
logo, Teradata ServiceConnect, Teradata Source Experts, Quantum and the Quantum logo are trademarks of
WebAnalyst, and Xkoto are trademarks or registered Quantum Corporation, registered in the U.S.A. and other
trademarks of Teradata Corporation or its affiliates in the countries.
United States and other countries. Red Hat is a trademark of Red Hat, Inc., registered in the
Adaptec and SCSISelect are trademarks or registered U.S. and other countries. Used under license.
trademarks of Adaptec, Inc. SAP is the trademark or registered trademark of SAP AG
Amazon Web Services, AWS, [any other AWS Marks used in Germany and in several other countries.
in such materials] are trademarks of Amazon.com, Inc. or SAS and SAS/C are trademarks or registered trademarks of
its affiliates in the United States and/or other countries. SAS Institute Inc.
AMD Opteron and Opteron are trademarks of Advanced SPARC is a registered trademark of SPARC International,
Micro Devices, Inc. Inc.
Apache, Apache Avro, Apache Hadoop, Apache Hive, Symantec, NetBackup, and VERITAS are trademarks or
Hadoop, and the yellow elephant logo are either registered registered trademarks of Symantec Corporation or its
trademarks or trademarks of the Apache Software affiliates in the United States and other countries.
Foundation in the United States and/or other countries. Unicode is a registered trademark of Unicode, Inc. in the
Apple, Mac, and OS X all are registered trademarks of United States and other countries.
Apple Inc. UNIX is a registered trademark of The Open Group in the
Axeda is a registered trademark of Axeda Corporation. United States and other countries.
Axeda Agents, Axeda Applications, Axeda Policy Other product and company names mentioned herein may
Manager, Axeda Enterprise, Axeda Access, Axeda be the trademarks of their respective owners.
Software Management, Axeda Service, Axeda
ServiceLink, and Firewall-Friendly are trademarks and The information contained in this document is provided on
Maximum Results and Maximum Support are an "as-is" basis, without warranty of any kind, either
servicemarks of Axeda Corporation. express or implied, including the implied warranties of
CENTOS is a trademark of Red Hat, Inc., registered in the merchantability, fitness for a particular purpose, or
U.S. and other countries. non-infringement. Some jurisdictions do not allow the
Cloudera, CDH, [any other Cloudera Marks used in such exclusion of implied warranties, so the above exclusion
materials] are trademarks or registered trademarks of may not apply to you. In no event will Teradata
Cloudera Inc. in the United States, and in jurisdictions Corporation be liable for any indirect, direct, special,
throughout the world. incidental, or consequential damages, including lost profits
Data Domain, EMC, PowerPath, SRDF, and Symmetrix or lost savings, even if expressly advised of the possibility
are registered trademarks of EMC Corporation. of such damages.
GoldenGate is a trademark of Oracle.
Hewlett-Packard and HP are registered trademarks of The information contained in this document may contain
Hewlett-Packard Company. references or cross-references to features, functions,
Hortonworks, the Hortonworks logo and other products, or services that are not announced or available in
Hortonworks trademarks are trademarks of Hortonworks your country. Such references do not imply that Teradata
Inc. in the United States and other countries. Corporation intends to announce such features, functions,
Intel, Pentium, and XEON are registered trademarks of products, or services in your country. Please consult your
Intel Corporation. local Teradata Corporation representative for those
IBM, CICS, RACF, Tivoli, and z/OS are registered features, functions, products, or services available in your
trademarks of International Business Machines country.
Corporation.
Linux is a registered trademark of Linus Torvalds. Information contained in this document may contain
LSI is a registered trademark of LSI Corporation. technical inaccuracies or typographical errors. Information
Microsoft, Active Directory, Windows, Windows NT, and may be changed or updated without notice. Teradata
Windows Server are registered trademarks of Microsoft Corporation may also make improvements or changes in
Corporation in the United States and other countries. the products or services described in this information at
NetVault is a trademark or registered trademark of Dell any time without notice.
Inc. in the United States and/or other countries.
Combining the strengths of both companies, our customers will see integrated offerings of SAS software
and Teradata.
We‘re combining the industry‘s best analytics with the industry leader in data warehousing to:
• Deliver value to the business users with improved and extended use of analytics,
• While improving the return on IT investment.
Ultimately, our customers will experience broader use and better performance.
M M
Modeling Modeling Model
Scoring Translation
Modeling
ADS
Modeling Scoring
ADS ADS
Analytical Scoring
Data Data
Preparation Preparation
Teradata Data
Warehouse
Model Development Model Deployment Model Development Model Deployment
The “Traditional Analytic Environment” illustrates a common process and architecture used by many
businesses to develop and deploy analytic technology. In this environment, data is extracted from a
variety of sources ranging from enterprise data warehouses to data marts across multiple lines of
business. This data is aggregated, transformed and integrated into a development analytic data set. This
is typically a large flat data structure, such as a flat file, containing hundreds of variables where each
row represents an observation. This data is used to build analytic models within a SAS environment.
Once the model is developed, tested and validated, it is then exported into the scoring environment
which is typically based on production or operational data. For scoring purposes, the data is again
extracted and prepared based on model requirements into the “Scoring ADS” (also sometimes referred
as the score table).This table typically has 10 to 20 variables but may also contain millions of records to
be scored. Scoring is done on the scoring server.
The “In-Database Analytic Environment,” offered through the SAS and Teradata Analytic Advantage
program, leverages the Teradata database for data processing and scoring and the SAS analytic platform
for model development. The Teradata EDW provides a single environment for both the development
environment and production or operational data. Optionally, a Teradata appliance can be used as a
separate development environment. External data can be loaded into an analytic sandbox that provides a
development environment that is logically segregated from the production environment in order to
preserve the integrity of the production level data, while allowing for the flexibility to load untested data
for development. The EDW data is explored, aggregated, transformed and derived to create the
development ADS without incurring unnecessary data movement. Once the data is prepared, a sample is
extracted to SAS Enterprise Miner for analytic modeling. SAS Enterprise Miner provides the breadth
and depth of analytic techniques for additional exploration, model specific transformation, analytic
modeling and testing required to complete the development process. Once completed, the model is
exported to the SAS Scoring Accelerator Publishing agent. The SAS Scoring Accelerator for Teradata
Combining the strengths of both companies, our customers will see integrated offerings of SAS software
and Teradata.
We‘re combining the industry‘s best analytics with the industry leader in data warehousing to:
• Deliver value to the business users with improved and extended use of analytics,
• While improving the return on IT investment.
Ultimately, our customers will experience broader use and better performance.
Teradata
Teradata SQL
• There are a number of components within the framework. Starting at the bottom there’s the data
integration layer…then the analytics component…the reporting component…and on top solutions
that will make a real difference to your business.
• SAS Data Integration forms a solid data foundation with the capability for accessing enterprise data
access across systems and platforms. It provides integrated data quality, which is critical to providing
accurate, consistent information; and an interactive, visual data integration development environment
that enables collaboration and easy reusability across your organization, all with a single point of IT
administration.
• SAS Analytics provides an integrated environment for predictive and descriptive modeling, data
mining, text analytics, forecasting, optimization, simulation, experimental design and more.
The functionality of SAS is built around the four data-driven tasks common to virtually any application:
data access, data management, data analysis, and data presentation.
The first bullet, graphical user interface, can reference Enterprise Guide or Management Console.
The bulleted items were pulled from the Base SAS fact sheet:
http://www.sas.com/technologies/bi/appdev/base/factsheet.pdf
DATA steps are typically used to create SAS tables. The DATA
step provides a powerful and fast 4GL data management
programming language.
Raw
Data
DATA SAS PROC Report
Step table Step
SAS
table SAS procedure (PROC) steps are typically used to process SAS
tables (that is, generate reports and graphs, manage data, and
sort data). SAS procedures encapsulate distinct business analysis
approaches
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
Let me briefly explain what each step is doing on the next three slides. I want you to have an
understanding of what the step is accomplishing, we aren’t discussing the syntax. This DATA step …
On this INFILE statement, we are referring to the raw data file NEWEMPS.CSV. How you refer to this
raw data file is dependent on your operating environment. In your course notes on page 2-4, you can see
how you refer to the file if you are using our classroom computers.
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
Variable
First_Name Last_Name Job_Title Salary
names
Satyakam Denny Sales Rep. II 26780
Monica Kletschkus Sales Rep. IV 30890 Variable
Kevin Lyon Sales Rep. I 26955 values
Petrea Soltau Sales Rep. II 27440
store
-365 0 366
display
A SAS date value is stored as the number of days between January 1, 1960, and a specific date.
A character A numeric
missing missing
value is value is
displayed as displayed as
a blank. a period.
SAS provides a variety of library engines to provide access various types of data (SAS data, PC
file formats, database files, ….) with the SAS data library concept.
Files
Libref
When a SAS session starts, SAS automatically creates one temporary and at least one permanent SAS
data library that you can access.
UNIX
libname orion '/users/userid';
z/OS (OS/390)
libname orion 'userid.workshop.sasdata';
For UNIX, the actual program uses a period between quotes to refer to the default location. For z/OS,
the userid is not used in the actual programs.
orion
SAS/ACCESS interfaces are out-of-the-box solutions that provide enterprise data access and integration
between SAS and third-party databases.
Data are stored in tables which are accessed using SQL (Structured Query Language). Teradata
SQL is ANSI compliant and there are extensions which enables further data manipulation.
Database
Technology
scales in all
dimensions
unit of parallelism
Node
PE – Parser Engine. This is the interfaces that talks to the client on one side, via the BYNET, and the
AMPs on the other side. The PE decomposes SQL statements into steps and returns the resultant sets to
the client application.
Parsing Engine is a component that interprets SQL requests and sends the request (along with the input
records and data) to the AMPs through BYNET technology.
The Parsing Engine interprets the SQL command and converts the data record from the host into an
AMP message.
...
Node
The Message Passing Layer handles the internal communication of the Teradata RDBMS. It is a
combination of the Teradata PDE (Parallel Database Extensions) software, the BYNET software, and
the BYNET interconnect itself.
BYNET – is a networking layer. It can either be software (cheaper) or hardware and software (more
expensive). The BYNET does more than just networking it actually sorts the data that is passed from the
AMPs. Multi-node environments have more than one BYNET. This aids in redundancy and makes the
environment more robust.
Node
AMPs or Access Module Process is the virtual processor responsible for reading and writing data. It's
the heart, soul and work horse of Teradata. AMPs use a BYNET technology to receive messages. AMP's
control database management functions such as sorting, performing aggregations, and formatting of the
data.
AMP – is a vproc that controls access to the disk subsystem. In other words, it controls access to a
subset of the data. Each table in Teradata is spread out amongst all the AMPs. This is where Teradata
gets its parallelism.
• AMP (Access Module Process) is the unit of parallelism in Teradata. AMPs are designed to operate
on only one portion of the database so they must operate in parallel to accomplish their intended
results
• AMPs do all of the physical work associated with generating an answer set including, sorting,
aggregating, formatting and converting
The AMP formats the row and writes it to its associated disks.
Node
...
A Node is a computer that is made up of hardware and software that contains CPU's, system disk,
memory and adapters and runs a copy of the Operating System and the Teradata software.
NODE – A node is a computer that participates as part of a Teradata Server. Adding nodes to an
environment increases the performance of the environment.
Each table in Teradata is spread out amongst all the AMPs. This is where Teradata gets its parallelism.
• A NoPI table does not have a primary index. The chief purpose of NoPI tables is to enhance the
performance of data loading operations.
• When a table has no primary index, its rows can be dispatched to any given AMP arbitrarily, so
the system can load data into a staging table faster and more efficiently.
Teradata database is a collection of tables, views, macros, triggers, stored procedures, join
indexes, hash indexes, UDFs, space limits and access rights and used for administration and
security (comparable to schema in other systems).
Teradata user is a collection of tables, views, macros, triggers, stored procedures, join indexes,
hash indexes, UDFs, and access rights. A User represents a logon point within the hierarchy and
Access Rights apply only to Users. Further, Users are granted rights to access other
database(s).
Other database(s) containing views and macros, which in turn are granted rights to access the corporate
production tables
A view looks like a table, but has no data of its own, and therefore takes up no storage space
except for its definition.
Views are used to simplify query requests, to limit access to data, and to allow different users to
look at the same data from different perspectives
A view is a window that accesses selected portions of a database. Views can show parts of one
table (single-table view), more than one table (multi-table view), or a combination of tables and
other views.
Gaining real-time access to derived information in the virtual data mart layers from the normalized
and physical EDW layer.
User Analytical
Star Security and exploitation
Schemas
Sandbox Base Tables layer (views)
> virtual Data Mart layer
Normalized EDW physical
table layer
Staging and temporary
physical tables layer
Perm Space is a is the maximum amount of storage assigned to a user or database for
holding table rows, stored procedures, UDFs, and permanent journals.
Spool Space is work space acquired automatically by the system and used for work space
and answer sets for intermediate and final results of Teradata SQL statements
Temporary Space is temporary space acquired automatically by the system when user’s
make use of and materialize Teradata temporary tables.
Other database(s) containing views and macros, which in turn are granted rights to access the corporate
production tables
Teradata
Teradata SQL
SAS Process
With SAS 9.2 a uniquely extended SAS/ACCESS interface Proc Freq
• Extended support of the latest Data=TD.credit_data;
table state*credit;
Run;
Export- and Load Utilities (inkl.TPT) SAS/Access to Teradata
Uses SQL SELECT, UPDATE, DELETE statements, and leverages Teradata bulk load and export
interfaces
Note the SAS STRIP() function gets mapped to Teradata TRIM() by PROC SQL IP and
SAS/ACCESS.
SAS Company Confidential – the information contained herein must not be revealed to third parties.
Certain SAS DI transformations which are conducive for in-database processing are evaluated for
further Teradata integration (e.g. data quality functions)
%stpbegin; 2
proc freq data=TDLib.Table;
...; run;
proc SQL;
select * from TDLib.Table
Ad-hoc query and ..; quit;
analysis Proc req data=TDLib.Table;
• …..
The relational database and SAS environments use slightly different terms to describe very similar
concepts.
Throughout this course and during your day-to-day work activities, you will likely hear some of these
used interchangeably. It will be useful to keep these comparisons in mind as they relate to your work
activities.
Teradata SAS
A name must start with a letter unless enclosed A name must start with a letter or underscore (_). A
in double quotation marks. name cannot be enclosed in double quotation marks.
A name must be from 1 to 30 chars. long. A name must be from 1 to 32 chars. long.
A name can contain the letters A through Z, the A name can contain the letters A through Z, the digits
digits 0 through 9, the underscore (_), $, and #. 0 through 9, and the underscore (_).
A name, even when enclosed in double A name is not case sensitive;
quotation marks, is not case sensitive; e.g. CUSTOMER is the same as customer.
A name cannot be a Teradata reserved word A name can be words such as COMMIT or SELECT,
such as COMMIT or SELECT. because SAS does not have reserved words.
The name must be unique between objects; a A name does not need to be unique between object
view and table in the same database cannot types, with the exception of a data table and view in
have the same name. the same SAS data library.
Teradata SAS
CHAR(n), VARCHAR(n),
LONGCHAR
DATE, TIME(n), TIMESTAMP(n) Numeric
• SAS users can have SAS generate the SQL (Implicit SQL Pass-Thru) via a SAS PROC SQL
statement
• SAS users can leverage all the power of Teradata by writing their own Teradata SQL (Explicit
SQL Pass-Thru)
• Once you have the power to access and process data from the Teradata database, the
workflow processes that you used prior to accessing Teradata should be carefully
reconsidered for optimal performance
• In order to fully use the power of Teradata think about how you currently perform your workflow
processes and consider the alternative suggestions provided in the following chapters.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-1
Objectives
• Review the purpose of the LIBNAME statement.
• Submit a SAS/ACCESS LIBNAME statement to the Teradata DBMS.
• Discuss SAS/ACCESS LIBNAME statement options for Teradata.
• Define SQL implicit pass-through.
• Use the SASTRACE option to determine the SQL commands that are being passed from
SAS to Teradata.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-2
Module 2 – Querying Teradata Using SAS LIBNAME
and Implicit SQL Pass-Through
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-3
Module 2 – Querying Teradata Using SAS LIBNAME
and Implicit SQL Pass-Through
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-4
The LIBNAME Statement (Review)
The LIBNAME statement does the following:
• establishes a library reference, or libref, which acts
as an alias, or nickname, to a collection of data sets (SAS data library).
• references data sets by a two-level name. The first level is the libref, and the second
level is the data set name.
• removes operating-system-specific references in the program code.
• enables data sets to be read and update.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-5
The LIBNAME Statement (Review)
You can use the LIBNAME statement to assign a libref to a SAS data library.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-6
Assigning a Libref (Review)
When you refer to the SAS file in your program, you use the two-level name:
libref.filename
Physical Data
Storage Location libref
SAS Data
Set A proc print data=mydata.a;
run;
SAS
Data
proc print data=mydata.b;
Set B
run;
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-7
The SAS/ACCESS LIBNAME Statement
The SAS/ACCESS LIBNAME statement does the following:
• establishes a libref, which acts as an alias, or nickname, to Teradata
• permits a Teradata DBMS table to be referenced by a two-level name, allowing the
Teradata table to be read as easily as a SAS data set
• enables the Teradata table to be updated if the proper authority already exists
• allows the use of the SAS/ACCESS LIBNAME statement options to specify how
Teradata objects are processed by SAS
• enables you to customize how to connect to Teradata.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-8
SAS/ACCESS LIBNAME Statement
General form of the SAS/ACCESS LIBNAME statement:
When you submit a SAS/ACCESS LIBNAME statement, a connection is made between a libref in SAS
and the database.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-9
The SAS/ACCESS LIBNAME Statement
The DBMS table is referenced using a two-level name, enabling the DBMS table to be read
as easily as a SAS data set.
libref.DBMS-Table-name
Teradata
Database libref
Teradata
Table A
proc print data=mytera.a;
where Gender='M';
run;
Teradata proc print data=mytera.b;
Table B
where lastname='Smith';
run;
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-10
SAS/ACCESS LIBNAME Statement to Teradata
Connection Information for Teradata.
Database Engine
Teradata Server
libname teralib teradata
server=tera5500
user=edutest Teradata User ID
pw=edutest1
database=saseduc; Teradata
Password
Teradata Database
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-11
LIBNAME Statement Connection Options
To avoid the usage of User and Password-credentials explicitly stated within the
SAS program, beginning with SAS 9.2, you may use instead the AUTHDOMAIN-
Libname option.
With this new option used, the appropriate Teradata credentials get automatically
cached from the SAS Metadata User‘s account information at runtime.
Thus specification of credentials in program code can be avoided.
Using the AUTHDOMAIN= option you can retrieve USER= and PASSWORD= information from an
authentication domain stored in your SAS Metadata Server. To the engine, it appears that the USER=
and PASSWORD= options were specified on the LIBNAME statement.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-12
Reading Teradata Tables into SAS
Default Teradata Data Types conversion into SAS
Default SAS data type and SAS formats assigned to Teradata data types.
Teradata Data Type Default SAS Data Type – SAS Format
CHAR(n ) Character – $n (n<= 32,767)
CHAR(n ) Character – $32767.(n>32,767) 1
VARCHAR(n ) Character – $n (n<= 32,767)
VARCHAR(n ) Character – $32767.(n> 32,767) 1
LONG VARCHAR(n ) Character – $32767. 1
BYTE(n ) Character – $HEXn. (n<= 32,767)
BYTE(n )1 Character – $HEX32767.(n> 32,767)
VARBYTE(n ) Character – $HEXn. (n<= 32,767)
VARBYTE(n ) Character – $HEX32767.(n> 32,767)
INTEGER Numeric – 11.0
SMALLINT Numeric – 6.0
BYTEINT Numeric – 4.0
DECIMAL(n, m )2 Numeric – (n+2 ).(m )
FLOAT Numeric – none
DATE3 Numeric – DATE9.
TIME(n)4 Numeric – for n=0, TIME8. for n>0, TIME9+n.n
TIMESTAMP(n)4 Numeric – for n=0, DATETIME19. for n>0,
DATETIME20+n.n
TRIM(LEADING FROM c)
LEFT(c)
CHARACTER_LENGTH(TRIM(TRAILING FROM c)
LENGTH(c)
(v MOD d)
MOD(v,d)
TRIMN(c)
TRIM(TRAILING FROM c)
1. When reading Teradata data into SAS, DBMS columns that exceed 32,767 bytes are truncated. The
maximum size for a SAS character column is 32,767 bytes.
2. If the DECIMAL number is extremely large, SAS can lose precision. For details, see the topic
"Numeric Data".
3. See the topic "Date/Time Data" for how SAS/ACCESS handles dates that are outside the valid SAS
date range.
4. TIME and TIMESTAMP are supported for Teradata Version 2, Release 3 and later. The TIME with
TIMEZONE, TIMESTAMP with TIMEZONE, and INTERVAL types are presented as SAS
character strings, and thus are harder to use.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-13
Module 2 – Querying Teradata Using SAS Libname
and Implicit SQL Pass-Through
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-14
Using the SAS/ACCESS Libname Engine
The SAS/ACCESS engine writes SQL code on the user's behalf from this PROC PRINT step
and DATA step that is passed implicitly to Teradata.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-15
Examining SQL Implicit Pass-Through Code
Behind the scenes, the SAS/ACCESS engine writes SQL code that is passed implicitly to
Teradata, causing as much work to be done in the database as possible.
By default, there is no indication regarding the success or failure of the SAS/ACCESS
engine to generate SQL code that is passed to Teradata.
To determine the success or failure of implicit pass-through for a query, you must examine
the SQL that the SAS/ACCESS engine submits to the Teradata by using the SASTRACE=
SAS system option.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-16
The SASTRACE= SAS System Option
SASTRACE=',,,d' specifies that all SQL statements sent to the
DBMS are sent to the log.
',,,s' specifies that a summary of timing
information for calls made to the DBMS is
sent to the log.
SASTRACELOC= prints SASTRACE information to a specific
STDOUT| SASLOG| location.
FILE 'path-and- STDOUT – writes trace messages to the
filename' default output location for your operating
environment.
SASLOG – writes trace information to the
SAS log window.
FILE 'path-and-filename' – writes trace
information to a file.
NOSTSUFFIX limits the amount of information displayed
in the log.
In general the SQL Implicit Pass Through mechanism is a silent optimization. In part it is because it
cannot be guaranteed. If the optimization succeeds in passing a query (or parts of a query) directly to a
DBMS it does not indicate that it was successful. If it fails to pass a query through to a DBMS, the
query will be processing in SAS through use of the standard SAS engine interfaces. There is normally
no indication surfaced to the user for Implicit Passthrough failures or successes.
To determine the success or failure of Implicit Passthrough for a query one must examine the SQL that
the engine submits to the database. The primary mechanism for showing what is actually passed to an
underlying DBMS by a SAS/Access engine is to use the SASTRACE= SAS system option. Use of this
option can cause all of the SQL or internal API call information that a SAS/Access engine passes to the
underlying DBMS to be displayed in the SASLOG output. . It will also cause any DBMS return
codes/messages that are returned from the execution to be shown in the SASLOG output as well.
To enable this level of tracing, specify the following options in your SAS program code:
Option SASTRACE=off;
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-17
The Fullstimer SAS System Option (Optional)
To print additional resource utilization information in the SAS log, specify the following
option:
FULLSTIMER tracks usage of additional resources. This option is ignored unless
STIMER or MEMRPT is in effect. It can also be specified by the alias FULLSTATS
MSGLEVEL=I tracks SAS index usage information.
OPTIONS FULLSTIMER
MSGLEVEL=I ;
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-18
SQL Implicit Pass-Through Code
The SASTRACE= option shows the SQL statements that were sent to Teradata. In this case,
the SAS/ACCESS engine writes SQL code on the user's behalf from this PROC PRINT step
that is passed implicitly to Teradata.
at02a01
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-19
SASTRACE Messages in the Log
Fullstimer resource
utilization info
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-20
Examining the Data Extract Size
While SAS by default extracts all the rows and columns from a table before executing a
SAS program step
SAS does not extract
– all columns where the variables for selection and analysis are explicitly specified
– All rows where the number of rows are constrained by the use of a WHERE clause
or other constraining options.
SAS Process
SAS/Access to Teradata
Teradata Client
Teradata
Size ?
Teradata SQL
While SAS, by default extracts all the rows and columns from a table before executing a PROC, SAS
does not extract all columns where the variables for analysis are explicitly specified using, for example,
a WHERE, VAR, TABLES, or MODEL statement in the PROC.
Likewise, the number of rows extracted can be constrained by the use of a WHERE clause in a PROC
that: (a) SAS will recognize, and (b) send to Teradata in the SQL request that it generates. Recall from
Module 5, that the WHERE clause cannot contain functions that are not recognized by SAS/ACCESS to
Teradata.
Also, the use of an OBS= data set option with SAS v9 and higher will force SQL to be generated and
passed with the Teradata SAMPLE clause.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-21
Data Extract Size – Limiting Columns
SAS does not extract all columns where the variables for analysis are explicitly specified.
• Generically this can be achieved in any SAS program step making use of Data Set
Options to a Teradata Table reference
• Furthermore, SAS procedures offer specific statements specifying columns to be
analyzed like VAR, TABLES, or MODEL statement – and many more.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-22
Data Extract Size – Limiting Columns
The following data set options can be used to select columns to read or write to
tables.
at02d09
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-23
Data Extract Size – Limiting Rows
SAS does not extract all rows where the number of rows are constrained
• by the use of a WHERE clause in SAS Procedure Steps or Data Steps.
• or other constraining options: for example making use of Data Set Options
to a Teradata Table reference like obs=# rows
Likewise, the number of rows extracted can be constrained by the use of a WHERE clause in a PROC
that: (a) SAS will recognize, and (b) send to Teradata in the SQL request that it generates. Recall from
Module 5, that the WHERE clause cannot contain functions that are not recognized by SAS/ACCESS to
Teradata.
Also, the use of an OBS= data set option with SAS v9 and higher will force SQL to be generated and
passed with the Teradata SAMPLE clause.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-24
Data Extract Size – Limiting Rows
The following data set options can be used to select columns to read or write to
tables.
Also, the use of an OBS= data set option with SAS v9 and higher will force SQL to be generated and
passed with the Teradata SAMPLE clause.
56 data _null_;
57 set TDOrion.order_fact (keep=Customer_ID Quantity
58 firstobs=50 obs=100);
59 run;
NOTE: There were 51 observations read from the data set TDORION.order_fact.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-25
Selected SAS/ACCESS Data Set Options
SAS users can rename column reference names temporarily for the time of a
selected task.
RENAME= enables you to rename
(old-col-name=new- columns in output data
col-name) sets.
at02d07b
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-26
Querying Teradata Tables Using
SAS Procedures and DATA Step
Programs
This demonstration illustrates accessing Teradata
tables using a LIBNAME statement access Teradata
tables.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-27
Exercise
This exercise reinforces the concepts discussed
previously.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-28
Module 2 – Querying Teradata Using SAS Libname
and Implicit SQL Pass-Through
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-29
SQL Implicit Pass-Through
• The use of the SAS/ACCESS LIBNAME statement not only enables
communication between Teradata and SAS
• SAS via the SAS/ACCESS engine, determines the SQL query that is implicitly
passed to Teradata on the user’s behalf and convert the user’s SAS code to
Teradata-specific SQL.
The purpose of Implicit Pass-Through is to have SAS through the SAS/ACCESS engine construct the
SQL in such a way that as much work as possible is performed in Teradata.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-30
Using PROC SQL Implicit Pass-Through
A PROC SQL query may be considered for implicit
pass-through in any of the following cases:
• The referenced tables in the query all use the same SAS/ACCESS engine.
• The query contains the SELECT DISTINCT keyword.
• The query contains an SQL aggregate function.
• The query uses a SAS language function(s) that is mapped to a DBMS function.
Selected functions:
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-31
Using PROC SQL Implicit Pass-Through
• The query contains a GROUP BY clause.
• The query contains a HAVING clause.
• The query performs an SQL join.
• The query contains an ORDER BY clause.
• The query involves a SET operation other than OUTER UNION.
• The query with a WHERE clause contains a subquery.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-32
SQL Implicit Pass-Through
Implicit PassThrough (IP) is the result of a collaborative effort between PROC SQL and
SAS/ACCESS
Trigger for
implicit pass-
thru
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-33
SQL Implicit Pass-Through
Implicit PassThrough (IP) is the result of a collaborative effort between PROC SQL and
SAS/ACCESS
Trigger for
implicit pass-
thru
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-34
Rules for SQL Implicit Pass-Through
There are rules / code elements that disqualify the use of implicit pass-through.
• If you have multiple SAS libraries then the LIBNAME statement connect options must
match(USER=, PASSWORD=, ACCOUNT= and SERVER=)
Multiple Teradata Libnames with different database= Values Are OK
• Data set options used in PROC SQL
• Mixing with CONNECTION TO statements (explicit pass-through)
• ANSI MISS/NOMISS outer joins / NULL VALUE HANDLING
• Unmapped SAS functions
• One or more truncated comparisons
• Create VIEW statements
• Order BY-Differences
Remerging ?
INTO clause ?
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-35
Validating SQL Implicit Pass-Through
Using the SAS PROC SQL NOEXEC option in conjunction with tracing options
enables validation of the generated passed-through SQL code without/before
execution.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-36
Governing SQL Implicit Pass-Through
While the SAS implicit pass-through implementation attempts to push down as
much SQL as possible, some customers might want to control the workload
issued on to their database system.
SAS provides system-, libname- and procedure options to control the amount
and type of SQL being pushed to the database
• Examples are DIRECT_EXE, DIRECT_SQL, IPASSTHRU,
DBIDIRECTEXEC or SQLREDUCEPUT
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-37
Governing SQL Implicit Pass-Through
The SQL procedure option IPASSTHRU | NOIPASSTHRU controls the
enablement of SQL pass-thru to a database.
• If it is enabled per default.
GOOD:
COUNT(*)
function passed
to DBMS!
When SAS performs the count it must read the entire contents of a variable in order to count the rows.
In this example, COMPLAINTS_13_24_MTHS_CNT just happens to be the first column in the table. If
it contains NULL values then SAS may return a different count.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-38
Governing SQL Implicit Pass-Through
When the NOIPASSTHRU option is set, SQL pass-through is disabled.
BAD: COUNT(*)
function NOT
passed to
DBMS!
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-39
Providing Data to End-Users Using Views
Like database systems, SAS provides access to data via the concept of view
(PROC SQL and DATA Step Views)
While views are commonly used in Teradata system, SAS views might add
additional benefit
– Through hiding the libname connection options as part of the view
– Enrich a database table/view with SAS options like labels, formats etc.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-40
What is a PROC SQL View?
A PROC SQL view
• is a stored query
• contains no actual data
• can be derived from one or more tables or views
• extracts underlying data each time that it is used, and accesses the most
current data
• can be referenced in SAS programs in the same way as a data table
• cannot have the same name as a data table stored in the same SAS library.
Views are sometimes referred to as virtual tables because they are referenced in SAS programs in the
same manner as actual data tables, but they are not physical data tables. They contain no actual data but
instead store the instructions required to retrieve and present the data to which they refer.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-41
Creating a PROC SQL View
General form of the CREATE VIEW statement:
PROC SQL;
CREATE VIEW view-name AS
SELECT column-1, column-2,…column-n
FROM table-1<,table-n>
…;
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-42
Creating a PROC SQL View
When you submit the code to create the view, you get a message in the log that the view has been
created. At this point, nothing is getting passed implicitly to Teradata.
Caution:
When you use this method to create a view, the LIBNAME statement must always be in effect during
your SAS session if you want to execute the PROC SQL view to retrieve the rows of data. The library
reference is hard-coded in the FROM clause in the view definition.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-43
Using a PROC SQL View
You can reference the PROC SQL view the same way that you reference a SAS data set.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-44
SASTRACE Information
Partial SAS Log
When the FREQ procedure code is executed, the instructions to retrieve the data are also executed. The
SAS/ACCESS engine passed the SQL statements to Teradata to retrieve the rows. The work was done
on the Teradata side and the rows were then returned to SAS to be displayed.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-45
Creating Views with Embedded LIBNAME Statements
You can create a PROC SQL view that embeds the LIBNAME statement with a
USING clause. The embedded LIBNAME statement has the following
characteristics:
• is defined in a USING clause within a PROC SQL view
• is assigned when the view begins executing
• can contain connection information
• uses the LIBNAME engine to pass joins to Teradata
• can store label, format, and alias information
• is de-assigned when the view stops executing
IG Note: The last bullet is a BIG point. When you issue the LIBNAME statement with the USING
clause it is de-assigned. Unlike the libref when it is assigned to Teradata outside of the PROC SQL step.
When a LIBNAME statement is executed in the SAS session, it stays assigned until the end of the SAS
session, thus staying connected to the DBMS.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-46
Creating a PROC SQL View with the
Embedded LIBNAME Statement
General form of the CREATE VIEW statement with the USING statement:
PROC SQL;
CREATE VIEW view-name AS
SELECT column-list
FROM Teradata-table-name
USING LIBNAME-statement;
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-47
Embedded LIBNAME Statements
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-48
Embedded LIBNAME Statements
Use the view in your program.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-49
SASTRACE Messages in the Log
The SASTRACE messages show that the view instructions were passed directly to Teradata
and selection happened on the original table.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-50
Dynamic SAS Programs – SAS Macro Language
The SAS Macro Facility
Using the macro language, you can write SAS programs that are dynamic, or
capable of self-modification.
Specifically, the macro language enables you to
• create and resolve macro variables anywhere in a SAS program
• write special programs (macros) that generate tailored SAS code.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-51
SAS Macro Language (Example)
Recall the work.CUST_CITY view we created
Generated Teradata
SQL
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-52
Creating Dynamic SAS Views (optional)
Frequently you want to dynamically pass part of the view query at runtime as a parameter.
You can use SAS Macro language parameters and the SYMGET function to resolve the SAS
macro variable at runtime.
Note: SQL does not perform
automatic data conversion.
You must use the INPUT
function to convert the macro
variable value to numeric if it is
compared to a numeric
variable.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-53
Creating Dynamic SAS Views (optional)
Continuing from the city_age example
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-54
Using SAS PROC SQL and the
SAS/ACCESS Libname Engine
for Teradata
This demonstration illustrates using SAS PROC SQL
for querying Teradata tables.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-55
Exercise
This exercise reinforces the concepts discussed
previously.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-56
Module 2 – Querying Teradata Using SAS Libname
and Implicit SQL Pass-Through
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-57
Options for Optimizing the Teradata Query
A series of SAS options enables further optimization of queries passed to Teradata
Usage of SAS specific code elements in where-clauses and SQL statements often breaks
the pass-through in the first place. A lot of these case can be addressed by
– enabling mapping of functions
– further optimization of where-clause
push-down making use of
reduce-put facility SAS Process
Teradata Client
Constants ?
Teradata
Functions mapped
? Teradata SQL
Reduce Put?
While SAS, by default extracts all the rows and columns from a table before executing a PROC, SAS
does not extract all columns where the variables for analysis are explicitly specified using, for example,
a WHERE, VAR, TABLES, or MODEL statement in the PROC.
Likewise, the number of rows extracted can be constrained by the use of a WHERE clause in a PROC
that: (a) SAS will recognize, and (b) send to Teradata in the SQL request that it generates. Recall from
Module 5, that the WHERE clause cannot contain functions that are not recognized by SAS/ACCESS to
Teradata.
Also, the use of an OBS= data set option with SAS v9 and higher will force SQL to be generated and
passed with the Teradata SAMPLE clause.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-58
Implicit Pass-Through – Temporal functions
To use a non-default temporal SAS format when reading Teradata tables or to
prevent date type mismatches, use the SASDATEFMT= option in these
circumstances:
during input operations to convert DBMS date values to the correct SAS
DATE, TIME, or DATETIME values
during output operations to convert SAS DATE, TIME, or DATETIME values to
the correct DBMS date values.
SASDATEFMT= changes the Teradata date values to a
(date-column="SAS-date- SAS date or datetime format.
format")
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-59
Implicit Pass-Through – Temporal functions
While temporal values in SAS are stored as numeric, it is possible to query (and calculate)
using its numeric values in SAS. SAS saves dates as count of days since 1st January 1960.
For example, 22nd February 1995 becomes
‘1995-02-22’ - '1960-01-01’ = 12836
For ambiguity reasons, it is recommended to use temporal literals in SAS which are
transformed into the correct database equivalent values:
DATE: ’22Fed1995’d;
TIME: ‘18:30:55.12345’t;
DATETIME: '22Fed1995:18:30:55.12345'dt;
Writing
This example creates a Teradata table and assigns the SAS TIME8. format to the TRXTIME0 column.
Teradata creates the TRXTIME0 column as the equivalent Teradata data type, TIME(0), with the value
of 12:30:55.
libname mylib teradata user=testuser password=testpass; data mylib.trxtimes; format trxtime0 time8.;
trxtime0 = '12:30:55't; run;
This example creates a Teradata column that specifies very precise time values. The format TIME(5) is
specified for the TRXTIME5 column.
Reading
When SAS reads this TIME(5) column, it assigns the equivalent SAS format TIME14.5.
libname mylib teradata user=testuser password=testpass; proc sql noerrorstop; connect to teradata
(user=testuser password=testpass); execute (create table trxtimes (trxtime5 time(5) )) by teradata;
execute (commit) by teradata; execute (insert into trxtimes values (cast('12:12:12' as time(5)) )) by
teradata; execute (commit) by teradata; quit;
/* You can print the value that is read with SAS/ACCESS. */
proc print data =mylib.trxtimes; run;
SAS might not preserve more than four digits of fractional precision for Teradata TIMESTAMP.
This next example creates a Teradata table and specifies a simple timestamp column with no digits of
precision. Teradata stores the value 2000-01-01 00:00:00. SAS assigns the default format
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-60
DATETIME19. to the TRSTAMP0 column generating the corresponding SAS value
of 01JAN2000:00:00:00.
This example creates a Teradata table and assigns the SAS format DATETIME23.3 to
the TSTAMP3 column, generating the value 13APR1961:12:30:55.123. Teradata
creates the TSTAMP3 column as the equivalent data type TIMESTAMP(3) with the
value 1961-04-13 12:30:55.123.
This next example illustrates how the SAS engine passes the literal value for
TIMESTAMP in a WHERE statement to Teradata for processing. Note that the value
is passed without being rounded or truncated so that Teradata can handle the rounding
or truncation during processing. This example would also work in a DATA step.
proc sql ; select * from trlib.flytime where col1 = '22Aug1995 12:30:00.557'dt ; quit;
Implicit Pass-Through – Temporal functions
Example: Accounts opened on 22nd Feb 1995
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-61
Implicit Pass-Through – Date/Time Functions
Options enable resolving of SAS DATE, TIME, DATETIME and TODAY fucntion calls into
constant values before the SQL Push-down code generation.
• Avoids potential conflicts with function resolution
• Alternative system option SQLCONSTDATETIME (enabled Default);
GOOD: Where
Clause Passed to
Teradata!
Replacing “put(today(),date9.)” on the right side of the equal sign has the potential to harm query
performance as there is a good possibility that the function call will prevent the statement from being
passed to the DBMS.
Interaction: If both the CONSTDATETIME option and the REDUCEPUT= option are specified,
PROC SQL replaces the DATE, TIME, DATETIME, and TODAY functions with their respective values
in order to determine the PUT function value before the query executes.
Tip: Alternatively, you can set the SQLCONSTDATETIME system option. If specified, the PROC
SQL CONSTDATETIME option takes precedence over the SQLCONSTDATETIME system option.
Notice that the WHERE clause is not being passed to the database. This means that the entire contents
of the DBMS table is being passed to SAS. SAS will go through all the data and test the where
condition. For large database tables this is very inefficient.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-62
Implicit Pass-Through – Function Mapping
The SAS/ACCESS SQL generation engine maps specific functions used in the WHERE
clause in any procedure/DATA step or used in PROC SQL programs into their database
equivalent.
In SAS 9.1.3, this list of SAS functions automatically is static as the functions are
compiled into SAS/ACCESS for Teradata:
With SAS 9.2 this default list has been enhanced, further the list is no longer static and can
be customized (see section in chapter 5)
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-63
Implicit Pass-Through – Function Mapping
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-64
Implicit Pass-Through – SAS Formats
SAS formats and formatting-functions are commonly used by SAS user’s, however
especially when used in where-clause disqualify the SQL pass-through in the first place
This can be addressed with the Reduce-PUT optimization or by leveraging SAS formats in
Teradata (s 4.3)
Note – No WHERE
clause pushed to
Teradata!
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-65
SAS by Example – SAS Formats
A SAS FORMAT can be described as an internal rule for mapping data values to
label values/formatted values.
SAS provides standard formats (currencies, date and time, …), but users can
define custom formats using the FORMAT procedure.
Using formats in SAS programs
• Provides a flexible way to dynamically apply different labeling-rules of data
values
• Reduces storage space
• Leverages a fast Lookup-technique and avoids joins or merges for catching
lookup values.
• Enables analyzing aggregated values by applying formats to detail values
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-66
SAS by Example – SAS Formats
Assigning Temporary Formats
p111d06
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-67
Implicit Pass-Through – SAS Formats
The Reduce PUT optimization resolves the PUT function before generating the database-
Query.
• options SQLREDUCEPUT=ALL | NONE | DMBS (default)
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-68
Differences in Query Behavior
It is imperative to understand the differences in how SAS and Teradata store
and process data.
Behavioral differences between how SAS and Teradata store and process data
are found in these three areas:
• NULL values and MISSING values
• Physical ordering of data, or lack thereof
• Native data types support
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-69
SAS Missing and Teradata NULL Values
Teradata (like other RDBMS) and SAS possess different mechanisms for
indicating the absence of a data value.
• In SAS data absence is represented using the MISSING value (. for numeric
and ‘ ‘ for character data), which is conceptually but not exactly analogous to a
relational database NULL value (‘NULL’ value).
• SAS/ACCESS APIs translates a SAS MISSING value into a relational database
NULL when inserting or updating a database table and database NULL will
conversely be translated into a SAS MISSING value when a database table is
queried from within SAS.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-70
SAS Missing and Teradata NULL Values
Some differences in processing missing or NULL values are
• Within SAS missing values are treated as valid values, hence even joins for key column
value matches of missing values is a valid operation
A B
1 vvv 1 aaa
1 aaa vvv 1 aaa vvv
www bbb . bbb www 3 ccc yyy
xxx 3 ccc . bbb xxx 3 ccc zzz
3 yyy 3 ccc yyy
3 zzz 3 ccc zzz
Within the database ‘ ‘ values are valid values, so when SAS passes on SQL to TD to look
for these values it changes the syntax to look for NULLs as well. This is demonstrated on the
next slide with the teralib.employee_pay table where there are “ “ in the last_name column
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-71
SAS Missing and Teradata NULL Values
Same Output
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-72
Table Order and Sorting in SAS and Teradata
The row order in stored tables can be described as
SAS tables are stored in the order the observations are written. By use of the
sort procedures, stored data sets can have a specific sort order.
Teradata tables are not internally ordered, and cannot be internally ordered by
use of any utility. The order of rows is determined at query time.
Sorting data is a resource-intensive operation and should only be done when you
need the data in a specific order.
Efficient sorts can maximize the performance of jobs.
While it is usually more efficient to pass the sort to the relational database it will
depend on a specific use case if leveraging the database sort is appropriate.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-73
Table Order and Sorting – Teradata Sort
A database sort is automatically initiated through the SAS/ACCESS interface in the
following situations
If the BY statement in the any DATA step (or any PROC step) or the ORDERBY clause
in PROC SQL is specified, an ORDERBY clause for that variable will automatically
generated.
PROC SORT will also pass the sort to the database using the ORDERBY clause unless
the SORTPGM option is set to something other than BEST.
The ORDERBY clause will cause a sort of the data to occur in Teradata before the data set
or PROC step uses the data in SAS.
Note, sorting by columns that are included in database indexes can be much faster than
sorting by columns that are not sorted. Therefore, if some of the columns to be sorted
by are indexed and others are not, sort first by the indexed columns.
Sort stability, meaning that the ordering of the observations in the BY statement is exactly the same
every time the sort is run, is not guaranteed when you query data stored in a relational database.
Because the data in the relational database might not be static data, the same query issued at different
times might return the data in different order.
If you require sort stability of the data, sort on a unique key, or place your database data into a SAS data
set and then sort it.
Note: Do not use PROC SORT to sort data from SAS back to the relational database. Doing so has no
effect on the order of the data in the database and only impedes performance.
Also, sorting by columns that are included in database indexes can be much faster than sorting by
columns that are not sorted. Therefore, if some of the columns to be sorted by are indexed and others are
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-74
not, sort first by the indexed columns.
Table Order and Sorting – Teradata Sort
Example – PROC TABULATE procedure BY processing
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-75
Table Order and Sorting – Sort Stability
Some SAS analytics require that data set observations be retrieved in the same
order when a data set is read multiple times. An analysis delivers incorrect
results if observation order varies.
An example is the SAS/ETS time-series analysis.
A typical first step in time series is sorting source data on a temporal (for
example, date or datetime) column, creating a new SAS data set with old
dates first and new dates last.
Ensuring analyses can read and re-read the sorted data set with no BY
clause, because SAS retrieves data set observations in the order that they
were originally written.
How can this case be addressed with database data?
Another example is the data set ‘MODIFY’ statement. For a non-unique BY key, the MODIFY
statement updates the first row retrieved with a matching key. However, you cannot predict the first row
when a database retrieves multiple rows with the same key. Therefore, you will probably generate
erroneous results by applying the MODIFY to a database table with a non-unique BY key.
For a simple example of unpredictable ordering with a non-unique BY key and adding another BY
variable to ensure predictability, see “Using a BY Clause to Order Query Results” in the SAS/ACCESS
for Teradata online documentation:
(http://support.sas.com/91doc/getDoc/acreldb.hlp/a001399962.htm#a001399973).
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-76
Table Order and Sorting – Sort Stability
A static, repeatable order of rows for multiple executions of the same database
queries may not be guaranteed, depending on the sort method used
Sort stability is not guaranteed when you query data stored in a database
(explicit order by-query or implicit using a SAS BY-statement).
Further the data in the relational database might not be static data, due to
updates to the data within the database
If you require sort stability of the data, sort on a unique key, or place your
database data into a SAS data set and then sort it. Because the key is unique,
there are no rows with identical keys whose retrieval order could vary for each
read.
Database tables do not retrieve rows in the order in which they were originally written. Without ORDER
BY asserted in the database SQL, rows are returned in random order each time a table is read.
Sort stability, meaning that the ordering of the observations in the BY statement is exactly the same
every time the sort is run, is not guaranteed when you query data stored in a relational database.
Because the data in the relational database might not be static data, the same query issued at different
times might return the data in different order.
If you require sort stability of the data, sort on a unique key, or place your database data into a SAS data
set and then sort it.
Note: Do not use PROC SORT to sort data from SAS back to the relational database. Doing so has no
effect on the order of the data in the database and only impedes performance.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-77
Also, sorting by columns that are included in database indexes can be much faster
than sorting by columns that are not sorted. Therefore, if some of the columns to be
sorted by are indexed and others are not, sort first by the indexed columns.
Table Order and Sorting – Spooling (Optional)
Multiple Pass-Processing occurs when a SAS procedure requests-that data be
made available for multiple pass reading, in most case a static repeatable order of
rows is required.
In the context of tables residing in a database, to meet the data requirements for
multiple pass processing, SAS creates temporary spool files containing the data
extracts.
The SAS Option SPOOL=YES|NO controls the use of spooling
– NO – requires SAS/ACCESS to issue the identical SELECT statement to
Teradata twice.
– YES – spools all rows to a temporary SAS file on the first pass of the data. On
subsequent passes, SAS will read the row data from the spool file.
NOTE: When Two-Pass Processing occurs, disk space and resource requirements may increase.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-78
Making a SAS Copy of Teradata Data
There might be cases, where making a local temporary SAS copy of Teradata data
will be a good solution
When a static order of rows is required for repeated access in the same
analysis
When multi-pass processing occurs is required but shows to be inefficient.
When the same table or view has to be referenced repeatedly in a dynamic
environment, it may be preferable to create a static SAS copy of the data.
Note: SAS is very efficient in handling staging or temporary tables and for
memory- or compute-intensive processing.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-79
Making a SAS Copy of Teradata Data
Comparing Direct Access versus Copying Data
Request
Data
Request
Teradata
Table Data
Request
Data
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-80
Making a SAS Copy of Teradata Data
Comparing Direct Access versus Copying Data
Request
Data
Request
Teradata
Table Data
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-81
Making a SAS Copy of Teradata Data
Comparing Direct Access versus Copying Data
Data
SAS Data Set
Subset of
Teradata
table
Teradata Table
Data
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-82
Linguistic Order and Sorting (Optional)
Although there are recognized standards for collation, the way people look at
data in "sorted" order differs a lot.
• German collation is different from French, and a Danish one is again
different from both—just to name a few.
• Even within a language community, there can be subtle differences: a
German phone book sort is different from a dictionary sort, traditional
Spanish sort order is different from the modern one, and so on.
• Users of languages based on alphabetic writing systems that make a
distinction between upper- and lowercase letters, might want to sort
uppercase before lowercase or vice versa or do a case insensitive sort.
http://support.sas.com/resources/papers/linguistic_collation.pdf
Sorting is often called "alphabetization," though collation is not limited to ordering letters of an
alphabet. For non-alphabetic writing systems as used in Asian languages, collation can be either
phonetic or based on the number of pen strokes or simply on the position of the characters within an
encoding (for example, Japanese kanji are usually sorted in the order of their Shift-JIS codes).
Nevertheless, people are free to choose: For example, most Japanese customers expect the Shift-JIS
order instead of the UCA.
Invocation of linguistic collation with PROC SORT is quite simple. The only requirement is the
specification of LINGUISTIC as the value to the SORTSEQ procedure option:
Synonymously, one can specify SORTSEQ=UCA. This causes the SORT procedure to collate
linguistically, in accordance with the current system LOCALE setting. The collating sequence used is
the default provided by the ICU for the given locale. Options that modify the collating sequence can be
specified in parentheses following the LINGUISTIC or UCA keywords. Generally, it is not necessary to
specify option settings because the ICU associates option defaults with the various languages and
locales. PROC SORT currently allows only a subset of the ICU options to be specified. These options
include STRENGTH, CASE_FIRST, COLLATION, and NUMERIC_COLLATION. In addition, a
LOCALE option is available to instruct SORT to use a collating sequence that is associated with a
locale other than the current locale.
CLASS processing does not order or group data linguistically nor is it sensitive to an existing linguistic
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-83
collation sequence of a data set. CLASS processing can produce results that are
different from those obtained using BY processing because BY processing is now
sensitive to collating sequences.
For example, with the SUMMARY procedure, class processing is normally performed
by grouping formatted values of a class variable (or raw values, if the
GROUPINTERNAL option is specified). If a data set is sorted, the ORDER=DATA
option can be used to preserve the order in which class levels are output for the
NWAY type. However, if the data is sorted linguistically, classification boundaries are
still determined by a binary difference in the formatted (or unformatted) class variable
values. For example, if a case-insensitive linguistic collating sequence was used (that
is STRENGTH=2), changes in character case still denotes a new level in the NWAY
type.
Linguistic Order and Sorting (Optional)
To implement linguistic collation, SAS has adopted the International Components for
Unicode (ICU). The ICU and its implementation of the Unicode Collation Algorithm (UCA)
have become a de facto standard.
Linguistic Sorts within SAS
Alice Adam
John Alice
Adam Ethan
Ethan John
Zack Zack
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-84
Implicit Pass-Through – Considerations
To optimize the SAS implicit pass-thru behavior, be aware of the discussed initial
integration approaches:
How large is the extract data size?
How often will the data be accessed?
For what purpose is the data being accessed?
Which options can be used to optimize the approach?
…..
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-85
Using SAS Options for More
Specific Teradata Query Use
Cases
This demonstration illustrates how to use SAS options
to optimize Teradata query behavior from SAS.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-86
Exercise
This exercise reinforces the concepts discussed
previously.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-87
Copyright © 2009 by SAS Institute Inc. and Teradata Corporation. All Rights Reserved.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA
and other countries. Teradata and the names of products and services of Teradata Corporation are registered trademarks or trademarks
of Teradata Corporation in the USA and other countries. ® indicates USA registration. Other brand and product names are registered
trademarks or trademarks of their respective companies.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-88
Rules for SQL Implicit Pass-Through
SQL function mapping is only one of many factors that influence whether SQL is passed
down to the DBMS…
• “SAS-isms” – SAS SQL extensions not supported by DBMS
> Will cause SAS to retrieve all rows from the DBMS
> For example the “?” (CONTAINS) condition:
proc sql;
select count(*) from cdr_usage where usage_type ? “W”;
quit;
TERADATA: trforc: COMMIT WORK
ERROR: Teradata prepare: Syntax error, expected
something like an 'IN' keyword between the word
'USAGE_TYPE' and the 'contains' keyword. SQL
statement was: select COUNT(*) from "cdr_usage"
where "cdr_usage"."USAGE_TYPE" contains 'W'.
Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-89
Module 3
Querying Teradata Using SAS Explicit SQL
Pass-Through
Further and detailed documentation is available (e.g. Documentation for SAS/ACCESS 9.2)
Teradata-specific SQL is passed by SAS directly to the Teradata system through the SAS/ACCESS
engine.
As opposed to Implicit SQL Pass-Thru, SAS does not examine, parse, translate, or manipulate this SQL
prior to passing it on to Teradata for execution.
Native
Foundation SAS Query
Request
SAS/ACCESS to Teradata
Connection
Execute
proc sql... Native
Engine
RESULTS Query
Engine
Teradata
Data
SELECT select-expression
FROM CONNECTION TO TERADATA | alias
(Teradata-query)
AS alias2 (col-name, col-name,...);
Note: The query plan needs to be interpreted and this is not straightforward but nevertheless it
gives useful information.
Using the macro language, you can write SAS programs that are dynamic, or capable of self-
modification.
Note: The same SQL statement might perform differently in each based on previous
considerations but no functionality is inhibited by choice of mode.
All transactions in ANSI mode are considered explicit, that is, they require an
explicit COMMIT command to complete
When using the SAS/ACCESS Interface to Teradata and running in ANSI mode,
any SQL request that modifies the database (SQL statements that create,
update, modify, and drop Teradata tables) must be issues in conjunction with an
explicit COMMIT statement.
proc sql;
connect to teradata (user=testuser pw=XXX
server=XYZ mode=teradata);
...
quit;
No commit
requests anymore
ANSI Mode must be used with caution as it can potentially effect DEADLOCK
collision in multi-user environment
It is a tag for transactions and sessions that can be used by Teradata for different
purposes e.g. to manage task priorities and track system usage
• can be used as a workload and security classification criteria
• enables all requests coming from a single logon to be classified into different
workloads
Note: Teradata Version 12 and later support Query Banding. SAS 9.2 M2
supports Teradata query banding options.
SAS further supports the implicit usage of Teradata’s FastExport-utility for mass-
extractions.
– This is most suitable for multi-million row extracts
– Availability may depend on System configuration, your Teradata systems
configuration and usage restrictions.
With SAS 9.2, the Teradata Parallel Transporter Utilities can be used for Fast-Exporting Data
Using the FASTEXPORT=YES Data Set or Library Option and the TPT=YES-Option (specifies
that the TPT API to read data from a Teradata table is being used)
NOTE: SAS/ACCESS figures out the number of partitions with Oracle and uses that number of threads
(connections).
NOTE: SAS/ACCESS Interface to Teradata is NOT available on UNIX with SAS V9.0. We are
expecting it to be available on UNIX with SAS
V9.1.
Doku: FASTEXPORT=YES specifies that the SAS/ACCESS engine uses the Teradata Parallel
Transporter (TPT) API to read data from a Teradata table
This option allows opening of Teradata connections in the specified mode. Connections that are opened
with MODE=TERADATA use Teradata mode rules for all SQL requests that are passed to the Teradata
DBMS. This impacts transaction behavior and can cause case insensitivity when processing data.
During data insertion, not only is each inserted row committed implicitly, but rollback is not possible
when the error limit is reached if you also specify ERRLIMIT=. Any update or delete that involves a
cursor does not work.
ANSI mode is recommended for all features that SAS/ACCESS supports, while Teradata mode is
recommended only for reading data from Teradata.
*SAS 9.4 M4
PROC FREQ
Use SQL to pre-aggregate data in Teradata
Leverage SAS_PUT In-Database formatting
Leverage PROC SQL Implicit Pass-through
PROC SUMMARY
Multi-dimensional aggregation
Complex aggregation options (MLF)
Shared summary subsystem
PROC RANK
Does not aggregate data – expands it
Can be done completely with SQL – no post processing
Can execute completely in Teradata
Globally controlled by SQLGENERATION option.
DBMS = Permission to generate SQL if the data source is a supported RDBMS. Do not report
incompatibilities without MSGLEVEL=I.
DBMUST = Eligible PROCs MUST generate SQL to prevent drawing all the rows out of the
database.
Local control at the libname level – same syntax.
MSGLEVEL = I allows PROCs to give information as telling why passthrough did occur.
Note: As long as USER=, PASSWORD=, ACCOUNT= and SERVER= information in SAS Teradata
Libnames with different databases (schemas) are the same, the join is passed to Teradata.
DBCOMMIT=n causes a
Teradata "checkpoint" after each
group of n rows is transmitted.
Provides transparent access to Teradata Teradata can optimize all table joins.
tables.
The DATA and PROC step syntax is Teradata-specific functions and utilities
unchanged. can be used.
Knowledge of Teradata-specific SQL is You can combine SAS features and
unnecessary because Teradata-specific features in your query.
the SAS/ACCESS engine can convert the
SAS code to Teradata-specific SQL and
send the query to be processed by
Teradata.
continued..
.
If you have multiple SAS libraries then the LIBNAME statement connect options must
match(ENGINE, USER=, PASSWORD=, ACCOUNT= and SERVER=)
data _null_;
set TDOrion.order_fact (DBSLICEPARM=(ALL));
Run;
TERADATA_1: Prepared: on connection 2
SELECT CAST("Customer_ID" AS FLOAT), ..
FROM sasorion."order_fact“
NOTE: SAS/ACCESS figures out the number of partitions with Oracle and uses that number of
threads (connections).
NOTE: SAS/ACCESS Interface to Teradata is NOT available on UNIX with SAS V9.0. We are
expecting it to be available on UNIX with SAS
V9.1.
/* Implicit Read Partitioning with DBSLICEPARM Libname or Data Set Option */
* DBSLICEPARM=ALL automatically invokes Teradata FASTEXPORT, if available;
* Otherwise Threaded Reads use an autopartitioning mechanism based on the MOD function.
Note: this requires specific column types (Byteint, Smallint, integer, date, decimal);
•Support for FASTEXPORT by SAS is not limited to SAS on Windows;
*Log
sasiotra/trautogn(): SAS-supplied sasaxsm access module not found on your system
sasiotra/trautogn(): Cannot FastExport. Reverting to MOD slicing.
Check FASTEXPORT in the environment's path (Win: Fexp.exe- + SasAxsm-paths to end of PATH);
NOTE: SAS/ACCESS figures out the number of partitions with Oracle and uses that number of
threads (connections).
NOTE: SAS/ACCESS Interface to Teradata is NOT available on UNIX with SAS V9.0. We are
expecting it to be available on UNIX with SAS V9.1.
Combined, these subsets add up to exactly the result set for your original single SQL statement.
NOTE: Rows with an EVEN value for INTCOL are retrieved by the first thread. Rows with an
ODD value for INTCOL are retrieved by the second thread. Distribution of rows across the two
threads is optimal if INTCOL has a 50/50 distribution of EVEN and ODD values.
Scope of usage
• THREADED_APPS : only thread support is for SAS thread enable applications
(procedures)
• ALL : enable threading on all read-only engine requests incl. DATA step reads and
uses implicitly Teradata fastexport if available.
• DBI: enable threading on all read-only engine requests incl. Data Step reads.
THREADS | MAXTHREADS – maximum number of threads supported by the engine
DEFAULT- (THREADED_APPS,2)
THREADED_APPS makes threaded SAS procedures eligible for threaded reads.
(THREADED_APPS,2) is the default value.
ALL attempts to “autopartition” the table and threaded reads are automatically attempted.
<, MAXTHREADS> specifies the maximum number of threads determined by an integral column of a
DBMS table.
DBI forces SAS/ACCESS to Teradata ONLY to generate partitioning WHERE clauses for you. ***
NOTE: If performance is slow on the DBMS side, it has nothing to do with SAS and questions
regarding this needs to be handled by the DBMS vendor. However, if pulling data from DBMS into SAS
is slow then using MAXTHREADS can improve performance. You may need to play with this value to
find an optimal value for your system/site…similar to using BUFFSIZE option.
NOTE: SAS/ACCESS Interface to Teradata is not supported on UNIX SAS V9.0 but is expected to be
with SAS V9.1.
NOTE: Default values for MAXTHREADS are ORACLE=2, TERADATA=2, DB2=3, ODBC=3 and
SYBASE=3
NOTE: DBSLICEPARM=DBI for Teradata essentially means "ALL, but using modulo, not
FastExporting".
The Teradata extract request transforms into three parallel query requests:
..
select * from salesdata where(mod(num,3))=0;
select * from salesdata where(mod(num,3))=1;
select * from salesdata where(mod(num,3))=2;
NOTE: THREADS does affect SAS/ACCESS. Here are some testing results to help with answering
questions when tested with SAS V9.0 as of June 26, 2002.
We have run tests cases.
When PreFetch is enabled, the first time you run your SAS job, SAS/ACCESS identifies and selects
statements with a high execution cost. SAS/ACCESS then stores (caches) the selected SQL statements
to one or more Teradata macros that it creates.
On subsequent runs of the job, when PreFetch is enabled, SAS/ACCESS extracts statements from the
cache and submits them to Teradata in advance. The rows selected by these SQL statements are
immediately available to SAS/ACCESS because Teradata 'prefetches' them. Your SAS job runs faster
because PreFetch reduces the wait for SQL statements with a high execution cost. However, PreFetch
improves elapsed time only on subsequent runs of a SAS job. During the first run, SAS/ACCESS only
creates the SQL cache and stores selected SQL statements; no prefetching is performed.
REMERGE works!
To satisfy some queries SAS must read data twice. This can be very time consuming. Setting
REMERGE allows SAS to execute queries which require a remerge. Keep in mind, this can be very
resource intensive and time consuming. If you are dealing with DBMS tables the remerge operation can
be very slow. You may want to turn it off by using NOREMERGE.
REMERGE|NOREMERGE
Specifies whether PROC SQL can process queries that use remerging of data. The remerge feature of
PROC SQL makes two passes through a table, using data in the second pass that was created in the first
pass, in order to complete a query. When the NOREMERGE system option is set, PROC SQL cannot
process remerging of data. If remerging is attempted when the NOREMERGE option is set, an error is
written to the SAS log.
Default:REMERGE
Tip: Alternatively, you can set the SQLREMERGE system option. The value that is specified in the
SQLREMERGE system option is in effect for all SQL procedure statements, unless the PROC SQL
REMERGE option is set. The value of the REMERGE option takes precedence over the
SQLREMERGE system option. The RESET statement can also be used to set or reset the REMERGE
option. However, changing the value of the REMERGE option does not change the value of the
SQLREMERGE system option. For more information, see the SQLREMERGE system option in the
SAS Language Reference: Dictionary.
Options SQLREMERGE=0;
proc sql NOREMERGE;
select region, account_num, account_balance,
(account_balance - avg(account_balance)) as diff_from_avg
from account
group by region;
quit; SQL Statements
requiring
remerging fail!
NOREMERGE is an option that is used primarily to protect you from DBMS queries which take a lot
of time to execute. Specifying NOREMERGE will cause the SQL statement to fail. You will see an error
in the log which states that the query requires remerging but cannot execute because the NOREMERGE
option is in effect. If you really want to execute this query you could set REMERGE.
If you are querying a DBMS you may want to figure-out why the query isn’t being passed to the
database.
Teradata Client
Process Flow
• SAS client submits a PROC FOO Step
• Again, the Proc generates a SQL view
• The view references sas_sscp() to compute the matrix inside the DBMS
• Proc FOO reads the matrix and completes the analysis
Proc PRINCOMP
Principal Components Analysis
Proc VARCLUS
Variable Clustering
Proc REG
Model Selection
Proc SCORE
Generates SQL code for the given model
Runs without any data extraction
•Example 100 cols and 5Mio rows > Transfer reduced from 500 Mio rows to 5000 rows
• -or-
PROC SORT DATA=samp_YourUserID OUT=Work.B;
BY Account_NBR Account_NBR_Modifier;
Run;
DATA LocalLib.MyMergedData;
MERGE TDLIB.Agreement_100K Work.B;
BY Account_NBR Account_NBR_Modifier;
RUN;
• Note: The second block of code is not only more efficient, it also eliminates
unnecessary code, and takes full advantage of Teradata’s massively parallel
processing engine to perform the sort of the Teradata table for the MERGE
Subsetting
by key-
values
Note: In this example no index was used, all rows where extracted
from the table. Be cautious using the DBINDEX-Option.
Note: The Teradata-Query behavior is the same in this Data Step-Lookup example,
however as there is no loop around the second set statement, each row of the SAS
tables is only combined with the first lookup value row. The Output-table has as many
rows as the SAS input table. Only appropriate with unique index lookup values.
Note: SAS constructs one or more IN clauses containing the unique key values. SAS
passes the IN clause to the database and retrieves the matching rows. If the
number of unique key values is greater than 4500, the processing is performed
in SAS. The IN clause is passed whether or not an index exists, and multiple IN
clauses can cause multiple full table scans if the join key is not indexed. Make
sure your join key is indexed on the database.
proc sql;
Select databasenamei, ABS(permspace) as abs_permspace
from td1.dbase
WHERE abs(permspace)>100000;
quit;
*Derived aggregates do not appear in the function map SQL Dictionary, they are built into
the PROC SQL IP processing.
The “SQL_FUNCTIONS” Libname options provide control over the function map (also called the
SQL Dictionary) and provide a means to make it extensible:
• SQL_FUNCTIONS=ALL Add optional functions
• SQL_FUNCTIONS_COPY= Copy function list to SAS data set
• SQL_FUNCTIONS=“EXTERNAL_APPEND=Lib.Tab” Add user-supplied functions
• SQL_FUNCTIONS=“EXTERNAL_REPLACE=Lib.Tab” Install custom map
Where normally the SAS function is used, with SQL_Functions=ALL * enabled, the
default behavior can be
SAS Function Name Teradata Function Name
overruled and additional function are TODAY CURRENT_DATE
mapped to their Teradata equivalent. DATE CURRENT_DATE
Note: In order to get the function to execute in Teradata and in this example get the server time, make sure to
disable DATE and TIME constant folding (resolving the SAS function to a constant before the pass-through).
When the NOCONSTDATETIME option is set, PROC SQL evaluates these functions in a query each time it
processes an observation.
Note – This is a SAS SQL function… only functions known to SAS SQL will be mapped
In this example, the scalar function for CHARACTER_LENGTH() is added to the SQL Dictionary:
proc sql;
create table WORK.tera_func_append as select * from WORK.tera_func_list;
delete from WORK.tera_func_append;
insert into WORK.tera_func_append
values("LENGTHC", 7, "CHARACTER_LENGTH", 16, " ", " ", 0, 0, " ", .);
quit;
In this example, the SQRT() function is replaced with a UDF in a specific database “MyDB” :
proc sql;
create table WORK.tera_func_append as select * from WORK.tera_func_list;
update WORK.tera_func_replace
set DBMSFUNCNAME = """MyDB"".""sqrt2""", DBMSFUNCNAMELEN=16
where SASFUNCNAME = "SQRT";
quit;
Creating a
custom
format
SAS formats are basically mapping functions. They change an element of data from one format to
another. For example, there are SAS formats to change numeric values to various currency formats and
date/time formats. Furthermore, it is possible for a SAS programmer to define a custom format. Let’s
make our credit score example more interesting by imagining the user wants to map customer states into
geographic regions (Northeast, Southeast, Central, Pacific, etc.) or map countries into regions such as
central and eastern Europe. This can be done by creating a custom SAS format that turns state
abbreviations like those found in our input table into region codes. Using the rules of SAS
programming, this format will be called $REGION and can be added to our PROC FREQ program with
one more line of SAS code:
SAS Freq is a very simple example, but not everything can be represented in SQL. There are items that
are dynamically generated or SAS functions may be required. For example: Given that the $REGION
format must be applied to the state column in every row of our input table, so we find a way to export
the definition of the $REGION format to Teradata and create a way for it to be used as part of an SQL
We need to teach the SAS engine to pass format references to Teradata. Remember
that we previously taught PROC FREQ to create a SAS SQL view that performs basic
data summarization. SAS SQL already processes SAS formats but does so using
syntax that is slightly different than the syntax we enabled in Teradata SQL:
We have the SAS engine convert the “put” with “sas_udf_put” so it recognizes that it
needs to call the SAS put.
This does take a bit longer in Teradata to apply the formatting function via UDF, but
still 11 seconds is must better than the 151 seconds that it takes in the traditional
method.
Adaptive Processing
BASE PROCs are sensitive to but not dependent on embedded formats.
If a format is not found in the database, raw value processing is substituted and
formatting is deferred until the results are returned to SAS.
In this way, customers can still create and use formats on the fly AND have in-
database processing.
Business critical, high value formats can be published to provide an extra lift.
A SAS FORMAT can be described as an internal rule for mapping data values to label
values/formatted values.
SAS provides standard formats (currencies, date and time, …), but users can define
custom formats using the FORMAT-Procedure.
%indtd_publish_formats(fmtcat=FMTLIB.formats, ACTION=REPLACE,
FMTTABLE=SASFormatsRegistryTable);
The SQLMAPPUTTO= option can be used to identify a specific database where the SAS_PUT()
function is located:
option sqlmapputto = (MyDB.sas_put) ;
option sql_ip_trace = source ;
proc sql;
select put(service, $1.) as service_prefix, count(*)
from td.intr_seed
group by service_prefix;
quit;
No SAS
observatio
n index
Creating
Table
Inserting
Sample data
Creating
Table
SAMPLE n - where n is a decimal value expressing the percentage of rows relative to the
total rows in the Teradata table being sampled between 0.00 and 1.00.
Note: Both the above specified select statements only function as part of the SAS
explicit SQL passthrough and are not supported by the Implicit SQL passthrough
proc sql;
connect to teradata (user=uid pw=xxxxxxxx server=server_name);
Select * from connection to teradata
(select column_name, sampleid
from DataBaseName.TableName
sample .25, .25, .50
order by sampleid);
disconnect from teradata;
quit;
proc sql;
connect to teradata (user=uid pw=xxxxxxxx server=server_name);
Select * from connection to teradata
(select column1, ...
from DataBaseName.TableName
sample with replacement
when <conditions> end
order by column1, ...);
disconnect from teradata;
quit;
In SAS and Teradata, simple random sampling is based upon pseudo-random number generation using
the uniform(0,1) distribution
proc sql;
connect to teradata (user=uid pw=xxxxxxxx server=server_name);
Select * from connection to teradata
(select column1, ...
from DataBaseName.TableName
sample with replacement randomized allocation
when <conditions> end
order by column1, ...);
disconnect from teradata;
quit;
proc sql;
connect to teradata (user=instructor pw=instructor server=barbera);
select * from connection to teradata
(SELECT COL1 ,COL2, STR_COL
FROM DataBaseName.TableName
SAMPLE WHEN STR_COL < x THEN (% or n)
WHEN COL2 BETWEEN x+1 AND y THEN (% or n)
WHEN COL2 > y THEN (% or n)
END
ORDER BY STR_COL;);
disconnect from teradata;
quit;
In SAS and Teradata, simple random sampling is based upon pseudo-random number generation using
the uniform(0,1) distribution
CHARACTER
• CHARACTER specifies a character data value.
• The length can be from 1 to 32,767 characters or bytes.
NUMERIC
• NUMERIC specifies a double-precision, floating-point binary number
• Capped at 8 bytes of storage.
• Dates are stored as numeric values and represent the number of days
between January 1, 1960, and the date value. For example, January 2, 1960,
is stored as 1.
1
When reading Teradata data into SAS, DBMS columns that exceed 32,767 bytes
are truncated. The maximum size for a SAS character column is 32,767 bytes.
3 The SAS range for dates is from A.D. 1582 through A.D. 20,000. If a date is out
of this range, SAS/ACCESS returns an error message and displays the date as
a missing value.
DATE9 DDMMMYYYY eg. 18MAR2000
4 TIME8 HH:MM:SS eg. 14:45:32
TIME9 H:MM:SS AM/PM e.g 2:45:32 PM
NUMBER
• Represents a numeric value with a maximum precision of 0-38. The scale. This
indicates the maximum number of digits allowed to the right of the decimal point.
• The NUMBER data type is not supported with SAS/ACCESS Interface to Teradata.
If used data truncation might occur, which causes a data integrity issue.
• The Teradata NUMBER data type has a higher level of precision than SAS® 9.2M2
supports. If you have to use the NUMBER data type columns in calculations, you
can use the CAST() function to change to the CHAR type. The CAST() function can
be placed in Teradata views.
The character data type in turn can be to a maximum of 32,767 characters or bytes.
1. If none of your BIGINT columns will ever contain a number that is greater than 15 digits, then you can set the
environment variable TRUNCATE_BIGINT=YES, which enables BIGINT support with truncation. When this
environment variable is set to YES, all of your BIGINT columns are truncated to 15 digits.
2. If none of your BIGINT columns are used for computations and are used only for character fields (such as ID
numbers), then you can use the DBSASTYPE= option to specify what data type should be used when data is
read into SAS. For BIGINT data in SAS, you typically use a character data type because SAS does not have a
data type that can show such large numbers.
HSBC • HSBC – Background to the move from DB2 to Teradata and the requirements/restrictions for
Phase 1, scale of migration, timescales etc.
Migration Issues:
BIGINT issues (read/write required)
• HSBC – overview: why BIGINT is required, maximum numbers involved and joining with character
data (SAS data sets) outside Teradata
• Using views and casting to DOUBLE
• Using views and casting to DECIMAL
• Issues with using a coding workaround
• Potential issues with large numbers of casts
• Potential issues with FASTLOAD
• VIEWTABLE issue
• Opening large Teradata tables
• SAS response and proposed next steps
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-1
Module 6 – Creating, Updating, and Loading Teradata
Tables from SAS
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-2
Module 6 – Creating, Updating, and Loading Teradata
Tables from SAS
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-3
Creating and Loading Teradata Tables from SAS
Writing Data to Teradata from SAS
There are several use cases in which a SAS user may
Want to create a new Teradata table, insert, or update an existing table.
Use cases examples
SAS Data Integration flows (ETLT/ELT processing)
SAS Power Users or Analyst using the Teradata Datalab concept as a development area
SAS Applications creating tables as part or their inherent or optional tuning capabilities
Note: A Teradata user must have sufficient access permissions assigned to his/her Teradata
user ID to perform these types of table manipulations in the target Teradata database.
NoPI is going to be a great feature for sites which have two general categories of use: 1) ELT, and 2)
SandBoxes or User/App created Tables. Both of these have been mentioned in previously in this chain. I
would expect that in both cases, the default an organization will be to use NoPI.
For me, this is a defensive feature which prevents runtime breakage. It prevents our DBA team from
having to deal with exceptions and for our development staff to deal with a class of boundary conditions
which have no real solution on TD. Runtime exceptions are the most expensive to deal with from a staff
perspective, particularly when your batch runs in the middle of the night...
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-4
Teradata Tables from SAS
SAS makes it very easy to copy data sets into the Teradata server. However, this can be very
dangerous and jeopardize the effectiveness of the table usage.
What would happen if you had authorization to create tables in Teradata and you submit the
following SAS program?
Which Teradata data types would be used for the columns in the new table?
What would be the primary index for the table?
You copy the data from the SAS data set orion. Order_fact into the Teradata table CustomerOrders.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-5
Creating Teradata Tables from SAS
Default Output Teradata Data Types
SAS/ACCESS assigns default Teradata data types according to SAS data types and SAS formats
during output processing.
SAS Data Type SAS Format Teradata Data Type
Character $w. $CHARw. $VARYINGw. CHAR[w]
Character $HEXw. BYTE[w]
Numeric A date format DATE
Numeric TIMEw.d TIME(d)1
Numeric DATETIMEw.d TIMESTAMP(d)1
Numeric w.(w2) BYTEINT
Numeric w.(w3-4) SMALLINT
Numeric w.(w5-9) INTEGER
Numeric w.(w10) FLOAT
Numeric w.d DECIMAL(w-1,d)
Numeric All other numeric formats FLOAT
To display Teradata columns that contain SAS times and datetimes properly, you must explicitly assign
the appropriate SAS time or datetime display format to the column.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-6
Teradata Basics – Primary Indexes (Review)
Primary Index on Teradata Tables
A required Index on one or multiple columns used to determine the distribution of table rows
across the Teradata nodes and AMPS (parallel processing units on Teradata nodes).
A hashing algorithm is applied to the primary index columns values to determine the Hash-ID,
the information about which AMP owns a specific table row.
Hence the primary index determines how evenly data is distributed among the AMPS.
Optimal performance results because even distribution allows the AMPs to work in parallel and
complete their processing about the same time.
To define a Teradata table it is necessary to choose a column or set of columns as the primary index.
This index is passed through a hashing algorithm to determine which AMP owns the data
To retrieve a row, the primary index value is again passed to the hash algorithm, which generates the
two hash values, AMP and Hash-ID. These values are used to immediately determine which AMP owns
the row and where the data are stored.
One dramatic side-effect of using the hashing algorithm as an indexing mechanism is the absence of a
user-defined order
Hash partitioning of primary index values allows rows from different tables with high affinities to be
placed on the same node. This co-location reduces the inter-connect traffic that cross-node joins
necessitate
It is very important to chose the right primary index – Data need to be distributed evenly across the
system for better performances (take full advantage of the system parallelism)
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-7
Teradata Basics – Primary Indexes (Review)
Primary Index on Teradata Tables
When the Primary Index column(s) values for a table are “sufficiently unique”, the rows in that
table are evenly distributed across all AMPs.
However, if data is not evenly distributed across all AMPs “Skewed Data”, the slowest AMP
becomes a bottleneck. That is, a given query or operation will only run as fast as the slowest
AMP involved.
“Hot” Amp
due to skewed
data
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-8
Creating Teradata Tables from SAS
Copying a SAS Data Set into a Teradata Table
SAS makes it very easy to copy data into the Teradata server.
What would be the primary index for the table created with this sample program ?
SAS/ACCESS as a default selects the first column of SAS table to be chosen as the primary
index column for the new Teradata table.
Of course, this can be very dangerous and jeopardize the effectiveness of the primary index.
An AMP is as fast as it's fastest processing AMP. Which means if one AMP is idle because the workload
is not distributed evenly, things will move as slow as that underworked AMP.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-9
Creating Teradata Tables from SAS
Determining Primary Index Chosen by Teradata
If you have loaded a SAS data set into a Teradata table, you can use the SQL procedure to pass a
SHOW TABLE statement to Teradata to confirm the primary index that was chosen by Teradata for
the table.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-10
Creating Teradata Tables from SAS
Teradata SHOW TABLE Results
The primary index cust_id was the first column defined in the table.
Request Text
CREATE MULTISET TABLE INSTRUCTOR.checking_account_new
, NO FALLBACK
, NO BEFORE JOURNAL
, NO AFTER JOURNAL
, CHECKSUM = DEFAULT
, DEFAULT MERGEBLOCKRATIO (
cust_id INTEGER
, acct_nbr CHAR(16) CHARACTER SET LATIN CASESPECIFIC
, minimum_balance INTEGER
, per_check_fee DECIMAL(9,2)
, account_active CHAR(1) CHARACTER SET LATIN CASESPECIFIC
, acct_start_date DATE FORMAT 'YY/MM/DD'
, acct_end_date DATE FORMAT 'YY/MM/DD'
, starting_balance DECIMAL(9,2)
, ending_balance DECIMAL(9,2))
PRIMARY INDEX ( cust_id );
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-11
Creating Teradata Tables from SAS
Teradata TABLE Types
Teradata distinguishes the following table types:
MULTISET-Tables that do allow duplicate rows
Default type used for create table operations from SAS.
Explicitly used with the data set option SET=NO
A “MULTISET table” allows duplicate rows of data to be loaded into it, something that a “SET table”,
(the type that FASTLOAD creates), does not allow.
Teradata was designed to adhere to strict relational rules, one of which is that you cannot have duplicate
rows in a table. Over the course of time this restriction was loosened. Now you can create a table that
will allow duplicate rows. This is called a multiset table. Unfortunately you cannot use FastLoad to load
these tables if the load data contains and requires duplicate rows.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-12
Creating Teradata Tables from SAS
Governing SQL Implicit Pass-Through
The system option DBIDIRECTEXEC governs, if CREATE TABLE or DELETE statement are
directly passed through..
The DBIDIRECTEXEC option tells SAS to optimize CREATE TABLE AS statements and have the
DBMS execute the SQL statement. This is faster because SAS doesn’t need to read and insert data into
the table – the DBMS does it all.
Examining the SQL passed to the database, we see that in the Get Data stage, all of the data in ZZZ is
being read by SAS. In fact, data is also being written to a temp table which can be expensive in terms of
IO. The Insert Data stage will also be expensive as data is moved back into the database.
The criteria for passing SQL statements to the DBMS are the same as those for passing joins. When
these criteria are met, a DBMS can process the CREATE TABLE <table-name> AS SELECT statement
in a single step. If multiple librefs point to different data sources, the statement is processed normally
regardless of how you set this option.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-13
Creating Teradata Tables from SAS
Governing SQL Implicit Pass-Through
When DBIDIRECTEXEC is set, a CREATE TABLE or DELETE statement is directly passed-
through to the database.
GOOD: The
DBMS does it all!
Take a look at the SQL statement that is being passed to the database. Using the SASTRACE=‘,,,d’
option you can clearly see that the DBMS is handling the CREATE TABLE AS processing. This is
good.
This statement doesn’t move data from the DBMS to SAS and from SAS back into the DBMS. The data
stays in the DBMS.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-14
Creating Teradata Tables from SAS
SAS/ACCESS can automatically create DBMS tables.
If you want to create tables with, for example, different data types than the default ones,
use the DBTYPE= data set option.
To set any DBMS specific table creation option use the DBCREATE_TABLE_OPTS data
set option, whose values get appended to the CREATE TABLE statement.
Hence you can specify the primary index when you create the Teradata table.
data teralib.checking_account_new
(DBCREATE_TABLE_OPTS='Primary index (acct_nbr)'
DBTYPE=(acct_start_date='date' acct_end_date='date')
);
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-15
Creating Teradata Tables from SAS
Example creating a new table with custom primary Index
If the table already exists in Teradata, you can not REPLACE it because the SAS/ACCESS Teradata
engine does not support the REPLACE option. You would have to drop the table and then recreate it
first. If you try to recreate it, you get the following ERROR message:
ERROR: The TERADATA table CustomerOrders has been opened for OUTPUT. This table
already exists, or there is a name conflict with an existing object. This table will not be replaced.
This engine does not support the REPLACE option.
NOTE: The SAS System stopped processing this step because of errors.
If you don't have authorization on the DBMS to drop tables, the PROC SQL code will not work.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-16
Creating Teradata Tables from SAS
Determining Primary Index Distribution of a Table
Use a query to predict the distribution of the primary index values for the columns to be
chosen.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-17
Loading Data into Teradata Tables
Overview of loading data into existing Teradata tables
SAS can insert rows into a Teradata table using Data Steps, PROC SQL or PROC
APPEND steps.
• By default, SAS inserts rows into the Teradata table one row at a time in a sequential
process. This process is slow because rows are inserted using one unit of
parallelism.
• SAS LiBNAME options can be used to tune standard SAS insert operations into
Teradata tables (multi-row inserts etc.
Teradata provides specific utilities for fastest load operations, which can be used directly
through SAS.*
• Teradata distinguishes operation for loading to empty tables, appending to existing
tables and more and provides specific utilities for those (FastLoad, MultiLoad,
TPUMP, TPT, …)
• However, depending on site restrictions you will require permission to leverage those
utilities.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-18
Loading Data into Teradata Tables
Overview of loading data into existing Teradata tables
The default load methods for loading SAS data into Teradata tables use Teradata
libnames and
– corresponding SAS DATA step syntax,
– SAS PROC SQL insert into syntax, or
– SAS PROC APPEND syntax.
SAS default behavior issues a series of single-row inserts committed in blocks
sequentially (default 1000).
This can be tuned using DBCOMMIT and MULTISTMT options.
However, this process will most likely always be slow compared to using
Teradata’s load utilities, because rows are inserted using only one unit of
parallelism.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-19
Loading Data into Teradata Tables
Using SAS PROC SQL, insert syntax to load data from SAS to Teradata:
Take a look at the SQL statement that is being passed to the database. Using the SASTRACE=‘,,,d’
option you can clearly see that the DBMS is handling the CREATE TABLE AS processing. This is
good.
This statement doesn’t move data from the DBMS to SAS and from SAS back into the DBMS. The data
stays in the DBMS.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-20
Loading Data into Teradata Tables
Using SAS PROC SQL, insert syntax to load data from SAS to Teradata – Teradata Query-Log.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-21
Loading Data into Teradata Tables
DBCOMMIT SAS Load Option
Causes an automatic COMMIT after a specified number of rows have been
processed.
DBCOMMIT= affects update, delete, and insert processing. The number of
rows that are processed includes rows that are not processed successfully.
Default value: 1000 when inserting rows into a DBMS table; 0 when updating
a DBMS table
If you set to 0, COMMIT is issued only once--after the procedure or DATA
step completes.
Default value: 1000 when inserting rows into a DBMS table; 0 when updating a DBMS table
If you explicitly set the DBCOMMIT= option, SAS/ACCESS fails any update with a WHERE clause.
Note: If you specify both DBCOMMIT= and ERRLIMIT= and these options collide during processing,
COMMIT is issued first and ROLLBACK is issued second. Because COMMIT is issued (through the
DBCOMMIT= option) before ROLLBACK (through the ERRLIMIT= option), DBCOMMIT=
overrides ERRLIMIT=.
Teradata: See the FastLoad capability description for the default behavior of this option. DBCOMMIT=
and ERRLIMIT= are disabled for MultiLoad to prevent any conflict with ML_CHECKPOINT= data set
option
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-22
Loading Data into Teradata Tables
DBCOMMIT SAS Load Option – Example
Take a look at the SQL statement that is being passed to the database. Using the SASTRACE=‘,,,d’
option you can clearly see that the DBMS is handling the CREATE TABLE AS processing. This is
good.
This statement doesn’t move data from the DBMS to SAS and from SAS back into the DBMS. The data
stays in the DBMS.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-23
Loading Data into Teradata Tables
MULTISTMT SAS Load Option
Specifies whether INSERT statements are sent to Teradata one at a time or in a group (multi-row
inserts).
When you need to insert large volumes of data, you can significantly improve performance by
using MULTISTMT= instead of inserting only single-row.
SAS first determines how many insert statements that it can send to Teradata:
How many SQL insert statements can fit in a 64K buffer,
how many data rows can fit in the 64K data buffer, and
how many inserts the Teradata server chooses to accept.
The SAS/ACCESS engine to Teradata supports a MULTISTMT option that causes the engine to
generate a multi-row insert, using the same mechanism as the Teradata TPump utility.
Significant performance gains can be obtained when compared to single-row inserts when large
volumes of data are inserted
Examples
Here is an example of how you can send insert statements one at a time to Teradata.
libname user teradata user=zoom pw=XXXXXX server=dbc; proc delete data=user.testdata; run; data
user.testdata(DBTYPE=(I="INT") MULTISTMT=YES); do i=1 to 50; output; end; run;
In the next example, DBCOMMIT=100, so SAS issues a commit after every 100 rows, so it sends only
100 rows at a time.
libname user teradata user=zoom pw=XXXXX server=dbc; proc delete data=user.testdata; run; proc
delete data=user.testdata;run; data user.testdata(MULTISTMT=YES DBCOMMIT=100); do i=1 to
1000; output; end; run;
In the next example, DBCOMMIT=1000, which is much higher than in the previous example. In this
example, SAS sends as many rows as it can fit in the buffer at a time (up to 1000) and issues a commit
after every 1000 rows. If only 600 can fit, 600 are sent to the database, followed by the remaining 400
(the difference between 1000 and the initial 600 that were already sent), and then all rows are
committed.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-24
libname user teradata user=zoom pw=XXXXX server=dbc; proc delete
data=user.testdata; run; proc delete data=user.testdata; run; data
user.testdata(MULTISTMT=YES DBCOMMIT=1000); do i=1 to 10000; output; end;
run;
This next example sets CONNECTION=GLOBAL for all tables, creates a global
temporary table, and stores the table in the current database schema.
libname user teradata user=zoom pw=XXXXX server=dbc connection=global; proc
delete data=user.temp1; run; proc sql; connect to teradata(user=zoom
pw=XXXXXXX server=dbc connection=global); execute (CREATE GLOBAL
TEMPORARY TABLE temp1 (col1 INT ) ON COMMIT PRESERVE ROWS) by
teradata; execute (COMMIT WORK) by teradata; quit; data work.test; do col1=1 to
1000; output; end; run; proc append data=work.test base=user.temp1(multistmt=yes);
run;
Loading Data into Teradata Tables
MTULTISTMT SAS Load Option – Example
Take a look at the SQL statement that is being passed to the database. Using the SASTRACE=‘,,,d’
option you can clearly see that the DBMS is handling the CREATE TABLE AS processing. This is
good.
This statement doesn’t move data from the DBMS to SAS and from SAS back into the DBMS. The data
stays in the DBMS.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-25
Loading Data into Teradata Tables
MULTISTMT SAS Load Option – Example
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-26
Creating Teradata Tables and
Loading Data from SAS
This demonstration illustrates how to control table
creation and column definitions from SAS and how to
load data into a Teradata table from SAS.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-27
Exercise
This exercise reinforces the concepts discussed
previously.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-28
Module 6 – Creating, Updating and Loading Teradata
Tables from SAS
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-29
Leveraging Teradata Load Utilities
Improving Teradata Load Performance
SAS/ACCESS to Teradata supports the following native Teradata-CLI bulk-
loading utilities , which greatly improve performance when inserting rows of data
into Teradata tables:
FastLoad utility for built-loading empty tables
MultiLoad for bulk-appending to existing tables
TPUMP* utility for continuous real-time data loads and updates without the
typical table locking effects during bulk-loads
Teradata Parallel Transporter (TPT) provides even more parallel processing
enabled- and stream-based versions of the classic utilities.
TPT is now default. Will revert back to classic utilities as per need
NOTE: MULTILOAD is an option available with version SAS version 9.1.3 and higher, (and requires
that the “SAS/ACCESS to Teradata” module be installed on your SAS platform). The FASTLOAD
option is available with SAS version 8.2 and higher, (and requires that the “SAS/ACCESS to Teradata”
module be installed on your SAS platform)
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-30
Leveraging Teradata Load Utilities
The Teradata classics and their differences
FastLoad MultiLoad
FASTLOAD=Yes MULTILOAD=Yes
Can be used to load data to Can be used to load empty
empty tables only. tables or tables that contain
data.
Cannot load duplicate rows Can be used to load duplicate
into any tables. rows into multi-set tables.
Can be used with the Can only be used as a data set
LIBNAME statement or as a option.
data set option.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-31
FastLoading Empty Teradata Tables
Assumptions/Features
Must be creating a new table (in any given logical Teradata database) or loading
into an empty table.
Each row loaded must be unique, duplicate rows will be rejected (bit-for-bit
duplicate rows).
Log files can be captured from the load to determine the source of the problem.
Restrictions
Must the target table must have no secondary indexes, join indexes, or hash
indexes defined on it.
The target table must have no triggers defined on it. Triggers are data
modifications – for example, insert, delete, or update.
The table must have no standard referential integrity or batch referential integrity
defined on it.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-32
FASTLOADing Empty Teradata Tables
Enabling Teradata FASTLOADING
• Can be used by any SAS program step which creates and loads an empty
table at the same time or steps which load into empty tables
• SAS Libname and Data Set options FASTLOAD=YES (synonym
BULKLOAD=YES) enables Teradata’s utility if available
• Further might be helpful when using FASTLOAD
– SESSIONS Specifies how many Teradata sessions to be logged on when
using FastLoad, FastExport, or MultiLoad
SESSIONS=4 When reading data with FastExport or loading data with FastLoad and MultiLoad, you
can request multiple sessions to increase throughput. Using large values might not necessarily increase
throughput due to the overhead associated with session management. Check whether your site has any
recommended value for the number of sessions to use. See your Teradata documentation for details
about using multiple sessions.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-33
FASTLOADing Empty Teradata Tables
Example – Creating and FastLoading a Teradata table using SAS DATA Step
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-34
FASTLOADing empty Teradata Tables
Example – Creating and FastLoading a Teradata table using SAS PROC SQL
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-35
FASTLOADing empty Teradata Tables
Example – Creating and FastLoading a Teradata table using two separate, independent steps.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-36
Using FASTLOAD for Append Operations
Using a FastLoad in a Three-Step Table Append Process
1. Write new data to an “intermediate Teradata table”, using the FASTLOAD
option (if so advised by your Teradata DBA)
2. Use Explicit SQL Pass-Thru to insert rows into the “target Teradata table”
from the “intermediate Teradata table” often referred to by Teradata users as
an “INSERT … SELECT”
3. Then, drop the “intermediate Teradata table”
Recall that use of the FASTLOAD option will result in duplicate rows being
dropped when creating your intermediate table, (if duplicate rows exist within
your intermediate data set)
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-37
Using FASTLOAD for Append Operations
Example – FastLoad an intermediate table in a multi-step append operation.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-38
MULTILOADing Teradata Tables
MultiLoad is the parallel load utility used by Teradata to insert rows of data into both
empty and existing Teradata tables.
Requirement for using the MultiLoad utility
The target table must have no unique secondary, join, or hash indexes on it.
The target table must have no triggers defined on it. Triggers are data
modifications – for example, insert, delete, or update.
The table must have no standard referential integrity or batch referential integrity
defined on it.
The MultiLoad input file must have data to qualify all columns defined in the
primary index of the target table.
Will allow to load duplicate rows
You must drop these items on target tables before the load:
unique secondary indexes, foreign key references, join indexes.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-39
MULTILOADing Teradata Tables
Enabling Teradata MultiLoad with SAS
• Can be used by any SAS program step creating and loading Teradata table.
Most commonly multiload is used with SAS Proc Append steps..
• Invoked by the Data Set options MultiLoad=YES if it is available and permission
to use the utility have been granted
• Further options might be helpful when using multiload
– SESSIONS Specifies how many Teradata sessions to be logged on when
using FastLoad, FastExport, or MultiLoad
– LOGDB Specifying an alternative database for writing multiload log files
– BL_LOG specifying non-standard name for the multiload log-files.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-40
MULTILOADing Teradata Tables
Example – Writing to Teradata from SAS Table Append
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-41
Leveraging Teradata Load Utilities
Teradata Parallel Transporter Utilities (TPT)
The TPT API provides a consistent interface for FastLoad, MultiLoad, and Multi-Statement insert. TPT
API documentation refers to FastLoad as the load driver, MultiLoad as the update driver, and Multi-
Statement insert as the stream driver.
By using the TPT API, you can load data into a Teradata table without working directly with such stand-
alone Teradata utilities as FastLoad, MultiLoad, or TPump. When TPT=NO, SAS uses the TPT API load
driver for FastLoad, the update driver for MultiLoad, and the stream driver for Multi-Statement insert.
When TPT=YES, sometimes SAS cannot use the TPT API due to an error or because it is not installed
on the system. When this happens, SAS does not produce an error, but it still tries to load data using the
requested load method (FastLoad, MultiLoad, or Multi-Statement insert). To check whether SAS used
the TPT API to load data, look for a similar message to this one in the SAS log:
libname tera teradata user=testuser pw=testpw TPT=YES; /* Create data */ data testdata; do i=1 to 100;
output; end; run; * Load using MultiLoad TPT. This note appears in the SAS log if SAS uses TPT.
NOTE: Teradata connection: TPT MultiLoad has inserted 100 row(s).*/ data
tera.testdata(MULTILOAD=YES); set testdata; run;
/* Verfication */
Make sure that LD_LIBRARY_PATH has the TPT library directory embedded. Then run the following
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-42
code to verify that TPT is being used in SAS:
Then look at the trace messages for each row and you should see something like
TERADATA: trtpt_insert() …
Teradata Load Utilities – TPT FastLoad
TPT FastLoad Supported Features and Restrictions
SAS/ACCESS Interface to Teradata supports the TPT API for FastLoad, also known as the
load driver. SAS/ACCESS works by interfacing with the load driver through the TPT API, which
in turn uses the Teradata FastLoad protocol for loading data.
If SAS cannot find the Teradata modules that are required for the TPT API or TPT=NO, then
SAS/ACCESS uses the old method of FastLoad.
SAS/ACCESS can restart FastLoad from checkpoints when FastLoad uses the TPT API.
Data errors are logged in Teradata tables. Error recovery can be difficult if you do not
TPT_CHECKPOINT_DATA= to enable restart from the last checkpoint. To find the error that
corresponds to the code that is stored in the error table, see your Teradata documentation.
You can restart a failed job for the last checkpoint by following the instructions in the SAS error
log.
The SAS/ACCESS FastLoad facility using the TPT API is similar to the native Teradata FastLoad
utility. They share these limitations.
• FastLoad can load only empty tables. It cannot append to a table that already contains data. If you try
to use FastLoad when appending to a table that contains rows, the append step fails.
• FastLoad does not load duplicate rows (those where all corresponding fields contain identical data)
into a Teradata table. If your SAS data set contains duplicate rows, you can use other load methods
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-43
Teradata Load Utilities – TPT FastLoad
TPT FastLoad Example
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-44
Teradata Load Utilities – TPT MultiLoad
TPT MultiLoad Supported Features and Restrictions
SAS/ACCESS Interface to Teradata supports the TPT API for MultiLoad, also known as the
update driver. SAS/ACCESS works by interfacing with the update driver through the TPT
API. This API then uses the Teradata MultiLoad protocol for loading data.
If SAS cannot find the Teradata modules that are required for the TPT API or TPT=NO,
then SAS/ACCESS uses the old method of MultiLoad.
SAS/ACCESS supports only insert operations and loading only one target table at time.
SAS/ACCESS can restart MultiLoad from checkpoints when MultiLoad uses the TPT
API.
Errors are logged to Teradata tables. Error recovery can be difficult if you do not set
TPT_CHECKPOINT_DATA= to enable restart from the last checkpoint. You can restart
a failed job for the last checkpoint by following the instructions in the SAS error log.
The SAS/ACCESS MultiLoad facility loads both empty and existing Teradata tables.
The SAS/ACCESS MultLoad facility using the TPT API is similar to the native Teradata MultiLoad
utility. A common limitation that they share is that you must drop these items on target tables before the
load:
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-45
Teradata Load Utilities – TPT MultiLoad
TPT MultiLoad Example
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-46
Teradata Basics – Teradata TPUMP
TPump is a Teradata utility designed to continuously move data from data sources into Teradata
tables without locking the affected table.
TPump provides near-real-time data into data warehouses.
TPump can be used to insert, update, and delete data in the Teradata database.
TPump uses Teradata row hash locks, meaning users can run queries while it’s updating the
Teradata Warehouse.
http://en.wikipedia.org/wiki/Tpump
TPump uses standard Teradata SQL to achieve moderate to high data loading rates to the Teradata
RDBMS. Multiple sessions and multi-statement request are typically used to increase throughput.
TPump provides an alternative to MultiLoad for the low volume batch maintenance of large databases
under control of a Teradata system. Instead of updating Teradata databases overnight, or in batches
throughout the day, TPump updates information in real time, acquiring every bit of data from the client
system with low processor utilization. It does this through a continuous feed of data into the data
warehouse, rather than the traditional batch updates. Continuous updates results in more accurate,
timely data.
And, unlike most load utilities, TPump uses row hash locks rather than table level locks. This allows
users to run queries while TPump is running. This also means that TPump can be stopped
instantaneously.
TPump also provides a dynamic throttling feature that enables it to run “all out” during batch windows,
but within limits when it may impact other business uses of the Teradata RDBMS. Operators can specify
the number of statements run per minute, or may alter throttling minute-by-minute, if necessary.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-47
and network slow downs. Jobs can restart with absolutely no intervention.
• Flexible data management – accepts an infinite variety of data forms from an
infinite number of data sources, including direct feeds from other databases.
TPump is also able to transform that data on the fly before sending it to Teradata.
SQL statements and conditional logic are usable within the utilities, making it
unnecessary to write wrapper jobs around.
Features
• Fast, scalable continuous data loads
• Row hash lock enables concurrent queries
• Dynamic throttling feature
• Best for small data volumes
Teradata Load Utilities – TPT TPUMP
TPT Multi-Statement Insert – Features and Restrictions
SAS supports the TPT API for Multi-Statement insert, also known as the stream driver.
SAS/ACCESS works by interfacing with the stream driver through the TPT API, which in turn uses
the Teradata Multi-Statement insert (TPump) protocol for loading data.
If SAS cannot find the Teradata modules that are required for the TPT API or TPT=NO, then
SAS/ACCESS uses the old method of Multi-Statement insert.
SAS/ACCESS can restart Multi-Statement insert from checkpoints when Multi-Statement
insert uses the TPT API.
The SAS/ACCESS Multi-Statement insert facility loads both empty and existing Teradata
tables. SAS/ACCESS supports only insert operations and loading only one target table at
time.
Errors are logged to Teradata tables. Error recovery can be difficult if you do not set
TPT_CHECKPOINT_DATA= to enable restart from the last checkpoint. You can restart a failed
job for the last checkpoint by following the instructions on the SAS error log.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-48
Teradata Load Utilities – TPT TPUMP
TPT TPUMP Example
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-49
Use Fastest Loading Methods
to Load Data into Teradata
Tables
This demonstration illustrates how to use Teradata bulk
loading utilities from SAS programs.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-50
Exercise
This exercise reinforces the concepts discussed
previously.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-51
Module 6 – Creating, Updating, and Loading Teradata
Tables from SAS
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-52
Updating Teradata Tables from SAS
Row updates to database tables from SAS
SAS enables PROC SQL updates
Updates using transaction tables used with PROC SQL or DATA Step with the
MODIFY clause
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-53
Updating Teradata Tables from SAS
Row Updates using PROC SQL implicit pass-through
Issues a single update request for each row
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-54
Updating Teradata Tables from SAS
Row Updates using Proc SQL and transaction tables in SAS
The code below identifies a SAS customer process that did not work well with PROC SQL running
against the DBMS.
The code example will produce multiple selects and updates of the data to satisfy the query,
resulting in multiple passes of the data.
Given the size of the data, this could degrade performance.
/* Update Case 3 - Same SQL-based update Teradata internally, for each empid in the transaction set a
query for each variable to be updated*/
proc sql;
update myTDLib.PayrollMasterUP pmu
set Gender=(select gender from MyTDLib.Payrollchanges pc where pmu.empid=pc.empid),
Jobcode=(select Jobcode from MyTDLib.Payrollchanges pc where
pmu.empid=pc.empid),
Salary=(select Salary from MyTDLib.Payrollchanges pc where pmu.empid=pc.empid),
DATEOFBIRTH=(select DATEOFBIRTH from
MyTDLib.Payrollchanges pc where pmu.empid=pc.empid),
DATEOFHIRE=(select DATEOFHIRE from MyTDLib.Payrollchanges
pc where pmu.empid=pc.empid)
where pmu.empid in (select empid from MyTDLib.Payrollchanges pc);
/* where exists
(select 1 from MyTDLib.Payrollchanges as pc2
where pmu.empid=pc2.empid);*/
quit;
/*
TERADATA_83: Executed: on connection 7
SELECT "EMPID","GENDER","JOBCODE","SALARY","DATEOFBIRTH","DATEOFHIRE" FROM
saseduc."PayrollMasterUP" FOR CURSOR
TERADATA: trget - rows to fetch: 1036
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-55
TERADATA: trqacol- No casting. Raw row size=4, Casted size=4,
CAST_OVERHEAD_MAXPERCENT=20%
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-56
Upserting Teradata Tables from SAS
Upsert Processing using explicit SQL pass-through
In the following example, payroll records stored in a transaction table are matched against the
payroll master table. If a match is found, you want to update the existing master record. If no
match is found, you want to append the transaction record (a new employee) to the master table.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-57
Upserting Teradata Tables from SAS
Upsert Processing using explicit SQL pass-through
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-58
Upserting Using Teradata Load Utilities
MultiLoad Supported Features and Restrictions
The UPSERT load option, which simultaneously updates and inserts during a load is also available
with the Teradata multiload. TPT also supports upserting.
proc append base = TSSASBOX.PRDTD_MASTER
(multiload=yes upsert=yes
upsert_where=(PRODUCT, COUNTRY, YEAR, MONTH)
keep = COUNTRY ACTUAL ... MONTH)
data = sashelp.prdsale (
keep = COUNTRY ACTUAL ... MONTH) force;
run;
.begin import mload tables sandbox."PRDTD_MASTER" WORKTABLES SAS_ML_WT_1758253940123
ERRORTABLES SAS_ML_ET_175825394012 SAS_ML_UT_17582539401230429826
NOTIFY HIGH EXIT SASMLNE.DLL TEXT '4308 ';.layout saslayout indicators;.FIELD "COUNTRY" * CHAR
(10);…. DML Label SASDML DO
INSERT FOR MISSING UPDATE ROWS;
UPDATE sandbox."PRDTD_MASTER" SET "ACTUAL"=:
"ACTUAL","PREDICT"=:"PREDICT","PRODTYPE"=:"PRODTYPE","QUARTER"=:"QUARTER"
WHERE "COUNTRY"=:"COUNTRY" AND "PRODUCT"=:"PRODUCT" AND "YEAR"=:"YEAR" AND
"MONTH"=:"MONTH";I
NSERT sandbox."PRDTD_MASTER"("COUNTRY","ACTUAL",“…,"MONTH")
VALUES(:"COUNTRY",:"ACTUAL…",:"MONTH");.
IMPORT INFILE DUMMY AXSMOD SASMLAM.DLL '4308 4308 4308 ' FORMAT UNFORMAT LAYOUT
SASLAYOUT APPLY SASDML;.END MLOAD;
NOTE: MultiLoad Inserts : 0 MultiLoad Updates : 17280
NOTE: Procedure used: APPEND - (Total process time): real time 14.84 seconds.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-59
Updating Teradata Tables from
SAS programs
This demonstration illustrates how to SAS programming
techniques to update Teradata tables.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-60
Exercise
This exercise reinforces the concepts discussed
previously.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-61
Copyright © 2009 by SAS Institute Inc. and Teradata Corporation. All Rights Reserved.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA
and other countries. Teradata and the names of products and services of Teradata Corporation are registered trademarks or trademarks
of Teradata Corporation in the USA and other countries. ® indicates USA registration. Other brand and product names are registered
trademarks or trademarks of their respective companies.
Creating, Updating, and Loading Teradata Tables from SAS Slide 6-62
Module 7
Best Practices for Advanced Integration
Use Cases
NoPI is going to be a great feature for sites which have two general categories of use: 1) ELT, and 2)
SandBoxes or User/App created Tables. Both of these have been mentioned in previously in this chain. I
would expect that in both cases, the default an organization will be to use NoPI.
For me, this is a defensive feature which prevents runtime breakage. It prevents our DBA team from
having to deal with exceptions and for our development staff to deal with a class of boundary conditions
which have no real solution on TD. Runtime exceptions are the most expensive to deal with from a staff
perspective, particularly when your batch runs in the middle of the night...
Due to a Teradata limitation, FastLoad and FastExport do not support use of temporary tables at this
time.
When accessing a volatile table with a LIBNAME statement, it is recommended that you do not use
these options:
If you use either DATABASE= or SCHEMA=, you must specify DBMSTEMP=YES in the LIBNAME
statement to denote that all tables accessed through it and all tables that it creates are volatile tables.
DBMSTEMP= also causes all table names to be not fully qualified for either SCHEMA= or
DATABASE=. In this case, you should use the LIBNAME statement only to access tables--either
permanent or volatile--within your default database or schema.
Examples
The following example shows how to use a temporary table:
SELECT
col1,
…, aggregate function
SUM( sales ) window function
OVER (
window grouping
PARTITION BY product
ORDER BY sales DESC window ordering
ROWS BETWEEN
UNBOUNDED PRECEDING window
default boundary= boundaries
AND
partition by group
UNBOUNDED FOLLOWING
) as SalesSum
FROM table;
A window is specified by the OVER() phrase, which can include the following clauses inside the
parentheses:
Note:
The window is defined as all rows - there
is no PARTITION specified.
The final column represents the total of
all rows.
The default title of the last column
indicates that this is a Group function.
Note:
Note that the Group Sum reflects the
total for account type
Rows are ordered descendingly by
ending_balance within the acct_type
partition group.
Note:
Note that the Group Sum reflects the
total for each customer (cust_id)
Rows are ordered descendingly by
ending_balance within the cust_id
partition group.
ROWS clause specifies the default
clause and could have been omitted.
Note:
The Cumulative Sum reflects the
sequential aggregation of all
rows.
The default title of last column
indicates this is a Cumulative
function.
Note:
The 1st and 2nd rows compute
their sums based on one and two
rows respectively.
The default title of the last column
indicates this is a Moving
function.
Each row is thus a sum of the
ending balance of that specific
row and the two previous rows
Analysis based upon ranking is particularly efficient in Teradata because of its massively parallel
architecture, spreading work across its many units of parallelism
Note:
The PARTITION BY clause defines the scope
of the ranking (“rank within”).
In this case, the ranking is by ending_balance
on different account types per customer
Without PARTITION BY, scope would default
to ending_balance for all customers.
The QUALIFY clause limits the results to the
top two balance amounts for each customer
and the sort sequence of balance amount is
descending.
Due to PARTITION BY, sort is by sales
(DESC) for each cutomer.
No aggregation takes place in this query.
• A CASE statement can be easily included in your Explicit Pass-Thru SQL, within a
SELECT statement.
• Transformed variables can be part of your output result set, and also can be used
in subsequent variable specifications in the same SELECT statement.
Reduce strain on SAS server disk , memory , and processor resources by using
Teradata SQL to perform:
• Sampling
• Data exploration and quality checks
• Data summarization and aggregation
• Variable creation and transformation
For explicit SQL pass-thru recall that you can and should also use the EXPLAIN option to
see how efficient your SQL will be.
Sampling
• SAS sampling versus Teradata sampling
– Using SAS sampling function the entire raw data set needs to be downloaded
before sample taken
– Best strategy is Teradata sampling via explicit SQL pass-through (more in next
session)
Discuss with the DBA the best strategy for statistics collection.
Statistics should be collected for all table that are frequently used especially when
joins are involved.
Using the AUTHDOMAIN= option you can retrieve USER= and PASSWORD= information from an
authentication domain stored in your SAS Metadata Server. To the engine, it appears that the USER=
and PASSWORD= options were specified on the LIBNAME statement.
Protected Execution
– Function usage if possible two different execution modes PROTECTED and NOT
PROTECTED. Protected isolation grants higher isolation at the cost of slower
performance.
– Use PROTECTED mode to gain confidence in solution during testing. Then measure
performance difference, and plan deployment based on application performance
requirements and then run in NOT PROTECTED mode.
Both approaches require coordination, and defined processes, between analysts and
DBAs.
An Iterative Function Deployment Methodology can provide confidence and quiet concerns about
potential problems.
Helps “prove” to Teradata DBA’s the validity of SAS embedded function capabilities.
Helps prove to SAS Analysts that correct statistical results are produced, before running on
M/B/Trillions of rows.
Satisfies the need for In-DB performance testing, and informed decisions about UDF execution mode,
before installing into production Data Warehouse.