Sei sulla pagina 1di 502

Integration Skills Training

from SAS and Teradata


Version 9.4.0

37534
Student Guide
Trademarks
The product or products described in this book are licensed Novell and SUSE are registered trademarks of Novell, Inc.,
products of Teradata Corporation or its affiliates. in the United States and other countries.
Oracle, Java, and Solaris are registered trademarks of
Teradata, Applications-Within, Aster, BYNET, Claraview, Oracle and/or its affiliates.
DecisionCast, Gridscale, MyCommerce, QueryGrid, SQL- QLogic and SANbox are trademarks or registered
MapReduce, Teradata Decision Experts, "Teradata Labs" trademarks of QLogic Corporation.
logo, Teradata ServiceConnect, Teradata Source Experts, Quantum and the Quantum logo are trademarks of
WebAnalyst, and Xkoto are trademarks or registered Quantum Corporation, registered in the U.S.A. and other
trademarks of Teradata Corporation or its affiliates in the countries.
United States and other countries. Red Hat is a trademark of Red Hat, Inc., registered in the
Adaptec and SCSISelect are trademarks or registered U.S. and other countries. Used under license.
trademarks of Adaptec, Inc. SAP is the trademark or registered trademark of SAP AG
Amazon Web Services, AWS, [any other AWS Marks used in Germany and in several other countries.
in such materials] are trademarks of Amazon.com, Inc. or SAS and SAS/C are trademarks or registered trademarks of
its affiliates in the United States and/or other countries. SAS Institute Inc.
AMD Opteron and Opteron are trademarks of Advanced SPARC is a registered trademark of SPARC International,
Micro Devices, Inc. Inc.
Apache, Apache Avro, Apache Hadoop, Apache Hive, Symantec, NetBackup, and VERITAS are trademarks or
Hadoop, and the yellow elephant logo are either registered registered trademarks of Symantec Corporation or its
trademarks or trademarks of the Apache Software affiliates in the United States and other countries.
Foundation in the United States and/or other countries. Unicode is a registered trademark of Unicode, Inc. in the
Apple, Mac, and OS X all are registered trademarks of United States and other countries.
Apple Inc. UNIX is a registered trademark of The Open Group in the
Axeda is a registered trademark of Axeda Corporation. United States and other countries.
Axeda Agents, Axeda Applications, Axeda Policy Other product and company names mentioned herein may
Manager, Axeda Enterprise, Axeda Access, Axeda be the trademarks of their respective owners.
Software Management, Axeda Service, Axeda
ServiceLink, and Firewall-Friendly are trademarks and The information contained in this document is provided on
Maximum Results and Maximum Support are an "as-is" basis, without warranty of any kind, either
servicemarks of Axeda Corporation. express or implied, including the implied warranties of
CENTOS is a trademark of Red Hat, Inc., registered in the merchantability, fitness for a particular purpose, or
U.S. and other countries. non-infringement. Some jurisdictions do not allow the
Cloudera, CDH, [any other Cloudera Marks used in such exclusion of implied warranties, so the above exclusion
materials] are trademarks or registered trademarks of may not apply to you. In no event will Teradata
Cloudera Inc. in the United States, and in jurisdictions Corporation be liable for any indirect, direct, special,
throughout the world. incidental, or consequential damages, including lost profits
Data Domain, EMC, PowerPath, SRDF, and Symmetrix or lost savings, even if expressly advised of the possibility
are registered trademarks of EMC Corporation. of such damages.
GoldenGate is a trademark of Oracle.
Hewlett-Packard and HP are registered trademarks of The information contained in this document may contain
Hewlett-Packard Company. references or cross-references to features, functions,
Hortonworks, the Hortonworks logo and other products, or services that are not announced or available in
Hortonworks trademarks are trademarks of Hortonworks your country. Such references do not imply that Teradata
Inc. in the United States and other countries. Corporation intends to announce such features, functions,
Intel, Pentium, and XEON are registered trademarks of products, or services in your country. Please consult your
Intel Corporation. local Teradata Corporation representative for those
IBM, CICS, RACF, Tivoli, and z/OS are registered features, functions, products, or services available in your
trademarks of International Business Machines country.
Corporation.
Linux is a registered trademark of Linus Torvalds. Information contained in this document may contain
LSI is a registered trademark of LSI Corporation. technical inaccuracies or typographical errors. Information
Microsoft, Active Directory, Windows, Windows NT, and may be changed or updated without notice. Teradata
Windows Server are registered trademarks of Microsoft Corporation may also make improvements or changes in
Corporation in the United States and other countries. the products or services described in this information at
NetVault is a trademark or registered trademark of Dell any time without notice.
Inc. in the United States and/or other countries.

Copyright © 2007-2019 by Teradata. All rights reserved.


Table of Contents

Integration Skills Training from SAS and


Teradata
Version 9.4.0

Module 1 – Introduction and Overview to SAS and Teradata


Integration
Module Objectives ....................................................................................................................... 1-2
The SAS and Teradata Strategic Partnership ............................................................................... 1-5
The SAS and Teradata Partnership Delivers ............................................................................... 1-6
SAS and Teradata Architecture ................................................................................................... 1-7
SAS In-Database Principles ......................................................................................................... 1-8
The SAS and Teradata Partnership Delivers ............................................................................... 1-9
SAS and Teradata Together ....................................................................................................... 1-10
SAS and Teradata Integration – User Roles .............................................................................. 1-11
SAS – The Platform for Business Analytics .............................................................................. 1-13
What Is Foundation SAS?.......................................................................................................... 1-15
SAS by Example – SAS Enterprise Guide ................................................................................ 1-17
SAS by Example – Display Manager ........................................................................................ 1-18
SAS by Example – Batch Mode ................................................................................................ 1-19
SAS by Example – SAS Programs ............................................................................................ 1-20
SAS by Example – SAS Table / SAS Data Set ......................................................................... 1-23
SAS by Example – SAS Data Libraries..................................................................................... 1-29
SAS by Example –Two-Level SAS Filenames ......................................................................... 1-33
What Is SAS/ACCESS Interface Software? .............................................................................. 1-34
SQL Support within SAS ........................................................................................................... 1-35
What Is Teradata? ...................................................................................................................... 1-37
Teradata Architecture – Parallelism and Scalability .................................................................. 1-38
Teradata Architecture – Overview ............................................................................................. 1-40
Teradata Architecture – Parsing Engine .................................................................................... 1-41
Teradata Architecture – Message Passing Layer ....................................................................... 1-42
Teradata Architecture – Access Module Process....................................................................... 1-43
Teradata Architecture – Node .................................................................................................... 1-44
Teradata Architecture – Teradata Tables ................................................................................... 1-45
Teradata Architecture – Table Data Distribution....................................................................... 1-46
Teradata Architecture – Primary Index and Performance ......................................................... 1-47
Teradata Architecture – Skewed Data Distribution ................................................................... 1-48
Teradata Basics – Accessing Data Without Indexes.................................................................. 1-49
Teradata Architecture – Teradata NoPI Tables ......................................................................... 1-50
Teradata Basics – System Space Organization .......................................................................... 1-51
Teradata Basics – Database Objects .......................................................................................... 1-52
Teradata Basics – Views as Teradata Objects ........................................................................... 1-53
Teradata Basics – Views as Teradata objects ............................................................................ 1-54
Teradata Basics – System Space Types ..................................................................................... 1-55
Module 1 – Introduction and Overview to SAS and Teradata Integration ................................ 1-56
How Do SAS and Teradata Integrate? ....................................................................................... 1-57
SAS/ACCESS to Teradata – Overview ..................................................................................... 1-59
SAS/ACCESS Libname Engine for Teradata ............................................................................ 1-61
PROC SQL Implicit Pass-through (“IP”) .................................................................................. 1-62
PROC SQL Explicit Pass-through ............................................................................................. 1-63
SAS Data Integration – Teradata Integration............................................................................. 1-64
SAS Reporting – Teradata Integration ....................................................................................... 1-65
Similarities and Differences ....................................................................................................... 1-66
Teradata and SAS – Similar Concepts ....................................................................................... 1-67
Teradata and SAS – Naming Conventions ................................................................................ 1-68
Teradata and SAS – Data Types ................................................................................................ 1-69
Integrate SAS and Teradata General Guidelines ....................................................................... 1-70

Module 2 – Querying Teradata Using SAS Libname and Implicit SQL


Pass-Through
Objectives .................................................................................................................................... 2-2
The LIBNAME Statement (Review) ........................................................................................... 2-5
Assigning a Libref (Review) ........................................................................................................ 2-7
The SAS/ACCESS LIBNAME Statement .................................................................................. 2-8
SAS/ACCESS LIBNAME Statement .......................................................................................... 2-9
The SAS/ACCESS LIBNAME Statement ................................................................................ 2-10
SAS/ACCESS LIBNAME Statement to Teradata ..................................................................... 2-11
LIBNAME Statement Connection Options ............................................................................... 2-12
Reading Teradata Tables into SAS ............................................................................................ 2-13
Module 2 – Querying Teradata Using SAS Libname and Implicit SQL Pass-Through ............ 2-14
Using the SAS/ACCESS Libname Engine ................................................................................ 2-15
Examining SQL Implicit Pass-Through Code ........................................................................... 2-16
The SASTRACE= SAS System Option .................................................................................... 2-17
The Fullstimer SAS System Option (Optional) ......................................................................... 2-18
SQL Implicit Pass-Through Code .............................................................................................. 2-19
SASTRACE Messages in the Log ............................................................................................. 2-20
Examining the Data Extract Size ............................................................................................... 2-21
Data Extract Size – Limiting Columns ...................................................................................... 2-22
Data Extract Size – Limiting Rows ........................................................................................... 2-24
Selected SAS/ACCESS Data Set Options ................................................................................. 2-26
Querying Teradata Tables Using SAS Procedures and DATA Step Programs ......................... 2-27
Exercise ...................................................................................................................................... 2-28
Module 2 – Querying Teradata Using SAS Libname and Implicit SQL Pass-Through ............ 2-29
SQL Implicit Pass-Through ....................................................................................................... 2-30
Using PROC SQL Implicit Pass-Through ................................................................................. 2-31
SQL Implicit Pass-Through ....................................................................................................... 2-33
Rules for SQL Implicit Pass-Through ....................................................................................... 2-35
Validating SQL Implicit Pass-Through ..................................................................................... 2-36
Governing SQL Implicit Pass-Through ..................................................................................... 2-37
Providing Data to End-Users Using Views ............................................................................... 2-40
What is a PROC SQL View? ..................................................................................................... 2-41
Creating a PROC SQL View ..................................................................................................... 2-42
Using a PROC SQL View.......................................................................................................... 2-44
SASTRACE Information ........................................................................................................... 2-45
Creating Views with Embedded LIBNAME Statements........................................................... 2-46
Creating a PROC SQL View with the Embedded LIBNAME Statement ................................. 2-47
Embedded LIBNAME Statements ............................................................................................. 2-48
SASTRACE Messages in the Log ............................................................................................. 2-50
Dynamic SAS Programs – SAS Macro Language ..................................................................... 2-51
SAS Macro Language (Example) .............................................................................................. 2-52
Creating Dynamic SAS Views (optional) .................................................................................. 2-53
Using SAS PROC SQL and the SAS/ACCESS Libname Engine for Teradata ........................ 2-55
Exercise ...................................................................................................................................... 2-56
Module 2 – Querying Teradata Using SAS Libname and Implicit SQL Pass-Through ............ 2-57
Options for Optimizing the Teradata Query .............................................................................. 2-58
Implicit Pass-Through – Temporal functions ............................................................................ 2-59
Implicit Pass-Through – Date/Time Functions .......................................................................... 2-62
Implicit Pass-Through – Function Mapping .............................................................................. 2-63
Implicit Pass-Through – SAS Formats ...................................................................................... 2-65
SAS by Example – SAS Formats .............................................................................................. 2-66
Implicit Pass-Through – SAS Formats ...................................................................................... 2-68
Differences in Query Behavior .................................................................................................. 2-69
SAS Missing and Teradata NULL Values ................................................................................. 2-70
Table Order and Sorting in SAS and Teradata .......................................................................... 2-73
Table Order and Sorting – Teradata Sort ................................................................................... 2-74
Table Order and Sorting – Sort Stability ................................................................................... 2-76
Table Order and Sorting – Spooling (Optional) ........................................................................ 2-78
Making a SAS Copy of Teradata Data ...................................................................................... 2-79
Linguistic Order and Sorting (Optional) .................................................................................... 2-83
Implicit Pass-Through – Considerations .................................................................................... 2-85
Using SAS Options for More Specific Teradata Query Use Cases ........................................... 2-86
Exercise ...................................................................................................................................... 2-87
Rules for SQL Implicit Pass-Through ....................................................................................... 2-89
Module 3 – Querying Teradata Using SAS Explicit SQL Pass-Through
SAS Explicit SQL Pass-Through – Overview ............................................................................. 3-4
Explicit Pass-Through – Using Queries ....................................................................................... 3-8
Explicit Pass-Through – Connecting to Teradata ...................................................................... 3-10
Explicit Pass-Through – SELECT Statement ............................................................................ 3-12
Explicit Pass-Through – Closing the Connection ...................................................................... 3-14
Explicit Pass-Through – Query Output in SAS ......................................................................... 3-16
Explicit Pass-Through – Using SAS Features ........................................................................... 3-17
Explicit Pass-Through – Using Teradata Features..................................................................... 3-20
Using Teradata Features – Teradata Explain ............................................................................. 3-22
Explicit Pass-Through – SAS SQL Views................................................................................. 3-24
Explicit Pass-Through – Creating a SAS Views........................................................................ 3-26
Explicit Pass-Through – Using a SAS View ............................................................................. 3-27
Dynamic SAS Programs – SAS Macro Language ..................................................................... 3-28
Explicit Pass-Through – Using Macro Variables ...................................................................... 3-29
Querying Teradata Using SAS Explicit SQL Pass-Through ..................................................... 3-32
Exercise ...................................................................................................................................... 3-33
Module 3 – Querying Teradata using SAS Explicit SQL Pass-Through................................... 3-34
Explicit Pass-Through – Execute Requests ............................................................................... 3-35
Teradata SQL Modes – ANSI versus Teradata.......................................................................... 3-40
Teradata SQL modes – ANSI Mode .......................................................................................... 3-41
Teradata SQL modes – Teradata Mode ..................................................................................... 3-42
Explicit Pass-Through – Teradata SQL Modes ......................................................................... 3-43
Explicit Pass-Through – Teradata Modes .................................................................................. 3-45
Teradata SQL Mode – Use Cases .............................................................................................. 3-46
Teradata Features – Query Banding (Optional) ......................................................................... 3-47
Teradata Features – Statistics (Optional) ................................................................................... 3-49
Using SQL Explicit Pass-Through to Execute Non-query Requests in Teradata ...................... 3-50
Exercise ...................................................................................................................................... 3-51
Module 4 – Advanced: Querying Teradata Using Implicit SQL Pass-
Through
Fast-Exporting Data from Teradata to SAS ................................................................................. 4-4
Fast Extracts – Teradata FastExport ............................................................................................ 4-5
Optimizing Teradata Queries from SAS ...................................................................................... 4-6
Using SAS LIBNAME Options to Optimize SAS Extract Behavior .......................................... 4-8
Exercise ........................................................................................................................................ 4-9
Module 4 – Advanced: Querying Teradata Using Implicit SQL Pass-Through ........................ 4-10
SAS Procedures and DBMS Tables (Review) ........................................................................... 4-11
Example (Proc Chart) ................................................................................................................ 4-12
Example (Proc Chart with WHERE clause) .............................................................................. 4-13
SAS Procedure SQL Push-down for Teradata ........................................................................... 4-14
SAS Procedure SQL- and Format-Push Down .......................................................................... 4-21
Reading Data into SAS from Teradata ...................................................................................... 4-24
Reading Data into SAS from Teradata (Example) .................................................................... 4-25
Using SAS Procedures SQL Push-down ................................................................................... 4-26
Exercise ...................................................................................................................................... 4-27
Module 4 – Advanced: Querying Teradata Using Implicit SQL Pass-Through ........................ 4-28
Combining Tables in SAS and Teradata .................................................................................... 4-29
Combining Data Horizontally – Comparison ............................................................................ 4-30
Combining Data Vertically – Comparison ................................................................................ 4-32
Combining Tables – From Teradata Only ................................................................................. 4-34
Combining Tables – From Mixed Sources ................................................................................ 4-37
Loading Data into Teradata (FASTLOAD) ............................................................................... 4-38
Loading Data into Teradata (MULTILOAD) ............................................................................ 4-39
Combining SAS and Teradata Tables ........................................................................................ 4-40
Exercise ...................................................................................................................................... 4-41
Module 4 – Advanced: Querying Teradata Using Implicit SQL Pass-Through ........................ 4-42
Accessing Teradata Tables (Review)......................................................................................... 4-43
The SAS/ACCESS LIBNAME Statement (Review)................................................................. 4-44
SQL Procedure Pass-Through Facility (Review) ...................................................................... 4-45
Comparing Pass-Through Facilities ........................................................................................... 4-46
Fast Extracts – Teradata FastExport .......................................................................................... 4-49
Fast Extracts – SAS Threaded Reads......................................................................................... 4-50
Fast Extracts – General Options (Optional) ............................................................................... 4-56
Fast Extracts – Recommendations ............................................................................................. 4-57
Optimizing Teradata Queries from SAS .................................................................................... 4-58
SAS Analytics Procedure Push-Down ....................................................................................... 4-61
Combining Tables – From mixed sources ................................................................................. 4-64
Combining Tables – Key-Lookups (optional) ........................................................................... 4-68
Module 5 – Best Practices for Query Integration Use Cases
SAS to Teradata Function Mapping ............................................................................................ 5-4
Enabling Dynamic Function Mapping ......................................................................................... 5-9
Enabling Dynamic Function Mapping… ................................................................................... 5-13
Enabling Dynamic Function Mapping ....................................................................................... 5-14
Using Dynamic Function Mapping ............................................................................................ 5-20
Teradata Basics – User Defined Functions ................................................................................ 5-21
Using Extended Function Mapping Capabilities ....................................................................... 5-24
Exercise ...................................................................................................................................... 5-25
Module 5 – Best Practices for Query Integration Use Cases ..................................................... 5-26
SAS by Example – SAS Formats (Review) ............................................................................... 5-27
Implicit Pass-Through – SAS Formats (Review) ...................................................................... 5-28
Using SAS Formats in Teradata ................................................................................................ 5-29
Using the SAS Format Library .................................................................................................. 5-33
Exercise ...................................................................................................................................... 5-34
Module 5 – Best Practices for Query Integration Use Cases ..................................................... 5-35
Extracting Samples from Teradata Tables ................................................................................. 5-36
Reading Sample Data from Database using SAS code .............................................................. 5-37
Reading Sample Data from Database using Teradata SQL (Explicit SQL pass-through) ........ 5-38
Creating Sample Data from Database using SAS code (DATA STEP) .................................... 5-39
Creating Sample Data from Database using SAS code (SURVEY SELECT) .......................... 5-40
Teradata Sampling – Sample Size ............................................................................................. 5-41
Teradata Sampling – Multiple Samples ..................................................................................... 5-42
Teradata Sampling - Sampling Method ..................................................................................... 5-44
Teradata Sampling – Sampling Method .................................................................................... 5-45
Teradata Sampling - Sampling Method ..................................................................................... 5-46
Teradata Sampling - Considerations .......................................................................................... 5-48
Teradata Sampling – Best Practices ........................................................................................... 5-49
Extract Samples from Teradata Tables ...................................................................................... 5-50
Exercise ...................................................................................................................................... 5-51
Module 5 – Best Practices for Query Integration Use Cases ..................................................... 5-52
Naming Data Objects ................................................................................................................. 5-53
Data Types ................................................................................................................................. 5-54
Enabling Dynamic Function Mapping ....................................................................................... 5-61
Handling Specific Teradata Data Types .................................................................................... 5-62
Handling Teradata Large Numeric Values ................................................................................ 5-63
Handling Specific Teradata Data Types .................................................................................... 5-65
Exercise ...................................................................................................................................... 5-66
Using Dynamic Function Mapping ............................................................................................ 5-68
Using SAS Formats in Teradata ................................................................................................ 5-69
Teradata Sampling Function - Overview ................................................................................... 5-70
Module 6 – Creating, Updating, and Loading Teradata Tables from
SAS
Creating and Loading Teradata Tables from SAS ....................................................................... 6-4
Teradata Tables from SAS ........................................................................................................... 6-5
Creating Teradata Tables from SAS ............................................................................................ 6-6
Teradata Basics – Primary Indexes (Review) .............................................................................. 6-7
Creating Teradata Tables from SAS ............................................................................................ 6-9
Loading Data into Teradata Tables ............................................................................................ 6-18
Creating Teradata Tables and Loading Data from SAS ............................................................ 6-27
Exercise ...................................................................................................................................... 6-28
Module 6 – Creating, Updating and Loading Teradata Tables from SAS................................. 6-29
Leveraging Teradata Load Utilities ........................................................................................... 6-30
FastLoading Empty Teradata Tables ......................................................................................... 6-32
FASTLOADing Empty Teradata Tables ................................................................................... 6-33
FASTLOADing empty Teradata Tables .................................................................................... 6-35
Using FASTLOAD for Append Operations .............................................................................. 6-37
MULTILOADing Teradata Tables ............................................................................................ 6-39
Leveraging Teradata Load Utilities ........................................................................................... 6-42
Teradata Load Utilities – TPT FastLoad ................................................................................... 6-43
Teradata Load Utilities – TPT MultiLoad ................................................................................. 6-45
Teradata Basics – Teradata TPUMP .......................................................................................... 6-47
Teradata Load Utilities – TPT TPUMP ..................................................................................... 6-48
Use Fastest Loading Methods to Load Data into Teradata Tables ............................................ 6-50
Exercise ...................................................................................................................................... 6-51
Module 6 – Creating, Updating, and Loading Teradata Tables from SAS................................ 6-52
Updating Teradata Tables from SAS ......................................................................................... 6-53
Upserting Teradata Tables from SAS ........................................................................................ 6-57
Upserting Using Teradata Load Utilities ................................................................................... 6-59
Updating Teradata Tables from SAS programs ......................................................................... 6-60
Exercise ...................................................................................................................................... 6-61

Module 7 – Best Practices for Advanced Integration Use Cases


Creating Staging or Temporary Teradata Tables ......................................................................... 7-4
Using Teradata NoPI Tables ........................................................................................................ 7-5
Using Teradata Temporary Tables............................................................................................... 7-7
Using Temporary Teradata Tables............................................................................................... 7-9
Using Teradata Temporary Tables from SAS ........................................................................... 7-11
Exercise ...................................................................................................................................... 7-12
Module 7 - Best Practices for Advanced Integration Use Cases ............................................... 7-13
Teradata Analytical Functions - Overview ................................................................................ 7-14
Teradata Analytical Functions - Syntax ..................................................................................... 7-18
Teradata Analytical Functions - Example.................................................................................. 7-19
Teradata Analytical Functions - Windows ................................................................................ 7-20
Example 1 - Group Sum Window Function .............................................................................. 7-21
Example 2 – Cumulative Sum Window..................................................................................... 7-24
Example 3 – Moving Sum Window........................................................................................... 7-25
Teradata Analytical Functions – Ranking.................................................................................. 7-26
Example 4 – Teradata Rank Window Function ......................................................................... 7-27
Using Teradata Ordered Analytical Functions .......................................................................... 7-28
Exercise ...................................................................................................................................... 7-29
Module 7 - Best Practices for Advanced Integration Use Cases ............................................... 7-30
Preparing Data in Teradata for Use within SAS ........................................................................ 7-31
Preparing Data in Teradata ........................................................................................................ 7-32
Preparing Data in Teradata – Transformation ........................................................................... 7-34
Preparing Data in Teradata (Reference) .................................................................................... 7-38
Use Teradata SQL to Prepare Data for Use in SAS................................................................... 7-39
Exercise ...................................................................................................................................... 7-40
Module 7 - Best Practices for Advanced Integration Use Cases ............................................... 7-41
General Areas for Improvement ................................................................................................ 7-42
SAS and Teradata Workload Considerations ............................................................................ 7-43
Finding Bottlenecks ................................................................................................................... 7-44
Avoiding Bottlenecks................................................................................................................. 7-45
Note on Teradata Statistics ........................................................................................................ 7-48
Module 7 - Best Practices for Advanced Integration Use Cases ............................................... 7-52
Aspects of Security and Administration .................................................................................... 7-53
SAS/ACCESS to Teradata Connections Options ...................................................................... 7-54
Using Teradata Query Banding with SAS ................................................................................. 7-56
Using SAS Functions in a Teradata EDW ................................................................................. 7-58
Deployment Locations for SAS Functions ................................................................................ 7-60
Deployment Process for SAS Functions .................................................................................... 7-61
Module 1
Introduction and Overview to SAS and
Teradata Integration

Introduction and Overview to SAS and Teradata Integration Slide 1-1


Module Objectives
 The SAS and Teradata integration partnership
 Get familiar with basics of SAS and Teradata platform
 Understand the concept of integration between SAS and Teradata

Introduction and Overview to SAS and Teradata Integration Slide 1-2


Module 1

• Section 1.1 – SAS and Teradata Overview


• Section 1.2 – SAS Basics
• Section 1.3 – Teradata Basics
• Section 1.4 – Introduction to Integration Techniques

Introduction and Overview to SAS and Teradata Integration Slide 1-3


Module 1

• Section 1.1 – SAS and Teradata Overview


• Section 1.2 – SAS Basics
• Section 1.3 – Teradata Basics
• Section 1.4 – Introduction to Integration Techniques

Introduction and Overview to SAS and Teradata Integration Slide 1-4


The SAS and Teradata Strategic Partnership
 Corporate commitment from the highest levels
 Teradata is SAS’ strategic database partner for its “In-Database” initiative
 Joint product roadmap with dedicated R&D teams
 Joint products and services
 The SAS and Teradata Center of Excellence (CoE)
 strong customer and prospect interest

Introduction and Overview to SAS and Teradata Integration Slide 1-5


The SAS and Teradata Partnership Delivers
• A compelling and robust business analytics and data warehousing environment from two industry
leaders
• Enabling advanced database integration features, based on the extended usage of each
system‘s capabilities
• Leveraging SAS In-Database processing, by extending Teradata‘s capabilites with SAS
functionality been deployed into the database
• Driven by the ideas
• Moving SAS analytics to the data and not the data to the analytics
• Using the power of the Teradata database, where it increases speed and performance of the
analysis or the business process

Combining the strengths of both companies, our customers will see integrated offerings of SAS software
and Teradata.

We‘re combining the industry‘s best analytics with the industry leader in data warehousing to:
• Deliver value to the business users with improved and extended use of analytics,
• While improving the return on IT investment.

Ultimately, our customers will experience broader use and better performance.

Introduction and Overview to SAS and Teradata Integration Slide 1-6


SAS and Teradata Architecture
Traditional Architecture In-Database Architecture
SAS SAS SAS SAS
Model Model

M M
Modeling Modeling Model
Scoring Translation
Modeling
ADS

Modeling Scoring
ADS ADS
Analytical Scoring
Data Data
Preparation Preparation

Data Analytical Data Scoring Data


Data Extracts Preparation Preparation
Extracts In-database
Sandbox Production Data Scoring
Data
Data Warehouse SAS
Modeling Scoring
Warehouse Model
ADS ADS

Teradata Data
Warehouse
Model Development Model Deployment Model Development Model Deployment

The “Traditional Analytic Environment” illustrates a common process and architecture used by many
businesses to develop and deploy analytic technology. In this environment, data is extracted from a
variety of sources ranging from enterprise data warehouses to data marts across multiple lines of
business. This data is aggregated, transformed and integrated into a development analytic data set. This
is typically a large flat data structure, such as a flat file, containing hundreds of variables where each
row represents an observation. This data is used to build analytic models within a SAS environment.
Once the model is developed, tested and validated, it is then exported into the scoring environment
which is typically based on production or operational data. For scoring purposes, the data is again
extracted and prepared based on model requirements into the “Scoring ADS” (also sometimes referred
as the score table).This table typically has 10 to 20 variables but may also contain millions of records to
be scored. Scoring is done on the scoring server.

The “In-Database Analytic Environment,” offered through the SAS and Teradata Analytic Advantage
program, leverages the Teradata database for data processing and scoring and the SAS analytic platform
for model development. The Teradata EDW provides a single environment for both the development
environment and production or operational data. Optionally, a Teradata appliance can be used as a
separate development environment. External data can be loaded into an analytic sandbox that provides a
development environment that is logically segregated from the production environment in order to
preserve the integrity of the production level data, while allowing for the flexibility to load untested data
for development. The EDW data is explored, aggregated, transformed and derived to create the
development ADS without incurring unnecessary data movement. Once the data is prepared, a sample is
extracted to SAS Enterprise Miner for analytic modeling. SAS Enterprise Miner provides the breadth
and depth of analytic techniques for additional exploration, model specific transformation, analytic
modeling and testing required to complete the development process. Once completed, the model is
exported to the SAS Scoring Accelerator Publishing agent. The SAS Scoring Accelerator for Teradata

Introduction and Overview to SAS and Teradata Integration Slide 1-7


converts the SAS Enterprise Miner models into embedded functions inside the
Teradata database that are callable by SQL programs.
SAS In-Database Principles
• Reduce Data Movement
• Push data-intensive work to database
• Make use of database resources: disks and cpus
• Generate optimized SQL
• Preserve SAS User Experience
• SAS language skills
• SAS procedures experience
• SAS environment knowledge
• Maintain SAS Standards
• Numerical accuracy and precision
• Statistical integrity
• Software quality

Introduction and Overview to SAS and Teradata Integration Slide 1-8


The SAS and Teradata Partnership Delivers
• Solutions that enable companies to focus on higher value business opportunities
• Expands the use of analytics to increase competitive advantage

• A reduction in the complexity and cost for decision making


• Reduced data movement, redundancy and latency issues
• Improved data quality and data consistency
• Lower total cost of ownership and investment protection

Combining the strengths of both companies, our customers will see integrated offerings of SAS software
and Teradata.

We‘re combining the industry‘s best analytics with the industry leader in data warehousing to:
• Deliver value to the business users with improved and extended use of analytics,
• While improving the return on IT investment.

Ultimately, our customers will experience broader use and better performance.

Introduction and Overview to SAS and Teradata Integration Slide 1-9


SAS and Teradata Together
From the user perspective …..
• SAS users gain access to a formal RDBMS system that is
• Fast (massively parallelized architecture)
• Flexible (any query, any time)
• Comprehensive (detail view of all the data)
• Decision-support oriented (many analytical functions, high user concurrency)
• System platform capable of engaging in operational decision processes.

• SAS users keep (and Teradata users gain)


• Multi-platform, multi-operating system analytic and data processing system
• Largest set of analysis and reporting capabilities on market
• Ability to code in SAS Language or SQL with access to powerful SAS function set
• Flexible data source import and export capability

Introduction and Overview to SAS and Teradata Integration Slide 1-10


SAS and Teradata Integration – User Roles
Who should know about SAS and Teradata integration techniques?

SAS Administrator SAS Power User SAS BI Developer


Preparing Teradadata Database Ad Hoc Analysis and Reporting with Designing SAS Information Maps for
Connection and Libraries, SAS Enterprise Guide or SAS Web Reports and Dashboards
Designing User Administration and SAS Windowing Environment ( Foundation)
Security Integration Developing Stored Processes

SAS Data Integration SAS Analytics User


Developer SAS Process
Executing statistical analysis using SAS
Deloping Data Management Jobs
Enterprise Guide or
leveraging ELT-Processing using
SAS Program SAS windowing environment ( Foundation)
SAS Data Integration Studio or
SAS Windowing Environment (Foundation) Data Mining Analyst developing
Predictive models

Teradata

Teradata SQL

Teradata Administrator Teradata Power Users


Preparing Databases and Security Advanced Querying Users
for SAS Users Managing Sandbox-Databases

Introduction and Overview to SAS and Teradata Integration Slide 1-11


Module 1

• Section 1.1 – SAS and Teradata Overview


• Section 1.2 – SAS Basics
• Section 1.3 – Teradata Basics
• Section 1.4 – Introduction to Integration Techniques

Introduction and Overview to SAS and Teradata Integration Slide 1-12


SAS – The Platform for Business Analytics
The SAS platform for Business Analytics is a framework that enables organizations to address
their most critical business issues by providing data integration, analytics, reporting, and business
solutions. It provides the flexibility to start with the capabilities that you need now, and you can add
new functionality incrementally over time.

• There are a number of components within the framework. Starting at the bottom there’s the data
integration layer…then the analytics component…the reporting component…and on top solutions
that will make a real difference to your business.
• SAS Data Integration forms a solid data foundation with the capability for accessing enterprise data
access across systems and platforms. It provides integrated data quality, which is critical to providing
accurate, consistent information; and an interactive, visual data integration development environment
that enables collaboration and easy reusability across your organization, all with a single point of IT
administration.
• SAS Analytics provides an integrated environment for predictive and descriptive modeling, data
mining, text analytics, forecasting, optimization, simulation, experimental design and more.

Introduction and Overview to SAS and Teradata Integration Slide 1-13


SAS – The Platform for Business Analytics
The Business Analytics framework
 application environment for managing
and developing data integration flows,
enabling a variety of reporting interfaces
and designing and executing
sophisticated business analysis
 while these framework components get enriched with business
content into vertical and horizontal business solution
 they are at the same time based on SAS Foundation modules
 they generate SAS program code, to be executed in batch or
interactively by framework SAS servers and services.

Introduction and Overview to SAS and Teradata Integration Slide 1-14


What Is Foundation SAS?
Foundation SAS is the base for all Business Analytics framework
components and SAS solutions.
It is a highly flexible and integrated software environment that
can be used in virtually any setting to access, manipulate,
manage, store, analyze, and report on data.

The functionality of SAS is built around the four data-driven tasks common to virtually any application:
data access, data management, data analysis, and data presentation.

Introduction and Overview to SAS and Teradata Integration Slide 1-15


What Is Foundation SAS?
Foundation SAS provides the following:
• a graphical user interface for administering SAS tasks
• a highly flexible and extensible programming language
• a rich library of prewritten, ready-to-use SAS procedures
• the flexibility to run on all major operating environments such as Windows, UNIX, and z/OS
(OS/390)
• the access to virtually any data source such as DB2, Oracle, SYBASE, Teradata, SAP, and
Microsoft Excel
• the support for most widely used character encodings for globalization

The first bullet, graphical user interface, can reference Enterprise Guide or Management Console.

The bulleted items were pulled from the Base SAS fact sheet:
http://www.sas.com/technologies/bi/appdev/base/factsheet.pdf

Introduction and Overview to SAS and Teradata Integration Slide 1-16


SAS by Example – SAS Enterprise Guide
SAS Enterprise Guide provides a point-and-click interface for ad hoc data access, analysis, and
generation of reports. SAS code gets generated behind and executed by a SAS server. Nowadays
commonly used as a power user’s first client interface.

Introduction and Overview to SAS and Teradata Integration Slide 1-17


SAS by Example – Display Manager

The SAS windowing environment is


used to write, include, and submit SAS
programs and examine SAS results and
log information.

For the class, we will be using the SAS Windowing Environment ….

Introduction and Overview to SAS and Teradata Integration Slide 1-18


SAS by Example – Batch Mode
Batch mode is a method of running SAS programs in background on any operating system, where
you prepare a program file that contains SAS statements plus any necessary operating system
control statements (like a shell-scripts) and submit the file to the operating system.

Partial z/OS (OS/390) Example:


//jobname JOB accounting info,name … Appropriate JCL
// EXEC SAS is placed before
//SYSIN DD * SAS statements.
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile '.workshop.rawdata(newemps)' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;

Introduction and Overview to SAS and Teradata Integration Slide 1-19


SAS by Example – SAS Programs
A SAS program is a sequence of steps that the user submits for execution.

DATA steps are typically used to create SAS tables. The DATA
step provides a powerful and fast 4GL data management
programming language.
Raw
Data
DATA SAS PROC Report
Step table Step

SAS
table SAS procedure (PROC) steps are typically used to process SAS
tables (that is, generate reports and graphs, manage data, and
sort data). SAS procedures encapsulate distinct business analysis
approaches

Introduction and Overview to SAS and Teradata Integration Slide 1-20


SAS by Example – SAS Programs
This DATA step creates a temporary SAS data set named Work.NewSalesEmps by reading four
fields from a raw data file.

data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;

proc print data=work.NewSalesEmps;


run;

proc means data=work.NewSalesEmps;


class Job_Title;
var Salary;
run;

Let me briefly explain what each step is doing on the next three slides. I want you to have an
understanding of what the step is accomplishing, we aren’t discussing the syntax. This DATA step …
On this INFILE statement, we are referring to the raw data file NEWEMPS.CSV. How you refer to this
raw data file is dependent on your operating environment. In your course notes on page 2-4, you can see
how you refer to the file if you are using our classroom computers.

Introduction and Overview to SAS and Teradata Integration Slide 1-21


SAS by Example – SAS Programs
This PROC MEANS step creates a summary report of the Work.NewSalesEmps data set with statistics
for the variable Salary for each value of Job_Title.

data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;

proc print data=work.NewSalesEmps;


run;

proc means data=work.NewSalesEmps;


class Job_Title;
var Salary;
run;

Introduction and Overview to SAS and Teradata Integration Slide 1-22


SAS by Example – SAS Table / SAS Data Set
A SAS data set is a table file structure that SAS creates and processes.

Data Set Name WORK.NEWSALESEMPS


Engine V9
Created Fri, Feb 08, 2008 01:40 PM
Observations 71 Descriptor
Variables 4 Portion
...
First_Name Last_Name Job_Title Salary
$ 12 $ 18 $ 25 N 8
Satyakam Denny Sales Rep. II 26780
Monica Kletschkus Sales Rep. IV 30890 Data
Kevin Lyon Sales Rep. I 26955 Portion
Petrea Soltau Sales Rep. II 27440

Introduction and Overview to SAS and Teradata Integration Slide 1-23


SAS by Example – SAS Table / SAS Data Set
The descriptor portion of a SAS data set contains the following:
• general information about the SAS data set (such as data set name and number of
observations)
• variable information (such as name, type, and length)

Data Set Name WORK.NEWSALESEMPS


Engine V9
Created Fri, Feb 08, 2008 01:40 PM General
Observations 71 Information
Variables 4
...
First_Name Last_Name Job_Title Salary Variable
$ 12 $ 18 $ 25 N 8 Information

Introduction and Overview to SAS and Teradata Integration Slide 1-24


SAS by Example – SAS Table / SAS Data Set
The data portion of a SAS data set is a rectangular table of character and/or numeric data values.

Variable
First_Name Last_Name Job_Title Salary
names
Satyakam Denny Sales Rep. II 26780
Monica Kletschkus Sales Rep. IV 30890 Variable
Kevin Lyon Sales Rep. I 26955 values
Petrea Soltau Sales Rep. II 27440

Character values Numeric


values

Introduction and Overview to SAS and Teradata Integration Slide 1-25


SAS by Example – SAS Table / SAS Data Set
There are two types of variables:

Contain any value: letters, numbers, special


characters, and blanks.
Character Character values are stored with
a length of 1 to 32,767 bytes.
One byte equals one character.

Stored as floating point numbers


in 8 bytes of storage by default.
Numeric Eight bytes of floating point storage provide
space for 16 or 17 significant digits.
You are not restricted to 8 digits.

Introduction and Overview to SAS and Teradata Integration Slide 1-26


SAS by Example – SAS Table / SAS Data Set
SAS stores date values as numeric values.

01JAN1959 01JAN1960 01JAN1961

store
-365 0 366

display

01/01/1959 01/01/1960 01/01/1961

A SAS date value is stored as the number of days between January 1, 1960, and a specific date.

Introduction and Overview to SAS and Teradata Integration Slide 1-27


SAS by Example – SAS Table / SAS Data Set
A value must exist for every variable for each observation. Missing values are valid values in a
SAS data set.

First_Name Last_Name Job_Title Salary


Satyakam Denny Sales Rep. II 26780
Monica Kletschkus Sales Rep. IV .
Kevin Lyon Sales Rep. I 26955
Petrea Soltau 27440

A character A numeric
missing missing
value is value is
displayed as displayed as
a blank. a period.

Introduction and Overview to SAS and Teradata Integration Slide 1-28


SAS by Example – SAS Data Libraries
A SAS data library is a collection of SAS files from within a file system path, that are recognized as
a unit by SAS.

Directory-based System A SAS data library is a directory.

Windows Example: s:\workshop

UNIX Example: /users/userid

SAS provides a variety of library engines to provide access various types of data (SAS data, PC
file formats, database files, ….) with the SAS data library concept.

Introduction and Overview to SAS and Teradata Integration Slide 1-29


SAS by Example – SAS Data Libraries
Assigning a Libref
Regardless of which host operating system you use, you identify SAS data libraries by assigning a
library reference name (libref) to each library.

Files

Libref

When a SAS session starts, SAS automatically creates one temporary and at least one permanent SAS
data library that you can access.

Work – temporary library


Sasuser – permanent library

Introduction and Overview to SAS and Teradata Integration Slide 1-30


SAS by Example – SAS Data Libraries
You can use the LIBNAME statement to assign a library reference name (libref) to a SAS data
library.

General form of the LIBNAME statement:

LIBNAME libref 'SAS-data-library' <options>;

Rules for naming a libref:


• The name must be 8 characters or less.
• The name must begin with a letter or underscore.
• The remaining characters must be letters, numerals, or underscores.

Introduction and Overview to SAS and Teradata Integration Slide 1-31


SAS by Example – SAS Data Libraries

Windows libname orion 's:\workshop';

UNIX
libname orion '/users/userid';

z/OS (OS/390)
libname orion 'userid.workshop.sasdata';

For UNIX, the actual program uses a period between quotes to refer to the default location. For z/OS,
the userid is not used in the actual programs.

Introduction and Overview to SAS and Teradata Integration Slide 1-32


SAS by Example –Two-Level SAS Filenames
Every SAS file has a two-level name: libref.filename
The data set orion.sales is a
SAS file in the orion library.

The first name (libref)


refers to the library. Work

The second name (filename) Sasuser


refers to the file in the library.
Sales

orion

Introduction and Overview to SAS and Teradata Integration Slide 1-33


What Is SAS/ACCESS Interface Software?
SAS/ACCESS Interfaces enable connectivity between
Foundation SAS and other databases for read, write, and
update access to database tables.
• Most commonly, the access interface for a specific
database provides a SAS Library Engine, leveraging
the database systems native client interfaces.
• It supports fast multi-threaded extracts, support implicit
SQL-pass-thru generating native database SQL behind
the scenes and supports bulk loading utilities.

SAS/ACCESS interfaces are out-of-the-box solutions that provide enterprise data access and integration
between SAS and third-party databases.

Introduction and Overview to SAS and Teradata Integration Slide 1-34


SQL Support within SAS
SQL is supported in SAS and can be called through the procedure interface of PROC SQL
 It follows most of the guidelines set by the American National Standards Institute (ANSI
SQL2 [1992]).
 However, it is not fully compliant with the current ANSI Standard for SQL.
 Focusing on SQL as a query language.
 Omitting rollback, commits, ….. as SAS has no transactional focus of data processing
 Extending with many SAS features (functions, formats, operators, expressions, …)

For details search “SQL and ANSI” under support.sas.com


http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/a002473705.htm

Introduction and Overview to SAS and Teradata Integration Slide 1-35


Module 1

• Section 1.1 – SAS and Teradata Overview


• Section 1.2 – SAS Basics
• Section 1.3 – Teradata Basics
• Section 1.4 – Introduction to Integration Techniques

Introduction and Overview to SAS and Teradata Integration Slide 1-36


What Is Teradata?
The Teradata Database is a Relational Database Management System (RDBMS) that drives a
company’s data warehouse. The Teradata Database is an open system, compliant with industry
ANSI standards.

Data are stored in tables which are accessed using SQL (Structured Query Language). Teradata
SQL is ANSI compliant and there are extensions which enables further data manipulation.

Introduction and Overview to SAS and Teradata Integration Slide 1-37


Teradata Architecture – Parallelism and Scalability
Unconditional SQL-Parallelism
• Parallelism is automatic
• Cost based query optimizer is parallel aware
• Rewrites built-in
• No single threaded operation

• Each SQL-query step is completely “parallelized”:


Scans, Joins, Index access, Aggregation, Sort, Insert,
Update, Delete, ….

• Basis for the parallelism is the indexing and parallel


distribution of stored data

Introduction and Overview to SAS and Teradata Integration Slide 1-38


Teradata Architecture – Parallelism and Scalability

Database
Technology
scales in all
dimensions

Introduction and Overview to SAS and Teradata Integration Slide 1-39


Teradata Architecture – Overview
Teradata RDBMS has a shared nothing architecture:
 each unit of parallelism has its own memory,
 manages its own disk space,
 executes independently of other units

unit of parallelism

Introduction and Overview to SAS and Teradata Integration Slide 1-40


Teradata Architecture – Parsing Engine
The parsing engine is a virtual processor that talks to clients via BYNET technology. It is the
component that interprets the SQL request and sends the request to the AMPs, and then returns
the results to the client application via the Message Passing Layer.

Message Passing Layer (PDE and BYNET)

AMP AMP AMP AMP AMP AMP AMP AMP AMP

Node

PE – Parser Engine. This is the interfaces that talks to the client on one side, via the BYNET, and the
AMPs on the other side. The PE decomposes SQL statements into steps and returns the resultant sets to
the client application.

Parsing Engine is a component that interprets SQL requests and sends the request (along with the input
records and data) to the AMPs through BYNET technology.

The Parsing Engine interprets the SQL command and converts the data record from the host into an
AMP message.

Introduction and Overview to SAS and Teradata Integration Slide 1-41


Teradata Architecture – Message Passing Layer
The Message Passing Layer is the networking layer made up of hardware and software that
enables high-speed communication for transferring data inside and between the nodes. It is also
responsible for sorting the data that is passed from the AMPs.

Message Passing Layer (PDE and BYNET)

AMP AMP AMP AMP AMP AMP AMP AMP AMP

...
Node

The Message Passing Layer handles the internal communication of the Teradata RDBMS. It is a
combination of the Teradata PDE (Parallel Database Extensions) software, the BYNET software, and
the BYNET interconnect itself.

BYNET – is a networking layer. It can either be software (cheaper) or hardware and software (more
expensive). The BYNET does more than just networking it actually sorts the data that is passed from the
AMPs. Multi-node environments have more than one BYNET. This aids in redundancy and makes the
environment more robust.

Introduction and Overview to SAS and Teradata Integration Slide 1-42


Teradata Architecture – Access Module Process
The Access Module Process is the virtual processor responsible for reading and writing data.
AMPs use BYNET technology to receive messages and control database management functions
such as sorting, sub-setting, performing aggregations, and formatting data.

Message Passing Layer (PDE and BYNET)

AMP AMP AMP AMP AMP AMP AMP AMP AMP

Node

AMPs or Access Module Process is the virtual processor responsible for reading and writing data. It's
the heart, soul and work horse of Teradata. AMPs use a BYNET technology to receive messages. AMP's
control database management functions such as sorting, performing aggregations, and formatting of the
data.

AMP – is a vproc that controls access to the disk subsystem. In other words, it controls access to a
subset of the data. Each table in Teradata is spread out amongst all the AMPs. This is where Teradata
gets its parallelism.

• AMP (Access Module Process) is the unit of parallelism in Teradata. AMPs are designed to operate
on only one portion of the database so they must operate in parallel to accomplish their intended
results
• AMPs do all of the physical work associated with generating an answer set including, sorting,
aggregating, formatting and converting

The AMP formats the row and writes it to its associated disks.

Introduction and Overview to SAS and Teradata Integration Slide 1-43


Teradata Architecture – Node
A node is a computer that is part of a Teradata server. It is made up of hardware and software that
contains CPUs, system disks, memory, and adaptors. Adding more nodes increases the
performance of the Teradata environment.

Message Passing Layer (PDE and BYNET)

AMP AMP AMP AMP AMP AMP AMP AMP AMP

Node

...

A Node is a computer that is made up of hardware and software that contains CPU's, system disk,
memory and adapters and runs a copy of the Operating System and the Teradata software.

NODE – A node is a computer that participates as part of a Teradata Server. Adding nodes to an
environment increases the performance of the environment.

Introduction and Overview to SAS and Teradata Integration Slide 1-44


Teradata Architecture – Teradata Tables
Tables are stored in the Teradata relational database. Each table in Teradata is spread out among
all the AMPs via hashing of the primary index.

Message Passing Layer (PDE and BYNET)

AMP AMP AMP AMP AMP AMP AMP AMP AMP

Each table in Teradata is spread out amongst all the AMPs. This is where Teradata gets its parallelism.

The disk holds the row for subsequent access

Introduction and Overview to SAS and Teradata Integration Slide 1-45


Teradata Architecture – Table Data Distribution
Table Data is distributed across the AMPs using a specific index, the primary index.
The primary index is using a sophisticated hashing algorithm on the indexes column values to
determine which AMP stores a data row.
To retrieve a row, the primary index value is again passed to the hash algorithm to determine
which AMP owns the row.

AMP AMP AMP AMP AMP AMP

There are two kinds of Teradata Indexes

• Primary – determine how data is distributed among the AMPs


• Secondary – along with Primary indexes, are used to locate data rows without searching the whole
table. (Such a search is called a full table scan

Introduction and Overview to SAS and Teradata Integration Slide 1-46


Teradata Architecture – Primary Index and
Performance
When the Primary Index column(s) values for a table are “sufficiently unique”, the rows in that
table are evenly distributed across all AMPs.
Optimal performance results because even distribution allows the AMPs to work in parallel and
complete their processing about the same time.

AMP AMP AMP AMP AMP AMP

Evenness of distribution = parallelism = scalability

Introduction and Overview to SAS and Teradata Integration Slide 1-47


Teradata Architecture – Skewed Data Distribution
If data is not evenly distributed across all AMPs, the slowest AMP becomes a bottleneck. That is, a
given query or operation will only run as fast as the slowest AMP involved.

“Hot” Amp due to skewed data

AMP AMP AMP AMP AMP AMP

Introduction and Overview to SAS and Teradata Integration Slide 1-48


Teradata Basics – Accessing Data Without Indexes
In Teradata, you can access data from any column or combination of columns – whether those
columns represent an index or not.
• This implies that sometimes the Parsing Engine (PE) can not access the requested data
without having each AMP to scan all the data it controls, a full table scan is executed.
• Full table scans requires looking at all the rows of a table, access all AMPs in parallel.
Whereas full table scans are impractical, or even disallowed, on some other commercial
database systems, Teradata routinely permits such SQL requests.

AMP AMP AMP AMP AMP AMP

Introduction and Overview to SAS and Teradata Integration Slide 1-49


Teradata Architecture – Teradata NoPI Tables
• NoPI (No Primary Index) Tables have been introduced with Teradata 13.0

• A NoPI table does not have a primary index. The chief purpose of NoPI tables is to enhance the
performance of data loading operations.

• When a table has no primary index, its rows can be dispatched to any given AMP arbitrarily, so
the system can load data into a staging table faster and more efficiently.

Introduction and Overview to SAS and Teradata Integration Slide 1-50


Teradata Basics – System Space Organization
A Teradata Database System, is mainly structured into databases and users

 Teradata database is a collection of tables, views, macros, triggers, stored procedures, join
indexes, hash indexes, UDFs, space limits and access rights and used for administration and
security (comparable to schema in other systems).

 Teradata user is a collection of tables, views, macros, triggers, stored procedures, join indexes,
hash indexes, UDFs, and access rights. A User represents a logon point within the hierarchy and
Access Rights apply only to Users. Further, Users are granted rights to access other
database(s).

 In Teradata, a Database and a User are essentially the same,


with the difference that a user can log on to the RDBMS.

Other database(s) containing views and macros, which in turn are granted rights to access the corporate
production tables

Introduction and Overview to SAS and Teradata Integration Slide 1-51


Teradata Basics – Database Objects
The following objects can be stored in a database/user
• Tables – rows and columns of data
• Views – predefined subsets of existing tables
• Macros – predefined, stored SQL statements
• Triggers – SQL statements associated with a table
• Stored Procedures – program stored within Teradata
• User-Defined Function (UDF) – function (C program) to provide additional SQL functionality
• Vendor Defined Functions – UDF released by vendors (e.g. SAS) to provide additional specific
capabilities
• Join and Hash Indexes – separate index structures stored as objects within a database
• Permanent Journals – table used to store before and/or after images for recovery

These objects are created, maintained,


and deleted using Structured Query
Language (SQL)

Object definitions are stored in the DD/D


(Data Dictionary Directory)

Introduction and Overview to SAS and Teradata Integration Slide 1-52


Teradata Basics – Views as Teradata Objects
Views are an alternate way of organizing and presenting information. A view, like a table, has
rows and columns. However, the rows and columns of a view are not stored directly but are
derived from the rows and columns of tables whenever the view is referenced.

A view looks like a table, but has no data of its own, and therefore takes up no storage space
except for its definition.

Views are used to simplify query requests, to limit access to data, and to allow different users to
look at the same data from different perspectives

A view is a window that accesses selected portions of a database. Views can show parts of one
table (single-table view), more than one table (multi-table view), or a combination of tables and
other views.

Introduction and Overview to SAS and Teradata Integration Slide 1-53


Teradata Basics – Views as Teradata objects
In the Teradata Enterprise Data Warehouse, views leverage a key concept.

Gaining real-time access to derived information in the virtual data mart layers from the normalized
and physical EDW layer.

Teradata Enterprise Data Warehouse

User Analytical
Star Security and exploitation
Schemas
Sandbox Base Tables layer (views)
> virtual Data Mart layer
Normalized EDW physical
table layer
Staging and temporary
physical tables layer

Text sandbox + analytical base tables

Introduction and Overview to SAS and Teradata Integration Slide 1-54


Teradata Basics – System Space Types
A Database or User area consists of different space types

 Perm Space is a is the maximum amount of storage assigned to a user or database for
holding table rows, stored procedures, UDFs, and permanent journals.

 Spool Space is work space acquired automatically by the system and used for work space
and answer sets for intermediate and final results of Teradata SQL statements

 Temporary Space is temporary space acquired automatically by the system when user’s
make use of and materialize Teradata temporary tables.

Other database(s) containing views and macros, which in turn are granted rights to access the corporate
production tables

Introduction and Overview to SAS and Teradata Integration Slide 1-55


Module 1 – Introduction and Overview to SAS and
Teradata Integration

• Section 1.1 – SAS and Teradata Overview


• Section 1.2 – SAS Basics
• Section 1.3 – Teradata Basics
• Section 1.4 – Introduction to Integration Techniques

Introduction and Overview to SAS and Teradata Integration Slide 1-56


How Do SAS and Teradata Integrate?
SAS and Teradata integrate with each other through
• the SAS/ACCESS Interfaces and
• the Teradata Client interfaces and utilities.
• SAS/ACCESS interfaces enables you SAS Process

Libname TDLib Teradata ..;


• to read data from a Teradata system, and use those Proc SQL;
Select *
files as input data for SAS programs. from TDLIB.Table
Where ...;
quit;
• Further, if you have appropriate authority, it enables SAS/Access to Teradata
you to write and update to Teradata. Teradata Client

Teradata

Teradata SQL

Introduction and Overview to SAS and Teradata Integration Slide 1-57


How Do SAS and Teradata Integrate?
There are two SAS/ACCESS Interface options
• SAS/ACCESS to Teradata (optimal)
• Uses Teradata’s native Call Level Interface (CLI)
• Read and write access directly to Teradata
• Raw read performance is up to 15% faster than via ODBC
• Bulk extract is available with FASTEXPORT capability for large data volumes
• With FASTLOAD or MULTILOAD features enabled write operations to Teradata are much
faster than via ODBC
• SAS/ACCESS to ODBC (works, but is non-optimal)
• Available for all platforms supported by the Teradata ODBC driver
• This option takes 15% or more time in terms of raw read performance than the native
access interface
• Raw write performance is even worse.

Introduction and Overview to SAS and Teradata Integration Slide 1-58


SAS/ACCESS to Teradata – Overview
Mature SAS/ACCESS Interface
• Supporting a SAS Library Engine for Teradata
• Supports explicit and implicit SQL pass-thru
• Supports Teradata Load Utilities since SAS 8.2

SAS Process
With SAS 9.2 a uniquely extended SAS/ACCESS interface Proc Freq
• Extended support of the latest Data=TD.credit_data;
table state*credit;
Run;
Export- and Load Utilities (inkl.TPT) SAS/Access to Teradata

• Teradata Mode and Teradata Client

Query Banding Support* Teradata


Select
• Specific In-Database enhancements count(*), state, credit
from credit_data
– SAS Formats Library on Teradata group by state, credit

– SAS BASE Procedure SQL Push-Down*

*SAS 9.2 TS2 M2

FASTLOAD – Bulkload; fastest way to add many records


MULTILOAD – Fast update in batch mode
TPUMP – Fast load for trickle (few records at a time) updates
Upsert
TPT Option – Newer interface; API
• Must allow users to select as an option
• Engine falls back to utilities for older versions of Teradata
• Not all features are supported yet in the TPT layer

Introduction and Overview to SAS and Teradata Integration Slide 1-59


SAS/ACCESS to Teradata – Overview
SAS/ACCESS Libname Engine for Teradata
• Any SAS Program may reference data residing in Teradata
• SAS/ACCESS simulates SAS I/O for SAS procedures
• A SQL select statement is generated for retrieval, which may contain where-clauses, column
selections, sample options
• Options to leverage Teradata bulk load and export interfaces
PROC SQL Implicit Pass-through
• SAS PROC SQL is neutral SQL Code (ANSI SQL2[1992] w/extensions), which if run based
on Teradata tables, triggers translation to native Teradata SQL and pass-thru for execution
into the Teradata system
• Objective is to implicitly pass-thru as much SQL as possible
• May be generated PROC SQL code or manually coded
PROC SQL Explicit Pass-through
• Interface to code any Teradata SQL
• May be generated PROC SQL code or manually coded
• Commonly used in optimized Teradata installations

Uses SQL SELECT, UPDATE, DELETE statements, and leverages Teradata bulk load and export
interfaces

Introduction and Overview to SAS and Teradata Integration Slide 1-60


SAS/ACCESS Libname Engine for Teradata
SAS/ACCESS emulates SAS I/O using SQL requests
WHERE clause pushed down whenever possible, function mapping will occur in WHERE clause

Libname tera TERADATA server=tdpid user=uid password=pwd;

proc print data=tera.cdr_agg1;


where substr(cust_id,1,2) = "04";
run;

TERADATA_2: Executed: on connection 1


SELECT "CUST_ID","START_CALL_DT","usage_type","call_dur_sum" FROM "cdr_agg1"
WHERE (SUBSTR("CUST_ID", 1, 2) = '04' )

TERADATA: trget – rows to fetch: 720

Introduction and Overview to SAS and Teradata Integration Slide 1-61


PROC SQL Implicit Pass-through (“IP”)
Implicit PassThrough (IP) is the result of a collaborative effort between PROC SQL and
SAS/ACCESS
libname tera teradata server=tdpid user=uid password=pwd;
proc sql
select cust_id, start_call_dt,
STRIP(usage_type) as usage_type,
sum(call_dur) as call_dur_sum
from tera.cdr_usage
where start_call_dt between "01JAN09"d and "30JUN09“d
group by cust_id, start_call_dt, usage_type;
quit;

SQL_IP_TRACE: passed down query:


select "cdr_usage"."CUST_ID", "cdr_usage"."START_CALL_DT",
TRIM("cdr_usage"."USAGE_TYPE") as "usage_type",
SUM("cdr_usage"."CALL_DUR") as "call_dur_sum" from
"cdr_usage" where "cdr_usage"."START_CALL_DT“
between DATE'2009-01-01' and DATE'2009-06-30‘
group by
"cdr_usage"."CUST_ID", "cdr_usage"."START_CALL_DT", "cdr_usage"."USAGE_TYPE"

 Note the SAS STRIP() function gets mapped to Teradata TRIM() by PROC SQL IP and
SAS/ACCESS.

Introduction and Overview to SAS and Teradata Integration Slide 1-62


PROC SQL Explicit Pass-through
proc sql;
connect to teradata (server=tdpid, user=uid, password=pwd);
execute (
create table uid.cdr_agg1 as
(select cust_id, start_call_dt,
TRIM(BOTH from usage_type) as usage_type ,
sum(call_dur) as call_dur_sum
from prod.cdr_usage
where start_call_dt between DATE '2009-01-01' and DATE '2009-06-30'
group by 1,2,3
) with data primary index (cust_id)
) by teradata;
execute (commit) by teradata;
create table work.cdr_agg1 as select * from connection to teradata
( select * from uid.cdr_agg1 );
disconnect from teradata;
quit;

For hand-written or optimized processes that will access only Teradata


(“get the syntax past the SQL parser”).
SAS does not parse the SQL or map functions for explicit pass-through
requests – passes to Teradata directly.

Introduction and Overview to SAS and Teradata Integration Slide 1-63


SAS Data Integration – Teradata Integration
SAS Data Integration Studio is enabled for Teradata
• Developing ELT/ETLT-Processes leveraging automatic implicit and explicit SQL-PushDown
• Table- und Transformation PushDown-Indication

DBMS Push-down Indicator

DBMS Table Indicator


• Staging Tables as Teradata Temporary Tables
• Managing Teradata Load Jobs implicitly using Teradata Parallel Transporter utilities from
SAS

SAS Company Confidential – the information contained herein must not be revealed to third parties.

Certain SAS DI transformations which are conducive for in-database processing are evaluated for
further Teradata integration (e.g. data quality functions)

Introduction and Overview to SAS and Teradata Integration Slide 1-64


SAS Reporting – Teradata Integration
Dashboards
Web Reports Enterprise Guide
Stored Processes

%stpbegin; 2
proc freq data=TDLib.Table;
...; run;
proc SQL;
select * from TDLib.Table
Ad-hoc query and ..; quit;
analysis Proc req data=TDLib.Table;

Information Maps .. .; Run;


%stpend;

Mapping optimized Stored Process framework for


relational BI data models or developing non-standard reports
ROLAP cubes Teradata and non-standadard queries,
Implict SQL-Code generation incl. Explicit pass-thrus queries
and automatic Push-Down
SQL-PushDown, Star Schemas,
Join Indexes, ……

Introduction and Overview to SAS and Teradata Integration Slide 1-65


Similarities and Differences
While the SAS Libname Engines for Teradata offers using Teradata tables to be used like SAS
data sets, there are many difference, which should be considered

• size of tables and data volumes acted on

• column type and conversion

• availability of functions to be used

• naming of tables of columns

• …..

Introduction and Overview to SAS and Teradata Integration Slide 1-66


Teradata and SAS – Similar Concepts
Concept SAS Nomenclature Teradata Equivalent
Description for where data Library Database (user)
resides
Data Storage Structure Dataset Table
Contents of a column Variable Column
Record in the dataset Observation Row
Table Order Permanent -
(as table is stored) (depends on the query)
Sorting data Sort-Procedure, order by Order by
Combining datasets/tables Merge, Join, other, …. Join
How to aggregate data PROC Means/Summary, SQL w/Group By statement
PROC SQL, PROC
XYZ, Data Step
Programming language SAS Language (incl. SQL) SQL
Baseline date for date January 1, 1960 January 1, 1900
manipulation

The relational database and SAS environments use slightly different terms to describe very similar
concepts.

Throughout this course and during your day-to-day work activities, you will likely hear some of these
used interchangeably. It will be useful to keep these comparisons in mind as they relate to your work
activities.

Introduction and Overview to SAS and Teradata Integration Slide 1-67


Teradata and SAS – Naming Conventions
The data objects that you can name in SAS and Teradata include tables, views, columns, indexes,
and macros.

Teradata SAS
A name must start with a letter unless enclosed A name must start with a letter or underscore (_). A
in double quotation marks. name cannot be enclosed in double quotation marks.
A name must be from 1 to 30 chars. long. A name must be from 1 to 32 chars. long.
A name can contain the letters A through Z, the A name can contain the letters A through Z, the digits
digits 0 through 9, the underscore (_), $, and #. 0 through 9, and the underscore (_).
A name, even when enclosed in double A name is not case sensitive;
quotation marks, is not case sensitive; e.g. CUSTOMER is the same as customer.
A name cannot be a Teradata reserved word A name can be words such as COMMIT or SELECT,
such as COMMIT or SELECT. because SAS does not have reserved words.
The name must be unique between objects; a A name does not need to be unique between object
view and table in the same database cannot types, with the exception of a data table and view in
have the same name. the same SAS data library.

Introduction and Overview to SAS and Teradata Integration Slide 1-68


Teradata and SAS – Data Types
Every column in a Teradata table and a SAS data set has a column data type. There are several
categories of data types in Teradata. SAS only has two data types: character and numeric.

Teradata SAS

BYTE(n), VARBYTE(n) Character

CHAR(n), VARCHAR(n),
LONGCHAR
DATE, TIME(n), TIMESTAMP(n) Numeric

BYTEINIT, DECIMAL, FLOAT, When reading Teradata into


INTEGER, SMALLINT, BIGINT, SAS, Teradata data types are
… converted to a SAS data types.

Introduction and Overview to SAS and Teradata Integration Slide 1-69


Integrate SAS and Teradata General Guidelines
All activities initiated from SAS to run on Teradata are executed in SQL

• Some of these mechanisms are more optimal than others

• SAS users can have SAS generate the SQL (Implicit SQL Pass-Thru) via a SAS PROC SQL
statement

• SAS users can leverage all the power of Teradata by writing their own Teradata SQL (Explicit
SQL Pass-Thru)

• Once you have the power to access and process data from the Teradata database, the
workflow processes that you used prior to accessing Teradata should be carefully
reconsidered for optimal performance

• In order to fully use the power of Teradata think about how you currently perform your workflow
processes and consider the alternative suggestions provided in the following chapters.

Introduction and Overview to SAS and Teradata Integration Slide 1-70


Copyright © 2009 by SAS Institute Inc. and Teradata Corporation. All Rights Reserved.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA
and other countries. Teradata and the names of products and services of Teradata Corporation are registered trademarks or trademarks
of Teradata Corporation in the USA and other countries. ® indicates USA registration. Other brand and product names are registered
trademarks or trademarks of their respective companies.

Introduction and Overview to SAS and Teradata Integration Slide 1-71


Module 2
Querying Teradata Using SAS Libname and
Implicit SQL Pass-Through

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-1
Objectives
• Review the purpose of the LIBNAME statement.
• Submit a SAS/ACCESS LIBNAME statement to the Teradata DBMS.
• Discuss SAS/ACCESS LIBNAME statement options for Teradata.
• Define SQL implicit pass-through.
• Use the SASTRACE option to determine the SQL commands that are being passed from
SAS to Teradata.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-2
Module 2 – Querying Teradata Using SAS LIBNAME
and Implicit SQL Pass-Through

• Section 2.1 – Querying Teradata Using the SAS/Access


Libname Engine
• Section 2.2 – Accessing Teradata Using SAS Procedures and
DATA Step Programs
• Section 2.3 – Querying Teradata Using the Implicit SQL Pass-
Through Facility
• Section 2.4 – Using Options for Querying Teradata

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-3
Module 2 – Querying Teradata Using SAS LIBNAME
and Implicit SQL Pass-Through

• Section 2.1 – Querying Teradata Using the SAS/ACCESS


Libname Engine
• Section 2.2 – Accessing Teradata Using SAS Procedures and
DATA Step Programs
• Section 2.3 – Querying Teradata Using the Implicit SQL Pass-
Through Facility
• Section 2.4 – Using Options for Querying Teradata

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-4
The LIBNAME Statement (Review)
The LIBNAME statement does the following:
• establishes a library reference, or libref, which acts
as an alias, or nickname, to a collection of data sets (SAS data library).
• references data sets by a two-level name. The first level is the libref, and the second
level is the data set name.
• removes operating-system-specific references in the program code.
• enables data sets to be read and update.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-5
The LIBNAME Statement (Review)
You can use the LIBNAME statement to assign a libref to a SAS data library.

General form of the LIBNAME statement:

LIBNAME libref 'SAS-data-library ' <options>;

Rules for naming a libref:


• The name must be eight characters or less.
• The name must begin with a letter or underscore.
• The remaining characters must be letters, numbers, or underscores.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-6
Assigning a Libref (Review)
When you refer to the SAS file in your program, you use the two-level name:
libref.filename

libname mydata 'SAS-data-library';

Physical Data
Storage Location libref

SAS Data
Set A proc print data=mydata.a;
run;
SAS
Data
proc print data=mydata.b;
Set B
run;

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-7
The SAS/ACCESS LIBNAME Statement
The SAS/ACCESS LIBNAME statement does the following:
• establishes a libref, which acts as an alias, or nickname, to Teradata
• permits a Teradata DBMS table to be referenced by a two-level name, allowing the
Teradata table to be read as easily as a SAS data set
• enables the Teradata table to be updated if the proper authority already exists
• allows the use of the SAS/ACCESS LIBNAME statement options to specify how
Teradata objects are processed by SAS
• enables you to customize how to connect to Teradata.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-8
SAS/ACCESS LIBNAME Statement
General form of the SAS/ACCESS LIBNAME statement:

LIBNAME libref SAS/ACCESS-engine-name


<SAS/ACCESS-engine-connection-options>
<SAS/ACCESS-engine-LIBNAME-options>;

When you submit a SAS/ACCESS LIBNAME statement, a connection is made between a libref in SAS
and the database.

libname mytera RDBMS RDBMS-options;

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-9
The SAS/ACCESS LIBNAME Statement
The DBMS table is referenced using a two-level name, enabling the DBMS table to be read
as easily as a SAS data set.
libref.DBMS-Table-name

libname mytera RDBMS RDBMS-options;

Teradata

Database libref
Teradata
Table A
proc print data=mytera.a;
where Gender='M';
run;
Teradata proc print data=mytera.b;
Table B
where lastname='Smith';
run;

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-10
SAS/ACCESS LIBNAME Statement to Teradata
Connection Information for Teradata.
Database Engine

Teradata Server
libname teralib teradata
server=tera5500
user=edutest Teradata User ID
pw=edutest1
database=saseduc; Teradata
Password

Teradata Database

Note: If connection is not specified, the connection would


be established to a user‘s default database.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-11
LIBNAME Statement Connection Options
To avoid the usage of User and Password-credentials explicitly stated within the
SAS program, beginning with SAS 9.2, you may use instead the AUTHDOMAIN-
Libname option.
 With this new option used, the appropriate Teradata credentials get automatically
cached from the SAS Metadata User‘s account information at runtime.
 Thus specification of credentials in program code can be avoided.

LIBNAME mytera TERADATA SERVER=teraserver


AUTHDOMAIN=TERAAUTH ;

To the engine it appears that the


USER= and PASSWORD= options
were on the original LIBNAME
statement.

Using the AUTHDOMAIN= option you can retrieve USER= and PASSWORD= information from an
authentication domain stored in your SAS Metadata Server. To the engine, it appears that the USER=
and PASSWORD= options were specified on the LIBNAME statement.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-12
Reading Teradata Tables into SAS
Default Teradata Data Types conversion into SAS
Default SAS data type and SAS formats assigned to Teradata data types.
Teradata Data Type Default SAS Data Type – SAS Format
CHAR(n ) Character – $n (n<= 32,767)
CHAR(n ) Character – $32767.(n>32,767) 1
VARCHAR(n ) Character – $n (n<= 32,767)
VARCHAR(n ) Character – $32767.(n> 32,767) 1
LONG VARCHAR(n ) Character – $32767. 1
BYTE(n ) Character – $HEXn. (n<= 32,767)
BYTE(n )1 Character – $HEX32767.(n> 32,767)
VARBYTE(n ) Character – $HEXn. (n<= 32,767)
VARBYTE(n ) Character – $HEX32767.(n> 32,767)
INTEGER Numeric – 11.0
SMALLINT Numeric – 6.0
BYTEINT Numeric – 4.0
DECIMAL(n, m )2 Numeric – (n+2 ).(m )
FLOAT Numeric – none
DATE3 Numeric – DATE9.
TIME(n)4 Numeric – for n=0, TIME8. for n>0, TIME9+n.n
TIMESTAMP(n)4 Numeric – for n=0, DATETIME19. for n>0,
DATETIME20+n.n

TRIM(LEADING FROM c)
LEFT(c)
CHARACTER_LENGTH(TRIM(TRAILING FROM c)
LENGTH(c)
(v MOD d)
MOD(v,d)
TRIMN(c)
TRIM(TRAILING FROM c)

1. When reading Teradata data into SAS, DBMS columns that exceed 32,767 bytes are truncated. The
maximum size for a SAS character column is 32,767 bytes.
2. If the DECIMAL number is extremely large, SAS can lose precision. For details, see the topic
"Numeric Data".
3. See the topic "Date/Time Data" for how SAS/ACCESS handles dates that are outside the valid SAS
date range.
4. TIME and TIMESTAMP are supported for Teradata Version 2, Release 3 and later. The TIME with
TIMEZONE, TIMESTAMP with TIMEZONE, and INTERVAL types are presented as SAS
character strings, and thus are harder to use.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-13
Module 2 – Querying Teradata Using SAS Libname
and Implicit SQL Pass-Through

• Section 2.1 – Querying Teradata Using the SAS/ACCESS


Libname Engine
• Section 2.2 – Accessing Teradata Using SAS Procedures
and DATA Step Programs
• Section 2.3 – Querying Teradata Using the Implicit SQL Pass-
Through Facility
• Section 2.4 – Using Options for Querying Teradata

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-14
Using the SAS/ACCESS Libname Engine
The SAS/ACCESS engine writes SQL code on the user's behalf from this PROC PRINT step
and DATA step that is passed implicitly to Teradata.

How do you know what happens behind the scenes?

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-15
Examining SQL Implicit Pass-Through Code
Behind the scenes, the SAS/ACCESS engine writes SQL code that is passed implicitly to
Teradata, causing as much work to be done in the database as possible.
By default, there is no indication regarding the success or failure of the SAS/ACCESS
engine to generate SQL code that is passed to Teradata.
To determine the success or failure of implicit pass-through for a query, you must examine
the SQL that the SAS/ACCESS engine submits to the Teradata by using the SASTRACE=
SAS system option.

OPTIONS SASTRACE=',,,d' SASTRACELOC=SASLOG


NOSTSUFFIX;

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-16
The SASTRACE= SAS System Option
SASTRACE=',,,d' specifies that all SQL statements sent to the
DBMS are sent to the log.
',,,s' specifies that a summary of timing
information for calls made to the DBMS is
sent to the log.
SASTRACELOC= prints SASTRACE information to a specific
STDOUT| SASLOG| location.
FILE 'path-and- STDOUT – writes trace messages to the
filename' default output location for your operating
environment.
SASLOG – writes trace information to the
SAS log window.
FILE 'path-and-filename' – writes trace
information to a file.
NOSTSUFFIX limits the amount of information displayed
in the log.

In general the SQL Implicit Pass Through mechanism is a silent optimization. In part it is because it
cannot be guaranteed. If the optimization succeeds in passing a query (or parts of a query) directly to a
DBMS it does not indicate that it was successful. If it fails to pass a query through to a DBMS, the
query will be processing in SAS through use of the standard SAS engine interfaces. There is normally
no indication surfaced to the user for Implicit Passthrough failures or successes.

To determine the success or failure of Implicit Passthrough for a query one must examine the SQL that
the engine submits to the database. The primary mechanism for showing what is actually passed to an
underlying DBMS by a SAS/Access engine is to use the SASTRACE= SAS system option. Use of this
option can cause all of the SQL or internal API call information that a SAS/Access engine passes to the
underlying DBMS to be displayed in the SASLOG output. . It will also cause any DBMS return
codes/messages that are returned from the execution to be shown in the SASLOG output as well.
To enable this level of tracing, specify the following options in your SAS program code:

Option SASTRACE=",,,d" no$stsuffix SASTRACELOC=saslog;

To disable the tracing, just clear the SASTRACE specification as such:

Option SASTRACE=off;

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-17
The Fullstimer SAS System Option (Optional)
To print additional resource utilization information in the SAS log, specify the following
option:
 FULLSTIMER tracks usage of additional resources. This option is ignored unless
STIMER or MEMRPT is in effect. It can also be specified by the alias FULLSTATS
 MSGLEVEL=I tracks SAS index usage information.

OPTIONS FULLSTIMER
MSGLEVEL=I ;

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-18
SQL Implicit Pass-Through Code
The SASTRACE= option shows the SQL statements that were sent to Teradata. In this case,
the SAS/ACCESS engine writes SQL code on the user's behalf from this PROC PRINT step
that is passed implicitly to Teradata.

at02a01

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-19
SASTRACE Messages in the Log

Fullstimer resource
utilization info

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-20
Examining the Data Extract Size
 While SAS by default extracts all the rows and columns from a table before executing a
SAS program step
 SAS does not extract
– all columns where the variables for selection and analysis are explicitly specified
– All rows where the number of rows are constrained by the use of a WHERE clause
or other constraining options.
SAS Process

Libname TDLib Teradata ..;

... SAS Program Code

SAS/Access to Teradata

Teradata Client

Teradata
Size ?
Teradata SQL

Coding Tips to Reduce Data Extract Size


When accessing Teradata directly from a SAS PROC, without the use of special statements in the PROC
to limit data extraction, SAS will extract the entire raw data table to a temporary SAS dataset and then
execute the PROC. The extracted data is no longer available to SAS after the procedure has completed.

While SAS, by default extracts all the rows and columns from a table before executing a PROC, SAS
does not extract all columns where the variables for analysis are explicitly specified using, for example,
a WHERE, VAR, TABLES, or MODEL statement in the PROC.

Likewise, the number of rows extracted can be constrained by the use of a WHERE clause in a PROC
that: (a) SAS will recognize, and (b) send to Teradata in the SQL request that it generates. Recall from
Module 5, that the WHERE clause cannot contain functions that are not recognized by SAS/ACCESS to
Teradata.

Also, the use of an OBS= data set option with SAS v9 and higher will force SQL to be generated and
passed with the Teradata SAMPLE clause.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-21
Data Extract Size – Limiting Columns
SAS does not extract all columns where the variables for analysis are explicitly specified.
• Generically this can be achieved in any SAS program step making use of Data Set
Options to a Teradata Table reference
• Furthermore, SAS procedures offer specific statements specifying columns to be
analyzed like VAR, TABLES, or MODEL statement – and many more.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-22
Data Extract Size – Limiting Columns
The following data set options can be used to select columns to read or write to
tables.

DROP=column-1 column-2 lists the column names to exclude in


…column-n processing or from writing to
output tables.

KEEP= column-1 column-2 lists the column names to include in


…column-n processing or for writing to output
tables.

at02d09

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-23
Data Extract Size – Limiting Rows
SAS does not extract all rows where the number of rows are constrained
• by the use of a WHERE clause in SAS Procedure Steps or Data Steps.
• or other constraining options: for example making use of Data Set Options
to a Teradata Table reference like obs=# rows

Likewise, the number of rows extracted can be constrained by the use of a WHERE clause in a PROC
that: (a) SAS will recognize, and (b) send to Teradata in the SQL request that it generates. Recall from
Module 5, that the WHERE clause cannot contain functions that are not recognized by SAS/ACCESS to
Teradata.

Also, the use of an OBS= data set option with SAS v9 and higher will force SQL to be generated and
passed with the Teradata SAMPLE clause.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-24
Data Extract Size – Limiting Rows
The following data set options can be used to select columns to read or write to
tables.

OBS=n specifies the number of the last observation to


process. By default, OBS=MAX, which is the largest
integer represented by your operating system.

FIRSTOBS=n causes SAS to begin reading at the specified


observation or record. By default, FIRSTOBS=1.

Also, the use of an OBS= data set option with SAS v9 and higher will force SQL to be generated and
passed with the Teradata SAMPLE clause.

56 data _null_;
57 set TDOrion.order_fact (keep=Customer_ID Quantity
58 firstobs=50 obs=100);
59 run;

NOTE: There were 51 observations read from the data set TDORION.order_fact.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-25
Selected SAS/ACCESS Data Set Options
SAS users can rename column reference names temporarily for the time of a
selected task.
RENAME= enables you to rename
(old-col-name=new- columns in output data
col-name) sets.

at02d07b

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-26
Querying Teradata Tables Using
SAS Procedures and DATA Step
Programs
This demonstration illustrates accessing Teradata
tables using a LIBNAME statement access Teradata
tables.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-27
Exercise
This exercise reinforces the concepts discussed
previously.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-28
Module 2 – Querying Teradata Using SAS Libname
and Implicit SQL Pass-Through

• Section 2.1 – Querying Teradata Using the SAS/ACCESS


Libname Engine
• Section 2.2 – Accessing Teradata Using SAS Procedures and
DATA Step Programs
• Section 2.3 – Querying Teradata Using the Implicit SQL
Pass-Through Facility
• Section 2.4 – Using Options for Querying Teradata

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-29
SQL Implicit Pass-Through
• The use of the SAS/ACCESS LIBNAME statement not only enables
communication between Teradata and SAS
• SAS via the SAS/ACCESS engine, determines the SQL query that is implicitly
passed to Teradata on the user’s behalf and convert the user’s SAS code to
Teradata-specific SQL.

The purpose of Implicit Pass-Through is to have SAS through the SAS/ACCESS engine construct the
SQL in such a way that as much work as possible is performed in Teradata.

(vgl. SGF Paper 296-2008 bzw. 309-2009).

Improved textualization of these types of aliases:


• Symbol aliases
• Table aliases
• View aliases
• Inline view aliases

Improved textualization of deeply nested queries

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-30
Using PROC SQL Implicit Pass-Through
A PROC SQL query may be considered for implicit
pass-through in any of the following cases:
• The referenced tables in the query all use the same SAS/ACCESS engine.
• The query contains the SELECT DISTINCT keyword.
• The query contains an SQL aggregate function.
• The query uses a SAS language function(s) that is mapped to a DBMS function.
Selected functions:

AVG LOG MONTH SUM


COUNT LOG10 SECOND TAN
DAY LOWCASE SQRT TIMEPART
EXP MAX STRIP (TRIM) UPCASE
HOUR MIN SUBSTR YEAR

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-31
Using PROC SQL Implicit Pass-Through
• The query contains a GROUP BY clause.
• The query contains a HAVING clause.
• The query performs an SQL join.
• The query contains an ORDER BY clause.
• The query involves a SET operation other than OUTER UNION.
• The query with a WHERE clause contains a subquery.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-32
SQL Implicit Pass-Through
Implicit PassThrough (IP) is the result of a collaborative effort between PROC SQL and
SAS/ACCESS
Trigger for
implicit pass-
thru

Generated Teradata SQL

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-33
SQL Implicit Pass-Through
Implicit PassThrough (IP) is the result of a collaborative effort between PROC SQL and
SAS/ACCESS

Trigger for
implicit pass-
thru

Generated Teradata SQL

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-34
Rules for SQL Implicit Pass-Through
There are rules / code elements that disqualify the use of implicit pass-through.
• If you have multiple SAS libraries then the LIBNAME statement connect options must
match(USER=, PASSWORD=, ACCOUNT= and SERVER=)
Multiple Teradata Libnames with different database= Values Are OK
• Data set options used in PROC SQL
• Mixing with CONNECTION TO statements (explicit pass-through)
• ANSI MISS/NOMISS outer joins / NULL VALUE HANDLING
• Unmapped SAS functions
• One or more truncated comparisons
• Create VIEW statements
• Order BY-Differences

Remerging ?
INTO clause ?

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-35
Validating SQL Implicit Pass-Through
Using the SAS PROC SQL NOEXEC option in conjunction with tracing options
enables validation of the generated passed-through SQL code without/before
execution.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-36
Governing SQL Implicit Pass-Through
 While the SAS implicit pass-through implementation attempts to push down as
much SQL as possible, some customers might want to control the workload
issued on to their database system.
 SAS provides system-, libname- and procedure options to control the amount
and type of SQL being pushed to the database
• Examples are DIRECT_EXE, DIRECT_SQL, IPASSTHRU,
DBIDIRECTEXEC or SQLREDUCEPUT

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-37
Governing SQL Implicit Pass-Through
The SQL procedure option IPASSTHRU | NOIPASSTHRU controls the
enablement of SQL pass-thru to a database.
• If it is enabled per default.

GOOD:
COUNT(*)
function passed
to DBMS!

When SAS performs the count it must read the entire contents of a variable in order to count the rows.
In this example, COMPLAINTS_13_24_MTHS_CNT just happens to be the first column in the table. If
it contains NULL values then SAS may return a different count.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-38
Governing SQL Implicit Pass-Through
When the NOIPASSTHRU option is set, SQL pass-through is disabled.

BAD: COUNT(*)
function NOT
passed to
DBMS!

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-39
Providing Data to End-Users Using Views
 Like database systems, SAS provides access to data via the concept of view
(PROC SQL and DATA Step Views)
 While views are commonly used in Teradata system, SAS views might add
additional benefit
– Through hiding the libname connection options as part of the view
– Enrich a database table/view with SAS options like labels, formats etc.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-40
What is a PROC SQL View?
A PROC SQL view
• is a stored query
• contains no actual data
• can be derived from one or more tables or views
• extracts underlying data each time that it is used, and accesses the most
current data
• can be referenced in SAS programs in the same way as a data table
• cannot have the same name as a data table stored in the same SAS library.

Views are sometimes referred to as virtual tables because they are referenced in SAS programs in the
same manner as actual data tables, but they are not physical data tables. They contain no actual data but
instead store the instructions required to retrieve and present the data to which they refer.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-41
Creating a PROC SQL View
General form of the CREATE VIEW statement:

PROC SQL;
CREATE VIEW view-name AS
SELECT column-1, column-2,…column-n
FROM table-1<,table-n>
…;

The underlying tables in the view can be


Teradata tables.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-42
Creating a PROC SQL View

When you submit the code to create the view, you get a message in the log that the view has been
created. At this point, nothing is getting passed implicitly to Teradata.

Caution:
When you use this method to create a view, the LIBNAME statement must always be in effect during
your SAS session if you want to execute the PROC SQL view to retrieve the rows of data. The library
reference is hard-coded in the FROM clause in the view definition.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-43
Using a PROC SQL View
You can reference the PROC SQL view the same way that you reference a SAS data set.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-44
SASTRACE Information
Partial SAS Log

When the FREQ procedure code is executed, the instructions to retrieve the data are also executed. The
SAS/ACCESS engine passed the SQL statements to Teradata to retrieve the rows. The work was done
on the Teradata side and the rows were then returned to SAS to be displayed.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-45
Creating Views with Embedded LIBNAME Statements
You can create a PROC SQL view that embeds the LIBNAME statement with a
USING clause. The embedded LIBNAME statement has the following
characteristics:
• is defined in a USING clause within a PROC SQL view
• is assigned when the view begins executing
• can contain connection information
• uses the LIBNAME engine to pass joins to Teradata
• can store label, format, and alias information
• is de-assigned when the view stops executing

IG Note: The last bullet is a BIG point. When you issue the LIBNAME statement with the USING
clause it is de-assigned. Unlike the libref when it is assigned to Teradata outside of the PROC SQL step.
When a LIBNAME statement is executed in the SAS session, it stays assigned until the end of the SAS
session, thus staying connected to the DBMS.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-46
Creating a PROC SQL View with the
Embedded LIBNAME Statement
General form of the CREATE VIEW statement with the USING statement:

PROC SQL;
CREATE VIEW view-name AS
SELECT column-list
FROM Teradata-table-name
USING LIBNAME-statement;

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-47
Embedded LIBNAME Statements

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-48
Embedded LIBNAME Statements
Use the view in your program.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-49
SASTRACE Messages in the Log
The SASTRACE messages show that the view instructions were passed directly to Teradata
and selection happened on the original table.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-50
Dynamic SAS Programs – SAS Macro Language
The SAS Macro Facility
Using the macro language, you can write SAS programs that are dynamic, or
capable of self-modification.
Specifically, the macro language enables you to
• create and resolve macro variables anywhere in a SAS program
• write special programs (macros) that generate tailored SAS code.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-51
SAS Macro Language (Example)
Recall the work.CUST_CITY view we created

Generated Teradata
SQL

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-52
Creating Dynamic SAS Views (optional)
Frequently you want to dynamically pass part of the view query at runtime as a parameter.
You can use SAS Macro language parameters and the SYMGET function to resolve the SAS
macro variable at runtime.
Note: SQL does not perform
automatic data conversion.
You must use the INPUT
function to convert the macro
variable value to numeric if it is
compared to a numeric
variable.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-53
Creating Dynamic SAS Views (optional)
Continuing from the city_age example

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-54
Using SAS PROC SQL and the
SAS/ACCESS Libname Engine
for Teradata
This demonstration illustrates using SAS PROC SQL
for querying Teradata tables.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-55
Exercise
This exercise reinforces the concepts discussed
previously.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-56
Module 2 – Querying Teradata Using SAS Libname
and Implicit SQL Pass-Through

• Section 2.1 – Querying Teradata Using the SAS/ACCESS


Libname Engine
• Section 2.2 – Accessing Teradata Using SAS Procedures and
Data Step Programs
• Section 2.3 – Querying Teradata Using the Implicit SQL Pass-
Through Facility
• Section 2.4 – Common Query Use Cases and Using
Options for Querying Teradata

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-57
Options for Optimizing the Teradata Query
 A series of SAS options enables further optimization of queries passed to Teradata
 Usage of SAS specific code elements in where-clauses and SQL statements often breaks
the pass-through in the first place. A lot of these case can be addressed by
– enabling mapping of functions
– further optimization of where-clause
push-down making use of
reduce-put facility SAS Process

Libname TDLib Teradata ..;

... SAS Program Code

WHERE clause? SAS/Access to Teradata

Teradata Client
Constants ?
Teradata
Functions mapped
? Teradata SQL
Reduce Put?

Coding Tips to Reduce Data Extract Size


When accessing Teradata directly from a SAS PROC, without the use of special statements in the PROC
to limit data extraction, SAS will extract the entire raw data table to a temporary SAS dataset and then
execute the PROC. The extracted data is no longer available to SAS after the procedure has completed.

While SAS, by default extracts all the rows and columns from a table before executing a PROC, SAS
does not extract all columns where the variables for analysis are explicitly specified using, for example,
a WHERE, VAR, TABLES, or MODEL statement in the PROC.

Likewise, the number of rows extracted can be constrained by the use of a WHERE clause in a PROC
that: (a) SAS will recognize, and (b) send to Teradata in the SQL request that it generates. Recall from
Module 5, that the WHERE clause cannot contain functions that are not recognized by SAS/ACCESS to
Teradata.

Also, the use of an OBS= data set option with SAS v9 and higher will force SQL to be generated and
passed with the Teradata SAMPLE clause.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-58
Implicit Pass-Through – Temporal functions
To use a non-default temporal SAS format when reading Teradata tables or to
prevent date type mismatches, use the SASDATEFMT= option in these
circumstances:
 during input operations to convert DBMS date values to the correct SAS
DATE, TIME, or DATETIME values
 during output operations to convert SAS DATE, TIME, or DATETIME values to
the correct DBMS date values.
SASDATEFMT= changes the Teradata date values to a
(date-column="SAS-date- SAS date or datetime format.
format")

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-59
Implicit Pass-Through – Temporal functions
While temporal values in SAS are stored as numeric, it is possible to query (and calculate)
using its numeric values in SAS. SAS saves dates as count of days since 1st January 1960.
For example, 22nd February 1995 becomes
‘1995-02-22’ - '1960-01-01’ = 12836
For ambiguity reasons, it is recommended to use temporal literals in SAS which are
transformed into the correct database equivalent values:

DATE: ’22Fed1995’d;
TIME: ‘18:30:55.12345’t;
DATETIME: '22Fed1995:18:30:55.12345'dt;

Writing
This example creates a Teradata table and assigns the SAS TIME8. format to the TRXTIME0 column.
Teradata creates the TRXTIME0 column as the equivalent Teradata data type, TIME(0), with the value
of 12:30:55.

libname mylib teradata user=testuser password=testpass; data mylib.trxtimes; format trxtime0 time8.;
trxtime0 = '12:30:55't; run;

This example creates a Teradata column that specifies very precise time values. The format TIME(5) is
specified for the TRXTIME5 column.

Reading
When SAS reads this TIME(5) column, it assigns the equivalent SAS format TIME14.5.

libname mylib teradata user=testuser password=testpass; proc sql noerrorstop; connect to teradata
(user=testuser password=testpass); execute (create table trxtimes (trxtime5 time(5) )) by teradata;
execute (commit) by teradata; execute (insert into trxtimes values (cast('12:12:12' as time(5)) )) by
teradata; execute (commit) by teradata; quit;
/* You can print the value that is read with SAS/ACCESS. */
proc print data =mylib.trxtimes; run;

SAS might not preserve more than four digits of fractional precision for Teradata TIMESTAMP.

This next example creates a Teradata table and specifies a simple timestamp column with no digits of
precision. Teradata stores the value 2000-01-01 00:00:00. SAS assigns the default format

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-60
DATETIME19. to the TRSTAMP0 column generating the corresponding SAS value
of 01JAN2000:00:00:00.

proc sql noerrorstop; connect to teradata (user=testuser password=testpass); execute


(create table stamps (tstamp0 timestamp(0) )) by teradata; execute (commit) by
teradata; execute (insert into stamps values (cast('2000-01-01 00:00:00' as
timestamp(0)) )) by teradata; execute (commit) by teradata; quit;

This example creates a Teradata table and assigns the SAS format DATETIME23.3 to
the TSTAMP3 column, generating the value 13APR1961:12:30:55.123. Teradata
creates the TSTAMP3 column as the equivalent data type TIMESTAMP(3) with the
value 1961-04-13 12:30:55.123.

libname mylib teradata user=testuser password=testpass;


data mylib.stamps;
format tstamp3 datetime23.3; tstamp3 = '13apr1961:12:30:55.123'dt; run;

This next example illustrates how the SAS engine passes the literal value for
TIMESTAMP in a WHERE statement to Teradata for processing. Note that the value
is passed without being rounded or truncated so that Teradata can handle the rounding
or truncation during processing. This example would also work in a DATA step.

proc sql ; select * from trlib.flytime where col1 = '22Aug1995 12:30:00.557'dt ; quit;
Implicit Pass-Through – Temporal functions
Example: Accounts opened on 22nd Feb 1995

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-61
Implicit Pass-Through – Date/Time Functions
Options enable resolving of SAS DATE, TIME, DATETIME and TODAY fucntion calls into
constant values before the SQL Push-down code generation.
• Avoids potential conflicts with function resolution
• Alternative system option SQLCONSTDATETIME (enabled Default);

GOOD: Where
Clause Passed to
Teradata!

Replacing “put(today(),date9.)” on the right side of the equal sign has the potential to harm query
performance as there is a good possibility that the function call will prevent the statement from being
passed to the DBMS.

CONSTDATETIME | NOCONSTDATETIME specifies whether PROC SQL replaces references to


the DATE, TIME, DATETIME, and TODAY functions in a query with their equivalent constant values
before the query executes. Computing these values once ensures consistency of results when the
functions are used multiple times in a query or when the query executes the functions close to a date or
time boundary. PROC SQL evaluates these functions in a query each time it processes an observation.
Default value is CONSTDATETIME.

Interaction: If both the CONSTDATETIME option and the REDUCEPUT= option are specified,
PROC SQL replaces the DATE, TIME, DATETIME, and TODAY functions with their respective values
in order to determine the PUT function value before the query executes.

Tip: Alternatively, you can set the SQLCONSTDATETIME system option. If specified, the PROC
SQL CONSTDATETIME option takes precedence over the SQLCONSTDATETIME system option.

Notice that the WHERE clause is not being passed to the database. This means that the entire contents
of the DBMS table is being passed to SAS. SAS will go through all the data and test the where
condition. For large database tables this is very inefficient.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-62
Implicit Pass-Through – Function Mapping
The SAS/ACCESS SQL generation engine maps specific functions used in the WHERE
clause in any procedure/DATA step or used in PROC SQL programs into their database
equivalent.
 In SAS 9.1.3, this list of SAS functions automatically is static as the functions are
compiled into SAS/ACCESS for Teradata:

ABS LOWCASE > LOWER SUBSTR


AVG MAX TODAY > CURRENT_DATE
EXP MIN UPCASE > UPPER
LOG > LN SQRT SUM
LOG10 STRIP > TRIM COUNT

 With SAS 9.2 this default list has been enhanced, further the list is no longer static and can
be customized (see section in chapter 5)

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-63
Implicit Pass-Through – Function Mapping

Note: The SAS STRIP() function gets mapped to Teradata TRIM() by


PROC SQL IP and SAS/Access:

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-64
Implicit Pass-Through – SAS Formats
SAS formats and formatting-functions are commonly used by SAS user’s, however
especially when used in where-clause disqualify the SQL pass-through in the first place
 This can be addressed with the Reduce-PUT optimization or by leveraging SAS formats in
Teradata (s 4.3)

Note – No WHERE
clause pushed to
Teradata!

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-65
SAS by Example – SAS Formats
A SAS FORMAT can be described as an internal rule for mapping data values to
label values/formatted values.
SAS provides standard formats (currencies, date and time, …), but users can
define custom formats using the FORMAT procedure.
Using formats in SAS programs
• Provides a flexible way to dynamically apply different labeling-rules of data
values
• Reduces storage space
• Leverages a fast Lookup-technique and avoids joins or merges for catching
lookup values.
• Enables analyzing aggregated values by applying formats to detail values

We talked about numerous formats earlier …

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-66
SAS by Example – SAS Formats
Assigning Temporary Formats

p111d06

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-67
Implicit Pass-Through – SAS Formats
The Reduce PUT optimization resolves the PUT function before generating the database-
Query.
• options SQLREDUCEPUT=ALL | NONE | DMBS (default)

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-68
Differences in Query Behavior
 It is imperative to understand the differences in how SAS and Teradata store
and process data.
 Behavioral differences between how SAS and Teradata store and process data
are found in these three areas:
• NULL values and MISSING values
• Physical ordering of data, or lack thereof
• Native data types support

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-69
SAS Missing and Teradata NULL Values
Teradata (like other RDBMS) and SAS possess different mechanisms for
indicating the absence of a data value.
• In SAS data absence is represented using the MISSING value (. for numeric
and ‘ ‘ for character data), which is conceptually but not exactly analogous to a
relational database NULL value (‘NULL’ value).
• SAS/ACCESS APIs translates a SAS MISSING value into a relational database
NULL when inserting or updating a database table and database NULL will
conversely be translated into a SAS MISSING value when a database table is
queried from within SAS.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-70
SAS Missing and Teradata NULL Values
Some differences in processing missing or NULL values are
• Within SAS missing values are treated as valid values, hence even joins for key column
value matches of missing values is a valid operation

A B
1 vvv 1 aaa
1 aaa vvv 1 aaa vvv
www bbb . bbb www 3 ccc yyy
xxx 3 ccc . bbb xxx 3 ccc zzz
3 yyy 3 ccc yyy
3 zzz 3 ccc zzz

Within the database ‘ ‘ values are valid values, so when SAS passes on SQL to TD to look
for these values it changes the syntax to look for NULLs as well. This is demonstrated on the
next slide with the teralib.employee_pay table where there are “ “ in the last_name column

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-71
SAS Missing and Teradata NULL Values

The employee pay table

Same Output

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-72
Table Order and Sorting in SAS and Teradata
The row order in stored tables can be described as
 SAS tables are stored in the order the observations are written. By use of the
sort procedures, stored data sets can have a specific sort order.
 Teradata tables are not internally ordered, and cannot be internally ordered by
use of any utility. The order of rows is determined at query time.
Sorting data is a resource-intensive operation and should only be done when you
need the data in a specific order.
 Efficient sorts can maximize the performance of jobs.
 While it is usually more efficient to pass the sort to the relational database it will
depend on a specific use case if leveraging the database sort is appropriate.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-73
Table Order and Sorting – Teradata Sort
A database sort is automatically initiated through the SAS/ACCESS interface in the
following situations
 If the BY statement in the any DATA step (or any PROC step) or the ORDERBY clause
in PROC SQL is specified, an ORDERBY clause for that variable will automatically
generated.
 PROC SORT will also pass the sort to the database using the ORDERBY clause unless
the SORTPGM option is set to something other than BEST.

The ORDERBY clause will cause a sort of the data to occur in Teradata before the data set
or PROC step uses the data in SAS.
 Note, sorting by columns that are included in database indexes can be much faster than
sorting by columns that are not sorted. Therefore, if some of the columns to be sorted
by are indexed and others are not, sort first by the indexed columns.

Sorting in the Relational Database


Sorting data is a resource-intensive operation and should only be done when you need the data in a
specific order. Efficient sorts can maximize the performance of your jobs.

It is usually more efficient to pass the sort to the relational database.


• If the BY statement in the DATA step or the ORDERBY clause in PROC SQL is specified,
SAS/ACCESS will automatically generate an ORDERBY clause for that variable.
• PROC SORT will also pass the sort to the database using the ORDERBY clause unless the
SORTPGM option is set to something other than BEST.
• The ORDERBY clause will cause a sort of the data to occur on the relational database before the
data set or PROC step uses the data in SAS.

Sort stability, meaning that the ordering of the observations in the BY statement is exactly the same
every time the sort is run, is not guaranteed when you query data stored in a relational database.
Because the data in the relational database might not be static data, the same query issued at different
times might return the data in different order.

If you require sort stability of the data, sort on a unique key, or place your database data into a SAS data
set and then sort it.

Note: Do not use PROC SORT to sort data from SAS back to the relational database. Doing so has no
effect on the order of the data in the database and only impedes performance.

Also, sorting by columns that are included in database indexes can be much faster than sorting by
columns that are not sorted. Therefore, if some of the columns to be sorted by are indexed and others are

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-74
not, sort first by the indexed columns.
Table Order and Sorting – Teradata Sort
Example – PROC TABULATE procedure BY processing

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-75
Table Order and Sorting – Sort Stability
Some SAS analytics require that data set observations be retrieved in the same
order when a data set is read multiple times. An analysis delivers incorrect
results if observation order varies.
 An example is the SAS/ETS time-series analysis.
A typical first step in time series is sorting source data on a temporal (for
example, date or datetime) column, creating a new SAS data set with old
dates first and new dates last.
 Ensuring analyses can read and re-read the sorted data set with no BY
clause, because SAS retrieves data set observations in the order that they
were originally written.
 How can this case be addressed with database data?

Another example is the data set ‘MODIFY’ statement. For a non-unique BY key, the MODIFY
statement updates the first row retrieved with a matching key. However, you cannot predict the first row
when a database retrieves multiple rows with the same key. Therefore, you will probably generate
erroneous results by applying the MODIFY to a database table with a non-unique BY key.

For a simple example of unpredictable ordering with a non-unique BY key and adding another BY
variable to ensure predictability, see “Using a BY Clause to Order Query Results” in the SAS/ACCESS
for Teradata online documentation:

(http://support.sas.com/91doc/getDoc/acreldb.hlp/a001399962.htm#a001399973).

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-76
Table Order and Sorting – Sort Stability
A static, repeatable order of rows for multiple executions of the same database
queries may not be guaranteed, depending on the sort method used
 Sort stability is not guaranteed when you query data stored in a database
(explicit order by-query or implicit using a SAS BY-statement).
 Further the data in the relational database might not be static data, due to
updates to the data within the database
If you require sort stability of the data, sort on a unique key, or place your
database data into a SAS data set and then sort it. Because the key is unique,
there are no rows with identical keys whose retrieval order could vary for each
read.

Database tables do not retrieve rows in the order in which they were originally written. Without ORDER
BY asserted in the database SQL, rows are returned in random order each time a table is read.

Sorting in the Relational Database


Sorting data is a resource-intensive operation and should only be done when you need the data in a
specific order. Efficient sorts can maximize the performance of your jobs.

It is usually more efficient to pass the sort to the relational database.


• If the BY statement in the DATA step or the ORDERBY clause in PROC SQL is specified,
SAS/ACCESS will automatically generate an ORDERBY clause for that variable.
• PROC SORT will also pass the sort to the database using the ORDERBY clause unless the
SORTPGM option is set to something other than BEST.
• The ORDERBY clause will cause a sort of the data to occur on the relational database before the data
set or PROC step uses the data in SAS.

Sort stability, meaning that the ordering of the observations in the BY statement is exactly the same
every time the sort is run, is not guaranteed when you query data stored in a relational database.
Because the data in the relational database might not be static data, the same query issued at different
times might return the data in different order.

If you require sort stability of the data, sort on a unique key, or place your database data into a SAS data
set and then sort it.

Note: Do not use PROC SORT to sort data from SAS back to the relational database. Doing so has no
effect on the order of the data in the database and only impedes performance.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-77
Also, sorting by columns that are included in database indexes can be much faster
than sorting by columns that are not sorted. Therefore, if some of the columns to be
sorted by are indexed and others are not, sort first by the indexed columns.
Table Order and Sorting – Spooling (Optional)
Multiple Pass-Processing occurs when a SAS procedure requests-that data be
made available for multiple pass reading, in most case a static repeatable order of
rows is required.
 In the context of tables residing in a database, to meet the data requirements for
multiple pass processing, SAS creates temporary spool files containing the data
extracts.
 The SAS Option SPOOL=YES|NO controls the use of spooling
– NO – requires SAS/ACCESS to issue the identical SELECT statement to
Teradata twice.
– YES – spools all rows to a temporary SAS file on the first pass of the data. On
subsequent passes, SAS will read the row data from the spool file.

NOTE: When Two-Pass Processing occurs, disk space and resource requirements may increase.

(ie. PROC SORT)

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-78
Making a SAS Copy of Teradata Data
There might be cases, where making a local temporary SAS copy of Teradata data
will be a good solution
 When a static order of rows is required for repeated access in the same
analysis
 When multi-pass processing occurs is required but shows to be inefficient.
 When the same table or view has to be referenced repeatedly in a dynamic
environment, it may be preferable to create a static SAS copy of the data.

Note: SAS is very efficient in handling staging or temporary tables and for
memory- or compute-intensive processing.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-79
Making a SAS Copy of Teradata Data
Comparing Direct Access versus Copying Data

Request

Data

Request
Teradata
Table Data
Request

Data

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-80
Making a SAS Copy of Teradata Data
Comparing Direct Access versus Copying Data

Request

Data

Request
Teradata
Table Data

Data change since


previous step

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-81
Making a SAS Copy of Teradata Data
Comparing Direct Access versus Copying Data

A subset of the DBMS table data can be copied


to the SAS data set to reduce processing.
It can be more efficient to read a
SAS data set than to access a Data
DBMS table repeatedly.

Data
SAS Data Set
Subset of
Teradata
table

Teradata Table
Data

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-82
Linguistic Order and Sorting (Optional)
Although there are recognized standards for collation, the way people look at
data in "sorted" order differs a lot.
• German collation is different from French, and a Danish one is again
different from both—just to name a few.
• Even within a language community, there can be subtle differences: a
German phone book sort is different from a dictionary sort, traditional
Spanish sort order is different from the modern one, and so on.
• Users of languages based on alphabetic writing systems that make a
distinction between upper- and lowercase letters, might want to sort
uppercase before lowercase or vice versa or do a case insensitive sort.

http://support.sas.com/resources/papers/linguistic_collation.pdf
Sorting is often called "alphabetization," though collation is not limited to ordering letters of an
alphabet. For non-alphabetic writing systems as used in Asian languages, collation can be either
phonetic or based on the number of pen strokes or simply on the position of the characters within an
encoding (for example, Japanese kanji are usually sorted in the order of their Shift-JIS codes).

Nevertheless, people are free to choose: For example, most Japanese customers expect the Shift-JIS
order instead of the UCA.

Invocation of linguistic collation with PROC SORT is quite simple. The only requirement is the
specification of LINGUISTIC as the value to the SORTSEQ procedure option:

proc sort data=foo SORTSEQ=LINGUISTIC; *LIGUISTIC=UCA; by x; run;

Synonymously, one can specify SORTSEQ=UCA. This causes the SORT procedure to collate
linguistically, in accordance with the current system LOCALE setting. The collating sequence used is
the default provided by the ICU for the given locale. Options that modify the collating sequence can be
specified in parentheses following the LINGUISTIC or UCA keywords. Generally, it is not necessary to
specify option settings because the ICU associates option defaults with the various languages and
locales. PROC SORT currently allows only a subset of the ICU options to be specified. These options
include STRENGTH, CASE_FIRST, COLLATION, and NUMERIC_COLLATION. In addition, a
LOCALE option is available to instruct SORT to use a collating sequence that is associated with a
locale other than the current locale.

CLASS processing does not order or group data linguistically nor is it sensitive to an existing linguistic

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-83
collation sequence of a data set. CLASS processing can produce results that are
different from those obtained using BY processing because BY processing is now
sensitive to collating sequences.

For example, with the SUMMARY procedure, class processing is normally performed
by grouping formatted values of a class variable (or raw values, if the
GROUPINTERNAL option is specified). If a data set is sorted, the ORDER=DATA
option can be used to preserve the order in which class levels are output for the
NWAY type. However, if the data is sorted linguistically, classification boundaries are
still determined by a binary difference in the formatted (or unformatted) class variable
values. For example, if a case-insensitive linguistic collating sequence was used (that
is STRENGTH=2), changes in character case still denotes a new level in the NWAY
type.
Linguistic Order and Sorting (Optional)
To implement linguistic collation, SAS has adopted the International Components for
Unicode (ICU). The ICU and its implementation of the Unicode Collation Algorithm (UCA)
have become a de facto standard.
 Linguistic Sorts within SAS

Alice Adam
John Alice
Adam Ethan
Ethan John
Zack Zack

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-84
Implicit Pass-Through – Considerations
To optimize the SAS implicit pass-thru behavior, be aware of the discussed initial
integration approaches:
 How large is the extract data size?
 How often will the data be accessed?
 For what purpose is the data being accessed?
 Which options can be used to optimize the approach?
 …..

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-85
Using SAS Options for More
Specific Teradata Query Use
Cases
This demonstration illustrates how to use SAS options
to optimize Teradata query behavior from SAS.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-86
Exercise
This exercise reinforces the concepts discussed
previously.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-87
Copyright © 2009 by SAS Institute Inc. and Teradata Corporation. All Rights Reserved.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA
and other countries. Teradata and the names of products and services of Teradata Corporation are registered trademarks or trademarks
of Teradata Corporation in the USA and other countries. ® indicates USA registration. Other brand and product names are registered
trademarks or trademarks of their respective companies.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-88
Rules for SQL Implicit Pass-Through
SQL function mapping is only one of many factors that influence whether SQL is passed
down to the DBMS…
• “SAS-isms” – SAS SQL extensions not supported by DBMS
> Will cause SAS to retrieve all rows from the DBMS
> For example the “?” (CONTAINS) condition:

proc sql;
select count(*) from cdr_usage where usage_type ? “W”;
quit;
TERADATA: trforc: COMMIT WORK
ERROR: Teradata prepare: Syntax error, expected
something like an 'IN' keyword between the word
'USAGE_TYPE' and the 'contains' keyword. SQL
statement was: select COUNT(*) from "cdr_usage"
where "cdr_usage"."USAGE_TYPE" contains 'W'.

Querying Teradata Using SAS Libname and Implicit SQL Pass-Through Slide 2-89
Module 3
Querying Teradata Using SAS Explicit SQL
Pass-Through

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-1


Module 3 – Querying Teradata using SAS Explicit
SQL Pass-Through

• Section 3.1 – Creating and using SQL procedure explicit pass-


through queries
• Section 3.2 – Using SQL procedure explicit pass-through to
execute non-query statements

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-2


Module 3 – Querying Teradata using SAS Explicit
SQL Pass-Through

• Section 3.1 – Creating and using SQL procedure explicit


pass-through queries
• Section 3.2 – Using SQL procedure explicit pass-through to
execute non-query statements

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-3


SAS Explicit SQL Pass-Through – Overview
What is Explicit Pass-Through?
• Teradata-specific SQL is passed by SAS directly to the Teradata system
through the SAS/ACCESS engine
• Teradata will parse and optimize the query, there is no SAS intervention
such as error validation or others
• Explicit Pass-Through is leveraged using SAS PROC SQL

Further and detailed documentation is available (e.g. Documentation for SAS/ACCESS 9.2)

Teradata-specific SQL is passed by SAS directly to the Teradata system through the SAS/ACCESS
engine.

As opposed to Implicit SQL Pass-Thru, SAS does not examine, parse, translate, or manipulate this SQL
prior to passing it on to Teradata for execution.

SAS users pass Teradata-specific SQL directly to Teradata DBMS which:


• Explicit Pass-Trough is performed using SAS PROC SQL
• Allows users to leverage full power of Teradata’s massively parallel processing capability and the
wide range of mathematical, statistical, and data reorganization functions
• Requires knowledge of Teradata SQL but performing basic operations is relatively easy

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-4


SAS Explicit SQL Pass-Through – Overview
Advantages of Explicit Pass-Through

Explicit Pass-Through allows users


• to leverage full power of Teradata’s massively parallel processing capability
and the wide range of mathematical, statistical, and data reorganization
functions available, via usage of native Teradata SQL
• to combine SAS programming features* and Teradata-specific features in your
query.
• to save the results of a query as a SAS data set or a PROC SQL pass-
through view

*SAS Macro language enables to create dynamic Teradata SQL queries.

EP CANNOT manipulate data across different Teradata servers

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-5


SAS Explicit SQL Pass-Through – Overview
Two types of SQL components that you can submit to Teradata are as follows:
• SELECT statements, which produce some output to SAS software (SQL
procedure pass-through queries)
• EXECUTE statements, which perform all other non-query SQL statements
that do not produce output (for example, GRANT, UPDATE, COMMIT, or
ROLLBACK)

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-6


SAS Explicit SQL Pass-Through – Overview

Native
Foundation SAS Query
Request

SAS/ACCESS to Teradata
Connection
Execute
proc sql... Native

Engine
RESULTS Query
Engine
Teradata

Data

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-7


Explicit Pass-Through – Using Queries
In a SAS PROC SQL Step, you can specify the following:
• to which Teradata database to connect
• connection information such as the Teradata password or database
• a SAS SELECT statement that embraces a native Teradata-SQL select-
statement enclosed in parentheses
• a statement to close the connection to Teradata

Select SAS SELECT-EXPRESSION


from connection to Teradata
(TERADATA-QUERY EXPRESSION);

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-8


Explicit Pass-Through – Using Queries
Relevant PROC SQL Statements

Statement or Component Purpose

CONNECT statement establishes a connection to


Teradata.

CONNECTION TO retrieves data directly from


component (in the FROM Teradata.
clause of the PROC SQL
SELECT statement)

DISCONNECT statement terminates the connection to


Teradata.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-9


Explicit Pass-Through – Connecting to Teradata
The CONNECT statement
• specifies Teradata as the name of the database
• provides other Teradata-specific connection information
• establishes the connection to Teradata
• specifies an alias for the connection (optional).

General form of the CONNECT statement:

CONNECT TO TERADATA <AS alias> (options);

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-10


Explicit Pass-Through – Connecting to Teradata
Example: Connect to a Teradata server.

USER= provides the Teradata user name.


PW= provides the Teradata password associated
with the Teradata user name.
AUTHDOMAIN= Can be used to avoid User and password
credentials in the program.
(See previous chapter for details)
SERVER= provides the Teradata server name as
defined in the HOSTS file.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-11


Explicit Pass-Through – SELECT Statement
The PROC SQL SELECT statement consists of the following:
 a PROC SQL SAS SELECT expression
 a FROM CONNECTION TO component
 a native Teradata-SQL code to be passed to Teradata

General form of the SELECT statement:

SELECT select-expression
FROM CONNECTION TO TERADATA | alias
(Teradata-query)
AS alias2 (col-name, col-name,...);

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-12


Explicit Pass-Through – SELECT Statement
The query that is passed to Teradata follows these rules:
 uses Teradata-specific SQL
 must reference Teradata column names
 is enclosed in parentheses

Note: The embedded Teradata query is not validated


by SAS but by Teradata only. Teradata sends any
error message to the SAS log.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-13


Explicit Pass-Through – Closing the Connection
You can close the connection to Teradata by using one of the following methods:
 submitting a DISCONNECT statement
 terminating the SQL procedure, for example, with
a QUIT statement

General form of the DISCONNECT statement:

DISCONNECT FROM TERADATA | alias;

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-14


Explicit Pass-Through – Closing the Connection

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-15


Explicit Pass-Through – Query Output in SAS
Partial PROC SQL Output

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-16


Explicit Pass-Through – Using SAS Features
 While the inner SELECT expression the FROM CONNECTION TO clause
represents PROC SQL’s request for the data returned by the explicit pass-through
query from Teradata,
 You can use SAS features such as SAS functions, titles, footnotes, aliases,
formats, and labels in the outer SAS SELECT statement of your SQL procedure
pass-through query.

Select SELECT-EXPRESSION (SAS Features)


from connection to Teradata
(TERADATA-QUERY EXPRESSION);

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-17


Explicit Pass-Through – Using SAS Features

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-18


Explicit Pass-Through – Using SAS Features
Partial PROC SQL Output

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-19


Explicit Pass-Through – Using Teradata Features
Examples for use cases leveraging Teradata features
 Use of the Teradata Explain function to show the Teradata query execution plan
 Queries using Teradata functions, User- or Vendor defined functions, Teradata
Stored Procedures, Teradata Macros, …
 Queries using SQL-features, currently not supported by SAS like “Ordered
Analytical SQL Functions”
 Extracting samples leveraging Teradata’s sampling function capabilities

Note: Examples will be discussed in more details


in later sections.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-20


Explicit Pass-Through – Using Teradata Features
Examples for use cases leveraging Teradata features
 Using Ordered Analytical Functions in Teradata

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-21


Using Teradata Features – Teradata Explain
Teradata Explain functionality
• Teradata SQL provides the function EXPLAIN for obtaining estimates of the impact of a
query on Teradata’s system resources.
• When you run an EXPLAIN, the query will not be executed but will be evaluated by
Teradata’s query planner and query optimizer.
• To run an EXPLAIN type the keyword EXPLAIN at the beginning of the SQL that will be
submitted to Teradata and the query plan is returned in your SAS Output window.

Note: The query plan needs to be interpreted and this is not straightforward but nevertheless it
gives useful information.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-22


Using Teradata Features – Teradata Explain
After running the following code, you should see the Teradata query plan, with anticipated
resource usage, in your SAS Output Window.
 Use EXPLAIN liberally in your Explicit Pass-Through syntax, especially when submitting new
queries.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-23


Explicit Pass-Through – SAS SQL Views
You can create a SAS SQL view that stores a explicit pass-through query. SAS
Views with embedded Teradata queries show the following characteristics, they
 contain no data (similarly to Teradata views)

 permits transparent access to Teradata tables and view using custom


queries
 can be used just like a SAS data set

 store Teradata connection information

 hide connection information from view-consumers.

Note: A SAS View IS NOT a Teradata View.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-24


Explicit Pass-Through – SAS SQL Views
General form of creating a SAS view using the SQL Procedure Pass-Through
Facility:

CREATE VIEW view-name AS


SELECT select-expression
FROM CONNECTION TO TERADATA | alias
(Teradata-SQL-query)
AS alias2 (col-name, col-name,...);

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-25


Explicit Pass-Through – Creating a SAS Views

Avoid using the ORDER BY clause when creating a view.


The ORDER BY clause will be executed each time the view is used.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-26


Explicit Pass-Through – Using a SAS View
Example: Use the SAS data view from the previous example to list only those
employees hired after January 1, 1990.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-27


Dynamic SAS Programs – SAS Macro Language
Enabling dynamic SAS programs and Teradata queries
 SAS Macro variables allow substitution of differing character values,
enabling code to be dynamic rather than static.
 SAS Macros enable you to substitute text in a program (and to do many
other things). A SAS program can contain any number of macros, and you
can invoke a macro any number of times in a single program.
 SAS Macro Language and SAS Macro variables can be used in conjunction
with Explicit Pass Through to enable dynamic, flexible and reusable
Teradata query programs for all kinds of use cases.

Using the macro language, you can write SAS programs that are dynamic, or capable of self-
modification.

Specifically, the macro language enables you to


• create and resolve macro variables anywhere in a SAS program
• write special programs (macros) that generate tailored SAS code.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-28


Explicit Pass-Through – Using Macro Variables
Macro variable values have the following characteristics:
 consist of all characters between the equal sign
and the semicolon
 does not include any leading blanks
 is considered to be a alwaysmcharacter string
 does not need to be enclosed in quotation marks, unless you want the quotes to be
part of the stored string

SAS macro variables are defined using the %LET statement.


%LET macro_variable=macro_value;

SAS macro variables are resolved in code using the &.


<some code> &macro_variable ….

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-29


Explicit Pass-Through – Using Macro Variables
Macro variables are preceded by an ampersand (&) when referenced.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-30


Explicit Pass-Through – Using Macro Variables
Partial SAS Log

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-31


Querying Teradata Using
SAS Explicit SQL Pass-Through
This demonstration illustrates how to execute native
Teradata SQL queries using explicit SQL pass-through
programs.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-32


Exercise
This exercise reinforces the concepts discussed
previously.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-33


Module 3 – Querying Teradata using SAS Explicit
SQL Pass-Through

• Section 3.1 – Creating and using SQL procedure explicit pass-


through queries
• Section 3.2 – Using SQL procedure explicit pass-through
to execute non-query statements

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-34


Explicit Pass-Through – Execute Requests
The SAS SQL Procedure Pass-Through Facility supports the EXECUTE
statement to submit non-query statements to Teradata for execution.
• Common tasks the EXECUTE statement is used for
̵ create tables, views, or indexes
̵ alter table structure
̵ insert, update, and delete data rows
̵ drop tables, views, or indexes
̵ grant user privileges

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-35


Explicit Pass-Through – Execute Requests
General form of the EXECUTE statement:
EXECUTE (SQL-statement)
BY TERADATA | alias;

SQL-statement is the statement passed to Teradata or the Teradata alias as specified


in a CONNECT TO statement.

By default, the Teradata connection is established in ANSI-SQL mode for Teradata,


hence after each or a series of execute request, an explicit commit request is
required to be submitted..

EXECUTE (SQL-statement) BY TERADATA;


EXECUTE (commit) BY TERADATA;

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-36


Explicit Pass-Through – Execute Requests
Example: Use Teradata’s Fast-Path Insert to append data from a staging table into a
master-table.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-37


Explicit Pass-Through – Execute Requests
Example: Delete the Teradata table customer_temp.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-38


Explicit Pass-Through – Execute Requests
Example: Update a row in the Teradata table customer_temp and insert a new row.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-39


Teradata SQL Modes – ANSI versus Teradata
Teradata is an ANSI-SQL compliant product, however Teradata provides specific
extensions to the language.
• Teradata SQL allows two different modes of session operation:
• ANSI mode
• Teradata (BTET) mode
• Choice of mode affects:
• Transaction protocol behavior
• Case sensitivity defaults
• Collating sequences
• Data conversions
• Display functions

Note: The same SQL statement might perform differently in each based on previous
considerations but no functionality is inhibited by choice of mode.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-40


Teradata SQL modes – ANSI Mode
ANSI mode is also referred to as COMMIT mode.
It automatically accumulates all requests until an explicit COMMIT is submitted.

All transactions in ANSI mode are considered explicit, that is, they require an
explicit COMMIT command to complete

Note: A transaction is a unit of work performed against one or more tables of a


database. A transaction is an all-or-nothing proposition i.e., it either
succeeds in its entirety, or it is entirely rolled back. This is an assurance of
data integrity.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-41


Teradata SQL modes – Teradata Mode
Teradata mode also is referred to as BTET mode, which stands for BEGIN
TRANSACTION / END TRANSACTION.

In Teradata mode, all individual requests are treated as single implicit


transactions. If you need to aggregate requests into a single transaction, use the
BEGIN and END TRANSACTION delimiters.

Character comparison is non case specific (unlike ANSI mode)

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-42


Explicit Pass-Through – Teradata SQL Modes
The SAS/ACCESS engine interfaces to Teradata with a default ANSI mode

When using the SAS/ACCESS Interface to Teradata and running in ANSI mode,
any SQL request that modifies the database (SQL statements that create,
update, modify, and drop Teradata tables) must be issues in conjunction with an
explicit COMMIT statement.

The COMMIT statement is unnecessary for read requests (for example,


SELECT).

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-43


Explicit Pass-Through – Teradata SQL Modes
Teradata modes can be controlled via the SAS/ACCESS engine using the option
mode=Teradata

proc sql;
connect to teradata (user=testuser pw=XXX
server=XYZ mode=teradata);
...
quit;

No COMMIT statement is necessary in Teradata mode


No COMMIT statement is necessary when using the implicit SQL pass-through

Teradata mode should be used when accessing


a Teradata RDMBS

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-44


Explicit Pass-Through – Teradata Modes
Example: The previous example in Teradata-Mode.

No commit
requests anymore

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-45


Teradata SQL Mode – Use Cases
Teradata Mode is suitable and recommended for majority of decision support
applications (no lock blocking collision is possible in Teradata Mode)

ANSI Mode should be used if transaction context should survive across


successive SQL statements.

ANSI Mode must be used with caution as it can potentially effect DEADLOCK
collision in multi-user environment

Explicit COMMIT in ANSI Mode has supplementary execution cost in comparison


to implicit COMMIT in Teradata Mode

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-46


Teradata Features – Query Banding (Optional)
A Teradata query band is a set of name-value pairs that can be set on a session
or transaction to identify the originating source of a query.

It is a tag for transactions and sessions that can be used by Teradata for different
purposes e.g. to manage task priorities and track system usage
• can be used as a workload and security classification criteria
• enables all requests coming from a single logon to be classified into different
workloads

Note: Teradata Version 12 and later support Query Banding. SAS 9.2 M2
supports Teradata query banding options.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-47


Teradata Features – Query Banding (Optional)
Enabling Teradata Features “Query Banding”

proc sql _method;


connect to teradata (user=sas_usr …server="&TDIP" mode=teradata);
execute
(SET QUERY_BAND='user=Adrian;role=unknown;' for session)
by teradata;
execute (
update saseduc.PayrollMasterUP
set Salary=Salary *
case substr(JobCode,3,1)
when '1' then 1.05
else 1.08
end
where substr(jobcode, 1, 2)='PT'
) by teradata;
disconnect from Teradata;
quit;

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-48


Teradata Features – Statistics (Optional)
If consecutive insert and read request on the same indexed Teradata tables are submitted, it
may be required to update table statistics in-between.
 This can be achieved by issuing “collect statistics” request as part of the explicit pass-
through requests.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-49


Using SQL Explicit Pass-
Through to Execute Non-query
Requests in Teradata
This demonstration illustrates how to execute non-query
requests in Teradata from a SAS programs.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-50


Exercise
This exercise reinforces the concepts discussed
previously.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-51


Copyright © 2009 by SAS Institute Inc. and Teradata Corporation. All Rights Reserved.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA
and other countries. Teradata and the names of products and services of Teradata Corporation are registered trademarks or trademarks
of Teradata Corporation in the USA and other countries. ® indicates USA registration. Other brand and product names are registered
trademarks or trademarks of their respective companies.

Querying Teradata Using SAS Explicit SQL Pass-Through Slide 3-52


Module 4
Advanced: Querying Teradata
Using Implicit SQL Pass-Through

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-1


Module 4 – Advanced: Querying Teradata Using
Implicit SQL Pass-Through

• Section 45.1 – Using advanced options for querying Teradata


• Section 4.2 – Using extended SAS Procedure SQL push-down for
Teradata
• Section 4.3 – Combining Tables in SAS and Teradata
• Section 4.4 – Comparing SQL implicit and explicit pass-through
(optional)

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-2


Module 4 – Advanced: Querying Teradata Using
Implicit SQL Pass-Through

• Section 4.1 – Using advanced options for querying Teradata


• Section 4.2 – Using extended SAS Procedure SQL push-down for
Teradata
• Section 4.3 – Combining Tables in SAS and Teradata
• Section 4.4 – Comparing SQL implicit and explicit pass-through
(optional)

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-3


Fast-Exporting Data from Teradata to SAS
SAS supports multiple methods of fastest data exports
 SAS enables Threaded Reads, where a Teradata queries is split up into multiple parallel
subqueries
– This is a SAS default behavior and may help to speed up standard queries, especially
data queries delivering data to SAS threaded procedures

 SAS further supports the implicit usage of Teradata’s FastExport-utility for mass-
extractions.
– This is most suitable for multi-million row extracts
– Availability may depend on System configuration, your Teradata systems
configuration and usage restrictions.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-4


Fast Extracts – Teradata FastExport
FastExporting with Teradata Parallel Transporter (TPT)

With SAS 9.2, the Teradata Parallel Transporter Utilities can be used for Fast-Exporting Data
Using the FASTEXPORT=YES Data Set or Library Option and the TPT=YES-Option (specifies
that the TPT API to read data from a Teradata table is being used)

NOTE: SAS/ACCESS figures out the number of partitions with Oracle and uses that number of threads
(connections).

NOTE: SAS/ACCESS Interface to Teradata is NOT available on UNIX with SAS V9.0. We are
expecting it to be available on UNIX with SAS

V9.1.
Doku: FASTEXPORT=YES specifies that the SAS/ACCESS engine uses the Teradata Parallel
Transporter (TPT) API to read data from a Teradata table

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-5


Optimizing Teradata Queries from SAS
Enabling Teradata Mode
A Libname option MODE=TERADATA | ANSI allows opening of connections in
the specified mode.
– With MODE=TERADATA SQL requests are passed through using the
Teradata mode rules
This impacts transaction behavior (insert, update and deletes) and enables
case insensitivity when processing data.
Avoids unnecessary single statements commit requests on Teradata, which
in high-workload scenario may impact performance.
– The use of Teradata-Mode is recommended primarily for reading data from
Teradata.

* Available with SAS 9.2 M2

This option allows opening of Teradata connections in the specified mode. Connections that are opened
with MODE=TERADATA use Teradata mode rules for all SQL requests that are passed to the Teradata
DBMS. This impacts transaction behavior and can cause case insensitivity when processing data.

During data insertion, not only is each inserted row committed implicitly, but rollback is not possible
when the error limit is reached if you also specify ERRLIMIT=. Any update or delete that involves a
cursor does not work.

ANSI mode is recommended for all features that SAS/ACCESS supports, while Teradata mode is
recommended only for reading data from Teradata.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-6


Optimizing Teradata Queries from SAS
Enabling Teradata Mode – Example

libname TDOrion teradata user=sas_usr pw=sas_usr server="&TDIP"


database=sasorion MODE=TERADATA ;
proc sql noprint;
select * from TDOrion.order_fact
where order_type=3 and customer_id=75654;
Quit;

TERADATA_5: Executed: on connection 1


SELECT CAST("Customer_ID" AS FLOAT), ...
FROM sasorion."order_fact"
WHERE ( ("Order_Type" = 3 ) AND ("Customer_ID" = 75654 ) )

TERADATA: trget - rows to fetch: 2


TERADATA: trforc: COMMIT WORK
15 quit;
The COMMIT WORK request will not be
explicitly issued with MODE=Teradata

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-7


Using SAS LIBNAME Options
to Optimize SAS Extract
Behavior
This demonstration illustrates how to use SAS options to
optimize Teradata extract performance.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-8


Exercise
This exercise reinforces the concepts discussed previously.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-9


Module 4 – Advanced: Querying Teradata Using
Implicit SQL Pass-Through

• Section 4.1 – Using advanced options for querying Teradata


• Section 4.2 – Using extended SAS Procedure SQL push-down
for Teradata
• Section 4.3 – Combining Tables in SAS and Teradata
• Section 4.4 – Comparing SQL implicit and explicit pass-through
(optional)

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-10


SAS Procedures and DBMS Tables (Review)
SAS/ACCESS Libname Engine
 Any SAS Program may reference data residing in a database
 SAS/Access simulates SAS I/O for SAS Procedures
 An SQL SELECT statement is generated for data retrieval, honoring column selections,
sample options and WHERE clauses (incl function mapping etc.).
 Any further execution logic of the SAS procedures is not transformed to SQL and
pushed down for common DBMS.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-11


Example (Proc Chart)

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-12


Example (Proc Chart with WHERE clause)

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-13


SAS Procedure SQL Push-down for Teradata
SAS/BASE Procedures SQL push-down
• Exclusively available with SAS/ACCESS to
Teradata
• Automatic transformation of SAS procedure SAS Process
execution logic into a SQL request being
SAS/Access to Teradata
pushed to Teradata for execution
Teradata Client

Enabled SAS Procedures*


FREQ, TRANSPOSE,
RANK, REPORT,
SORT, SUMMARY,
TABULATE, MEANS Teradata

*SAS 9.4 M4

Before: Returned 9,000,000 rows; Processing time: 55 seconds


In-Teradata: Returned 51 rows; Processing time: 2 seconds

PROC FREQ
Use SQL to pre-aggregate data in Teradata
Leverage SAS_PUT In-Database formatting
Leverage PROC SQL Implicit Pass-through
PROC SUMMARY
Multi-dimensional aggregation
Complex aggregation options (MLF)
Shared summary subsystem
PROC RANK
Does not aggregate data – expands it
Can be done completely with SQL – no post processing
Can execute completely in Teradata
Globally controlled by SQLGENERATION option.
DBMS = Permission to generate SQL if the data source is a supported RDBMS. Do not report
incompatibilities without MSGLEVEL=I.
DBMUST = Eligible PROCs MUST generate SQL to prevent drawing all the rows out of the
database.
Local control at the libname level – same syntax.
MSGLEVEL = I allows PROCs to give information as telling why passthrough did occur.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-14


SAS Procedure SQL Push-down for Teradata
SAS/BASE Procedures SQL push-down explained …
PROC FREQ
Uses SQL to pre-aggregate data in Teradata
Leverages SAS_PUT In-Database formatting
Leverages PROC SQL Implicit Pass-through
PROC SUMMARY/MEANS
Multi-dimensional aggregation
Complex aggregation options (MLF)
Shared summary subsystem
PROC RANK
Does not aggregate data – expands it
Can be done completely with SQL, no post processing
Can execute completely in Teradata

Control over SQL push-down


SAS option SQLGENERATION=DBMS | EXCLUDEPROC| NONE

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-15


SAS Procedure SQL Push-down for Teradata
Example 1 – SAS FREQ Procedure

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-16


SAS Procedure SQL Push-down for Teradata
Example 2 – SAS MEANS/SUMMARY Procedure

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-17


SAS Procedure SQL Push-down for Teradata
Example 3 – SAS RANK Procedure

NOTE: PROC RANK by itself produces no printed output.

Continued on next page


Result Table

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-18


Continued from previous page

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-19


SAS Procedure SQL Push-down for Teradata
Example Performance diagram, comparing PROC RANK performance calculated in SAS versus
pushed down to Teradata with different detail rowsets.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-20


SAS Procedure SQL- and Format-Push Down
SAS procedures support use of formats to enable custom aggregation rules on-the-fly.
 In SAS procedure SQL push-down supports on-the-fly aggregation using SAS Formats.
– In this way, customers can still create and use formats on the fly AND have in-database
processing.
– Business critical, high value formats (extreme 1:n relations) can be published to Teradata to
provide an extra performance lift.
– If a format is not found in the database, raw value processing is substituted and formatting is
deferred until the results are returned to SAS.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-21


SAS Procedure SQL- and Format-Push Down
Example – SAS FREQ Procedure and Formats

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-22


SAS Procedure SQL- and Format-Push Down
Example – SAS FREQ Procedure and Formats

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-23


Reading Data into SAS from Teradata
Enabling Teradata Parallel Transporter (TPT) API to read data
By using the TPT API, you can read data from a Teradata table without working directly with the
stand-alone Teradata FastExport utility.
To enable TPT API the FASTEXPORT clause is used in the LIBNAME statement.
When FASTEXPORT=YES, SAS uses the TPT API export driver for bulk reads.

NOTE: Teradata connection:


TPT FastExport has read n row(s).

To check whether SAS uses the TPT API to read data,


look for the following message in the SAS output log

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-24


Reading Data into SAS from Teradata (Example)
Enabling Teradata Parallel Transporter (TPT) API to read data

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-25


Using SAS Procedures SQL
Push-down
This demonstration illustrates how to use the SAS SQL
procedure capabilities.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-26


Exercise
This exercise reinforces the concepts discussed
previously.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-27


Module 4 – Advanced: Querying Teradata Using
Implicit SQL Pass-Through

• Section 4.1 – Using advanced options for querying Teradata


• Section 4.2 – Using extended SAS Procedure SQL push-down for
Teradata
• Section 4.3 – Combining Tables in SAS and Teradata
• Section 4.4 – Comparing SQL implicit and explicit pass-through
(optional)

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-28


Combining Tables in SAS and Teradata
SAS and Teradata provide techniques to
 Combine Data horizontally (rows by matching keys)
– SAS: Data Step-Merge, -Hash-Lookup, -Set Key-Lookup, SQL-Joins, …
– Teradata: SQL-Joins
 Combine Data vertically (rows below each other)
– SAS: Data Step (ordered) Set-concatenation, Append, SQL-Set Operations,
– Teradata: Fast-Path Insert, SQL-Set Operations, MultiLoad, …
 What are the differences between SAS and Teradata techniques, and which are most efficient,
when
– both tables reside inside Teradata, or when
– one table resides in SAS , the other one in Teradata ?

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-29


Combining Data Horizontally – Comparison
Differences between SAS techniques and Teradata
SQL Joins
• SAS supports standard SQL joining techniques Table Table
similar to Teradata. However Teradata’s SQL-Join A B

processing will be far more efficient


• A SAS Merge compared to an SQL Join can be
best described as a row-by-row merge of ordered
input tables, with a default result set comparable to
a Full Outer-Join.
Left Outer Join
• A SAS Data Step Hash-Lookup is an efficient In-
Memory-Lookup technique for large master-tables
and lookup-tables held in memory.
Cross Join
• A SAS Set Key-Lookup is an efficient Lookup Right Outer
Join
technique for Index-Based lookups with small
transaction tables large indexed lookup-tables
Inner Join

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-30


Combining Data Horizontally – Comparison
Differences between SAS techniques and Teradata SQL Joins
Operation Description SQL SAS MERGE Syntax SQL Syntax
Concept
Only rows that match on the Inner Join DATA ab; SEL * FROM a INNER JOIN b on
unique key between all data sets MERGE a(IN=x) b(IN=y); a.common_key = b.common_key;
are included in output data set BY common_key;
iF x AND y;
All rows from left hand side data Outer Join (Left) DATA ab; SEL * FROM a LEFT OUTER JOIN b
set retained in final set MERGE a(IN=left) b(IN=right); ON a.common_key = b.common_key;
regardless of whether there is BY common_key;
match to right side data set. If IF left;
match then additional columns
from right side are populated. If
not then NULL.
All rows from right-hand side Outer Join DATA ab; SEL * FROM a RIGHT OUTER JOIN
data set retained in final set (Right) MERGE a(IN=left) b(IN=right); b ON a.common_key =
regardless of whether there is BY common_key; b.common_key;
match to left side data set. If IF right;
match then additional columns
from left side are populated. If
not then NULL.
All rows from both sets are Outer Join (Full) DATA ab; SEL * FROM a FULL OUTER JOIN b
retained regardless of match. If MERGE a(IN=left) b(IN=right); ON a.common_key = b.common_key;
match then valid data for all BY common_key;
columns. If not then NULL values IF left OR right;
for columns where not match /* Or without IF-Statement as
obtained it
is the default behaviour */

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-31


Combining Data Vertically – Comparison
Differences between SAS techniques and Teradata Set
Operations
SAS supports standard SQL set operation techniques similar
to Teradata. However Teradata’s SQL-Join processing will be Set A Set B

far more efficient.


Teradata supports further techniques for fastest row-inserts
and utilities for bulk-loading data into Teradata tables
(multiload..)
Except Set A Intersection
A SQL Self-Join is a useful technique that references the
same table two or more times in the FROM clause in a single
SQL-step.
A SAS DATA Step (ordered) Set-concatenation can be best
described as a (ordered) SQL-union-operation of multiple Union
input tables.
A SAS APPEND Procedure step bulk-appends the rows of an
input table to an existing master-tables.

• SAS: Data Step (ordered) Set-concatenation, Append, SQL-Set Operations,


• Teradata: Fast-Path Insert, SQL-Set Operations, MultiLoad, …

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-32


Combining Data Vertically – Comparison
Differences between SAS techniques and Teradata Set Operations

Operation SQLConce SAS Syntax SQL Syntax


Description pt
Concatenate Data UNION Data table_name; PROC SQL;
Sets/Tables set table_A table_B; Select *
Run; from table_A
Union
Data table_name; Select *
set table_A table_B; from table_B
by common_var;
Run;

Perform operations Join Data table_name; PROC SQL;


on current data Merge table_A table_B; Select A.*, B.*
set/table and merge by common_var; from table_A A
results back to Join table_B B
original data On A.common_var=B.common_var;
set/table

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-33


Combining Tables – From Teradata Only
Combining multiple tables, which all reside in Teradata, is most efficient, if processed
completely in Teradata
 SAS PROC SQL implicit pass-through capabilities automatically push-down SQL-Joins
for execution to Teradata and is therefore most efficient.
– This includes operation of tables from different Teradata databases.
 It may further be appropriate, to use SAS SQL explicit pass-through capabilities to
address custom use cases
 SAS Data Step Merges or other techniques should not be used to combine data from
multiple Teradata tables
– The data from all Teradata tables would get transferred to the SAS Server and
processed there. This is regarded as a highly inefficient workflow for almost all use
cases.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-34


Combining Tables – From Teradata Only
Combining multiple tables using SAS PROC SQL implicit SQL pass-through.

Note: As long as USER=, PASSWORD=, ACCOUNT= and SERVER= information in SAS Teradata
Libnames with different databases (schemas) are the same, the join is passed to Teradata.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-35


Combining Tables – From Teradata Only
SAS Data Step Merges are NOT passed-through to Teradata, only tables select-query are issued
to Teradata.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-36


Combining Tables – From Mixed Sources
For the use case of combining tables from mixed sources, some tables reside in SAS and
some in Teradata multiple techniques can be used.
 A SAS Merge of SAS and a Teradata Table might be a straightforward approach
– Or any other SAS technique, where the Teradata table is loaded to SAS prior of the
SAS „Merge/lookup/…“ processing
 A SAS Set Key-Lookup is an efficient Lookup technique for Index-Based lookups with
small SAS transaction tables and large indexed lookup-tables in Teradata
 Uploading the SAS table to Teradata* and process the join in Teradata might be
especially useful, when the Teradata table is much larger then the SAS table.

*Requires write-access to Teradata, will be covered in a later section.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-37


Loading Data into Teradata (FASTLOAD)
To significantly improve performance when loading data into Teradata, SAS/ACCESS supports the
following Teradata utilities.
FastLoad - Bulk-load capability that accelerates insertion of data into empty Teradata tables. The
SAS/ACCESS FastLoad facility is similar to the native Teradata FastLoad Utility. They share these
limitations:
• FastLoad can load only empty tables; it cannot append to a table that already contains data. If you
attempt to use FastLoad when appending to a table that contains rows, the append step fails.
• Both the Teradata FastLoad Utility and the SAS/ACCESS FastLoad facility log data errors to tables.
Error recovery can be difficult.
• FastLoad does not load duplicate rows (rows where all corresponding fields contain identical data) into a
Teradata table. If your SAS data set contains duplicate rows, you can use the normal insert (load)
process.
• To enable FASTLOAD via the LIBNAME statement or in the DATA step set BULKLOAD=YES
NOTE:
BL_LOG= specifies the names
of error tables that are created
when you use the SAS/ACCESS
FastLoad facility

DBCOMMIT=n causes a
Teradata "checkpoint" after each
group of n rows is transmitted.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-38


Loading Data into Teradata (MULTILOAD)
Unlike FastLoad, which only loads empty table, MultiLoad allows the loading of both empty and
existing tables. The SAS/ACCESS MultiLoad facility is similar to the native Teradata MultiLoad
Utility and hence share the same limitations due to which you must drop the following items on the
target tables before the load:
• Unique Secondary Indexes
• Foreign key references
• Join Indexes
Just like FastLoad Error recovery can be difficult, but the ability to restart from the last checkpoint
is possible with MultiLoad.
To enable FASTLOAD via the LIBNAME statement or in the DATA step set BULKLOAD=YES
NOTE:
ML_LOG= specifies the names of temporary error
tables that are created when you use the SAS/ACCESS
MultiLoad facility

ML_CHECKPOINT=n ’n’ can be between 1-59


(minutes after which checkpoint is recorded),
If greater than 59 then a checkpoint occurs after a
multiple of the specified rows are loaded.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-39


Combining SAS and Teradata
Tables
This demonstration illustrates how to join multiple
Teradata tables and how best combine tables from
heterogeneous sources.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-40


Exercise
This exercise reinforces the concepts discussed
previously.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-41


Module 4 – Advanced: Querying Teradata Using
Implicit SQL Pass-Through

• Section 4.1 – Using advanced options for querying Teradata


• Section 4.2 – Using extended SAS Procedure SQL push-
down for Teradata
• Section 4.3 – Combining Tables in SAS and Teradata
• Section 4.4 – Comparing SQL implicit and explicit pass-
through (optional)

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-42


Accessing Teradata Tables (Review)
You can access Teradata table data from within SAS through these methods:
• the SAS/ACCESS LIBNAME statement (implicit pass-through)
• the SQL Procedure (explicit pass-through facility)

Each method has benefits and costs.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-43


The SAS/ACCESS LIBNAME Statement (Review)
The SAS/ACCESS LIBNAME statement does the following:
• establishes a libref, which acts as an alias, or nickname, by which the Teradata table is
referenced by SAS
• permits a Teradata table to be referenced by a two-level name
• permits the Teradata table to be read as easily as a SAS data set
• enables the Teradata table to be updated if the proper authority already exists

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-44


SQL Procedure Pass-Through Facility (Review)
The SQL procedure does the following:
• sends a query to be processed by Teradata (explicit pass-through)
• displays query results formatted by PROC SQL
• permits non-query (INSERT, UPDATE, GRANT, and so on) actions if authority already exists

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-45


Comparing Pass-Through Facilities

SAS/ACCESS LIBNAME PROC SQL


Implicit Pass-Through Explicit Pass-Through

Provides transparent access to Teradata Teradata can optimize all table joins.
tables.
The DATA and PROC step syntax is Teradata-specific functions and utilities
unchanged. can be used.
Knowledge of Teradata-specific SQL is You can combine SAS features and
unnecessary because Teradata-specific features in your query.
the SAS/ACCESS engine can convert the
SAS code to Teradata-specific SQL and
send the query to be processed by
Teradata.

continued..
.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-46


Comparing Pass-Through Facilities

SAS/ACCESS LIBNAME PROC SQL


Implicit Pass-Through Explicit Pass-Through
Enables you to save data retrieval results as a Results of a query can be saved as a SAS data
SAS data set or SAS view. set or a PROC SQL pass-through SAS view.
FastExport can be used with SAS threaded Can run a Teradata stored procedures.
reads.
Can pass PROC SQL queries to that contain
joins, GROUP BY, ORDER BY, WHERE, and
aggregate functions to Teradata.
Selected SAS functions are passed to Teradata
for processing.

If you have multiple SAS libraries then the LIBNAME statement connect options must
match(ENGINE, USER=, PASSWORD=, ACCOUNT= and SERVER=)

Functions used in the code must exist in the DBMS

No dataset options can be used in the code

Different SCHEMA= Values Are OK:


LIBNAME mytera1 TERADATA USER=tduser PASSWORD=tduser SERVER=tdtest
SCHEMA=tduser;
LIBNAME mytera2 TERADATA user=tduser PASSWORD=tduser SERVER=tdtest
SCHEMA=tduser2;

Different USER= Values Are NOT OK (JOIN not passed):


LIBNAME mytera1 TERADATA USER=tduser PW=tduser SERVER=tdtest SCHEMA=tduser;
LIBNAME mytera2 TERADATA USER=tduser2 PW=tduser SERVER=tdtest
SCHEMA=tduser2;

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-47


Copyright © 2009 by SAS Institute Inc. and Teradata Corporation. All Rights Reserved.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA
and other countries. Teradata and the names of products and services of Teradata Corporation are registered trademarks or trademarks
of Teradata Corporation in the USA and other countries. ® indicates USA registration. Other brand and product names are registered
trademarks or trademarks of their respective companies.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-48


Fast Extracts – Teradata FastExport
For the use cases of mass data extractions, SAS may implicitly use Teradata’s FastExport-Utility
With the SAS Option DBSLICEPARM=ALL applied and Fastexport-utilitiy has been setup to be
used by SAS.
• SAS/ACCESS Interface to Teradata uses the FastExport utilility, if available, automatically
• This enables the FastExport utility to direct the partitioning using the database partitions

data _null_;
set TDOrion.order_fact (DBSLICEPARM=(ALL));
Run;
TERADATA_1: Prepared: on connection 2
SELECT CAST("Customer_ID" AS FLOAT), ..
FROM sasorion."order_fact“

TERADATA: tryottrm(): SELECT was processed with FastExport.


NOTE: There were 113216 observations read from the data set
TDORION.order_fact.

NOTE: SAS/ACCESS figures out the number of partitions with Oracle and uses that number of
threads (connections).

NOTE: SAS/ACCESS Interface to Teradata is NOT available on UNIX with SAS V9.0. We are
expecting it to be available on UNIX with SAS
V9.1.
/* Implicit Read Partitioning with DBSLICEPARM Libname or Data Set Option */
* DBSLICEPARM=ALL automatically invokes Teradata FASTEXPORT, if available;
* Otherwise Threaded Reads use an autopartitioning mechanism based on the MOD function.
Note: this requires specific column types (Byteint, Smallint, integer, date, decimal);
•Support for FASTEXPORT by SAS is not limited to SAS on Windows;
*Log
sasiotra/trautogn(): SAS-supplied sasaxsm access module not found on your system
sasiotra/trautogn(): Cannot FastExport. Reverting to MOD slicing.
Check FASTEXPORT in the environment's path (Win: Fexp.exe- + SasAxsm-paths to end of PATH);

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-49


Fast Extracts – SAS Threaded Reads
SAS Threaded Reads occurs when an Implicit SQL Pass-through Query is
passed through the SAS/ACCESS interface and certain pre-conditions are met.
During a Threaded Read request
• A single SQL statement gets automatically or manually transformed into
multiple SQL requests (an additional WHERE clause is appended to the
original request)
• The number of parallel requests/threads is determined by options used or
system settings
• Each thread establishes a DBMS connection
• Teradata executes those multiple requests in parallel, which are “integrated”
on the SAS side or fed separately into parallel executing SAS procedure
threads.

Scope of Threaded Reads


• Faster Read Access to DBMS data
• Threaded Applications are SAS steps that are automatically eligible for a threaded read. They are
bottom-to-top fully threaded SAS procedures.
• Fully Threaded SAS Procedures perform data reads, numerical algorithms, and data analysis in
threads.
• SAS steps with a read-only libname table reference can become eligible for a threaded read.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-50


Fast Extracts – SAS Threaded Reads
Auto-Partitioning
SAS/ACCESS to Teradata shares an autopartitioning scheme based on the MOD
function.
• MOD is a mathematical function that produces the remainder of a division
operation
• Teradata tables must contain a column to which SAS can apply the MOD
function.
Numeric columns constrained to integral values: BYTEINT, SMALLINT,
INTEGER, DATE, DECIMAL (integral DECIMAL columns only) column
types.
• Is enabled by using the DBSLICEPARM option
̶ controls the scope of Teradata threaded reads and the number
Teradata connections.

NOTE: SAS/ACCESS figures out the number of partitions with Oracle and uses that number of
threads (connections).

NOTE: SAS/ACCESS Interface to Teradata is NOT available on UNIX with SAS V9.0. We are
expecting it to be available on UNIX with SAS V9.1.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-51


Fast Extracts – SAS Threaded Reads
Auto-Partitioning
On each thread, SAS appends a WHERE clause to the SQL that uses the MOD function with the
numeric column to create a subset of the result set.
For example, assume your original SAS-produced SQL is

SELECT CHR1, CHR2 FROM DBTAB

SAS creates two threads and issues:

SELECT CHR1, CHR2 FROM DBTAB


WHERE (MOD(INTCOL,2)=0)
and
SELECT CHR1, CHR2 FROM DBTAB
WHERE (MOD(INTCOL,2)=1)

Combined, these subsets add up to exactly the result set for your original single SQL statement.

NOTE: Rows with an EVEN value for INTCOL are retrieved by the first thread. Rows with an
ODD value for INTCOL are retrieved by the second thread. Distribution of rows across the two
threads is optimal if INTCOL has a 50/50 distribution of EVEN and ODD values.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-52


Fast Extracts – SAS Threaded Reads
Using Auto-Partitioning
The DBSLICEPARM-Option determines the scope of usage and enablement of the
auto-partitioning extracts

DBSLICEPARM= ( ALL | NONE | THREADED_APPS | DBI <, #THREADS >);

Scope of usage
• THREADED_APPS : only thread support is for SAS thread enable applications
(procedures)
• ALL : enable threading on all read-only engine requests incl. DATA step reads and
uses implicitly Teradata fastexport if available.
• DBI: enable threading on all read-only engine requests incl. Data Step reads.
THREADS | MAXTHREADS – maximum number of threads supported by the engine

DEFAULT- (THREADED_APPS,2)
THREADED_APPS makes threaded SAS procedures eligible for threaded reads.
(THREADED_APPS,2) is the default value.

ALL attempts to “autopartition” the table and threaded reads are automatically attempted.

<, MAXTHREADS> specifies the maximum number of threads determined by an integral column of a
DBMS table.

DBI forces SAS/ACCESS to Teradata ONLY to generate partitioning WHERE clauses for you. ***

NOTE: If performance is slow on the DBMS side, it has nothing to do with SAS and questions
regarding this needs to be handled by the DBMS vendor. However, if pulling data from DBMS into SAS
is slow then using MAXTHREADS can improve performance. You may need to play with this value to
find an optimal value for your system/site…similar to using BUFFSIZE option.

NOTE: SAS/ACCESS Interface to Teradata is not supported on UNIX SAS V9.0 but is expected to be
with SAS V9.1.

NOTE: Default values for MAXTHREADS are ORACLE=2, TERADATA=2, DB2=3, ODBC=3 and
SYBASE=3

NOTE: DBSLICEPARM=DBI for Teradata essentially means "ALL, but using modulo, not
FastExporting".

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-53


Fast Extracts – SAS Threaded Reads
DBSLICEPARM – Example
If you want the enable threaded reads with a SAS procedure or DATA step program, use
DBSCLICEPARM like in the following example

Proc Print data=dblib.salesdata


(dbsliceparm=(DBI,3));
Run;

The Teradata extract request transforms into three parallel query requests:

..
select * from salesdata where(mod(num,3))=0;
select * from salesdata where(mod(num,3))=1;
select * from salesdata where(mod(num,3))=2;

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-54


Fast Extracts – SAS Threaded Reads
Explicit Partitioning
Slices of data which are read in parallel are determined explicitly by the Data Set Option
DBSLICE

…. TDLib.TABLEX (DBSLICE = (“WHERE-clause-1”


“WHERE-clause-2”
“WHERE-clause-n”>))

each WHERE clause uses a separate connection (or thread)


• Provides user control over data slicing
• references a table and is only useful when you are familiar with the table data
• allows you to code your own WHERE clauses to partition data across threads

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-55


Fast Extracts – General Options (Optional)
General Options Affecting Threaded Read
There are two system options that will affect threaded reads using SAS/ACCESS:
• THREADS – A SAS system option which specifies that SAS use threaded
processing for SAS applications that support it. It takes advantage of multiple
CPUs by threading the processing and I/O operations.
Options NOTHREADS | THREADS;
THREADS is the default upon SAS invocation.
• CPUCOUNT – A SAS system option that specifies the number of processors that
the thread-enabled applications should have available for concurrent processing

NOTE: THREADS does affect SAS/ACCESS. Here are some testing results to help with answering
questions when tested with SAS V9.0 as of June 26, 2002.
We have run tests cases.

• If NOTHREADS and DBSLICEPARM=NONE or THREADED_APPS then NO threading occurs


within SAS/ACCESS.
• If NOTHREADS and DBSLICEPARM=ALL then ONLY threading occurs with NON-threaded app
(i.e.proc print) within SAS/ACCESS. With threaded apps (i.e.proc reg) , NO threading occurs.
• If THREADS and DBSLICEPARM=ALL on Options Statement, Dataset option or Libname option, it
uses CPUCOUNT value for threaded applications (I.e. proc reg).
• If THREADS and DBSLICEPARM=ALL on Options Statement, Dataset option or Libname option, it
uses MAXTHREADS value for NON-threaded applications (I.e. proc print).
• If THREADS and DBSLICEPARM=NONE on Options statement/libname options or dataset option,
no threading occurs with threaded or non-threaded applications with SAS/ACCESS.
• If THREADS, CPUCOUNT and DBSLICEPARM=THREADED_APPS,2 set by default, then
CPUCOUNT used for threaded applications and NO threading occurs for non-threaded applications,
with SAS/ACCESS.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-56


Fast Extracts – Recommendations
 In general, you should use FastExport if it is available, especially for mass data
extractions.
– SAS threaded reads are innately less efficient than FastExport, because Teradata
must process separate SQL statements that vary in the WHERE clause. In contrast,
FastExport is optimal because only one SQL statement is transmitted to Teradata .
 However, Teradata places restrictions on the system-wide number of simultaneous
FastExport operations that are allowed and your database administrator might be
concerned about large numbers of FastExport extractions.
– SAS threaded reads may support faster extractions, when fast export is
unavailable.
– SAS threaded reads may especially be considered with the use of SAS threaded
procedures.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-57


Optimizing Teradata Queries from SAS
Enabling Prefetching (optional)
 PreFetch is a SAS/ACCESS Interface to Teradata facility that speeds up a SAS job by
exploiting the parallel processing capability of Teradata. To obtain benefit from the facility,
your SAS job must run more than once and have these characteristics:
 use SAS/ACCESS to query Teradata DBMS tables
 should not contain SAS statements that create, update, or delete Teradata DBMS
tables
 run SAS queries that changes infrequently or not at all.
>> the same queries executed multiple times

libname x teradata PREFETCH=‘NameSQLCacheforSAS’,


[#sessions,algorithm]'

How PreFetch Works


When reading DBMS tables, SAS/ACCESS submits SQL statements on your behalf to Teradata. Each
SQL statement that is submitted has an execution cost: the amount of time Teradata spends processing
the statement before it returns the requested data to SAS/ACCESS.

When PreFetch is enabled, the first time you run your SAS job, SAS/ACCESS identifies and selects
statements with a high execution cost. SAS/ACCESS then stores (caches) the selected SQL statements
to one or more Teradata macros that it creates.

On subsequent runs of the job, when PreFetch is enabled, SAS/ACCESS extracts statements from the
cache and submits them to Teradata in advance. The rows selected by these SQL statements are
immediately available to SAS/ACCESS because Teradata 'prefetches' them. Your SAS job runs faster
because PreFetch reduces the wait for SQL statements with a high execution cost. However, PreFetch
improves elapsed time only on subsequent runs of a SAS job. During the first run, SAS/ACCESS only
creates the SQL cache and stores selected SQL statements; no prefetching is performed.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-58


Optimizing Teradata Queries from SAS
Controlling SQL Remerging (optional)
To satisfy some queries SAS must read data twice. This can be very time consuming,
especially if you are using database table, you may want to avoid operation like that
using the PROC SQL or the SQLREMERGE system option.

proc sql REMERGE;


select region, account_num, account_balance,
(account_balance - avg(account_balance)) as diff_from_avg
from account
group by region;
quit;

REMERGE works!

To satisfy some queries SAS must read data twice. This can be very time consuming. Setting
REMERGE allows SAS to execute queries which require a remerge. Keep in mind, this can be very
resource intensive and time consuming. If you are dealing with DBMS tables the remerge operation can
be very slow. You may want to turn it off by using NOREMERGE.

REMERGE|NOREMERGE
Specifies whether PROC SQL can process queries that use remerging of data. The remerge feature of
PROC SQL makes two passes through a table, using data in the second pass that was created in the first
pass, in order to complete a query. When the NOREMERGE system option is set, PROC SQL cannot
process remerging of data. If remerging is attempted when the NOREMERGE option is set, an error is
written to the SAS log.

Default:REMERGE

Tip: Alternatively, you can set the SQLREMERGE system option. The value that is specified in the
SQLREMERGE system option is in effect for all SQL procedure statements, unless the PROC SQL
REMERGE option is set. The value of the REMERGE option takes precedence over the
SQLREMERGE system option. The RESET statement can also be used to set or reset the REMERGE
option. However, changing the value of the REMERGE option does not change the value of the
SQLREMERGE system option. For more information, see the SQLREMERGE system option in the
SAS Language Reference: Dictionary.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-59


Optimizing Teradata Queries from SAS
Controlling SQL Remerging (optional)
NOREMERGE is an option that is used primarily to protect you from DBMS queries
which take a lot of time to execute.

Options SQLREMERGE=0;
proc sql NOREMERGE;
select region, account_num, account_balance,
(account_balance - avg(account_balance)) as diff_from_avg
from account
group by region;
quit; SQL Statements
requiring
remerging fail!

NOREMERGE is an option that is used primarily to protect you from DBMS queries which take a lot
of time to execute. Specifying NOREMERGE will cause the SQL statement to fail. You will see an error
in the log which states that the query requires remerging but cannot execute because the NOREMERGE
option is in effect. If you really want to execute this query you could set REMERGE.

If you are querying a DBMS you may want to figure-out why the query isn’t being passed to the
database.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-60


SAS Analytics Procedure Push-Down
Push-Down pre-processing for analytical SAS Procedures
• Enables automatic push-down of data-heavy preprocessing leveraging SAS Analytics
c-modules deployed in Teradata
• Exclusively available for Teradata SAS Process

• Bundled as SAS Analytics Accelerator*** Proc REG Data=TD.table;


Model ...;
SQL View
...;
Initially available for SAS/STAT Procedures* Run;
(calculating the
sscp-matrix)

• REG, PRINCOMP, VARCLUS SCORE (generates SQL) SAS/Access to Teradata

Teradata Client

Next Wave SAS/STAT Procedures** Teradata


• CORR, FACTOR, CANCORR, TIMESERIES sas_sscp()

*SAS 9.2 TS2 M2


** SAS 9.2 TS2 M3
*** Based on SAS/STAT and SAS/Access to Teradata

Process Flow
• SAS client submits a PROC FOO Step
• Again, the Proc generates a SQL view
• The view references sas_sscp() to compute the matrix inside the DBMS
• Proc FOO reads the matrix and completes the analysis

Proc PRINCOMP
Principal Components Analysis
Proc VARCLUS
Variable Clustering
Proc REG
Model Selection
Proc SCORE
Generates SQL code for the given model
Runs without any data extraction

•Example 100 cols and 5Mio rows > Transfer reduced from 500 Mio rows to 5000 rows

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-61


SAS Analytics Procedure Push-Down
SAS creates a precompiled SAS modifies selected
C code module that runs procedures to call the
on the data base to reuse aggregate UDF through
SAS C code for SSCP the SQL interface
building
SQL Parse Options
Read Data Read Data
Aggregate SSCP Aggregate SSCP
Compute Function

SSCP Linear regression


Principal components
Variable clustering
Send a small amount of
Output Reports
SSCP data back to SAS.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-62


SAS Analytics Procedure Push-Down
Sample analytical workflow leveraging the SAS Procedure execution push-down and minimizes
data transfer and duplication on the SAS’side, while containing the SAS user experience (minimal
code changes) and the high quality of SAS Analytics execution.

Step Function Data transfer


PROC contents … List of vars attributes results
PROC summary; Exploration results
means;
PROC freq; Exploration results
PROC format ; Transformation none
PROC SQL; connect… Transformation none
PROC princomp ; Dimension reduction, CORR output SSCP
PROC varclus ; Variable reduction SSCP
PROC reg ; Model selection SSCP
PROC score ; Apply model; Create OUT= table none
PROC rank ; Assign order to rows none
PROC SQL; connect… Select high ranks none

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-63


Combining Tables – From mixed sources
Choosing between SAS Merge (and others), Set-Key Lookup and upload and
join in the database depends on the individual use case.
 The key factors influencing the decision to choose the best technique are
– Relative size of SAS tables versus Teradata tables
– If a Teradata join column has an index or is used as the tables primary index
– Join key cardinality in the SAS and Teradata tables: number of unique join
key values and number of matches that occur when joined
– Selectivity of WHERE clauses asserted against Teradata tables

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-64


Combining Tables – From mixed sources
Merge a Teradata Table and a SAS Data Set

PROC SORT DATA=TDLIB.Agreement_100K OUT=WORK.A;


BY Account_NBR Account_NBR_Modifier;
Run;
PROC SORT DATA=samp_YourUserID OUT=WORK.B;
BY Account_NBR Account_NBR_Modifier;
Run;
DATA LocalLib.MyMergedData;
MERGE Work.A Work.B;
BY Account_NBR Account_NBR_Modifier;
RUN;

• -or-
PROC SORT DATA=samp_YourUserID OUT=Work.B;
BY Account_NBR Account_NBR_Modifier;
Run;
DATA LocalLib.MyMergedData;
MERGE TDLIB.Agreement_100K Work.B;
BY Account_NBR Account_NBR_Modifier;
RUN;

• Note: The second block of code is not only more efficient, it also eliminates
unnecessary code, and takes full advantage of Teradata’s massively parallel
processing engine to perform the sort of the Teradata table for the MERGE

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-65


Combining Tables – From mixed sources
A SAS Set Key-Lookup is an efficient Lookup technique for Index-Based lookups with small SAS
transaction tables and large indexed lookup-tables in Teradata
 The join-/lookup-operation optimizes SAS processing by subsetting the Teradata table using an
index.
SAS Process
 Can be used in SAS DATA Step or PROC SQL programs
SAS program ... SAS Table
key-lookup
 The Data Set Options DBINDEX, DBKEY and
MULTI_DATASRC_OPT support the approach
Teradata

Subsetting
by key-
values

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-66


Combining Tables – From mixed sources
Supporting SAS options for Set Key-Lookup operations
 With DBINDEX, SAS searches for the appropriate database indexes, suitable to
process the query
– Cannot be used with Teradata primary indexes
 DBKEY causes SAS to build a SELECT with a specially constructed WHERE
clause to search for key matches in the database table using the key specified.
– Supports usage of Teradata primary indexes
 With MULTI_DATASRC_OPT SAS constructs one or more IN clauses containing
the unique key values. SAS passes the IN clause to Teradata and retrieves the
matching rows.

Note: An appropriate index might need to be unique and might


require that the key column be the only column in the index.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-67


Combining Tables – Key-Lookups (optional)
Set Key-Lookup processing example using DBINDEX

libname TDOrion teradata server=... database=sasorion;


proc sql;
select *
from SASLib.GermanCustomersTop50 sas,
TDOrion.Order_Fact (DBINDEX=yes) td
where sas.customer_ID=td.customer_ID;
quit;
TERADATA_3: Executed: on connection 2
SELECT IndexName,ColumnName FROM DBC.IndicesX WHERE IndexName is not NULL
and TableName='Order_Fact' and UPPER(DatabaseName)='SASORION' order by
ColumnPosition
Search for
TERADATA_13: Executed: on connection 2 appropriate
SELECT CAST("Customer_ID" AS FLOAT), ... index.
FROM sasorion."Order_Fact" ...

TERADATA: trget - rows to fetch: 113216

Note: In this example no index was used, all rows where extracted
from the table. Be cautious using the DBINDEX-Option.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-68


Combining Tables – Key-Lookups (optional)
Set Key-Lookup processing example using DBKEY
libname TDOrion teradata server=... database=sasorion;
proc sql;
select *
from SASLib.GermanCustomersTop50 sas,
TDOrion.Order_Fact (DBKEY=customer_id) td
where sas.customer_ID=td.customer_ID;
quit;
INFO: DBINDEX influenced the strategy chosen for this query.
INFO: Index dbkey of SQL table TDORION.Order_Fact (alias = td)
selected for SQL WHERE clause (join) optimization.

USING ("customer_ID" DECIMAL(14))


SELECT CAST("Customer_ID" AS FLOAT), ...
Subquery
FROM sasorion."Order_Fact"
issued for
WHERE ("customer_ID"=:"customer_ID" OR each
("customer_ID" IS NULL AND :"customer_ID" IS NULL )) Key value
TERADATA: trget - rows to fetch: 11

Note: This technique might be efficient if there is a small SAS


table (table with few lookup values) and a large indexed
Teradata table.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-69


Combining Tables – Key-Lookups (optional)
Set Key-Lookup processing example using DBKEY
libname TDOrion teradata server=... database=sasorion;
data WORK.Top50LookedUp;
set SASLib.GermanCustomersTop50 ;
set TDOrion.Order_Fact (dbkey=customer_ID) key=dbkey;
run;

INFO: DBINDEX influenced the strategy chosen for this query.


INFO: Index dbkey of SQL table TDORION.Order_Fact (alias = td)
selected for SQL WHERE clause (join) optimization.

USING ("customer_ID" DECIMAL(14))


SELECT CAST("Customer_ID" AS FLOAT), ...
Subquery
FROM sasorion."Order_Fact"
issued for
WHERE ("customer_ID"=:"customer_ID" OR each
("customer_ID" IS NULL AND :"customer_ID" IS NULL )) Key value
TERADATA: trget - rows to fetch: 11

Note: The Teradata-Query behavior is the same in this Data Step-Lookup example,
however as there is no loop around the second set statement, each row of the SAS
tables is only combined with the first lookup value row. The Output-table has as many
rows as the SAS input table. Only appropriate with unique index lookup values.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-70


Combining Tables – Key-Lookups (optional)
Set Key-Lookup processing with MULTI_DATASRC_OPT
libname TDOrion teradata server=... database=sasorion
MULTI_DATASRC_OPT=IN_CLAUSE;
proc sql;
select * from SASLib.GermanCustomersTop50 sas,
TDOrion.Order_Fact td
where sas.customer_ID=td.customer_ID;
quit;
SELECT CAST("Customer_ID" AS FLOAT), ...
FROM sasorion."Order_Fact"
WHERE ( ("Customer_ID" IN ( 483 , 3314 , 3520 , 7398 , 8489 , ..
... , 62960 , 64765 , 64821 , 65150 , 65315 , 65482 )
)) IN clause
TERADATA: trget - rows to fetch: 900 contructed
by SAS

Note: SAS constructs one or more IN clauses containing the unique key values. SAS
passes the IN clause to the database and retrieves the matching rows. If the
number of unique key values is greater than 4500, the processing is performed
in SAS. The IN clause is passed whether or not an index exists, and multiple IN
clauses can cause multiple full table scans if the join key is not indexed. Make
sure your join key is indexed on the database.

Advanced: Querying Teradata Using Implicit SQL Pass-Through Slide 4-71


Module 5
Best Practices for Query Integration
Use Cases

Best Practices for Query Integration Use Cases Slide 5-1


Module 5 – Best Practices for Query Integration Use
Cases

• Section 5.1 – Making use of extended SAS/Access function


mapping capabilities
• Section 5.2 – Using SAS Formats in Teradata
• Section 5.3 – Sampling from Teradata Tables
• Section 5.4 – Handling specific Teradata Data types

Best Practices for Query Integration Use Cases Slide 5-2


Module 5 – Best Practices for Query Integration Use
Cases

• Section 5.1 – Making use of extended SAS/Access function


mapping capabilities
• Section 5.2 – Using SAS Formats in Teradata
• Section 5.3 – Sampling from Teradata Tables
• Section 5.4 – Handling specific Teradata Data types

Best Practices for Query Integration Use Cases Slide 5-3


SAS to Teradata Function Mapping
The SAS/ACCESS SQL generation engine maps specific functions used in
 WHERE clause in any procedure/DATA Step or
 used in PROC SQL programs into their database equivalent.
The list of default functions mapped has been extended from SAS9.2 M2 and later.
The list is no longer static and can be customized
 To exclude functions from the mapping list or to include mapping of SAS functions
to custom Teradata UDFs
 Specific LIBNAME options enable dynamic allocation of functions:
SQL_FUNCTIONS, SQL_FUNCTIONS_COPY

Best Practices for Query Integration Use Cases Slide 5-4


SAS to Teradata Function Mapping
When and where functions are mapped …
Functions are mapped when PROC SQL Implicit Pass-Through is triggered
Functions are mapped in WHERE clauses (any SAS progam step)
This is why it is important that SAS and Teradata functions be 100% compatible, when IP is not
triggered and the same function exists in both the SELECT list and the WHERE clause:

ABS() Processed by SAS

proc sql;
Select databasenamei, ABS(permspace) as abs_permspace
from td1.dbase
WHERE abs(permspace)>100000;
quit;

TERADATA_26: Executed: on connection 1


SELECT "DatabaseNameI","CreatorName","PermSpace" FROM dbc."dbase“
WHERE ( ABS("PermSpace") > 100000 )

WHERE Processed by Teradata

Best Practices for Query Integration Use Cases Slide 5-5


SAS to Teradata Function Mapping
Functions that are mapped by default with 9.2M2 - Page 1
SAS Function Name Teradata Function Name Converted Parameters
ABS ABS
ACOS ACOS
ARCOSH ACOSH
ARSINH ASINH
ASIN ASIN
ATAN ATAN
ATAN2 ATAN2
COALESCE COALESCE
COS COS
COSH COSH
DAY DAY EXTRACT(DAY FROM "arg")
DTEXTDAY DTEXTDAY EXTRACT(DAY FROM "arg")
DTEXTMONTH DTEXTMONTH EXTRACT(MONTH FROM "arg")
DTEXTYEAR DTEXTYEAR EXTRACT(YEAR FROM "arg")
EXP EXP
HOUR HOUR EXTRACT(HOUR FROM "arg")
INDEX (str,pat) POSITION(pat IN str)

Best Practices for Query Integration Use Cases Slide 5-6


SAS to Teradata Function Mapping
SAS Function Name Teradata Function Name Converted Parameters Functions that are mapped by
LENGTH CHARACTER_LENGTH default with 9.2M2 - Page 2
LOG LOG
LOG10 LOG10
LOWCASE LCASE
MINUTE MINUTE EXTRACT(MINUTE FROM "arg")
MOD MOD EXTRACT(MONTH FROM "arg")
CAST(EXTRACT(SECOND FROM
MONTH MONTH "arg") AS BYTEINT)
SECOND SECOND
SIN SIN
SINH SINH
SQRT SQRT
STRIP TRIM
SUBSTR SUBSTR
TAN TAN
TANH TANH
CAST(SUBSTR(CAST("arg" AS
TIMEPART TIMEPART CHAR(26)),12,8) AS TIME(0))
TRIM TRIM TRIM(TRAILING FROM "arg")

Best Practices for Query Integration Use Cases Slide 5-7


SAS to Teradata Function Mapping
Functions that are mapped by
SAS Function Name Teradata Function Name Converted Parameters
UPCASE UPCASE
default with 9.2M2 - Page 3
YEAR YEAR
AVG * AVG
COUNT * COUNT
MAX * MAX
MIN * MIN
STD * STDDEV_SAMP
SUM * SUM
VAR * VAR_SAMP

SQL_FUNCTIONS=ALL active for matching database behavior


DATE * DATE
DATETIME * CURRENT_TIMESTAMP
LEFT (a) LEFT (a) TRIM(leading from a)
SOUNDEX * SOUNDEX
TIME () * CURRENT_TIME
TODAY () * CURRENT_DATE

* New Teradata functions introduced with the latest SAS release.

Best Practices for Query Integration Use Cases Slide 5-8


Enabling Dynamic Function Mapping
Functions that are additionally mapped by default with 9.2M2

New derived aggregates (not in the function map*)


SAS Function Name Derivation
CSS(x) VAR(x) * (COUNT(x)-1)
(CASE WHEN AVG(x) = 0 THEN NULL ELSE 100 * (STD(x) /
CV(x) AVG(x)) END)

NMISS(x) COUNT(*) - COUNT(x)


RANGE(x) MAX(x) – MIN(x)
STDERR(x) STD(x) / SQRT(COUNT(x))

*Derived aggregates do not appear in the function map SQL Dictionary, they are built into
the PROC SQL IP processing.

Best Practices for Query Integration Use Cases Slide 5-9


Enabling Dynamic Function Mapping
Function mapping example for aggregate functions.

Best Practices for Query Integration Use Cases Slide 5-10


Enabling Dynamic Function Mapping
Some functions might not be 1:1 compatible and thus may return unexpected results depending
on their execution environment in SAS or Teradata
Example could be calculation based on NULL / MISSING-values or “empty values”

The “SQL_FUNCTIONS” Libname options provide control over the function map (also called the
SQL Dictionary) and provide a means to make it extensible:
• SQL_FUNCTIONS=ALL Add optional functions
• SQL_FUNCTIONS_COPY= Copy function list to SAS data set
• SQL_FUNCTIONS=“EXTERNAL_APPEND=Lib.Tab” Add user-supplied functions
• SQL_FUNCTIONS=“EXTERNAL_REPLACE=Lib.Tab” Install custom map

Best Practices for Query Integration Use Cases Slide 5-11


Enabling Dynamic Function Mapping
How To create a List of the SQL Dictionary
The SQL_FUNCTIONS_COPY option writes the function map associated with the LIBNAME to
a SAS data set or to a table

Best Practices for Query Integration Use Cases Slide 5-12


Enabling Dynamic Function Mapping…

Best Practices for Query Integration Use Cases Slide 5-13


Enabling Dynamic Function Mapping

Where normally the SAS function is used, with SQL_Functions=ALL * enabled, the
default behavior can be
SAS Function Name Teradata Function Name
overruled and additional function are TODAY CURRENT_DATE
mapped to their Teradata equivalent. DATE CURRENT_DATE

New “ALL” functions (not 100% compatible with SAS)


SAS Teradata Conversion
LENGTH(c) character_length(trim(trailing from c))
SOUNDEX SOUNDEX

New “ALL” constants


SAS Teradata Constant
TIME CURRENT_TIME
DATETIME CURRENT_TIMESTAMP

Best Practices for Query Integration Use Cases Slide 5-14


Enabling Dynamic Function Mapping
Example enabling SQL_FUNCTIONS=ALL

Note: In order to get the function to execute in Teradata and in this example get the server time, make sure to
disable DATE and TIME constant folding (resolving the SAS function to a constant before the pass-through).
When the NOCONSTDATETIME option is set, PROC SQL evaluates these functions in a query each time it
processes an observation.

Best Practices for Query Integration Use Cases Slide 5-15


Enabling Dynamic Function Mapping
What if you wanted to map the SAS SQRT()-function to a custom User Defined Function called
“sqrt2”?
Libname tera teradata .. ;
proc sql;
select min(sqrt(bal)) from tera.accounts;
Quit;

SQL_IP_TRACE: pushdown attempt # 1


SQL_IP_TRACE: passed down query: select MIN(“myDB"."sqrt2"(”accounts".”bal")) from
”accounts"
TERADATA: trforc: COMMIT WORK
ACCESS ENGINE: SQL statement was passed to the DBMS for fetching data.

TERADATA_47: Executed: on connection 4


select MIN("grotto"."sqrt2"(”accounts".”bal")) from ”accounts"

TERADATA: trget - rows to fetch: 1


TERADATA: trforc: COMMIT WORK

Note – This is a SAS SQL function… only functions known to SAS SQL will be mapped

Best Practices for Query Integration Use Cases Slide 5-16


Enabling Dynamic Function Mapping
How to add new function mappings
SQL_FUNCTIONS=EXTERNAL_APPEND option adds entries the function mappings
• Must be a SAS data set, usually created from default list
• Can be combined with “COPY” operation to get the resulting list

In this example, the scalar function for CHARACTER_LENGTH() is added to the SQL Dictionary:

libname tera teradata server=tdpid user=uid password=pwd


sql_functions_copy=WORK.tera_func_list; /* to get table structure */

proc sql;
create table WORK.tera_func_append as select * from WORK.tera_func_list;
delete from WORK.tera_func_append;
insert into WORK.tera_func_append
values("LENGTHC", 7, "CHARACTER_LENGTH", 16, " ", " ", 0, 0, " ", .);
quit;

libname tera teradata server=tdpid user=uid password=pwd


sql_functions="EXTERNAL_APPEND=WORK.tera_func_append"
sql_functions_copy=WORK.tera_func_list;

Best Practices for Query Integration Use Cases Slide 5-17


Enabling Dynamic Function Mapping
How to replace function mappings
SQL_FUNCTIONS=EXTERNAL_REPLACE updates the entire function mapping list
Must be a SAS data set, usually created from default list

In this example, the SQRT() function is replaced with a UDF in a specific database “MyDB” :

libname tera teradata server=tdpid user=uid password=pwd


sql_functions_copy=WORK.tera_func_list; /* to get current list */

proc sql;
create table WORK.tera_func_append as select * from WORK.tera_func_list;
update WORK.tera_func_replace
set DBMSFUNCNAME = """MyDB"".""sqrt2""", DBMSFUNCNAMELEN=16
where SASFUNCNAME = "SQRT";
quit;

libname tera teradata server=tdpid user=uid password=pwd


sql_functions="EXTERNAL_REPLACE=WORK.tera_func_replace"
sql_functions_copy=WORK.tera_func_list;

Best Practices for Query Integration Use Cases Slide 5-18


Enabling Dynamic Function Mapping
Using the User-Defined Function in a query
Libname Tera Teradata ...
sql_functions="EXTERNAL_APPEND=SASLIB.tera_func_append" ;
proc sql;
select min(sqrt(bal)) from tera.accounts;
Quit;

SQL_IP_TRACE: pushdown attempt # 1


SQL_IP_TRACE: passed down query: select MIN(“MyDB"."sqrt2"(”accounts".”bal")) from
”accounts"
TERADATA: trforc: COMMIT WORK
ACCESS ENGINE: SQL statement was passed to the DBMS for fetching data.

TERADATA_47: Executed: on connection 4


select MIN("MyDB "."sqrt2"(”accounts".”bal")) from ”accounts"

TERADATA: trget - rows to fetch: 1


TERADATA: trforc: COMMIT WORK

Best Practices for Query Integration Use Cases Slide 5-19


Using Dynamic Function Mapping
Notes on using custom function mappings
 There must always be a SAS function recognized by SAS in order to be
mapped. (e.g. like the SAS SQRT-function in the previous example).
 Functions that are added or replaced using SQL_FUNCTIONS options must
have exactly the same number and order of parameters as the corresponding
SAS function
 Parameter conversion is supported for some specific functions in the
SAS/Access engine, with hard-coded rules. There is currently no interface
that allows users to build custom parameter conversions

Best Practices for Query Integration Use Cases Slide 5-20


Teradata Basics – User Defined Functions
What are User Defined Functions?
 Extend the capabilities of the database to perform unique and complex
processes that are not standard DBMS functions
 Used in SQL along with built-in database capabilities
 Application providers like SAS use this feature to embed their specialized
technology in the Teradata Database

Best Practices for Query Integration Use Cases Slide 5-21


Teradata Basics – User Defined Functions
The three types of UDFs / Vendor Defined Functions
Scalar:
• Operates on a table and returns one result per input row
• Used wherever a built-in scalar function like ‘SUBSTR’ can be used, to transform or qualify
data
Examples: SAS_PUT for formats, Scoring functions
Aggregate:
• Operates on a table and returns one result per group of rows
• Used like a built-in aggregate function like ‘SUM’
Examples: SAS/STAT correlation matrix calculation (TD12)
Table:
• Instead of operating on a table it builds a table on which the rest of the SQL statement will
operate
• Invoked in the FROM clause
• Table Function is like a derived table, output is placed in spool
Examples: SAS/STAT correlation matrix calculation (TD13)

Best Practices for Query Integration Use Cases Slide 5-22


Teradata Basics – User Defined Functions
Why Do UDFs Cause Concern?
 Teradata Enterprise Data Warehouse systems are increasingly recognized as
essential, mission critical environments
 With increased dependence on the data warehouse comes increased oversight to
ensure high system availability
 Careful management of the Teradata environment to reduce risks associated with
change, including introduction of new technologies
 Deploying UDFs requires new procedures and processes
 Some Teradata installations may have never used UDFs in production applications

Best Practices for Query Integration Use Cases Slide 5-23


Using Extended Function
Mapping Capabilities
This demonstration illustrates how to enable extended
function mapping using the mapping SQL dictionary
with SAS/Access.

Best Practices for Query Integration Use Cases Slide 5-24


Exercise
This exercise reinforces the concepts discussed
previously.

Best Practices for Query Integration Use Cases Slide 5-25


Module 5 – Best Practices for Query Integration Use
Cases

• Section 5.1 – Making use of extended SAS/Access function


mapping capabilities
• Section 5.2 – Using SAS Formats in Teradata
• Section 5.3 – Sampling from Teradata Tables
• Section 5.4 – Handling specific Teradata Data types

Best Practices for Query Integration Use Cases Slide 5-26


SAS by Example – SAS Formats (Review)
A SAS FORMAT can be described as an internal rule for mapping data values to
label values/formatted values.
SAS provides standard formats (currencies, date and time, …), but users can define
custom formats using the FORMAT-Procedure.
Using formats in SAS programs
• Provides a flexible way to dynamically apply different labeling-rules of data
values
• Reduces storage space
• Leverages a fast Lookup-technique and avoids joins or merges for catching
lookup values.
• Enables analyzing aggregated values by applying formats to detail values

We talked about numerous formats earlier …

Best Practices for Query Integration Use Cases Slide 5-27


Implicit Pass-Through – SAS Formats (Review)
The Reduce-PUT optimization resolves the PUT-function before generating the database-Query.

Creating a
custom
format

Best Practices for Query Integration Use Cases Slide 5-28


Using SAS Formats in Teradata
SAS Formats
 are widely used concept among SAS users and SAS applications
 Often disqualify implicit pass thru if used in where-clauses, SQL-select clauses or applied in
SAS Procedures for the analysis of formatted values

Re-Enabling Implicit Pass-thru with SAS Formats


 The REDUCE-PUT optimization addresses SAS Format usage in where-clauses for most
cases (exception the format implies an extreme 1:n relation)
 In any other case, the
SAS Formats Library for Teradata Select put(KeyID, Namefmt.),
exclusively enables implicit pass-thru of SAS put(IDno, Labelfmt.),
count(*) as cellfreq
formatting into Teradata from table SAS Process
– SAS formatting functions are supplied as UDFs group by 1,2;

– SAS default formats as well as user-defined SAS/Access to Teradata


formats can be deployed into the database Teradata Client

Push-Down of SAS Format processing Teradata


to the Teradata System!
SAS Format sas_put()
SGF Paper - Publish SAS® Formats in Your Teradata Server Library

Change data from one form to another


Numeric to currency or date/time format
Leveraged by most SAS solutions
User custom format
Ex: Customer states into geographic region: “$REGION”
NY --> NorthEast

SAS formats are basically mapping functions. They change an element of data from one format to
another. For example, there are SAS formats to change numeric values to various currency formats and
date/time formats. Furthermore, it is possible for a SAS programmer to define a custom format. Let’s
make our credit score example more interesting by imagining the user wants to map customer states into
geographic regions (Northeast, Southeast, Central, Pacific, etc.) or map countries into regions such as
central and eastern Europe. This can be done by creating a custom SAS format that turns state
abbreviations like those found in our input table into region codes. Using the rules of SAS
programming, this format will be called $REGION and can be added to our PROC FREQ program with
one more line of SAS code:

proc freq data=customer.credit_data;


format state $region.;
table state * credit_score;

SAS Freq is a very simple example, but not everything can be represented in SQL. There are items that
are dynamically generated or SAS functions may be required. For example: Given that the $REGION
format must be applied to the state column in every row of our input table, so we find a way to export
the definition of the $REGION format to Teradata and create a way for it to be used as part of an SQL

Best Practices for Query Integration Use Cases Slide 5-29


statement.

We need to teach the SAS engine to pass format references to Teradata. Remember
that we previously taught PROC FREQ to create a SAS SQL view that performs basic
data summarization. SAS SQL already processes SAS formats but does so using
syntax that is slightly different than the syntax we enabled in Teradata SQL:

select count(*), put(state, “$region.”) as region, min(state), credit_score,


min(credit_score), max(credit_score)
from customer.credit_data
group by region, credit_score

We have the SAS engine convert the “put” with “sas_udf_put” so it recognizes that it
needs to call the SAS put.

This does take a bit longer in Teradata to apply the formatting function via UDF, but
still 11 seconds is must better than the 151 seconds that it takes in the traditional
method.

Adaptive Processing
BASE PROCs are sensitive to but not dependent on embedded formats.

If a format is not found in the database, raw value processing is substituted and
formatting is deferred until the results are returned to SAS.

In this way, customers can still create and use formats on the fly AND have in-
database processing.

Business critical, high value formats can be published to provide an extra lift.

A SAS FORMAT can be described as an internal rule for mapping data values to label
values/formatted values.

SAS provides standard formats (currencies, date and time, …), but users can define
custom formats using the FORMAT-Procedure.

Using formats in SAS programs


• Provides a flexible way to dynamically apply different labeling-rules of data values
• Reduces storage space
• Leverages a fast Lookup-technique and avoids joins or merges for catching lookup
values
Using SAS Formats in Teradata
Deploying SAS Formats to Teradata
 SAS Formats Library for Teradata must be installed on all Teradata nodes, using the Teradata
Parallel Upgrade Tool (PUT)
 A SAS Marco Utility (%indtd_publish_formats) is provided for automating the process of
deploying the formats

• Behind the scenes


 During the publishing process 5 different types of SAS UDFs named sas_put* are deployed
into a Teradata database (schema)
 Multiple instances of sas_put*-function can co-exist in different databases
 Each instance represents a different SAS format catalog with SAS user-defined formats and all
standard formats
 The formats itself are stored in Teradata as user defined objects

%indtd_publish_formats(fmtcat=FMTLIB.formats, ACTION=REPLACE,

FMTTABLE=SASFormatsRegistryTable);

Best Practices for Query Integration Use Cases Slide 5-30


Using SAS Formats in Teradata
Deploying SAS Formats to Teradata - Example
%indtdpf;
%let indtdconn = server=TDSRV user=grotto password=XYZ;
%indtd_publish_formats( fmtcat = work.formats,
FMTTABLE=SASFormatsRegistryTable,
action=REPLACE,
mode=PROTECTED,
outdir=C:\publish\formats\outdir);

/* create format in SAS,


default catalog is
work.formats
*/
proc format;
value $REGION
'CT', 'ME', … = 'NEW ENGLAND‘
'VA', 'WV‘… = 'SOUTH‘

other ="n.n.";
run;

Best Practices for Query Integration Use Cases Slide 5-31


Using SAS Formats in Teradata
Without specifications, the UDFs is expected in a user‘s default database
proc sql;
SELECT distinct state_id FROM TDOrion.state
where put( State_Code, $REGION12.)='WEST' ;
quit;

SQL_IP_TRACE: pushdown attempt # 1


SQL_IP_TRACE: passed down query: select distinct TXT_1."State_ID" from
"sasorion"."state"
TXT_1 where sas_put(“State_Code “, ‘$REGION12.‘)= 'WEST' ;,

The SQLMAPPUTTO= option can be used to identify a specific database where the SAS_PUT()
function is located:
option sqlmapputto = (MyDB.sas_put) ;
option sql_ip_trace = source ;
proc sql;
select put(service, $1.) as service_prefix, count(*)
from td.intr_seed
group by service_prefix;
quit;

SQL_IP_TRACE: passed down query:


select cast(MyDBsas_put("intr_seed"."SERVICE", '$1.0') as char(1)) as
"service_prefix", COUNT(*) from "intr_seed" group by cast(sas_put("intr_seed"."SERVICE",
'$1.0') as char(1))

Best Practices for Query Integration Use Cases Slide 5-32


Using the SAS Format Library
This demonstration illustrates how to publish and use
SAS Formats with the SAS Format Library for Teradata.

Best Practices for Query Integration Use Cases Slide 5-33


Exercise
This exercise reinforces the concepts discussed
previously.

Best Practices for Query Integration Use Cases Slide 5-34


Module 5 – Best Practices for Query Integration Use
Cases

• Section 5.1 – Making use of extended SAS/Access function


mapping capabilities
• Section 5.2 – Using SAS Formats in Teradata
• Section 5.3 – Sampling from Teradata Tables
• Section 5.4 – Handling specific Teradata Data types

Best Practices for Query Integration Use Cases Slide 5-35


Extracting Samples from Teradata Tables
SAS Sampling Technique: Current sampling methods can require large amounts of data
for DATA step processing or for PROCs, such as SURVEYSELECT.
 This can result in very slow response time, especially because of network bandwidth
limitations.
Teradata Alternative: Teradata has a robust and trustworthy sampling function based on a
pseudo random number generator that is comparable to the SAS RANUNI() function.
 The Teradata Sampling functions further supports
– Sampling with or without replacement. Sampling With Replacement option allows
users to perform various re-sampling techniques, such as bootstrap, jack-knife, etc.
– Sampling supports stratified random sampling
 Executing sampling in Teradata means less data being transferred over the network
and samples with quicker response times.

Sampling supports stratified random sampling if required to over/under sample a subgroup.

Best Practices for Query Integration Use Cases Slide 5-36


Reading Sample Data from Database using SAS
code

Best Practices for Query Integration Use Cases Slide 5-37


Reading Sample Data from Database using Teradata
SQL (Explicit SQL pass-through)

No SAS
observatio
n index

Best Practices for Query Integration Use Cases Slide 5-38


Creating Sample Data from Database using SAS
code (DATA STEP)

Creating
Table

Inserting
Sample data

Best Practices for Query Integration Use Cases Slide 5-39


Creating Sample Data from Database using SAS
code (SURVEY SELECT)

Creating
Table

Best Practices for Query Integration Use Cases Slide 5-40


Teradata Sampling – Sample Size
The SAMPLE function is used to generate samples of data from a table or view. Specification of
sample sizes may be done in two ways.
 SAMPLE n - where n is an integer will yield a sample of n rows, or if the number n is
greater than the number of rows in the table, the sample will consist of the exact number of
rows in the table, because rows are not reused within the same sample.
(e.g. 1000, 10000, 100000, etc.)

 SAMPLE n - where n is a decimal value expressing the percentage of rows relative to the
total rows in the Teradata table being sampled between 0.00 and 1.00.

Note: Both the above specified select statements only function as part of the SAS
explicit SQL passthrough and are not supported by the Implicit SQL passthrough

Best Practices for Query Integration Use Cases Slide 5-41


Teradata Sampling – Multiple Samples
Multiple(16 per fraction description) samples can be requested in one SELECT
statement, with the following caveats
• If you use integer numbers to specify sample sizes, the sum of those integer
values cannot exceed the row count in the table being sampled
– when using SAMPLE WITH REPLACEMENT, the sum of the integer values
can exceed the row count in the table being sampled – this can be useful for
implementing various resampling techniques
• If you specify percentages, expressed as decimals between 0.00 and 1.00 of
total rows, the sum of those values cannot exceed 1.00
• Membership in each sample is specified in your output by the value of a new
column SAMPLEID, if it specifically requested in the SELECT statement

Best Practices for Query Integration Use Cases Slide 5-42


Teradata Sampling – Multiple Samples
Multiple sample sets may be generated in a single query if desired.
 To identify the specific set, a tag called the SAMPLEID is made available for
association with each set. Membership in each sample is specified in your output
by the value of a new column SAMPLEID, if it is specifically requested in the
SELECT statement
 The column SAMPLEID may be selected, used for ordering, or used as a column
in a new table.

proc sql;
connect to teradata (user=uid pw=xxxxxxxx server=server_name);
Select * from connection to teradata
(select column_name, sampleid
from DataBaseName.TableName
sample .25, .25, .50

order by sampleid);
disconnect from teradata;
quit;

Best Practices for Query Integration Use Cases Slide 5-43


Teradata Sampling - Sampling Method
Teradata sampling With Replacement
 Sampling With Replacement option allows users to perform various re-sampling
techniques, such as bootstrap, jack-knife, etc.

proc sql;
connect to teradata (user=uid pw=xxxxxxxx server=server_name);
Select * from connection to teradata
(select column1, ...

from DataBaseName.TableName
sample with replacement
when <conditions> end
order by column1, ...);
disconnect from teradata;
quit;

Best Practices for Query Integration Use Cases Slide 5-44


Teradata Sampling – Sampling Method
The Teradata sampling functions supports three options to produce random samples
• Proportional Allocation Sampling (default method)
- Selects rows proportionally within each AMP, until the requested sample size is
achieved
- Supports options for stratified sampling
• Randomized Allocation Sampling
- Selects rows regardless of how rows are allocated to AMPs, until the requested
sample size is achieved (more mathematically robust randomization)
- Supports options for stratified sampling
• Hash Partition Sampling, uses HashPartition and HashRow functions (useful
samples from large tables)

In SAS and Teradata, simple random sampling is based upon pseudo-random number generation using
the uniform(0,1) distribution

Best Practices for Query Integration Use Cases Slide 5-45


Teradata Sampling - Sampling Method
Sampling with replacement and randomized allocation

proc sql;
connect to teradata (user=uid pw=xxxxxxxx server=server_name);
Select * from connection to teradata
(select column1, ...

from DataBaseName.TableName
sample with replacement randomized allocation
when <conditions> end
order by column1, ...);
disconnect from teradata;
quit;

Best Practices for Query Integration Use Cases Slide 5-46


Teradata Sampling - Sampling Method
Stratified random sampling is useful when you want to ensure that proportional representation of
sub-groups is present on your sample
• stratified random sampling is specified using WHEN and ELSE clauses to identify the values
of and the sizes of your strata
Stratified samplings permit differing sampling criteria to be applied to different sets of rows, all
within a single query.
 stratified random sampling is specified using WHEN and ELSE clauses to identify the values of
and the sizes of your strata

proc sql;
connect to teradata (user=instructor pw=instructor server=barbera);
select * from connection to teradata
(SELECT COL1 ,COL2, STR_COL
FROM DataBaseName.TableName
SAMPLE WHEN STR_COL < x THEN (% or n)
WHEN COL2 BETWEEN x+1 AND y THEN (% or n)
WHEN COL2 > y THEN (% or n)
END

ORDER BY STR_COL;);
disconnect from teradata;
quit;

Best Practices for Query Integration Use Cases Slide 5-47


Teradata Sampling - Considerations
Repeated use of the SAMPLE clause results in different samples
• if you create a “master lookup table” however, you can preserve any given
sample for repeated use

Hash partition samples change whenever a table is altered by an Insert, Delete,


or several other table operators
• if you create a “master lookup table” however, you can preserve any given
sample for repeated use

Best Practices for Query Integration Use Cases Slide 5-48


Teradata Sampling – Best Practices
Consider performing your sampling in Teradata
• significantly eases burden on your SAS server and on your network
• ease of doing so, very simple syntax
• speed, leverages Teradata’s massively parallel processing power

Derive and test samples before downloading them to SAS


• high confidence that your sample is representative
• reduce the amount of data that is downloaded to SAS

Use the Teradata Sample function from SAS


• The Teradata SAMPLE function can be used with Explicit SQL Pass-Through.

In SAS and Teradata, simple random sampling is based upon pseudo-random number generation using
the uniform(0,1) distribution

Best Practices for Query Integration Use Cases Slide 5-49


Extract Samples from Teradata
Tables
This demonstration illustrates how to efficiently extract
samples from Teradata tables using the Teradata
sample function from a SAS program.

Best Practices for Query Integration Use Cases Slide 5-50


Exercise
This exercise reinforces the concepts discussed
previously.

Best Practices for Query Integration Use Cases Slide 5-51


Module 5 – Best Practices for Query Integration Use
Cases

• Section 5.1 – Making use of extended SAS/Access function


mapping capabilities
• Section 5.2 – Using SAS Formats in Teradata
• Section 5.3 – Sampling from Teradata Tables
• Section 5.4 – Handling specific Teradata Data types

Best Practices for Query Integration Use Cases Slide 5-52


Naming Data Objects
SAS vs Teradata
Teradata SAS
A name must start with a letter unless enclosed A name must start with a letter or underscore
in double quotation marks. (_).
1 to 128 characters long 1 to 32 characters long
Letter: A-Z Letter: A-Z
Digits: 0-9, Digits: 0-9,
Signs: underscore(_), dollar($), hash(#) Signs: underscore(_)
Case insensitive Case insensitive
A name can be words such as COMMIT or
A name cannot be a Teradata reserved word such SELECT, because SAS does not have reserved
as COMMIT or SELECT. words.
The name must be unique between objects; a A name does not need to be unique between
view and table in the same database cannot have object types, with the exception of a data table
the same name. and view in the same SAS data library.

Best Practices for Query Integration Use Cases Slide 5-53


Data Types
SAS vs Teradata
Teradata SAS
BYTE(n)
VARBYTE(n) Character
CHAR(n)
VARCHAR(n)
DATE
TIME(n)
TIMESTAMP(n)
BYTEINT
INT Numeric
SMALLINT
BIGINT
DECIMAL
FLOAT

Best Practices for Query Integration Use Cases Slide 5-54


Data Types
SAS Character and numeric Data Types

CHARACTER
• CHARACTER specifies a character data value.
• The length can be from 1 to 32,767 characters or bytes. 


NUMERIC
• NUMERIC specifies a double-precision, floating-point binary number
• Capped at 8 bytes of storage.
• Dates are stored as numeric values and represent the number of days
between January 1, 1960, and the date value. For example, January 2, 1960,
is stored as 1.

Best Practices for Query Integration Use Cases Slide 5-55


Data Types
Teradata to SAS 1
Teradata Data Type Default SAS format
CHAR(n ) Character – $n (n<= 32,767)
1
CHAR(n ) Character – $32767.(n>32,767)
VARCHAR(n ) Character – $n (n<= 32,767)
1
VARCHAR(n ) Character – $32767.(n> 32,767)
LONG VARCHAR(n ) Character – $32767 1

BYTE(n ) Character – $HEXn. (n<= 32,767)


1
BYTE(n ) Character – $HEX32767.(n> 32,767)
VARBYTE(n ) Character – $HEXn. (n<= 32,767)
VARBYTE(n ) Character – $HEX32767.(n> 32,767)

1
When reading Teradata data into SAS, DBMS columns that exceed 32,767 bytes
are truncated. The maximum size for a SAS character column is 32,767 bytes.

Best Practices for Query Integration Use Cases Slide 5-56


Data Types
Teradata to SAS 2
Teradata Data Type Default SAS format
INTEGER Numeric – 11.0
SMALLINT Numeric – 6.0
BYTEINT Numeric – 4.0
DECIMAL(n, m ) Numeric – (n+2 ).(m ) 2

FLOAT Numeric – none

2 DECIMAL specifies a packed-decimal number. n is the total number of digits


(precision). m is the number of digits to the right of the decimal point (scale).
The range for precision is 1 through 18 in Teradata whereas it changes to 1
through 13.
For example, when SAS/ACCESS reads a Teradata column specified as
DECIMAL (18,18), it maintains only 13 digits of precision.

Best Practices for Query Integration Use Cases Slide 5-57


Data Types
Teradata to SAS 3
Teradata Data Type Default SAS format
3
DATE Numeric – DATE9.
TIME(n) Numeric – for n=0, TIME8. for n>0, TIME9+n.n 4

Numeric – for n=0, DATETIME19. for n>0,


TIMESTAMP(n)4 DATETIME20+n.n 5

3 The SAS range for dates is from A.D. 1582 through A.D. 20,000. If a date is out
of this range, SAS/ACCESS returns an error message and displays the date as
a missing value.
DATE9  DDMMMYYYY eg. 18MAR2000
4 TIME8  HH:MM:SS eg. 14:45:32
TIME9  H:MM:SS AM/PM e.g 2:45:32 PM

5 Presented as SAS character strings, and thus are harder to use


DATETIME19  DDMMMYYYY:HH:MM:SS:SS eg. 18MAR2000:14:45:32
DATETIME20  DDMMMYYYY:HH:MM:SS:SS eg. 18MAR2000:2:45:32PM

Best Practices for Query Integration Use Cases Slide 5-58


Data Types
Teradata Data Types not supported in SAS

NUMBER
• Represents a numeric value with a maximum precision of 0-38. The scale. This
indicates the maximum number of digits allowed to the right of the decimal point.
• The NUMBER data type is not supported with SAS/ACCESS Interface to Teradata.
If used data truncation might occur, which causes a data integrity issue.
• The Teradata NUMBER data type has a higher level of precision than SAS® 9.2M2
supports. If you have to use the NUMBER data type columns in calculations, you
can use the CAST() function to change to the CHAR type. The CAST() function can
be placed in Teradata views.

The character data type in turn can be to a maximum of 32,767 characters or bytes. 


Best Practices for Query Integration Use Cases Slide 5-59


Data Types
Teradata Data Types not supported in SAS
BIGINT
SAS can hold only 15 significant digits whereas Teradata's BIGINT data type can hold 18 significant digits. As a
result, this causes a problem with BIGINT data types within SAS.
The best way to handle columns of type BIGINT is dependent upon several factors:

1. If none of your BIGINT columns will ever contain a number that is greater than 15 digits, then you can set the
environment variable TRUNCATE_BIGINT=YES, which enables BIGINT support with truncation. When this
environment variable is set to YES, all of your BIGINT columns are truncated to 15 digits.

2. If none of your BIGINT columns are used for computations and are used only for character fields (such as ID
numbers), then you can use the DBSASTYPE= option to specify what data type should be used when data is
read into SAS. For BIGINT data in SAS, you typically use a character data type because SAS does not have a
data type that can show such large numbers.

Best Practices for Query Integration Use Cases Slide 5-60


Enabling Dynamic Function Mapping
How To create a List of the SQL Dictionary
The SQL_FUNCTIONS_COPY option writes the function map associated with the LIBNAME to a
SAS data set or to a table

Best Practices for Query Integration Use Cases Slide 5-61


Handling Specific Teradata Data Types
Conversion of Teradata large numeric and SAS numeric values
 The maximum reliable precision of a SAS float is 15 digits, large database numeric
columns are regarded as such columns with 16 or more digits of precision.
– These large database numeric columns are capable of precision greater than the
maximum reliable precision of a SAS float.
– This can be especially important when working with database table keys.
 Whenever possible, constrain the definition of database key columns to the following
database types to ensure that the translation into a SAS float will match the key on the
database.
– Character
– Small integer (16-bit)
– Integer (32-bit)
– DECIMAL(1,0), DECIMAL(2,0), DECIMAL(14,0), DECIMAL(15,0)
– FLOAT (constrained to integral values)

Best Practices for Query Integration Use Cases Slide 5-62


Handling Teradata Large Numeric Values
Conversion of Teradata large numeric and SAS numeric values
• While key columns are typically integer values, database key columns can also, but
rarely, be large numeric columns or non-integer large numeric columns.
• Database types for large numeric key columns include:
– Big Integer (64-bit integer, 18 digits)
– DECIMAL(16,0), DECIMAL(17,0), DECIMAL(18,0), and so on
• Special considerations apply for the following tasks:
1) Reading database large numeric columns into SAS
2) Asserting WHERE clauses for database large numeric columns
3) Performing joins involving database large numeric key columns
4) Writing large numeric columns back to the database

Best Practices for Query Integration Use Cases Slide 5-63


Handling Teradata Large Numeric Values
Conversion of Teradata large numeric and SAS numeric values
options to address these use cases
 Transform in Teradata before reading into a CHAR-Column
– Use a Teradata transformation rule like
cast((phcy_claim_id (format'999999999999999999')) as CHAR(20))
(This would allow for the preceding 0's to be displayed)
– You will need to use casting when using explicit SQL pass through.
Or you can use TERADATA views to cast BIGINT to CHAR so you don't have to change your
existing SQL in SAS, only create a view with the BIGINT fields casted as CHAR and then use the
view in your SQL.
 Transform when reading/writing the large numeric column into a SAS CHAR-Column using the
DBSASTYPE Option
 Specifying where-clauses using the DBCONDITION Data Set Option when subsetting on large
numeric columns
 Joins on large numeric keys - transfer the tables to SAS and convert large numeric
keys to SAS characters. Use an explicit SQL pass-through SAS view.

HSBC • HSBC – Background to the move from DB2 to Teradata and the requirements/restrictions for
Phase 1, scale of migration, timescales etc.

Migration Issues:
BIGINT issues (read/write required)
• HSBC – overview: why BIGINT is required, maximum numbers involved and joining with character
data (SAS data sets) outside Teradata
• Using views and casting to DOUBLE
• Using views and casting to DECIMAL
• Issues with using a coding workaround
• Potential issues with large numbers of casts
• Potential issues with FASTLOAD
• VIEWTABLE issue
• Opening large Teradata tables
• SAS response and proposed next steps

Best Practices for Query Integration Use Cases Slide 5-64


Handling Specific Teradata Data
Types
This demonstration illustrates how to handle specific
Teradata data types when querying Teradata.

Best Practices for Query Integration Use Cases Slide 5-65


Exercise
This exercise reinforces the concepts discussed
previously.

Best Practices for Query Integration Use Cases Slide 5-66


Copyright © 2009 by SAS Institute Inc. and Teradata Corporation. All Rights Reserved.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA
and other countries. Teradata and the names of products and services of Teradata Corporation are registered trademarks or trademarks
of Teradata Corporation in the USA and other countries. ® indicates USA registration. Other brand and product names are registered
trademarks or trademarks of their respective companies.

Best Practices for Query Integration Use Cases Slide 5-67


Using Dynamic Function Mapping
SAS function name (Teradata UDF name)
BAND (bitAND)
BLSHIFT (leftShift)
• Collection of Teradata UDFs for SAS
BNOT (bitNOT)
BOR (bitOR) • Candidates have been identified for field delivery of a
BRSHIFT (rightShift)
collection of ~17 User Defined Functions that
BXOR (bitXOR)
BYTE (CHR)
implement functions that are not in Teradata today
CEIL
• Most of these functions will be added as intrinsic in
COMPRESS (translate)
future Teradata releases; UDFs could be used in the
DTEXTJULDATE
DTEXTWEEKDAY
interim
FLOOR
• Most of these functions come from existing UDF
JULDATE
SIGN
collections that are available for download from
TRANSLATE Teradata.com
TRANWRD
WEEKDAY

Best Practices for Query Integration Use Cases Slide 5-68


Using SAS Formats in Teradata
Example Performance diagram, comparing SAS Format processing on detail values extracted to SAS
versus pushing SAS Format processing into the Teradata System.

Select put(KeyID, Namefmt.),


put(IDno, Labelfmt.),
count(*) as cellfreq
from table
group by 1,2;

Best Practices for Query Integration Use Cases Slide 5-69


Teradata Sampling Function - Overview
• The SAMPLE clause uses a pseudo-random number generator and the uniform(0,1)
distribution to select rows randomly.
• The general form of the SAMPLE clause is:

Best Practices for Query Integration Use Cases Slide 5-70


Module 6
Creating, Updating, and Loading Teradata
Tables from SAS

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-1
Module 6 – Creating, Updating, and Loading Teradata
Tables from SAS

• Section 6.1 – Creating and Loading Teradata Tables from SAS


• Section 6.2 – Loading Data into Teradata Leveraging Teradata
Load Utilities from SAS
• Section 6.3 – Updating Teradata Tables from SAS

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-2
Module 6 – Creating, Updating, and Loading Teradata
Tables from SAS

• Section 6.1 – Creating and Loading Teradata Tables from SAS


• Section 6.2 – Loading Data into Teradata Leveraging Teradata
Load Utilities from SAS
• Section 6.3 – Updating Teradata Tables from SAS

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-3
Creating and Loading Teradata Tables from SAS
Writing Data to Teradata from SAS
There are several use cases in which a SAS user may
 Want to create a new Teradata table, insert, or update an existing table.
Use cases examples
 SAS Data Integration flows (ETLT/ELT processing)
 SAS Power Users or Analyst using the Teradata Datalab concept as a development area
 SAS Applications creating tables as part or their inherent or optional tuning capabilities

Note: A Teradata user must have sufficient access permissions assigned to his/her Teradata
user ID to perform these types of table manipulations in the target Teradata database.

• incl. uploading data sets for merging/joining operations in Teradata

Operation examples include


• Uploading data sets for merging/joining operations in Teradata
• Uploading new predictive score tables

NoPI is going to be a great feature for sites which have two general categories of use: 1) ELT, and 2)
SandBoxes or User/App created Tables. Both of these have been mentioned in previously in this chain. I
would expect that in both cases, the default an organization will be to use NoPI.

For me, this is a defensive feature which prevents runtime breakage. It prevents our DBA team from
having to deal with exceptions and for our development staff to deal with a class of boundary conditions
which have no real solution on TD. Runtime exceptions are the most expensive to deal with from a staff
perspective, particularly when your batch runs in the middle of the night...

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-4
Teradata Tables from SAS
SAS makes it very easy to copy data sets into the Teradata server. However, this can be very
dangerous and jeopardize the effectiveness of the table usage.
 What would happen if you had authorization to create tables in Teradata and you submit the
following SAS program?

 Which Teradata data types would be used for the columns in the new table?
 What would be the primary index for the table?

You copy the data from the SAS data set orion. Order_fact into the Teradata table CustomerOrders.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-5
Creating Teradata Tables from SAS
Default Output Teradata Data Types
SAS/ACCESS assigns default Teradata data types according to SAS data types and SAS formats
during output processing.
SAS Data Type SAS Format Teradata Data Type
Character $w. $CHARw. $VARYINGw. CHAR[w]
Character $HEXw. BYTE[w]
Numeric A date format DATE
Numeric TIMEw.d TIME(d)1
Numeric DATETIMEw.d TIMESTAMP(d)1
Numeric w.(w2) BYTEINT
Numeric w.(w3-4) SMALLINT
Numeric w.(w5-9) INTEGER
Numeric w.(w10) FLOAT
Numeric w.d DECIMAL(w-1,d)
Numeric All other numeric formats FLOAT

To display Teradata columns that contain SAS times and datetimes properly, you must explicitly assign
the appropriate SAS time or datetime display format to the column.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-6
Teradata Basics – Primary Indexes (Review)
Primary Index on Teradata Tables
 A required Index on one or multiple columns used to determine the distribution of table rows
across the Teradata nodes and AMPS (parallel processing units on Teradata nodes).
 A hashing algorithm is applied to the primary index columns values to determine the Hash-ID,
the information about which AMP owns a specific table row.
 Hence the primary index determines how evenly data is distributed among the AMPS.
 Optimal performance results because even distribution allows the AMPs to work in parallel and
complete their processing about the same time.

AMP AMP AMP AMP AMP AMP

To define a Teradata table it is necessary to choose a column or set of columns as the primary index.
This index is passed through a hashing algorithm to determine which AMP owns the data

To retrieve a row, the primary index value is again passed to the hash algorithm, which generates the
two hash values, AMP and Hash-ID. These values are used to immediately determine which AMP owns
the row and where the data are stored.

One dramatic side-effect of using the hashing algorithm as an indexing mechanism is the absence of a
user-defined order

Hash partitioning of primary index values allows rows from different tables with high affinities to be
placed on the same node. This co-location reduces the inter-connect traffic that cross-node joins
necessitate

It is very important to chose the right primary index – Data need to be distributed evenly across the
system for better performances (take full advantage of the system parallelism)

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-7
Teradata Basics – Primary Indexes (Review)
Primary Index on Teradata Tables
 When the Primary Index column(s) values for a table are “sufficiently unique”, the rows in that
table are evenly distributed across all AMPs.
 However, if data is not evenly distributed across all AMPs “Skewed Data”, the slowest AMP
becomes a bottleneck. That is, a given query or operation will only run as fast as the slowest
AMP involved.

“Hot” Amp
due to skewed
data

AMP AMP AMP AMP AMP AMP

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-8
Creating Teradata Tables from SAS
Copying a SAS Data Set into a Teradata Table
SAS makes it very easy to copy data into the Teradata server.
 What would be the primary index for the table created with this sample program ?

SAS/ACCESS as a default selects the first column of SAS table to be chosen as the primary
index column for the new Teradata table.
Of course, this can be very dangerous and jeopardize the effectiveness of the primary index.

An AMP is as fast as it's fastest processing AMP. Which means if one AMP is idle because the workload
is not distributed evenly, things will move as slow as that underworked AMP.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-9
Creating Teradata Tables from SAS
Determining Primary Index Chosen by Teradata
If you have loaded a SAS data set into a Teradata table, you can use the SQL procedure to pass a
SHOW TABLE statement to Teradata to confirm the primary index that was chosen by Teradata for
the table.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-10
Creating Teradata Tables from SAS
Teradata SHOW TABLE Results
The primary index cust_id was the first column defined in the table.

Request Text
CREATE MULTISET TABLE INSTRUCTOR.checking_account_new
, NO FALLBACK
, NO BEFORE JOURNAL
, NO AFTER JOURNAL
, CHECKSUM = DEFAULT
, DEFAULT MERGEBLOCKRATIO (
cust_id INTEGER
, acct_nbr CHAR(16) CHARACTER SET LATIN CASESPECIFIC
, minimum_balance INTEGER
, per_check_fee DECIMAL(9,2)
, account_active CHAR(1) CHARACTER SET LATIN CASESPECIFIC
, acct_start_date DATE FORMAT 'YY/MM/DD'
, acct_end_date DATE FORMAT 'YY/MM/DD'
, starting_balance DECIMAL(9,2)
, ending_balance DECIMAL(9,2))
PRIMARY INDEX ( cust_id );

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-11
Creating Teradata Tables from SAS
Teradata TABLE Types
Teradata distinguishes the following table types:
 MULTISET-Tables that do allow duplicate rows
 Default type used for create table operations from SAS.
 Explicitly used with the data set option SET=NO

 SET-Tables that do NOT allow duplicate rows


 Explicitly used with the data set option SET=YES
 SET tables should typically have unique indexes defined. If unique indexes are
not present, this will impact insert performance.

A “MULTISET table” allows duplicate rows of data to be loaded into it, something that a “SET table”,
(the type that FASTLOAD creates), does not allow.

Teradata was designed to adhere to strict relational rules, one of which is that you cannot have duplicate
rows in a table. Over the course of time this restriction was loosened. Now you can create a table that
will allow duplicate rows. This is called a multiset table. Unfortunately you cannot use FastLoad to load
these tables if the load data contains and requires duplicate rows.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-12
Creating Teradata Tables from SAS
Governing SQL Implicit Pass-Through
The system option DBIDIRECTEXEC governs, if CREATE TABLE or DELETE statement are
directly passed through..

TERADATA_38: Executed: on connection 3


Get Data SELECT ”cust_id",”age”,”income” FROM "customer"

TERADATA_39: Executed: on connection 5


Create Table CREATE MULTISET TABLE "customer_new_test"

TERADATA_55: Prepared: on connection 5


USING ("cust_id" CHAR (5),”age" INT,”income” FLOAT)INSERT
Insert Data
INTO "zzz_new" (”cust_id",”age”,”income” ) VALUES
(:”cust_id",:”age”,:”income” )

The DBIDIRECTEXEC option tells SAS to optimize CREATE TABLE AS statements and have the
DBMS execute the SQL statement. This is faster because SAS doesn’t need to read and insert data into
the table – the DBMS does it all.

Examining the SQL passed to the database, we see that in the Get Data stage, all of the data in ZZZ is
being read by SAS. In fact, data is also being written to a temp table which can be expensive in terms of
IO. The Insert Data stage will also be expensive as data is moved back into the database.

DBIDIRECTEXEC | NODBIDIRECTEXEC indicates whether the Pass-Through facility optimizes


the handling of SQL statements by passing them directly to the DBMS for execution. Default is
NODBIDIRECTEXEC.

The criteria for passing SQL statements to the DBMS are the same as those for passing joins. When
these criteria are met, a DBMS can process the CREATE TABLE <table-name> AS SELECT statement
in a single step. If multiple librefs point to different data sources, the statement is processed normally
regardless of how you set this option.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-13
Creating Teradata Tables from SAS
Governing SQL Implicit Pass-Through
When DBIDIRECTEXEC is set, a CREATE TABLE or DELETE statement is directly passed-
through to the database.

TERADATA_71: Executed: on connection 4


CREATE MULTISET TABLE "zzz_new"
as ( select "zzz"."x", "zzz"."y" from "zzz" ) WITH DATA

GOOD: The
DBMS does it all!

Take a look at the SQL statement that is being passed to the database. Using the SASTRACE=‘,,,d’
option you can clearly see that the DBMS is handling the CREATE TABLE AS processing. This is
good.

This statement doesn’t move data from the DBMS to SAS and from SAS back into the DBMS. The data
stays in the DBMS.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-14
Creating Teradata Tables from SAS
SAS/ACCESS can automatically create DBMS tables.

If you want to create tables with, for example, different data types than the default ones,
use the DBTYPE= data set option.

To set any DBMS specific table creation option use the DBCREATE_TABLE_OPTS data
set option, whose values get appended to the CREATE TABLE statement.

Hence you can specify the primary index when you create the Teradata table.

data teralib.checking_account_new
(DBCREATE_TABLE_OPTS='Primary index (acct_nbr)'
DBTYPE=(acct_start_date='date' acct_end_date='date')
);

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-15
Creating Teradata Tables from SAS
Example creating a new table with custom primary Index

TERADATA_225: Executed: on connection 3 2034 1538642197 no_name 0


DATASTEP
CREATE MULTISET TABLE "checking_account_new" ("cust_id"
INTEGER,"acct_nbr" CHAR (16) CHARACTER SET LATIN,"minimum_balance"
INTEGER,"per_check_fee" DECIMAL(9,2),"account_active" CHAR (1)
CHARACTER SET LATIN,"acct_start_date" date,"acct_end_date"
date,"starting_balance" DECIMAL(9,2),"ending_balance" DECIMAL(9,2)) Primary
index (acct_nbr) ;COMMIT WORK

If the table already exists in Teradata, you can not REPLACE it because the SAS/ACCESS Teradata
engine does not support the REPLACE option. You would have to drop the table and then recreate it
first. If you try to recreate it, you get the following ERROR message:

ERROR: The TERADATA table CustomerOrders has been opened for OUTPUT. This table
already exists, or there is a name conflict with an existing object. This table will not be replaced.
This engine does not support the REPLACE option.

NOTE: The SAS System stopped processing this step because of errors.

NOTE: DATA statement used (Total process time):


real time 0.01 seconds
cpu time 0.01 seconds

If you don't have authorization on the DBMS to drop tables, the PROC SQL code will not work.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-16
Creating Teradata Tables from SAS
Determining Primary Index Distribution of a Table
 Use a query to predict the distribution of the primary index values for the columns to be
chosen.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-17
Loading Data into Teradata Tables
Overview of loading data into existing Teradata tables
 SAS can insert rows into a Teradata table using Data Steps, PROC SQL or PROC
APPEND steps.
• By default, SAS inserts rows into the Teradata table one row at a time in a sequential
process. This process is slow because rows are inserted using one unit of
parallelism.
• SAS LiBNAME options can be used to tune standard SAS insert operations into
Teradata tables (multi-row inserts etc.
 Teradata provides specific utilities for fastest load operations, which can be used directly
through SAS.*
• Teradata distinguishes operation for loading to empty tables, appending to existing
tables and more and provides specific utilities for those (FastLoad, MultiLoad,
TPUMP, TPT, …)
• However, depending on site restrictions you will require permission to leverage those
utilities.

*Discussed in the following section

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-18
Loading Data into Teradata Tables
Overview of loading data into existing Teradata tables
 The default load methods for loading SAS data into Teradata tables use Teradata
libnames and
– corresponding SAS DATA step syntax,
– SAS PROC SQL insert into syntax, or
– SAS PROC APPEND syntax.
 SAS default behavior issues a series of single-row inserts committed in blocks
sequentially (default 1000).
 This can be tuned using DBCOMMIT and MULTISTMT options.
 However, this process will most likely always be slow compared to using
Teradata’s load utilities, because rows are inserted using only one unit of
parallelism.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-19
Loading Data into Teradata Tables
Using SAS PROC SQL, insert syntax to load data from SAS to Teradata:

Take a look at the SQL statement that is being passed to the database. Using the SASTRACE=‘,,,d’
option you can clearly see that the DBMS is handling the CREATE TABLE AS processing. This is
good.

This statement doesn’t move data from the DBMS to SAS and from SAS back into the DBMS. The data
stays in the DBMS.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-20
Loading Data into Teradata Tables
Using SAS PROC SQL, insert syntax to load data from SAS to Teradata – Teradata Query-Log.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-21
Loading Data into Teradata Tables
DBCOMMIT SAS Load Option
Causes an automatic COMMIT after a specified number of rows have been
processed.
 DBCOMMIT= affects update, delete, and insert processing. The number of
rows that are processed includes rows that are not processed successfully.
 Default value: 1000 when inserting rows into a DBMS table; 0 when updating
a DBMS table
If you set to 0, COMMIT is issued only once--after the procedure or DATA
step completes.

COMMIT (a permanent writing of data to the DBMS)

Default value: 1000 when inserting rows into a DBMS table; 0 when updating a DBMS table

If you explicitly set the DBCOMMIT= option, SAS/ACCESS fails any update with a WHERE clause.

Note: If you specify both DBCOMMIT= and ERRLIMIT= and these options collide during processing,
COMMIT is issued first and ROLLBACK is issued second. Because COMMIT is issued (through the
DBCOMMIT= option) before ROLLBACK (through the ERRLIMIT= option), DBCOMMIT=
overrides ERRLIMIT=.

Teradata: See the FastLoad capability description for the default behavior of this option. DBCOMMIT=
and ERRLIMIT= are disabled for MultiLoad to prevent any conflict with ML_CHECKPOINT= data set
option

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-22
Loading Data into Teradata Tables
DBCOMMIT SAS Load Option – Example

Take a look at the SQL statement that is being passed to the database. Using the SASTRACE=‘,,,d’
option you can clearly see that the DBMS is handling the CREATE TABLE AS processing. This is
good.

This statement doesn’t move data from the DBMS to SAS and from SAS back into the DBMS. The data
stays in the DBMS.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-23
Loading Data into Teradata Tables
MULTISTMT SAS Load Option
Specifies whether INSERT statements are sent to Teradata one at a time or in a group (multi-row
inserts).
 When you need to insert large volumes of data, you can significantly improve performance by
using MULTISTMT= instead of inserting only single-row.
 SAS first determines how many insert statements that it can send to Teradata:
 How many SQL insert statements can fit in a 64K buffer,
 how many data rows can fit in the 64K data buffer, and
 how many inserts the Teradata server chooses to accept.

The SAS/ACCESS engine to Teradata supports a MULTISTMT option that causes the engine to
generate a multi-row insert, using the same mechanism as the Teradata TPump utility.

Significant performance gains can be obtained when compared to single-row inserts when large
volumes of data are inserted

Examples
Here is an example of how you can send insert statements one at a time to Teradata.

libname user teradata user=zoom pw=XXXXXX server=dbc; proc delete data=user.testdata; run; data
user.testdata(DBTYPE=(I="INT") MULTISTMT=YES); do i=1 to 50; output; end; run;

In the next example, DBCOMMIT=100, so SAS issues a commit after every 100 rows, so it sends only
100 rows at a time.

libname user teradata user=zoom pw=XXXXX server=dbc; proc delete data=user.testdata; run; proc
delete data=user.testdata;run; data user.testdata(MULTISTMT=YES DBCOMMIT=100); do i=1 to
1000; output; end; run;

In the next example, DBCOMMIT=1000, which is much higher than in the previous example. In this
example, SAS sends as many rows as it can fit in the buffer at a time (up to 1000) and issues a commit
after every 1000 rows. If only 600 can fit, 600 are sent to the database, followed by the remaining 400
(the difference between 1000 and the initial 600 that were already sent), and then all rows are
committed.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-24
libname user teradata user=zoom pw=XXXXX server=dbc; proc delete
data=user.testdata; run; proc delete data=user.testdata; run; data
user.testdata(MULTISTMT=YES DBCOMMIT=1000); do i=1 to 10000; output; end;
run;

This next example sets CONNECTION=GLOBAL for all tables, creates a global
temporary table, and stores the table in the current database schema.
libname user teradata user=zoom pw=XXXXX server=dbc connection=global; proc
delete data=user.temp1; run; proc sql; connect to teradata(user=zoom
pw=XXXXXXX server=dbc connection=global); execute (CREATE GLOBAL
TEMPORARY TABLE temp1 (col1 INT ) ON COMMIT PRESERVE ROWS) by
teradata; execute (COMMIT WORK) by teradata; quit; data work.test; do col1=1 to
1000; output; end; run; proc append data=work.test base=user.temp1(multistmt=yes);
run;
Loading Data into Teradata Tables
MTULTISTMT SAS Load Option – Example

Take a look at the SQL statement that is being passed to the database. Using the SASTRACE=‘,,,d’
option you can clearly see that the DBMS is handling the CREATE TABLE AS processing. This is
good.

This statement doesn’t move data from the DBMS to SAS and from SAS back into the DBMS. The data
stays in the DBMS.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-25
Loading Data into Teradata Tables
MULTISTMT SAS Load Option – Example

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-26
Creating Teradata Tables and
Loading Data from SAS
This demonstration illustrates how to control table
creation and column definitions from SAS and how to
load data into a Teradata table from SAS.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-27
Exercise
This exercise reinforces the concepts discussed
previously.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-28
Module 6 – Creating, Updating and Loading Teradata
Tables from SAS

• Section 6.1 – Creating and Loading Teradata Tables from SAS


• Section 6.2 – Loading Data into Teradata Leveraging
Teradata Load Utilities from SAS
• Section 6.3 – Updating Teradata Tables from SAS

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-29
Leveraging Teradata Load Utilities
Improving Teradata Load Performance
SAS/ACCESS to Teradata supports the following native Teradata-CLI bulk-
loading utilities , which greatly improve performance when inserting rows of data
into Teradata tables:
 FastLoad utility for built-loading empty tables
 MultiLoad for bulk-appending to existing tables
 TPUMP* utility for continuous real-time data loads and updates without the
typical table locking effects during bulk-loads
 Teradata Parallel Transporter (TPT) provides even more parallel processing
enabled- and stream-based versions of the classic utilities.

TPT is now default. Will revert back to classic utilities as per need

FASTLOAD option (SAS v8.2+) & MULTILOAD option (SAS v9.1.3+)

NOTE: MULTILOAD is an option available with version SAS version 9.1.3 and higher, (and requires
that the “SAS/ACCESS to Teradata” module be installed on your SAS platform). The FASTLOAD
option is available with SAS version 8.2 and higher, (and requires that the “SAS/ACCESS to Teradata”
module be installed on your SAS platform)

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-30
Leveraging Teradata Load Utilities
The Teradata classics and their differences

FastLoad MultiLoad
FASTLOAD=Yes MULTILOAD=Yes
Can be used to load data to Can be used to load empty
empty tables only. tables or tables that contain
data.
Cannot load duplicate rows Can be used to load duplicate
into any tables. rows into multi-set tables.
Can be used with the Can only be used as a data set
LIBNAME statement or as a option.
data set option.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-31
FastLoading Empty Teradata Tables
Assumptions/Features
 Must be creating a new table (in any given logical Teradata database) or loading
into an empty table.
 Each row loaded must be unique, duplicate rows will be rejected (bit-for-bit
duplicate rows).
 Log files can be captured from the load to determine the source of the problem.
Restrictions
 Must the target table must have no secondary indexes, join indexes, or hash
indexes defined on it.
 The target table must have no triggers defined on it. Triggers are data
modifications – for example, insert, delete, or update.
 The table must have no standard referential integrity or batch referential integrity
defined on it.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-32
FASTLOADing Empty Teradata Tables
Enabling Teradata FASTLOADING
• Can be used by any SAS program step which creates and loads an empty
table at the same time or steps which load into empty tables
• SAS Libname and Data Set options FASTLOAD=YES (synonym
BULKLOAD=YES) enables Teradata’s utility if available
• Further might be helpful when using FASTLOAD
– SESSIONS Specifies how many Teradata sessions to be logged on when
using FastLoad, FastExport, or MultiLoad

SESSIONS=4 When reading data with FastExport or loading data with FastLoad and MultiLoad, you
can request multiple sessions to increase throughput. Using large values might not necessarily increase
throughput due to the overhead associated with session management. Check whether your site has any
recommended value for the number of sessions to use. See your Teradata documentation for details
about using multiple sessions.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-33
FASTLOADing Empty Teradata Tables
Example – Creating and FastLoading a Teradata table using SAS DATA Step

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-34
FASTLOADing empty Teradata Tables
Example – Creating and FastLoading a Teradata table using SAS PROC SQL

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-35
FASTLOADing empty Teradata Tables
Example – Creating and FastLoading a Teradata table using two separate, independent steps.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-36
Using FASTLOAD for Append Operations
Using a FastLoad in a Three-Step Table Append Process
1. Write new data to an “intermediate Teradata table”, using the FASTLOAD
option (if so advised by your Teradata DBA)
2. Use Explicit SQL Pass-Thru to insert rows into the “target Teradata table”
from the “intermediate Teradata table” often referred to by Teradata users as
an “INSERT … SELECT”
3. Then, drop the “intermediate Teradata table”

Recall that use of the FASTLOAD option will result in duplicate rows being
dropped when creating your intermediate table, (if duplicate rows exist within
your intermediate data set)

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-37
Using FASTLOAD for Append Operations
Example – FastLoad an intermediate table in a multi-step append operation.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-38
MULTILOADing Teradata Tables
MultiLoad is the parallel load utility used by Teradata to insert rows of data into both
empty and existing Teradata tables.
Requirement for using the MultiLoad utility
 The target table must have no unique secondary, join, or hash indexes on it.
 The target table must have no triggers defined on it. Triggers are data
modifications – for example, insert, delete, or update.
 The table must have no standard referential integrity or batch referential integrity
defined on it.
 The MultiLoad input file must have data to qualify all columns defined in the
primary index of the target table.
 Will allow to load duplicate rows

You must drop these items on target tables before the load:
unique secondary indexes, foreign key references, join indexes.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-39
MULTILOADing Teradata Tables
Enabling Teradata MultiLoad with SAS
• Can be used by any SAS program step creating and loading Teradata table.
Most commonly multiload is used with SAS Proc Append steps..
• Invoked by the Data Set options MultiLoad=YES if it is available and permission
to use the utility have been granted
• Further options might be helpful when using multiload
– SESSIONS Specifies how many Teradata sessions to be logged on when
using FastLoad, FastExport, or MultiLoad
– LOGDB Specifying an alternative database for writing multiload log files
– BL_LOG specifying non-standard name for the multiload log-files.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-40
MULTILOADing Teradata Tables
Example – Writing to Teradata from SAS Table Append

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-41
Leveraging Teradata Load Utilities
Teradata Parallel Transporter Utilities (TPT)

Enabled using the TPT=YES Data Set or Library Option


• If available and configured uses the TPT load, update, and stream driver to
load data and the export driver to read data
• If TPT native utilities are not available, via the TPTAPI the classic FastLoad,
MultiLoad, etc. utilities will be used.

 If Teradata Parallel Transporter is enabled along with FastLoad or


MultiLoad using the TPT=YES Option the following Message
appears in the SAS-Log: Teradata connection: TPT FastLoad has
inserted 100 row(s).

The TPT API provides a consistent interface for FastLoad, MultiLoad, and Multi-Statement insert. TPT
API documentation refers to FastLoad as the load driver, MultiLoad as the update driver, and Multi-
Statement insert as the stream driver.

By using the TPT API, you can load data into a Teradata table without working directly with such stand-
alone Teradata utilities as FastLoad, MultiLoad, or TPump. When TPT=NO, SAS uses the TPT API load
driver for FastLoad, the update driver for MultiLoad, and the stream driver for Multi-Statement insert.
When TPT=YES, sometimes SAS cannot use the TPT API due to an error or because it is not installed
on the system. When this happens, SAS does not produce an error, but it still tries to load data using the
requested load method (FastLoad, MultiLoad, or Multi-Statement insert). To check whether SAS used
the TPT API to load data, look for a similar message to this one in the SAS log:

NOTE: Teradata connection: TPT FastLoad/MultiLoad/MultiStatement insert has read n row(s).


Example
In this example, SAS data is loaded into Teradata using the TPT API. This is the default method of
loading when FastLoad, MultiLoad, or Multi-Statement insert are requested. SAS still tries to load data
even if it cannot use the TPT API.

libname tera teradata user=testuser pw=testpw TPT=YES; /* Create data */ data testdata; do i=1 to 100;
output; end; run; * Load using MultiLoad TPT. This note appears in the SAS log if SAS uses TPT.
NOTE: Teradata connection: TPT MultiLoad has inserted 100 row(s).*/ data
tera.testdata(MULTILOAD=YES); set testdata; run;

/* Verfication */
Make sure that LD_LIBRARY_PATH has the TPT library directory embedded. Then run the following

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-42
code to verify that TPT is being used in SAS:

Libname x teradata user=…… pass=….. ;


Options sastrace=”,,d,d” sastraceloc=saslog ;
Data x.test (tpt=yes fastload=yes) ; do I = 1 to 5 ; output ; end ; run ;

Then look at the trace messages for each row and you should see something like

TERADATA: trtpt_insert() …
Teradata Load Utilities – TPT FastLoad
TPT FastLoad Supported Features and Restrictions
 SAS/ACCESS Interface to Teradata supports the TPT API for FastLoad, also known as the
load driver. SAS/ACCESS works by interfacing with the load driver through the TPT API, which
in turn uses the Teradata FastLoad protocol for loading data.
 If SAS cannot find the Teradata modules that are required for the TPT API or TPT=NO, then
SAS/ACCESS uses the old method of FastLoad.
 SAS/ACCESS can restart FastLoad from checkpoints when FastLoad uses the TPT API.
 Data errors are logged in Teradata tables. Error recovery can be difficult if you do not
TPT_CHECKPOINT_DATA= to enable restart from the last checkpoint. To find the error that
corresponds to the code that is stored in the error table, see your Teradata documentation.
You can restart a failed job for the last checkpoint by following the instructions in the SAS error
log.

The SAS/ACCESS FastLoad facility using the TPT API is similar to the native Teradata FastLoad
utility. They share these limitations.
• FastLoad can load only empty tables. It cannot append to a table that already contains data. If you try
to use FastLoad when appending to a table that contains rows, the append step fails.
• FastLoad does not load duplicate rows (those where all corresponding fields contain identical data)
into a Teradata table. If your SAS data set contains duplicate rows, you can use other load methods

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-43
Teradata Load Utilities – TPT FastLoad
TPT FastLoad Example

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-44
Teradata Load Utilities – TPT MultiLoad
TPT MultiLoad Supported Features and Restrictions
SAS/ACCESS Interface to Teradata supports the TPT API for MultiLoad, also known as the
update driver. SAS/ACCESS works by interfacing with the update driver through the TPT
API. This API then uses the Teradata MultiLoad protocol for loading data.
 If SAS cannot find the Teradata modules that are required for the TPT API or TPT=NO,
then SAS/ACCESS uses the old method of MultiLoad.
 SAS/ACCESS supports only insert operations and loading only one target table at time.
 SAS/ACCESS can restart MultiLoad from checkpoints when MultiLoad uses the TPT
API.
 Errors are logged to Teradata tables. Error recovery can be difficult if you do not set
TPT_CHECKPOINT_DATA= to enable restart from the last checkpoint. You can restart
a failed job for the last checkpoint by following the instructions in the SAS error log.

The SAS/ACCESS MultiLoad facility loads both empty and existing Teradata tables.

The SAS/ACCESS MultLoad facility using the TPT API is similar to the native Teradata MultiLoad
utility. A common limitation that they share is that you must drop these items on target tables before the
load:

- unique secondary indexes, foreign key references, join indexes

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-45
Teradata Load Utilities – TPT MultiLoad
TPT MultiLoad Example

NOTE: Appending sashelp.prdsale to TDSASBOX.PRDTD_MASTER.


NOTE: FORCE is specified, so dropping/truncating will occur.
NOTE: There were 1440 observations read from the data set sashelp.prdsale.
NOTE: 1440 observations added.
NOTE: The data set TDSASBOX.PRDTD_MASTER has . observations and 4 variables.
TERADATA: trforc: COMMIT WORK

NOTE: Teradata connection: TPT MultiLoad has inserted 1440 row(s).


NOTE: PROCEDURE APPEND used (Total process time):
real time 5.07 seconds
user cpu time 0.06 seconds
system cpu time 0.07 seconds
Memory 251k
OS Memory 7536k

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-46
Teradata Basics – Teradata TPUMP
TPump is a Teradata utility designed to continuously move data from data sources into Teradata
tables without locking the affected table.
 TPump provides near-real-time data into data warehouses.
 TPump can be used to insert, update, and delete data in the Teradata database.
 TPump uses Teradata row hash locks, meaning users can run queries while it’s updating the
Teradata Warehouse.

http://en.wikipedia.org/wiki/Tpump

TPump uses standard Teradata SQL to achieve moderate to high data loading rates to the Teradata
RDBMS. Multiple sessions and multi-statement request are typically used to increase throughput.

TPump provides an alternative to MultiLoad for the low volume batch maintenance of large databases
under control of a Teradata system. Instead of updating Teradata databases overnight, or in batches
throughout the day, TPump updates information in real time, acquiring every bit of data from the client
system with low processor utilization. It does this through a continuous feed of data into the data
warehouse, rather than the traditional batch updates. Continuous updates results in more accurate,
timely data.

And, unlike most load utilities, TPump uses row hash locks rather than table level locks. This allows
users to run queries while TPump is running. This also means that TPump can be stopped
instantaneously.

TPump also provides a dynamic throttling feature that enables it to run “all out” during batch windows,
but within limits when it may impact other business uses of the Teradata RDBMS. Operators can specify
the number of statements run per minute, or may alter throttling minute-by-minute, if necessary.

TPump’s main attributes are:


• Simple, hassle-free setup – doesn’t require staging of data, intermediary files, or special hardware.
• High-end portability – supports IBM mainframes; UNIX MP-RAS; AIX; HP-UX; Windows 98,
Windows NT, Windows 2000, and Windows XP; and Solaris SPARC.
• Efficient, time-saving operation – jobs can continue running in spite of database restarts, dirty data,

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-47
and network slow downs. Jobs can restart with absolutely no intervention.
• Flexible data management – accepts an infinite variety of data forms from an
infinite number of data sources, including direct feeds from other databases.
TPump is also able to transform that data on the fly before sending it to Teradata.
SQL statements and conditional logic are usable within the utilities, making it
unnecessary to write wrapper jobs around.

Features
• Fast, scalable continuous data loads
• Row hash lock enables concurrent queries
• Dynamic throttling feature
• Best for small data volumes
Teradata Load Utilities – TPT TPUMP
TPT Multi-Statement Insert – Features and Restrictions

SAS supports the TPT API for Multi-Statement insert, also known as the stream driver.
SAS/ACCESS works by interfacing with the stream driver through the TPT API, which in turn uses
the Teradata Multi-Statement insert (TPump) protocol for loading data.
 If SAS cannot find the Teradata modules that are required for the TPT API or TPT=NO, then
SAS/ACCESS uses the old method of Multi-Statement insert.
 SAS/ACCESS can restart Multi-Statement insert from checkpoints when Multi-Statement
insert uses the TPT API.
 The SAS/ACCESS Multi-Statement insert facility loads both empty and existing Teradata
tables. SAS/ACCESS supports only insert operations and loading only one target table at
time.
 Errors are logged to Teradata tables. Error recovery can be difficult if you do not set
TPT_CHECKPOINT_DATA= to enable restart from the last checkpoint. You can restart a failed
job for the last checkpoint by following the instructions on the SAS error log.

This is the default Multi-Statement insert method.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-48
Teradata Load Utilities – TPT TPUMP
TPT TPUMP Example

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-49
Use Fastest Loading Methods
to Load Data into Teradata
Tables
This demonstration illustrates how to use Teradata bulk
loading utilities from SAS programs.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-50
Exercise
This exercise reinforces the concepts discussed
previously.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-51
Module 6 – Creating, Updating, and Loading Teradata
Tables from SAS

• Section 6.1 – Creating and Loading Teradata Tables from SAS


• Section 6.2 – Loading Data into Teradata Leveraging Teradata
Load Utilities from SAS
• Section 6.3 – Updating Teradata Tables from SAS

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-52
Updating Teradata Tables from SAS
Row updates to database tables from SAS
 SAS enables PROC SQL updates
 Updates using transaction tables used with PROC SQL or DATA Step with the
MODIFY clause

Upserts (simultan. updates and inserts) are supported


 DATA Step with the MODIFY clause
 PROC SQL approaches instead (explicit pass-through)
 Teradata TPT mload utility as the corresponding Teradata mechanism for high-
volume upserts

*outdated wiht 9.2;


/* 3. Table Upsert Operation */
* SAS Access to Teradata doesn't support Data Step with Modify for table upserts.;
* Use a multi step Proc SQL approach instead.;
* Teradata MultiLoad Utility is the corresponding Teradata mechanism for high-volume upserts;

/* 4. Table Update and Delete Operation */


* With SAS 8.2 Proc SQL IPT updates and deletes are committed one row at a time, this is highly
inefficient.;
* Better use EPT syntax for updates and deletes.;

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-53
Updating Teradata Tables from SAS
Row Updates using PROC SQL implicit pass-through
 Issues a single update request for each row

SAS 8.2 committed one row at a time, this is highly inefficient

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-54
Updating Teradata Tables from SAS
Row Updates using Proc SQL and transaction tables in SAS
The code below identifies a SAS customer process that did not work well with PROC SQL running
against the DBMS.
 The code example will produce multiple selects and updates of the data to satisfy the query,
resulting in multiple passes of the data.
Given the size of the data, this could degrade performance.

Compare the following

/* Update Case 3 - Same SQL-based update Teradata internally, for each empid in the transaction set a
query for each variable to be updated*/

proc sql;
update myTDLib.PayrollMasterUP pmu
set Gender=(select gender from MyTDLib.Payrollchanges pc where pmu.empid=pc.empid),
Jobcode=(select Jobcode from MyTDLib.Payrollchanges pc where
pmu.empid=pc.empid),
Salary=(select Salary from MyTDLib.Payrollchanges pc where pmu.empid=pc.empid),
DATEOFBIRTH=(select DATEOFBIRTH from
MyTDLib.Payrollchanges pc where pmu.empid=pc.empid),
DATEOFHIRE=(select DATEOFHIRE from MyTDLib.Payrollchanges
pc where pmu.empid=pc.empid)
where pmu.empid in (select empid from MyTDLib.Payrollchanges pc);
/* where exists
(select 1 from MyTDLib.Payrollchanges as pc2
where pmu.empid=pc2.empid);*/
quit;
/*
TERADATA_83: Executed: on connection 7
SELECT "EMPID","GENDER","JOBCODE","SALARY","DATEOFBIRTH","DATEOFHIRE" FROM
saseduc."PayrollMasterUP" FOR CURSOR
TERADATA: trget - rows to fetch: 1036

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-55
TERADATA: trqacol- No casting. Raw row size=4, Casted size=4,
CAST_OVERHEAD_MAXPERCENT=20%

TERADATA_85: Executed: on connection 8


SELECT "EMPID" FROM saseduc."Payrollchanges"
TERADATA: trget - rows to fetch: 6
TERADATA: trqacol- No casting. Raw row size=5, Casted size=5,
CAST_OVERHEAD_MAXPERCENT=20%

TERADATA_87: Executed: on connection 2


SELECT "GENDER","EMPID" FROM saseduc."Payrollchanges" WHERE
("EMPID" = '1221' )
TERADATA: trget - rows to fetch: 1
TERADATA: trqacol- No casting. Raw row size=7, Casted size=7,
CAST_OVERHEAD_MAXPERCENT=20%

TERADATA_89: Executed: on connection 3


SELECT "JOBCODE","EMPID" FROM saseduc."Payrollchanges" WHERE
("EMPID" = '1221' )
TERADATA: trget - rows to fetch: 1
TERADATA: trqacol- No casting. Raw row size=12, Casted size=12,
CAST_OVERHEAD_MAXPERCENT=20%

TERADATA_91: Executed: on connection 4


SELECT "SALARY","EMPID" FROM saseduc."Payrollchanges" WHERE
("EMPID" = '1221' )
TERADATA: trget - rows to fetch: 1
TERADATA: trqacol- No casting. Raw row size=8, Casted size=12,
CAST_OVERHEAD_MAXPERCENT=20%

TERADATA_93: Executed: on connection 5


SELECT "DATEOFBIRTH","EMPID" FROM saseduc."Payrollchanges" WHERE
("EMPID" = '1221' )
TERADATA: trget - rows to fetch: 1
TERADATA: trqacol- No casting. Raw row size=8, Casted size=12,
CAST_OVERHEAD_MAXPERCENT=20%

TERADATA_95: Executed: on connection 6


SELECT "DATEOFHIRE","EMPID" FROM saseduc."Payrollchanges" WHERE
("EMPID" = '1221' )
TERADATA: trget - rows to fetch: 1
TERADATA_96: Executed: on connection 7
USING ("GENDER" CHAR (1),"JOBCODE" CHAR (3),"SALARY"
FLOAT,"DATEOFBIRTH" DATE,"DATEOFHIRE"
DATE)UPDATE saseduc."PayrollMasterUP" SET
"GENDER"=:"GENDER","JOBCODE"=:"JOBCODE","SALARY"=:"SALARY","D
ATEOFBIRTH"=:"DATEOFBIRTH","DATEOFH
IRE"=:"DATEOFHIRE" WHERE CURRENT
....
TERADATA_98: Executed: on connection 2
SELECT "GENDER","EMPID" FROM saseduc."Payrollchanges" WHERE
("EMPID" = '1561' )
....
TERADATA_108: Executed: on connection 2
SELECT "GENDER","EMPID" FROM saseduc."Payrollchanges" WHERE
("EMPID" = '1065' )
....
TERADATA_118: Executed: on connection 2
SELECT "GENDER","EMPID" FROM saseduc."Payrollchanges" WHERE
("EMPID" = '1639' )
....TERADATA: trforc: COMMIT WORK
NOTE: 4 rows were updated in MYTDLIB.PayrollMasterUP.

TERADATA: trforc: COMMIT WORK

DBMS_TIMER: summary statistics


DBMS_TIMER: total SQL execution seconds were: 0
DBMS_TIMER: total SQL prepare seconds were: 0
DBMS_TIMER: dbiopen/dbiclose timespan was 1.
TERADATA: trforc: COMMIT WORK

DBMS_TIMER: summary statistics


DBMS_TIMER: total SQL execution seconds were: 0
DBMS_TIMER: total SQL prepare seconds were: 0
DBMS_TIMER: dbiopen/dbiclose timespan was 1.
TERADATA: trforc: COMMIT WORK

DBMS_TIMER: summary statistics


DBMS_TIMER: total SQL execution seconds were: 0
DBMS_TIMER: total SQL prepare seconds were: 0
DBMS_TIMER: dbiopen/dbiclose timespan was 1.
TERADATA: trforc: COMMIT WORK
DBMS_TIMER: summary statistics
DBMS_TIMER: total SQL execution seconds were: 0
DBMS_TIMER: total SQL prepare seconds were: 0
DBMS_TIMER: dbiopen/dbiclose timespan was 1.
TERADATA: trforc: COMMIT WORK

DBMS_TIMER: summary statistics


DBMS_TIMER: total SQL execution seconds were: 0
DBMS_TIMER: total SQL prepare seconds were: 0
DBMS_TIMER: dbiopen/dbiclose timespan was 1.
TERADATA: trforc: COMMIT WORK

DBMS_TIMER: summary statistics


DBMS_TIMER: total SQL execution seconds were: 0
DBMS_TIMER: total SQL prepare seconds were: 0
DBMS_TIMER: total SQL row update seconds were: 0
DBMS_TIMER: dbiopen/dbiclose timespan was 1.
TERADATA: trforc: COMMIT WORK

DBMS_TIMER: summary statistics


DBMS_TIMER: total SQL execution seconds were: 0
DBMS_TIMER: total SQL prepare seconds were: 0
DBMS_TIMER: dbiopen/dbiclose timespan was 0.
211 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1.90 seconds
user cpu time 0.21 seconds
system cpu time 0.06 seconds
Memory 1512k
OS Memory 11892k
*/
Updating Teradata Tables from SAS
Row Updates using PROC SQL and transaction tables in SAS
TERADATA_2142: Executed: on connection 2
SELECT "EMPID","GENDER","JOBCODE","SALARY","DATEOFBIRTH","DATEOFHIRE" FROM
saseduc."PayrollMasterUP" FOR CURSOR
TERADATA: trget - rows to fetch: 1036

TERADATA_2143: Executed: on connection 2


USING ("GENDER" CHAR (1),"JOBCODE" CHAR (3),"SALARY" FLOAT,"DATEOFBIRTH"
DATE,"DATEOFHIRE" DATE)
UPDATE saseduc."PayrollMasterUP"
SETGENDER"=:"GENDER",“….."DATEOFHIRE"=:"DATEOFHIRE"
WHERE CURRENT
TERADATA: trforc: COMMIT WORK
NOTE: 4 rows were updated in MYTDLIB.PayrollMasterUP.
NOTE: PROCEDURE SQL used (Total process time): real time 0.46 seconds

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-56
Upserting Teradata Tables from SAS
Upsert Processing using explicit SQL pass-through
In the following example, payroll records stored in a transaction table are matched against the
payroll master table. If a match is found, you want to update the existing master record. If no
match is found, you want to append the transaction record (a new employee) to the master table.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-57
Upserting Teradata Tables from SAS
Upsert Processing using explicit SQL pass-through

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-58
Upserting Using Teradata Load Utilities
MultiLoad Supported Features and Restrictions
The UPSERT load option, which simultaneously updates and inserts during a load is also available
with the Teradata multiload. TPT also supports upserting.
proc append base = TSSASBOX.PRDTD_MASTER
(multiload=yes upsert=yes
upsert_where=(PRODUCT, COUNTRY, YEAR, MONTH)
keep = COUNTRY ACTUAL ... MONTH)
data = sashelp.prdsale (
keep = COUNTRY ACTUAL ... MONTH) force;
run;
.begin import mload tables sandbox."PRDTD_MASTER" WORKTABLES SAS_ML_WT_1758253940123
ERRORTABLES SAS_ML_ET_175825394012 SAS_ML_UT_17582539401230429826
NOTIFY HIGH EXIT SASMLNE.DLL TEXT '4308 ';.layout saslayout indicators;.FIELD "COUNTRY" * CHAR
(10);…. DML Label SASDML DO
INSERT FOR MISSING UPDATE ROWS;
UPDATE sandbox."PRDTD_MASTER" SET "ACTUAL"=:
"ACTUAL","PREDICT"=:"PREDICT","PRODTYPE"=:"PRODTYPE","QUARTER"=:"QUARTER"
WHERE "COUNTRY"=:"COUNTRY" AND "PRODUCT"=:"PRODUCT" AND "YEAR"=:"YEAR" AND
"MONTH"=:"MONTH";I
NSERT sandbox."PRDTD_MASTER"("COUNTRY","ACTUAL",“…,"MONTH")
VALUES(:"COUNTRY",:"ACTUAL…",:"MONTH");.
IMPORT INFILE DUMMY AXSMOD SASMLAM.DLL '4308 4308 4308 ' FORMAT UNFORMAT LAYOUT
SASLAYOUT APPLY SASDML;.END MLOAD;
NOTE: MultiLoad Inserts : 0 MultiLoad Updates : 17280
NOTE: Procedure used: APPEND - (Total process time): real time 14.84 seconds.

proc append base=mytera.greeting (MULTILOAD=YES UPSERT=YES)


data=work.upsert;
run;

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-59
Updating Teradata Tables from
SAS programs
This demonstration illustrates how to SAS programming
techniques to update Teradata tables.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-60
Exercise
This exercise reinforces the concepts discussed
previously.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-61
Copyright © 2009 by SAS Institute Inc. and Teradata Corporation. All Rights Reserved.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA
and other countries. Teradata and the names of products and services of Teradata Corporation are registered trademarks or trademarks
of Teradata Corporation in the USA and other countries. ® indicates USA registration. Other brand and product names are registered
trademarks or trademarks of their respective companies.

Creating, Updating, and Loading Teradata Tables from SAS Slide 6-62
Module 7
Best Practices for Advanced Integration
Use Cases

Best Practices for Advanced Integration Use Cases Slide 7-1


Module 7 - Best Practices for Advanced Integration
Use Cases

• Section 7.1 – Using Staging and Temporary Tables in Teradata


• Section 7.2 – Teradata Analytical Functions (Optional)
• Section 7.3 – Preparing Data in Teradata for Use within SAS
(Optional)
• Section 7.4 – SAS and Teradata Workload Considerations
(Optional)
• Section 7.5 – Aspects of Security and Administration (Optional)

Best Practices for Advanced Integration Use Cases Slide 7-2


Module 7 - Best Practices for Advanced Integration
Use Cases

• Section 7.1 – Using Staging and Temporary Tables in Teradata


• Section 7.2 – Teradata Analytical Functions (Optional)
• Section 7.3 – Preparing Data in Teradata for Use within SAS
(Optional)
• Section 7.4 – SAS and Teradata Workload Considerations
(Optional)
• Section 7.5 – Aspects of Security and Administration (Optional)

Best Practices for Advanced Integration Use Cases Slide 7-3


Creating Staging or Temporary Teradata Tables
Writing Data to Teradata from SAS

There are several use cases in which a SAS user may


 want to fast upload a staging table to Teradata table and/or make use of Teradata temporary
tables

Use cases examples


 SAS Data Integration flows (ETLT/ELT processing)
 SAS Power Users or Analyst using the Teradata Sandbox concept as a development area
 SAS Applications creating tables as part or their inherent or optional tuning capabilities

• incl. uploading data sets for merging/joining operations in Teradata

Operation examples include


• Uploading data sets for merging/joining operations in Teradata
• Uploading new predictive score tables

NoPI is going to be a great feature for sites which have two general categories of use: 1) ELT, and 2)
SandBoxes or User/App created Tables. Both of these have been mentioned in previously in this chain. I
would expect that in both cases, the default an organization will be to use NoPI.

For me, this is a defensive feature which prevents runtime breakage. It prevents our DBA team from
having to deal with exceptions and for our development staff to deal with a class of boundary conditions
which have no real solution on TD. Runtime exceptions are the most expensive to deal with from a staff
perspective, particularly when your batch runs in the middle of the night...

Best Practices for Advanced Integration Use Cases Slide 7-4


Using Teradata NoPI Tables
Teradata has No Primary Index (NoPI) Tables
• NoPI tables are particularly useful as staging tables for bulk data loads. Table loads
process even faster and more efficiently using FastLoad or TPUMP.
• NoPI tables are also useful as so-called sandbox tables when an appropriate primary index
has not yet been defined for the primary-indexed table they will eventually populate.
• Creating a NOPI-Table from SAS

Best Practices for Advanced Integration Use Cases Slide 7-5


Using Teradata NoPI Tables
Teradata No Primary Index (NoPI) Tables
Example – Uploading a SAS Table to Teradata to join it with a Teradata table.

Best Practices for Advanced Integration Use Cases Slide 7-6


Using Teradata Temporary Tables
 Teradata supports two types of temporary tables, global and volatile.
– With the use of global temporary tables, the rows are deleted after the connection is
closed but the table definition itself remains.
– Their definitions must be stored in Teradata Dictionary (create global table …) before they
can be referenced to. Their definition has general scope but their contents has session local
scope
– Global Temporary table take up space from the temporary space and not spool unlike
volatile table. So it is best to create global temporary tables if the user has limited Spool
space to spare
– With volatile temporary tables, the table (and all rows) are dropped when the connection is
closed.
– They are created implicitly in the session, their contents and definition have session local
scope
– Volatile tables occupy Spool Space
 Temporary tables use and require TEMP Space of the database or user area they are created
in.
 There is little performance to be gained by using temporary tables. Additionally,
FastLoad and FastExport do not support use of temporary tables at this time.

Due to a Teradata limitation, FastLoad and FastExport do not support use of temporary tables at this
time.

When accessing a volatile table with a LIBNAME statement, it is recommended that you do not use
these options:

DATABASE= (as a LIBNAME option)


SCHEMA= (as a data set or LIBNAME option)

If you use either DATABASE= or SCHEMA=, you must specify DBMSTEMP=YES in the LIBNAME
statement to denote that all tables accessed through it and all tables that it creates are volatile tables.

DBMSTEMP= also causes all table names to be not fully qualified for either SCHEMA= or
DATABASE=. In this case, you should use the LIBNAME statement only to access tables--either
permanent or volatile--within your default database or schema.

Terminating a Temporary Table


You can drop a temporary table at any time, or allow it to be implicitly dropped when the connection is
terminated. Temporary tables do not persist beyond the scope of a single connection.

Examples
The following example shows how to use a temporary table:

/* Set global connection for all tables. */


libname x teradata user=test pw=test server=boom connection=global;

Best Practices for Advanced Integration Use Cases Slide 7-7


/* Create global temporary table & store in the current database schema. */
proc sql; connect to teradata(user=test pw=test server=boom connection=global);
execute (CREATE GLOBAL TEMPORARY TABLE temp1 (col1 INT ) ON
COMMIT PRESERVE ROWS) by teradata; execute (COMMIT WORK) by teradata;
quit;
/* Insert 1 row into the temporary table to surface the table. */
proc sql; connect to teradata(user=test pw=test server=boom connection=global);
execute (INSERT INTO temp1 VALUES(1)) by teradata; execute (COMMIT WORK)
by teradata; quit;
/* Access the temporary table through the global libref. */
data work.new_temp1; set x.temp1; run;
/* Access the temporary table through the global connection. */
proc sql; connect to teradata (user=test pw=test server=boom connection=global);
select * from connection to teradata (select * from temp1); quit;
/* Drop the temporary table. */
proc sql; connect to teradata(user=prboni pw=prboni server=boom
connection=global); execute (DROP TABLE temp1) by teradata; execute (COMMIT
WORK) by teradata; quit;

This example shows how to use a volatile table:

/* Set global connection for all tables. */


libname x teradata user=test pw=test server=boom connection=global;
/* Create a volatile table. */
proc sql; connect to teradata(user=test pw=test server=boom connection=global);
execute (CREATE VOLATILE TABLE temp1 (col1 INT) ON COMMIT PRESERVE
ROWS) by teradata; execute (COMMIT WORK) by teradata; quit;
/* Insert 1 row into the volatile table. */
proc sql; connect to teradata(user=test pw=test server=boom connection=global);
execute (INSERT INTO temp1 VALUES(1)) by teradata; execute (COMMIT WORK)
by teradata; quit;
/* Access the temporary table through the global libref. */
data _null_; set x.temp1; put _all_; run;
/* Access the volatile table through the global connection. */
proc sql; connect to teradata (user=test pw=test server=boom connection=global);
select * from connection to teradata (select * from temp1); quit;
/* Drop the connection & the volatile table is automatically dropped. */
libname x clear;
/* To confirm that it is gone, try to access it. */
libname x teradata user=test pw=test server=boom connection=global;
/* It is not there. */
proc print data=x.temp1; run;
Using Teradata Temporary Tables
Temporary Tables – Use Cases
 Loading SAS Data into temporary Teradata Tables
– If you require to load temporary data into Teradata, but don‘t want to store and
manage those tables as permanent object, you can use temporary tables.
– Upload a small SAS Table to Teradata and join it with a large Teradata table. Using
NoPI tables might be more efficient for this use case.
 Intermediate Tables in Batch Jobs
– Recommended to use in batch jobs to transfer data from one SQL statement to
another (the usage of temporary tables decreases the cost of processing through
Teradata dictionary maintenance, locking logic and journaling reduction)
– Temporary tables should be used with caution because they cannot survive batch
or system failures: the temporary tables should not be used when the batch
automatic restart is required

Best Practices for Advanced Integration Use Cases Slide 7-8


Using Temporary Teradata Tables
Using Temporary Teradata Tables from SAS
 Teradata session logoff destroys automatically the contents of temporary tables. That’s why
temporary tables cannot be used without specifying CONNECTION=GLOBAL.
 The usage of that option prevents automatic LOGON/LOGOFF at the beginning and at the and
of every PROC SQL.

Best Practices for Advanced Integration Use Cases Slide 7-9


Using Temporary Teradata Tables
Using Temporary Teradata Tables from SAS
 When you specify DBMSTEMP=YES in the LIBNAME statement, all tables created will be
volatile tables. The TEMP space of the database specified in the LIBNAME statement is used for
storing the tables physically.
 When you specify CONNECTION=GLOBAL, you can reference a temporary table throughout a
SAS session.

Best Practices for Advanced Integration Use Cases Slide 7-10


Using Teradata Temporary
Tables from SAS
This demonstration illustrates how to use Teradata
temporary and NoPI tables from SAS programs.

Best Practices for Advanced Integration Use Cases Slide 7-11


Exercise
This exercise reinforces the concepts discussed
previously.

Best Practices for Advanced Integration Use Cases Slide 7-12


Module 7 - Best Practices for Advanced Integration
Use Cases

• Section 7.1 – Using Staging- and Temporary Tables in Teradata


• Section 7.2 – Teradata Analytical Functions (Optional)
• Section 7.3 – Preparing Data in Teradata for Use within SAS
(Optional)
• Section 7.4 – SAS and Teradata Workload Considerations
(Optional)
• Section 7.5 – Aspects of Security and Administration (Optional)

Best Practices for Advanced Integration Use Cases Slide 7-13


Teradata Analytical Functions - Overview
Business needs often require you to analyze and relate data from different
aggregation levels within the same report or query.
• Teradata SQL language has a variety of aggregation and univariate
statistical functions that can be used to aggregate and summarize detailed
data.
• Teradata Ordered analytical functions allow complex manipulations and
transformation to create suitable business metrics.

Best Practices for Advanced Integration Use Cases Slide 7-14


Teradata Analytical Functions - Overview
What Are Ordered Analytical Functions?
 Like traditional aggregate functions, window* aggregate functions operate on
groups of rows and permit qualification and filtering of the group result. Unlike
aggregations, OLAP functions also return individual detail rows, not just
aggregations.

How They Work


 The window feature is ANSI SQL-99 compliant and provides a way to
dynamically define a subset of data, or window, in an ordered relational
database table.

Target different window of data rows


– All rows
– A specific number of rows
– Row immediately before/after current row
– Row position (7,13) before/after current row
– All rows before/after current row

Best Practices for Advanced Integration Use Cases Slide 7-15


Teradata Analytical Functions - Overview
Traditional SQL calculation compared to ordered analytical function calculations:

Best Practices for Advanced Integration Use Cases Slide 7-16


Teradata Analytical Functions - Overview
Teradata has numerous native functions that perform analysis based upon
ranking and ordering, including
• Moving Difference
• Moving Average
• Cumulative Sum
• Moving Linear Regression
• Quantiling
• Ranking
• Percentile Ranking
• and, many more …

Best Practices for Advanced Integration Use Cases Slide 7-17


Teradata Analytical Functions - Syntax
Ordered Analytical Function – Basic Example

SELECT
col1,
…, aggregate function
SUM( sales ) window function
OVER (
window grouping
PARTITION BY product
ORDER BY sales DESC window ordering
ROWS BETWEEN
UNBOUNDED PRECEDING window
default boundary= boundaries
AND
partition by group
UNBOUNDED FOLLOWING
) as SalesSum
FROM table;

A window is specified by the OVER() phrase, which can include the following clauses inside the
parentheses:

PARTITION BY, ORDER BY and ROWS

Best Practices for Advanced Integration Use Cases Slide 7-18


Teradata Analytical Functions - Example
SUM(qty)
OVER(
PARTITION BY month
ORDER BY day
ROWS BETWEEN
UNBOUNDED PRECEDING
AND
CURRENT ROW
) as cum_qty

Example - Window defined by


 PARTITION BY clause defining the “grouping” of data
 ORDER BY clause defining the sequence of data
 ROWS BETWEEN defines the window used for calculation
for example, following/preceding/current row unbounded or relative row numbers.

Best Practices for Advanced Integration Use Cases Slide 7-19


Teradata Analytical Functions - Windows
All Windows aggregate functions fall into one of 4 types:
 Group Window - Aggregates based on a grouping of rows.
– Rows clause: ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING (default)
 Cumulative Window - Aggregates based on accumulation of rows.
– Rows clause: ROWS BETWEEN # PRECEDING AND # FOLLOWING
 Moving Window - Aggregates based on a moving window of rows.
– Rows clause: ROWS BETWEEN UNBOUNDED PRECEDING
 Remaining Window - Aggregates based on the rows remaining outside of a defined
window.
– Rows clause: ROWS BETWEEN UNBOUNDED FOLLOWING

Best Practices for Advanced Integration Use Cases Slide 7-20


Example 1 - Group Sum Window Function
The SUM Window function permits an aggregate to be computed across a defined group. The
group may be defined using the PARTITION BY clause. The absence of this clause indicates that
the group consists of all rows.
• Show balances across all account types for all customers and a grand total for the group

Note:
 The window is defined as all rows - there
is no PARTITION specified.
 The final column represents the total of
all rows.
 The default title of the last column
indicates that this is a Group function.

Best Practices for Advanced Integration Use Cases Slide 7-21


Example 1 - Group Sum Window Function
Include the PARTITION BY clause to explicitly group over selected categories.
• Show the totals for each account type for all customers

Note:
 Note that the Group Sum reflects the
total for account type
 Rows are ordered descendingly by
ending_balance within the acct_type
partition group.

Best Practices for Advanced Integration Use Cases Slide 7-22


Example 1 - Group Sum Window Function
Show the totals for each customer across all account types

Note:
 Note that the Group Sum reflects the
total for each customer (cust_id)
 Rows are ordered descendingly by
ending_balance within the cust_id
partition group.
 ROWS clause specifies the default
clause and could have been omitted.

Best Practices for Advanced Integration Use Cases Slide 7-23


Example 2 – Cumulative Sum Window
Show the cumulative sum of balances for each row (beginning with the first row).

Note:
 The Cumulative Sum reflects the
sequential aggregation of all
rows.
 The default title of last column
indicates this is a Cumulative
function.

Best Practices for Advanced Integration Use Cases Slide 7-24


Example 3 – Moving Sum Window
Each row computes a moving sum based on itself and 2 preceding rows.

Note:
 The 1st and 2nd rows compute
their sums based on one and two
rows respectively.
 The default title of the last column
indicates this is a Moving
function.
 Each row is thus a sum of the
ending balance of that specific
row and the two previous rows

Best Practices for Advanced Integration Use Cases Slide 7-25


Teradata Analytical Functions – Ranking
Ranking Ordered Analytical Functions
Analysis based upon ranking is typically performed for two primary reasons:
• To create an ordinal grouping of records, based upon values of some business
metric
• To assess differences in values of business metrics from ranked sets of records,
ordered sequentially by that metric, or by one or more other business metrics
Example: Rank the top three products in each store and the revenue generated
by them in each store

The Teradata RANK function


• The RANK function is a Teradata specific function that assigns a ranking order to
rows in a qualified answer set.

Analysis based upon ranking is particularly efficient in Teradata because of its massively parallel
architecture, spreading work across its many units of parallelism

Best Practices for Advanced Integration Use Cases Slide 7-26


Example 4 – Teradata Rank Window Function
Example: Rank the top three products in each store and the revenue generated by them in each
store

Note:
 The PARTITION BY clause defines the scope
of the ranking (“rank within”).
In this case, the ranking is by ending_balance
on different account types per customer
 Without PARTITION BY, scope would default
to ending_balance for all customers.
 The QUALIFY clause limits the results to the
top two balance amounts for each customer
and the sort sequence of balance amount is
descending.
 Due to PARTITION BY, sort is by sales
(DESC) for each cutomer.
 No aggregation takes place in this query.

Best Practices for Advanced Integration Use Cases Slide 7-27


Using Teradata Ordered
Analytical Functions
This demonstration illustrates how to use Teradata
ordered analytical functions in SAS.

Best Practices for Advanced Integration Use Cases Slide 7-28


Exercise
This exercise reinforces the concepts discussed
previously.

Best Practices for Advanced Integration Use Cases Slide 7-29


Module 7 - Best Practices for Advanced Integration
Use Cases

• Section 7.1 – Using Staging- and Temporary Tables in Teradata


• Section 7.2 – Teradata Analytical Functions (Optional)
• Section 7.3 – Preparing Data in Teradata for Use within SAS
(Optional)
• Section 7.4 – SAS and Teradata Workload Considerations
(Optional)
• Section 7.5 – Aspects of Security and Administration (Optional)

Best Practices for Advanced Integration Use Cases Slide 7-30


Preparing Data in Teradata for Use within SAS
SAS Technique: SAS has an excellent set of simple and complex functions available
to create new variables or transform existing ones.
• SAS is very efficient in handling staging or temporary tables and for memory- or
compute-intensive transformations.

Teradata Alternative: Teradata is a decision-support oriented database that also has


a very deep set of functions available to create new variables or transform existing
ones.
• SAS users should become familiar with Teradata functions in order to leverage
them when it is applicable.
• Data can be prepared in Teradata using SQL to summarize, aggregate, and
transform existing values on tables that then can be used in SAS for further
analysis. This step leverages Teradata capabilities to deal with very large volume
of data.

Best Practices for Advanced Integration Use Cases Slide 7-31


Preparing Data in Teradata
Using relatively simple SQL statements, functions can be applied to data in very large
tables leveraging Teradata parallelism and speed.

Data preparation in SQL can involve


• ranking and quantiling functions
• variable transformation / derivation functions
• complex analysis functions.

Use Explicit Pass-Through for these data manipulations

Best Practices for Advanced Integration Use Cases Slide 7-32


Preparing Data in Teradata
Some of the basic components of a univariate analysis on numeric columns can be
obtained using the following functions:
• COUNT(*) gives us N (the total row count)
• MIN(colx) gives us the minimum value
• MAX(colx) gives us the maximum value
• SUM(colx) gives us the sum
• AVG(colx) gives us the mean
• STDDEV_POP(colx) gives us the standard deviation (population value)
• STDDEV_SAMP(colx) gives us the standard deviation (sample estimate)
• VAR_POP(colx) gives us the variance (population value)
• VAR_SAMP(colx) gives us the variance (sample estimate)
• SKEW(colx) gives us the skewness
• KURTOSIS(colx) gives us the kurtosis

Best Practices for Advanced Integration Use Cases Slide 7-33


Preparing Data in Teradata – Transformation
Data Manipulation in Teradata
• Using SELECT statements and simple IF-THEN-ELSE programming logic, (using
CASE statements), Teradata allows you to perform:
– Variable Recoding and Transformation
– Dummy Coding or Contrast Coding
– Variable Rescaling
– Ordered Analytical Functions
• These data manipulation techniques can be used together with aggregation
techniques to prepare suitable data for a further analysis and manipulation using
SAS tools.

Best Practices for Advanced Integration Use Cases Slide 7-34


Preparing Data in Teradata – Transformation
Data Manipulation in Teradata
• In SAS, it is common to use sets of IF-THEN-ELSE statements to create these
derived variables.
• In SQL, a CASE statement is used to obtain the same result.
CASE WHEN logical_conditions THEN 1
ELSE 0
END AS new_var_name;

Creation of a numerical flag 1/0

• A CASE statement can be easily included in your Explicit Pass-Thru SQL, within a
SELECT statement.
• Transformed variables can be part of your output result set, and also can be used
in subsequent variable specifications in the same SELECT statement.

Best Practices for Advanced Integration Use Cases Slide 7-35


Preparing Data in Teradata – Transformation
Data Manipulation in Teradata
Teradata has many analytical data transformation functions that are easy to use and
parameterize.
A few of the more commonly used SQL functions are shown here:
• LN() Natural Logarithm
• LOG() Base 10 Log
• SQRT() Square Root
• EXP() Exponentiation
There are also several aggregate statistical measures that can be calculated:
• CORR(x,y) Correlation Coefficient (for variables x and y)
• COVAR(x,y) Covariance (for variables x and y)
• REGR_R2(x,y) Regression R-Square (for variables x and y)
• REGR_SLOPE(x,y) Regression Slope (for variables x and y)
• REGR_INTCPT(x,y) Regression Intercept (for variables x and y)

Best Practices for Advanced Integration Use Cases Slide 7-36


Preparing Data in Teradata – Transformation
Data Manipulation in Teradata
Rescaling a continuous numeric variable is performed by specifying a new value for
its upper boundary and/or a new value for its lower boundary.
You then apply a simple linear rescaling function that modifies all values of that
variable so that:
1) The maximum value equals your new upper boundary, and/or the minimum
value equals your new lower boundary
2) Relative differences among values remain unchanged within your newly
defined scale (range).

Best Practices for Advanced Integration Use Cases Slide 7-37


Preparing Data in Teradata (Reference)
Two useful reference guides:
• Teradata 16 “SQL Reference: Functions and Operators”
• Teradata 16 “SQL Reference: Data Manipulation Statements”

SQL Reference: Functions and Operators” manual contains an exhaustive list of


Teradata functions, especially the sections entitled:
• Aggregate Functions
• Ordered Analytical Functions
• Arithmetic Operators and Functions / Trigonometric and Hyperbolic Functions

Best Practices for Advanced Integration Use Cases Slide 7-38


Use Teradata SQL to Prepare
Data for Use in SAS
This demonstration illustrates how to prepare data
using explicit SQL pass-through and Teradata SQL for
use in SAS.

Best Practices for Advanced Integration Use Cases Slide 7-39


Exercise
This exercise reinforces the concepts discussed
previously.

Best Practices for Advanced Integration Use Cases Slide 7-40


Module 7 - Best Practices for Advanced Integration
Use Cases

• Section 7.1 – Using Staging- and Temporary Tables in Teradata


• Section 7.2 – Teradata Analytical Functions (Optional)
• Section 7.3 – Preparing Data in Teradata for Use within SAS
(Optional)
• Section 7.4 – SAS and Teradata Workload Considerations
(Optional)
• Section 7.5 – Aspects of Security and Administration (Optional)

Best Practices for Advanced Integration Use Cases Slide 7-41


General Areas for Improvement
Give SAS Teradata programs a “tune-up”
• Look for bottlenecks and move parts of process up/downstream
• Use Teradata for production data processing, reporting, and scoring functions

Re-architect and migrate processes to other platform


• Teradata for big data exploration, processing, and analytical work
• SAS for more complex, highly-iterative analytical work on smaller samples

Best Practices for Advanced Integration Use Cases Slide 7-42


SAS and Teradata Workload Considerations
If possible join Teradata tables in Teradata and not in SAS
• the join happens faster in Teradata
• you will not unnecessarily take up disk , memory , and processor resources on
SAS server

Reduce strain on SAS server disk , memory , and processor resources by using
Teradata SQL to perform:
• Sampling
• Data exploration and quality checks
• Data summarization and aggregation
• Variable creation and transformation

Make liberal use of EXPLAIN (interpret with caution)

Best Practices for Advanced Integration Use Cases Slide 7-43


Finding Bottlenecks
Examine SAS logs and use SAS diagnostics
The following SAS OPTION statement is useful:
OPTIONS DEBUG=DBMS_TIMERS /* Returns timing and utilization from database */
SASTRACE=‘,,,d’ /*Requests SAS generate code trace for all SQL steps */
SASTRACELOC=SASLOG NO$STSUFFIX /* Tells SAS to write it to program log */

Look in log for:


• Short DBMS run times with long SAS step completion time (network contention)
• Huge amount of time spent sorting data (SAS doing sorting)
• Inadvertent extract of Teradata table to SAS for local processing (mixing Teradata and
SAS functions in PROC or DATA step)

For explicit SQL pass-thru recall that you can and should also use the EXPLAIN option to
see how efficient your SQL will be.

Best Practices for Advanced Integration Use Cases Slide 7-44


Avoiding Bottlenecks
Reducing Network Contention
• Reduce the amount of data being transferred, or
• Reduce the number of times you perform the transfer
• Choose the right processing “window”
• To reduce the amount of data being transferred
• Perform pre-processing, aggregation, column reduction, and sampling in
Teradata, bring only what is required to SAS
• Generate reporting answer sets in Teradata, format and present in SAS
• Push all production model scoring processes into Teradata
• For iterative analytical routines, sample and extract once into SAS data set
format, perform analysis on SAS data set

Best Practices for Advanced Integration Use Cases Slide 7-45


Avoiding Bottlenecks
Avoid Pulling all Data Back to SAS
• Inadvertently done by violating “rules of the road”
• Mistakenly done when users believe that they are building a local copy of data
set when they run a procedure
• Biggest offenders
– SAS data set options in the wrong place
– Mixing SAS options in implicit SQL pass-thru routines
– Lack of understanding default merge behavior

Best Practices for Advanced Integration Use Cases Slide 7-46


Avoiding Bottlenecks
Merging (Join)
• If all tables are currently in Teradata, leave them there
• For other scenarios consider
– Final “resting place” of the data
– Network throughput
– Cost of sorting in SAS

Sampling
• SAS sampling versus Teradata sampling
– Using SAS sampling function the entire raw data set needs to be downloaded
before sample taken
– Best strategy is Teradata sampling via explicit SQL pass-through (more in next
session)

Best Practices for Advanced Integration Use Cases Slide 7-47


Note on Teradata Statistics
The Teradata Optimizer plans an execution strategy for each query submitted to the
Teradata RDBMS, among others the Optimizer will take care of:
• Access Path (Use index, table scan, dynamic bitmap, etc.)
• Join Method (How tables are joined –merge join, product join, hash join, nested
join)
• Join Order (Sequence of table joins)
• The Optimizer chooses the optimum strategy using information on:
• Environment (such as number of node and AMPs)
• Data Demographics (such as number of rows in the table)

Best Practices for Advanced Integration Use Cases Slide 7-48


Note on Teradata Statistics
The best way to assure that the Optimizer (Parsing Engine) has all the information it
needs to generate optimum execution strategies is to COLLECT STATISTICS.
Statistics collection will improve performance when joining and working with Teradata
tables
It is possible to collect statistics on indexes (combination of columns).
Advantages include:
• PE plans to fetch the data required in the least costly way
• Higher confidence level of PE in proposed plan to fetch data
• Saves the overhead cost of Dynamic Amp sampling to access data

Best Practices for Advanced Integration Use Cases Slide 7-49


Note on Teradata Statistics
Syntax:

COLLECT STATISTICS ON tablename INDEX (columnname/s);


COLLECT STATISTICS ON tablename COLUMN (columnname/s);

Discuss with the DBA the best strategy for statistics collection.

Statistics should be collected for all table that are frequently used especially when
joins are involved.

The process needs data governance.

Best Practices for Advanced Integration Use Cases Slide 7-50


Copyright © 2009 by SAS Institute Inc. and Teradata Corporation. All Rights Reserved.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA
and other countries. Teradata and the names of products and services of Teradata Corporation are registered trademarks or trademarks
of Teradata Corporation in the USA and other countries. ® indicates USA registration. Other brand and product names are registered
trademarks or trademarks of their respective companies.

Best Practices for Advanced Integration Use Cases Slide 7-51


Module 7 - Best Practices for Advanced Integration
Use Cases

• Section 7.1 – Using Staging- and Temporary Tables in Teradata


• Section 7.2 – Teradata Analytical Functions (Optional)
• Section 7.3 – Preparing Data in Teradata for Use within SAS
(Optional)
• Section 7.4 – SAS and Teradata Workload Considerations
(Optional)
• Section 7.5 – Aspects of Security and Administration (Optional)

Best Practices for Advanced Integration Use Cases Slide 7-52


Aspects of Security and Administration
Aspects of SAS and Teradata Integration, relevant to SAS/ACCESS to Teradata
architects and/or Administrators include
 Advanced SAS/ACCESS connection configuration
 Enabling Security and Single Sign-On features
 Usage and Deployment of SAS Functions in Teradata

Best Practices for Advanced Integration Use Cases Slide 7-53


SAS/ACCESS to Teradata Connections Options
Options changing the default connection behavior
 DEFER=NO|YES. Determines when the Teradata connection occurs.
– NO - The connection occurs when the libref is assigned.
– YES - The connection occurs when a table in the database is opened.
 DBPROMPT=NO|YES. Opens a window that prompts the user to enter connection
information before connecting to Teradata.
 CONNECTION= UNIQUE | SHAREDREAD. Specifies whether operations on a
single/multiple libref(s) librefs share a connection.
– UNIQUE - specifies that a separate connection is established every time a DBMS
table is accessed by your SAS application.
– SHAREDREAD – specifies that all READ operations that access DBMS tables in a
single libref share a single connection. A separate connection is established for every
table that is opened for update or output operations.

Best Practices for Advanced Integration Use Cases Slide 7-54


SAS/ACCESS to Teradata Connections Options
To avoid the usage of user and password credentials explicitly stated within the SAS
program, beginning with SAS 9.2, you can use instead the AUTHDOMAIN LIBNAME
option.
 With this new option used, the appropriate Teradata credentials get automatically
cached from the SAS Metadata User‘s account information at runtime.
 Thus specification of credentials in program code can be avoided.

LIBNAME mytera TERADATA SERVER=teraserver


AUTHDOMAIN=TERAAUTH ;

To the engine it appears that the


USER= and PASSWORD= options
were on the original LIBNAME
statement.

Using the AUTHDOMAIN= option you can retrieve USER= and PASSWORD= information from an
authentication domain stored in your SAS Metadata Server. To the engine, it appears that the USER=
and PASSWORD= options were specified on the LIBNAME statement.

Best Practices for Advanced Integration Use Cases Slide 7-55


Using Teradata Query Banding with SAS
Teradata Query Banding* is supported to enable more advanced security and workload-
management capabilities.
A Query Band is a custom string passed to the Teradata System containing for example,
username, -role, and so on information.
This enables dynamic management of security permissions, roles and workload management
within Teradata

libname myTDLib teradata


Query_Band=‘role=demorole;app=SAS;’;
Proc SQL;
select * from mytdLib.order_fact where employee_id=120444;
Quit;

TERADATA_1: Executed: on connection 1


SET QUERY_BAND='role=demorole; app=SAS;' FOR SESSION;

TERADATA_2: Prepared: on connection 1


SELECT * FROM sasorion."order_fact …..

* Available with SAS 9.2 M2

Best Practices for Advanced Integration Use Cases Slide 7-56


Using Teradata Query Banding with SAS
Teradata Query Banding* is supported to enable more advanced security and
workload-management capabilities.
Enhancements with Teradata 13
• The connection between SAS and Teradata can be established under a shared
user ID,
• However, by passing a Query-Band String to the Teradata System containing
user and role information, the execution of the query request gets completely
switched to the user‘s context (security, workload-specification, …)
• On the SAS side, you will need to securely prepare the trusted server connection
credentials and dynamically pass the Query-Band string depending on the
requesting client-user ID.

Best Practices for Advanced Integration Use Cases Slide 7-57


Using SAS Functions in a Teradata EDW
Teradata environments
– Teradata EDWs are increasingly recognized as mission critical, high available
environments.
SAS Vendor Defined Functions for Teradata
– The new In-Database processing functions provided by SAS represent features that
have been developed with the same high quality standards as all SAS tools and
solutions.
Deployment process for SAS Functions
– Deployment of new functions like SAS Scoring models, SAS Formats, and so on.
should be thoughtfully managed.
– Might be deployed to SYSLIB or usage specific databases.
– Limiting access to functions to small class of users or running functions in protected
mode may be part of the deployment strategy.

Best Practices for Advanced Integration Use Cases Slide 7-58


Using SAS Functions in a Teradata EDW
Teradata Resource Management
– All User- and Vendor-Defined Functions run under the control of Teradata workload
management. Resources are controlled in both protected and not protected mode.
– User- and Vendor-Defined Functions can be thought of as subroutines that are run
within the context of any SQL request.

Protected Execution
– Function usage if possible two different execution modes PROTECTED and NOT
PROTECTED. Protected isolation grants higher isolation at the cost of slower
performance.
– Use PROTECTED mode to gain confidence in solution during testing. Then measure
performance difference, and plan deployment based on application performance
requirements and then run in NOT PROTECTED mode.

Best Practices for Advanced Integration Use Cases Slide 7-59


Deployment Locations for SAS Functions
Where should SAS functions be deployed to?
– SAS Scoring Model Functions created by SAS Enterprise Miner Analysts
should be published to the database where the application sessions run –
typically a views database.
– SAS Formats (and functions like SAS_PUT) created by
SAS users should also be published to the database where the application
sessions runs or to a user’s database for their personal format library (like a
Sandbox environment).
– SAS Analytics Accelerator Functions as static SAS software components are
shared by all users on a system
so they should be installed in SYSLIB.

Best Practices for Advanced Integration Use Cases Slide 7-60


Deployment Process for SAS Functions
Phased Implementation Methodology
• Multiple Teradata environments for validating different phases of a new analytic
process
• Ideal for companies with carefully managed Production EDW instance

Sandbox Implementation Methodology


• Provide segregated database environments within the same Teradata system to
Business Analysts with advanced database privileges

Both approaches require coordination, and defined processes, between analysts and
DBAs.

Best Practices for Advanced Integration Use Cases Slide 7-61


Deployment Process for SAS Functions
Phased Implementation Methodology

DEV Environment TEST Environment Prod Environment


Limited amount of data, used More substantial amount of Full production data volumes,
only by SAS Analysts and data, used to test for valid used by all Operational Users
database solution developers results and to measure
for iterative R&D performance
SAS SAS SAS
Modeler Modeler DBA

SAS Function() SAS Function() SAS Function()

• Deploy Function, • Deploy Function, • Deploy Function,


from analyst workspace from software source from software source
control control
• Run in Protected Mode,
• Run Protected Mode,
Confirm Valid Result • Run in selected mode,
Confirm Valid Result
• Include “Negative” Testing Confirm valid result on
• Run Unprotected Mode,
(end cases and exceptions) test data
Confirm Valid Result
• Run Unprotected Mode, • Determine desired • Run production
Confirm Valid Result Production Execution
mode based on
performance results

An Iterative Function Deployment Methodology can provide confidence and quiet concerns about
potential problems.

Helps “prove” to Teradata DBA’s the validity of SAS embedded function capabilities.

Helps prove to SAS Analysts that correct statistical results are produced, before running on
M/B/Trillions of rows.

Not different than other application development projects.

Satisfies the need for In-DB performance testing, and informed decisions about UDF execution mode,
before installing into production Data Warehouse.

Best Practices for Advanced Integration Use Cases Slide 7-62


Deployment Process for SAS Functions
Sandbox Implementation Methodology
– Provide segregated database environments within the same Teradata system
– Business Analysts have increased DBMS privileges
– Condense the testing involved into 1-2 testing steps
– Publishing of SAS functions can be restricted initially to an Analytic Sandbox
area that exists on a combined DEV/TEST environment for testing and validation
(start PROTECTED)
– Functions can then be published, under Teradata DBA control, to the PROD
environment after testing has been completed

Best Practices for Advanced Integration Use Cases Slide 7-63


Deployment Process for SAS Functions
SAS functions have been created using the same thorough development process used for
all SAS tools and solutions.
Carefully plan the deployment of SAS In-Database functions, as you would for any new
application.
Deploy functions using an iterative install, test, run methodology.
– Starting out by exercising the components in a test environment.
– Move to production as new capabilities are validated.
Grant rights to deploy functions to a small class of analytic users, who understand what
they do and the DBMS administration implications of installing, using, and maintaining
them .
Leverage the PROTECTED / NOT PROTECTED execution mode capability of the Teradata
UDF framework, balancing the performance needs of the application with the level of
testing required to validate the solution for your organization.

Best Practices for Advanced Integration Use Cases Slide 7-64

Potrebbero piacerti anche