Teradata Warehouse Miner: User Guide - Volume 2 ADS Generation

Teradata Warehouse Miner
User Guide - Volume 2
ADS Generation
Release 05.01.00
B035-2301-077A
Teradata Development Division,

Teradata Application Engineering
The product described in this book is a licensed product of Teradata, a division of NCR Corporation
NCR, Teradata and BYNET are registered trademarks of NCR Corporation.

TeraMiner is a trademark of NCR Corporation.
Intel, Pentium, and XEON are registered trademarks of Intel Corporation.
Linux is a registered trademark of Linus Torvalds.
Microsoft, Windows, Windows Server, Windows NT, Windows Vista, Visual Studio and Excel are either registered
trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.
SAS and SAS/C are registered trademark of SAS Institute Inc.
SPSS is a registered trademark of SPSS Inc.
STATISTICA and StatSoft are trademarks or registered trademarks of StatSoft, Inc.
Sun Microsystems, Sun Java, Solaris, SPARC, and Sun are trademarks or registered trademarks of Sun Microsystems,
Inc. in the U.S. or other countries.
Unicode is a registered trademark of Unicode, Inc.
UNIX is a registered trademark of The Open Group in the US and other countries.
Other product and company names mentioned herein may be the trademarks of their respective owners.
THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN “AS-IS” BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR
NON- INFRINGEMENT. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF IMPLIED WARRANTIES, SO THE ABOVE EXCLUSION
MAY NOT APPLY TO YOU. IN NO EVENT WILL NCR CORPORATION (NCR) BE LIABLE FOR ANY INDIRECT, DIRECT, SPECIAL,
INCIDENTAL OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS OR LOST SAVINGS, EVEN IF EXPRESSLY ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
The information contained in this document may contain references or cross references to features, functions, products, or
services that are not announced or available in your country. Such references do not imply that NCR intends to announce
such features, functions, products, or services in your country. Please consult your local NCR representative for those
features, functions, products, or services available in your country.
Information contained in this document may contain technical inaccuracies or typographical errors. Information may be
changed or updated without notice. NCR may also make improvements or changes in the products or services described
in this information at any time without notice.
To maintain the quality of our products and services, we would like your comments on the accuracy, clarity, organization,
and value of this document. Please e-mail: teradata-books@lists.ncr.com
Any comments or materials (collectively referred to as “Feedback”) sent to NCR will be deemed non-confidential. NCR will
have no obligation of any kind with respect to Feedback and will be free to use, reproduce, disclose, exhibit, display,
transform, create derivative works of and distribute the Feedback and derivative works thereof without limitation on a
royalty-free basis. Further, NCR will be free to use any ideas, concepts, know-how or techniques contained in such
Feedback for any purpose whatsoever, including developing, manufacturing, or marketing products or services
incorporating Feedback.
Copyright © 1999 - 2007

By NCR Corporation
Dayton, Ohio U.S.A.
All Rights Reserved
Preface
About This Manual
This publication describes how to use the features and functions of NCR’s Teradata Warehouse
Miner, Release 5. All information required to use the analytic functions in the Teradata
Warehouse Miner product is provided in this manual. Teradata Warehouse Miner is a set of
Microsoft® .NET™ interfaces and a multi-tier User Interface that together help understand the
quality of data residing in a Teradata® database, create analytic data sets, and build and score
analytic models directly in the Teradata database.
Who Should Read This Manual

This manual is written for users of the Teradata Warehouse Miner product. It serves to guide the
end-user through each analytic function available from the Teradata Warehouse Miner user
interface. You should be familiar with Teradata SQL, the operation and administration of the
Teradata RDBMS system and statistical techniques. Users should also be familiar with the
Microsoft® Windows® 2000®, or Microsoft® Windows® XP® operating environment and standard
Microsoft® Windows® operating techniques.
How This Manual is Organized

0
This manual is organized and presents information as follows:
Chapter 1 “Data Reorganization” describees the Denorm, Join, Merge, Partition, and
Sample analyses.
Chapter 2 “Analytic Data Sets” describes the Variable Creation, Variable

Transformation and Build Data Set analyses available with Teradata
Warehouse Miner.
Chapter 3 “Matrix Functions” describes how to use the Teradata Warehouse Miner
Matrix Functions to build and export a Correlation, Covariance, or Sums of
Squares and Cross Products matrix.
Chapter 4 “Scoring” describes how to use the Teradata Warehouse Miner Predictive
Model Markup Language (PMML) Scoring analysis.
Chapter 5 “Publishing” describes how to use the Teradata Warehouse Miner Publish
analysis to make the SQL representing models and analytic data sets
available to the Model Manager application.
Conventions Used in this Manual

The following typographical conventions are used in this guide:
Convention Description
Italic text Titles (esp. screen names/titles)
New terms for emphasis
© 1999-2007 NCR Corporation, All Rights Reserved ii

Preface
Monospace Code sample

Output
ALL CAPS Acronyms
Bold Screen item and/or esp. something you will click on
or highlight in following a procedure.
Related Documents and Other Sources of Information

Teradata documentation related to the use of Teradata Warehouse Miner, including
documentation for the Teradata ODBC Driver for Windows, is available from www.info.ncr.com.
Support Information
0
How to Get Support

For information regarding support program availability, please contact the local Account Team.
Telephone assistance may be obtained through NCR’s Teradata Solutions Global Support Center
(TSGSC) at either of the following numbers:
Americas RCCA
For Service Levels: Enhanced/Business Critical

1-800-531-2222 (PIN number required)
MSC – Atlanta
For Service Levels: Standard/None/Time&Materials/Unknown

1-800-262-7782
iii © 1999-2007 NCR Corporation, All Rights Reserved

Table of Contents
Table of Contents
ABOUT THIS MANUAL ........................................................................................................................... II

WHO SHOULD READ THIS MANUAL ........................................................................................................... II
HOW THIS MANUAL IS ORGANIZED ............................................................................................................ II
CONVENTIONS USED IN THIS MANUAL ....................................................................................................... II
RELATED DOCUMENTS AND OTHER SOURCES OF INFORMATION ............................................................... III
SUPPORT INFORMATION ............................................................................................................................. III
How to Get Support .............................................................................................................................. iii
TABLE OF CONTENTS ........................................................................................................................... IV
1. DATA REORGANIZATION .................................................................................................................. 8

DENORM ................................................................................................................................................... 10
Initiate a Denorm Analysis .................................................................................................................. 11
Denorm - INPUT - Data Selection ...................................................................................................... 12
Denorm - INPUT - Analysis Parameters............................................................................................. 12
Denorm - INPUT - Expert Options...................................................................................................... 14
Denorm - OUTPUT ............................................................................................................................. 14
Run the Denorm Analysis .................................................................................................................... 15
Results - Denorm ................................................................................................................................. 15
Denorm - RESULTS - Data ................................................................................................................. 16
Denorm - RESULTS - SQL .................................................................................................................. 16
Tutorial - Denorm................................................................................................................................ 16
JOIN .......................................................................................................................................................... 18
Initiate a Join Analysis ........................................................................................................................ 18
Join - INPUT - Data Selection ............................................................................................................ 19
Join - INPUT – Join Columns ............................................................................................................. 19
Join - INPUT - Analysis Parameters ................................................................................................... 20
Join - INPUT - Expert Options............................................................................................................ 20
Join - OUTPUT - Storage.................................................................................................................... 21
Join - OUTPUT - Primary Index ......................................................................................................... 22
Run the Join Analysis .......................................................................................................................... 22
Results - Join ....................................................................................................................................... 22
Join - RESULTS - Data ....................................................................................................................... 22
Join - RESULTS - SQL ........................................................................................................................ 23
Output Columns – Join Analysis ......................................................................................................... 23
Tutorial – Join Analysis....................................................................................................................... 23
MERGE ...................................................................................................................................................... 25
Initiate a Merge Analysis..................................................................................................................... 25
Merge - INPUT - Data Selection ......................................................................................................... 26
Merge - INPUT - Analysis Parameters ............................................................................................... 27
Merge - INPUT - Expert Options ........................................................................................................ 27
Merge - OUTPUT - Storage ................................................................................................................ 27
Merge - OUTPUT - Primary Index ..................................................................................................... 28
Run the Merge Analysis ....................................................................................................................... 29
Results - Merge.................................................................................................................................... 29
Merge - RESULTS - Data.................................................................................................................... 29
Merge - RESULTS - SQL..................................................................................................................... 29
Output Columns – Merge Analysis ...................................................................................................... 30
Tutorial – Merge Analysis ................................................................................................................... 30
PARTITION ................................................................................................................................................ 31
© 1999-2007 NCR Corporation, All Rights Reserved iv

Table of Contents
Initiate a Partition Analysis................................................................................................................. 31

Partition - INPUT - Data Selection ..................................................................................................... 32
Partition - INPUT - Analysis Parameters ........................................................................................... 33
Partition - INPUT - Expert Options .................................................................................................... 33
Partition - OUTPUT - Storage ............................................................................................................ 34
Partition - OUTPUT - Primary Index ................................................................................................. 34
Run the Partition Analysis ................................................................................................................... 35
Results - Partition Analysis ................................................................................................................. 35
Partition - RESULTS - Data................................................................................................................ 35
Partition - RESULTS - SQL................................................................................................................. 35
Output Columns – Partition Analysis .................................................................................................. 36
Tutorial - Partition Analysis................................................................................................................ 36
SAMPLE..................................................................................................................................................... 38
Initiate a Sample Analysis ................................................................................................................... 39
Sample - INPUT - Data Selection........................................................................................................ 40
Sample - INPUT - Analysis Parameters .............................................................................................. 40
Sample - INPUT - Expert Options ....................................................................................................... 41
Sample - OUTPUT - Storage............................................................................................................... 42
Sample - OUTPUT - Primary Index .................................................................................................... 43
Run the Sample Analysis...................................................................................................................... 43
Results – Sample Analysis ................................................................................................................... 43
Sample - RESULTS - Data................................................................................................................... 43
Sample - RESULTS - SQL ................................................................................................................... 45
Output Columns – Sample Analysis..................................................................................................... 45
Tutorial - Sample Analysis .................................................................................................................. 45
2. ANALYTIC DATA SETS...................................................................................................................... 50
VARIABLE CREATION................................................................................................................................ 50
Initiate a Variable Creation Function ................................................................................................. 55
Variable Creation - INPUT - Variables .............................................................................................. 56
Variable Creation - INPUT - Variables - SQL Elements .................................................................... 67
Variable Creation - INPUT - Variables - Dimensions ...................................................................... 151
Variable Creation - INPUT - dimensions.......................................................................................... 161
Variable Creation - INPUT - anchor table........................................................................................ 163
Variable Creation - INPUT – analysis parameters ........................................................................... 166
Variable Creation - INPUT - Expert Options.................................................................................... 168
Variable Creation - INPUT - Expert Options- SQL Elements........................................................... 168
Variable Creation - INPUT - Expert Options - Expert Clauses ........................................................ 169
Variable Creation - OUTPUT - storage............................................................................................ 169
Variable Creation - OUTPUT - Primary Index................................................................................. 171
Run the Variable Creation Analysis .................................................................................................. 171
Results - Variable Creation ............................................................................................................... 172
Variable Creation - RESULTS - Data ............................................................................................... 172
Variable Creation - RESULTS - SQL ................................................................................................ 172
Tutorial – Variable Creation ............................................................................................................. 172
VARIABLE TRANSFORMATION ................................................................................................................ 179
Introduction ....................................................................................................................................... 179
Initiate a Variable Transformation Function .................................................................................... 183
Variable Transformation - INPUT - Transformations ...................................................................... 184
Setting Properties - Variable Transformation ................................................................................... 190
Setting Default Properties - Variable Transformation ...................................................................... 191
Properties Dialog – Common Features............................................................................................. 192
Properties Dialog – Function-Specific Features............................................................................... 194
Variable Transformation - INPUT - Primary Key............................................................................. 197
Variable Transformation - INPUT – Analysis Parameters ............................................................... 198
v © 1999-2007 NCR Corporation, All Rights Reserved

Table of Contents
Variable Transformation - INPUT - Expert Options......................................................................... 198

Variable Transformation - OUTPUT - Storage................................................................................. 199
Variable Transformation - OUTPUT - Primary Index...................................................................... 199
Run the Variable Transformation Analysis ....................................................................................... 200
Results - Variable Transformation .................................................................................................... 201
Variable Transformation - RESULTS - Data .................................................................................... 201
Variable Transformation - RESULTS - SQL ..................................................................................... 201
Tutorial – Variable Transformation Analysis.................................................................................... 201
BUILD ADS (ANALYTIC DATA SET) ....................................................................................................... 206
Initiate a Build ADS........................................................................................................................... 206
Build ADS - INPUT - Data Selection ................................................................................................ 207
Build ADS - INPUT – Anchor Table ................................................................................................. 208
Build ADS - INPUT - Expert Options................................................................................................ 208
Build ADS - OUTPUT - Storage........................................................................................................ 208
Build ADS - OUTPUT - Primary Index............................................................................................. 209
Run the Build ADS Analysis .............................................................................................................. 209
Results - Build ADS ........................................................................................................................... 210
Tutorial – Build ADS Analysis........................................................................................................... 210
REFRESH ................................................................................................................................................. 212
Initiate a Refresh Analysis ................................................................................................................. 212
Refresh - INPUT - Data Selection ..................................................................................................... 213
Run the Refresh Analysis ................................................................................................................... 214
Results - Refresh ................................................................................................................................ 214
Tutorial – Refresh.............................................................................................................................. 214
3. MATRIX FUNCTIONS....................................................................................................................... 218
OVERVIEW – MATRIX FUNCTIONS .......................................................................................................... 218
MATRIX ANALYSIS ................................................................................................................................. 219
Initiate a Matrix Function ................................................................................................................. 220
Matrix - INPUT - Data Selection....................................................................................................... 222
Matrix - INPUT - Analysis Parameters ............................................................................................. 222
Run the Matrix Analysis .................................................................................................................... 223
Results - Matrix ................................................................................................................................. 223
Tutorial - Matrix................................................................................................................................ 223
EXPORT MATRIX ..................................................................................................................................... 225
Initiate an Export Matrix Function.................................................................................................... 226
Export Matrix - INPUT - Data Selection........................................................................................... 227
Export Matrix - INPUT - Analysis Parameters ................................................................................. 227
Run the Export Matrix ....................................................................................................................... 228
Results – Export Matrix ..................................................................................................................... 228
Output Columns - Export Matrix....................................................................................................... 228
Tutorial - Export Matrix .................................................................................................................... 228
4. SCORING ............................................................................................................................................. 231
PMML SCORING..................................................................................................................................... 231
Initiate PMML Scoring ...................................................................................................................... 232
PMML Scoring - INPUT - Data Selection......................................................................................... 233
PMML Scoring - OUTPUT................................................................................................................ 234
Run the PMML Scoring Analysis....................................................................................................... 235
Results - PMML Scoring.................................................................................................................... 235
PMML Scoring Tutorials................................................................................................................... 236
Tutorial #1 - PMML Scoring ............................................................................................................. 236
© 1999-2007 NCR Corporation, All Rights Reserved vi

Table of Contents

5. PUBLISHING....................................................................................................................................... 241
PUBLISHING OVERVIEW .......................................................................................................................... 241
Initiate a Publish Analysis ................................................................................................................. 242
Publish - INPUT - Data Selection ..................................................................................................... 244
Preview the Publish Analysis............................................................................................................. 244
Run the Publish Analysis ................................................................................................................... 245
Results - Publish ................................................................................................................................ 245
Tutorial – Publish.............................................................................................................................. 245
REFERENCES ......................................................................................................................................... 251
vii © 1999-2007 NCR Corporation, All Rights Reserved

Chapter One
Data Reorganization
1. Data Reorganization
The data Reorganization functions provide the ability to join, merge and denormalize
preprocessed results into a wide analytic data set, as well as select a subset of the rows in a
table. The result of these functions is a new restructured table that has been built from one or
more existing tables, and/or a subset of the rows in a table.
The Sampling and Partitioning functions build a new table containing randomly selected rows
in an existing table or view. Sampling is useful when it becomes unwieldy to perform an
analytic process because of the high volume of data available. This is especially true for
compute intensive analytic modeling tasks. Partitioning is similar to sampling but allows
mutually distinct but all inclusive subsets of data to be requested by separate processes.
In the case of the Data Reorganization functions, NULL values are passed back as NULL. A
special case is the Denorm Analysis which allows you to convert NULL values to zero.
Note that Identity columns, i.e. columns defined with the attribute "GENERATE … AS
IDENTITY", cannot be analyzed by Data Reorganization functions.
The Teradata Warehouse Miner data reorganization functions include:
Denorm
Create new table denormalizing by removing key column(s).
Join
Join tables or views by columns into a combined result table.
Merge
Merge tables or views by rows into a combined result table.
Partition
Select partition(s) from a table using a hash key.
Sample
Select sample(s) from a table by size or fraction.
In order to add a Data Reorganization Analysis to a Teradata Warehouse Miner Data Mining
Project, create a new analysis as described in Chapter 3. Select Reorganization from the
menu:
© 1999-2007 NCR Corporation, All Rights Reserved 8

Chapter One
Data Reorganization
Double-click or highlight the desired analysis and click the OK button. Optionally select an
existing analysis for incorporation into the current data mining project. Each of these specific
analyses are described in detail in the subsequent sections.
9 © 1999-2007 NCR Corporation, All Rights Reserved

Chapter One
Data Reorganization
Denorm
Denorm Analysis is provided to denormalize or “flatten out” (sometimes referred to as
“pivoting”) a table so it can be used as an analytic data set. This is done by removing part of a
multi-part index and replicating remaining columns based upon the unique values of the
removed index column.
Many analytical techniques from the statistical and artificial intelligence communities require
a denormalized table, or data set, as input. The Denorm function is provided to help analytical
modelers and database administrators save considerable time and effort when a denormalized
table needs to be constructed from data which exists in relational form in the data warehouse.
The aggregations typically used in the construction of a denormalized table, (AVG, SUM,
MIN, MAX, and COUNT), are provided in the Denorm function as user selectable options.
Analytical modelers typically refer to the rows of a denormalized table as “observations”, and
typically refer to the columns as “variables”.
Given a table name, the names of index columns to remove, the names of index columns to
retain, the names of remaining columns to denormalize, the values of the removed index
columns to denormalize, and finally the names of any already denormalized columns to
retain, the Denorm Analysis creates a new denormalized table. All columns other than the
retain key and denormalize columns are dropped in the new table, unless they are specified as
columns to retain. However, in this case they should already be denormalized, that is have the
same value for each of the removed key columns.
New columns names are concatenated from the prefix associated by the user with the Values
to Denormalize (which occur in the Index Remove Columns), and the alias or name of the
Denormalize Column.
An option is provided which allows you to specify an aggregation method in the case where
new columns have multiple values to choose from. A user specified aggregation method,
specifically MIN, MAX, AVG, SUM or COUNT, should only be used when there are non-
unique index values or when a part of the index is being ignored, that is, when part of the
index is neither being retained nor removed (denormalized by).
Finally an option to specify zero instead of NULL, the default, for the value of those
denormalized columns for which the index is not defined, is also provided.
Literal values entered for columns of type DATE must be entered in the format defined or
defaulted for the column in question. For example, if the date format of a key value being
removed is ’YYYYMMDD’, then a parameter for this key value might be entered as
“19990703.”
The Denorm Analysis is parameterized by specifying the table and column(s) to analyze,
options unique to the Denorm Analysis, as well as specifying the desired results and SQL or
Expert Options.

Chapter One
Data Reorganization
Initiate a Denorm Analysis

Use the following procedure to initiate a new Denorm analysis in Teradata Warehouse Miner:
1. Click on the Add New Analysis icon in the toolbar:
2. In the resulting Add New Analysis dialog box, with Reorganization highlighted on the
left, double-click on the Denorm icon:
3. This will bring up the Denorm dialog in which you will enter INPUT and OUTPUT
options to parameterize the analysis as described in the next sections.

Chapter One
Data Reorganization
Denorm - INPUT - Data Selection

On the Denorm dialog click on INPUT and then click on data selection:
On this screen select:
Available Databases
All the databases which are available for the Denorm Analysis.
Available Tables
All the tables within the Source Database that are available for the Denorm Analysis.
Available Columns
All the columns within the selected table that are available for the Denorm Analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the
Selected Columns window, or click on the arrow button to move highlighted
columns into the Selected Columns window. Make sure the correct category is
highlighted:
Index Retain Columns
List of index columns to retain (not remove) in resultant denormalized table
(click to expand/highlight).
Index Remove Columns
List of index columns to remove (denormalize by values of these columns).
(Click to expand/highlight.)
Denormalize Columns
List of columns/aliases to denormalize (i.e. replicate for given values of removed
index columns). (Click to expand/highlight.)
Retain Columns
List of columns to retain which are already denormalized (i.e. have a constant
value over the selected values of removed key columns). (Click to
expand/highlight.)
Denorm - INPUT - Analysis Parameters

On the Denorm dialog click on INPUT and then click on analysis parameters:

Chapter One
Data Reorganization
Values to Denormalize
A list of values and prefixes which are valid values in the column specified in Index
Remove Columns. Use the Add and Remove buttons to set values for:
Prefix
An optional string (must be a valid Teradata word) that will define the unique
Value specified.
Value in <remove column>
A list of distinct values which the column specified in Index Remove Columns
takes on.
Add button
Both Prefix and Value in <remove column> can be specified manually by clicking
on the Add button and typing the required values.
Remove button
Remove the currently highlighted Prefix and Value in <remove column>.
Values…
Selecting the Values button brings up the following Denorm Values Wizard:
Once the Values button is selected, a status message indicating which column values
are being fetched appears. Once this is complete, the distinct values of the column
being removed are listed in the left-most column. These values can be dragged and
dropped into the right-most column, or selected via the Add button. Similarly, they

Chapter One
Data Reorganization
can be removed via the Remove button. Once the values to be denormalized are
moved to Selected values, click the Finish button to return to the Teradata Warehouse
Miner user interface or to continue to select the values of the next Denormalize
Column, if there is more than one.
When the Values load process is finished, a default value is generated for each
Column Prefix by concatenating the values of the Index Remove Columns, each
value followed by an underscore character. If the combination of the prefix and the
longest Denormalize Column name or alias will be greater than 30 characters in
length, the prefix is left blank, to be filled in by the user. Note that if the name of a
Denormalize Column or the values of the Index Remove Columns are long, it may
be necessary to specify a comparatively short alias for the Denormalize Column so
that automatic prefixes can be generated. Otherwise, it may be necessary to specify a
short prefix manually.
Aggregation Method
This parameter allows you to specify an aggregation method in the case where new
columns have multiple values to choose from. Valid user specified aggregation
methods, include MIN, MAX, AVG, SUM and COUNT. These should only be used
when there are non-unique indices or when a part of the index is being ignored, that
is, when part of the key is neither being retained nor removed (i.e. denormalized by).
Treat Undefined Index Values As:

This parameter allows you to specify zero instead of NULL for the value of those
denormalized columns for which the value of the removed column is not the target
value. The default is zero. If DATE or TIMESTAMP columns are specified in
Denormalize Columns, and ZERO is selected, an error will occur.
Compress undefined index values in output table

This parameter allows you to request data compression of either NULL or zero
values (depending on the Treat Undefined Index Values As option above) for those
denormalized columns for which the value of the removed column is not the target
value.
Denorm - INPUT - Expert Options

On the Denorm dialog click on INPUT and then click on expert options:
This screen provides the option to generate a SQL WHERE clause(s) to restrict rows selected
for analysis (for example: cust_id > 0).
Denorm - OUTPUT

Chapter One
Data Reorganization
Before running the analysis, specify Output options. On the Denorm dialog click on
OUTPUT:
This screen provides the following options:
Use the Teradata EXPLAIN feature to display the execution path for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.
Store the tabular output of this analysis in the database

Option to generate a Teradata TABLE or VIEW populated with the results of the
analysis. Once enabled, the following three fields must be specified:
Database Name
Text box to specify the name of the Teradata database where the resultant Table
or View will be created in. By default, this is the “Result Database.”
Output Name
Text box to specify the name of the Teradata Table or View.
Output Type
Pull-down to specify Table or View.
Create Output table using the FALLBACK keyword
If a table is selected, it will be built with FALLBACK if this option is selected
Create Output table using the MULTISET keyword
If a table is selected, it will be built as a MULTISET table if this option is
selected
Generate the SQL for this analysis, but do not execute it

If this option is selected the analysis will only generate SQL, returning it and
terminating immediately.
Run the Denorm Analysis

After setting parameters on the INPUT and OUTPUT screens as described above, you are
ready to run the analysis. To run the analysis you can either:
• Click the Run icon on the toolbar, or
• Select Run <project name> on the Project menu, or
• Press the F5 key on your keyboard
Results - Denorm
The results of running the Denorm Analysis include the generated SQL itself, the results of
executing the generated SQL, and, if the Create Table (or View) option is chosen, a Teradata
table (or view). All of these results are outlined below.

Chapter One
Data Reorganization
Denorm - RESULTS - Data

On the Denorm dialog, click on RESULTS and then click on data (note that the RESULTS
tab will be grayed-out/disabled until after the analysis is completed):
Results of the completed query are returned in a Data Viewer page within the Results
Browser. This page has the properties of the Data View page discussed in the Chapter on
Using the Teradata Warehouse Miner Graphical User Interface. With the exception of the
Explain Select Result Option, these results will match the tables described below in the
Output Column Definition section, depending upon the parameters chosen for the analysis.
Denorm - RESULTS - SQL

On the Denorm dialog, click on RESULTS and then click on SQL (note that the RESULTS
The generated SQL is returned here as text which can be copied, pasted, or printed.
Tutorial - Denorm
Parameterize a Denorm Analysis as follows:
Available Tables twm_accounts

Index Retain Columns cust_id
Index Remove Columns acct_type
Denormalize Columns ending_balance
Retain Columns acct_nbr
Values to Denormalize
Value SV
CK
CC
Prefix SV_ (Value - SV)
CK_ (Value - CK)
CC_ (Value - CC)
Aggregation Method MIN
Treat undefined key values as Zero

Chapter One
Data Reorganization
For this example, the Denorm Analysis generated the following results. Note that the SQL is
not shown for brevity:
Data
Note – only the first 10 rows shown.
cust_id acct_nbr CC_ending_balance CK_ending_balance SV_ending_balance

1363215 0000000013632153 0.00 0.00 2689.95
1362654 0000000013626543 0.00 0.00 622.46
1362793 4561143213627934 407.08 0.00 0.00
1362666 0000000013626663 0.00 0.00 300.42
1362700 0000000013627002 0.00 4494.03 0.00
1363400 0000000013634003 0.00 0.00 137.85
1363374 4561143213633744 0.00 0.00 0.00
1362941 0000000013629413 0.00 0.00 877.14
1362586 0000000013625862 0.00 260.70 0.00
1362883 0000000013628833 0.00 0.00 149.74
… … … … …
… … … … …
… … … … …

Chapter One
Data Reorganization
Join
The Join analysis is useful in joining together tables and/or views into an intermediate or
final analytic data set. The Join Analysis provides a graphical user interface to several of the
most common, though certainly not all, join mechanisms in Teradata. Consequently, it should
not be thought of or used as a complete replacement for SQL approaches to executing any
generic Teradata join.
By default, an INNER join is performed on the given tables based on the given join columns.
This means that rows will be returned only for primary index column values that appear in all
selected tables. By option, a LEFT outer join can be requested, which returns rows for all
primary index column values found in the first table specified, and fills in any missing values
from the other tables with NULL values. Alternatively, a RIGHT outer join can be requested
to return all rows found in the last requested table, filling in any missing values from the first
table with NULL values (or from the incremental right outer joins preceding it if more than
two tables were selected). Finally, an option to perform a FULL outer join can be requested
which retains all primary index values from all selected tables with missing values set to
NULL.
The Join Analysis is parameterized by specifying the table and column(s) to analyze, options
unique to the Join Analysis, as well as specifying the desired results and SQL or Expert
Options.
Initiate a Join Analysis

Use the following procedure to initiate a new Join analysis in Teradata Warehouse Miner:
left, double-click on the Join icon:

Chapter One
Data Reorganization
3. This will bring up the Join dialog in which you will enter INPUT and OUTPUT options to
parameterize the analysis as described in the next sections.
Join - INPUT - Data Selection

On the Join dialog click on INPUT and then click on data selection:
Available Databases
All the databases which are available for the Join Analysis.
Available Tables
All the tables within the Source Database that are available for the Join Analysis.
Available Columns
All the columns within the selected table that are available for the Join Analysis.
Selected Columns
columns into the Selected Columns window.
Join - INPUT – Join Columns

On the Join dialog click on INPUT and then click on join columns:

Chapter One
Data Reorganization
This screen is used to specify the columns on which to join together the tables or views
selected in this analysis. For tables, the primary index columns are displayed as default
values which may be changed. Join columns are matched for each table or view, one-for-one
in the order specified. Each table or view must therefore have the same number of join
columns specified. The screen contains these fields:
Available Tables
All tables specified under data selection Selected Columns.
Available Columns
All columns specified under data selection Selected Columns.
Selected Join Columns
Selected Join Columns window, or click on the arrow button to move highlighted
columns into the Selected Join Columns window.
Join - INPUT - Analysis Parameters

On the Join dialog click on INPUT and then click on analysis parameters:
Anchor Table
For all the join types, this table will be the first column to which all other joins are
performed against.
Join Style
Select the type of join to perform, either Inner, or Left, Right or Full outer join.
Join - INPUT - Expert Options

On the Join dialog click on INPUT and then click on expert options:

Chapter One
Data Reorganization
It may be useful to note that if a WHERE clause condition is specified on the "inner" table of
a join (i.e. a table that contributes only matched rows to the results), the join is logically
equivalent to an Inner Join, regardless of whether an Outer type is specified. (In a Left Outer
Join, the left table is the "outer" table and the right table is the "inner" table.)
Join - OUTPUT - Storage

Before running the analysis, define Output options. On the Join dialog click on OUTPUT
and then click on storage:
Execution Plan.

Database Name
Output Name
Output Type
selected

Chapter One
Data Reorganization
Join - OUTPUT - Primary Index

Before running the analysis, define Output options. On the Join dialog click on OUTPUT
and then click on primary index:
On this screen, select the columns which comprise the primary index of the output table.
Select:
Available Columns
A list of columns which comprise the index of the resultant table if an Output Table
is used.
Primary Index Columns
Primary Index Columns window, or click on the arrow button to move highlighted
columns into the Primary Index Columns window.
Run the Join Analysis

Results - Join
The results of running the Teradata Warehouse Miner Join Analysis include the generated
SQL itself, the results of executing the generated SQL, and, if the Create Table (or View)
option is chosen, a Teradata table (or view). All of these results are outlined below.
Join - RESULTS - Data

On the Join dialog, click on RESULTS and then click on data (note that the RESULTS tab
will be grayed-out/disabled until after the analysis is completed):

Chapter One
Data Reorganization
The results of the completed query are returned in a Data Viewer page within the Results
Join - RESULTS - SQL

On the Join dialog, click on RESULTS and then click on SQL (note that the RESULTS tab
will be grayed-out/disabled until after the analysis is completed):
Output Columns – Join Analysis

When the “Store the Tabular Output” option is selected, the following table will be built.
Name Type Definition

Columns Same as The Selected Columns from the joined tables. If a table is created, those selected columns
input type that are also selected on the “Output” – “primary index” tab will comprise the primary index
of the created table.
Tutorial – Join Analysis
Join - Example #1
Parameterize a Join Analysis as follows:
Selected Columns TWM_CUSTOMER.cust_id

TWM_CHECKING_ACCT.ending_balance
(Rename to Chk_Bal)
TWM_CREDIT_ACCT.ending_balance
(Rename to Crd_Bal)
TWM_SAVINGS_ACCT.ending_balance
(Rename to Sav_Bal)
For this example, the Join Analysis generated the following results. Note that the SQL is not

Chapter One
Data Reorganization
shown for brevity:
Data
cust_id Chk_Bal Crd_Bal Sav_Bal

1362759 4.62 835.19 79.52
1363088 769.47 739.86 1426.1
1362587 292 859.44 1596.17
1362952 473.45 782.77 232.76
1362771 141.86 1200 2361.62
1363316 947.22 1000 629.06
1363342 14.45 1600 57.96
1362700 4494.03 107.4 315.88
1363448 1961.87 1400 1763.31
1362936 109.31 244.47 792.11
… … … …
… … … …
… … … …

Chapter One
Data Reorganization
Merge
The Merge analysis merges together tables or views by performing an SQL UNION,
INTERSECT or MINUS operation. The merge operation brings together rows from two or
more tables, matching up the selected columns in the order they are selected. (This can be
contrasted with the Join function that brings together columns from multiple tables.) The
rows contained in the answer set are determined by the choice of the Merge Style,
determining whether the Union, Intersect or Minus operator is applied to each table after the
first table selected. An additional option is provided to determine if duplicate rows, if any,
should be included in the answer set. You may also specify one or more optional SQL Where
Clauses to apply to selected tables (each Where Clause is applied to just one table).
When the Union merge style is selected, the union of the rows containing selected columns
from the first table and each subsequent table is performed using the SQL UNION operator.
The final answer table contains all the qualifying rows from each table. With the Union
merge style, an option is provided to add an identifying column to the answer set and to name
the column if desired. This column assumes an integer value from 1 to n to indicate the input
table each row in the answer set comes from.
When the Intersect merge style is selected, the intersection of the rows containing selected
columns from the first table and each subsequent table is performed using the SQL
INTERSECT operator. The final answer table contains all the qualifying rows that exist in
each of the tables being merged. (That is, if a row is not contained in each of the requested
tables, it is not included in the answer set.)
When the Minus merge style is selected, the rows containing selected columns from the first
table are included in the answer table provided they do not appear in any of the other selected
tables. This is achieved using the SQL MINUS operator for each table after the first. (The
MINUS operator is a Teradata specific SQL operator equivalent to the standard EXCEPT
operator.)
Initiate a Merge Analysis

Use the following procedure to initiate a new Merge analysis in Teradata Warehouse Miner:
left, double-click on the Merge icon:

Chapter One
Data Reorganization
3. This will bring up the Merge dialog in which you will enter INPUT and OUTPUT options
to parameterize the analysis as described in the next sections.
Merge - INPUT - Data Selection

On the Merge dialog click on INPUT and then click on data selection:
Available Databases
All the databases that are available for the Merge Analysis.
Available Tables
All the tables within the Source Database that are available for the Merge Analysis.
Available Columns
All the columns within the selected table that are available for the Merge Analysis.
Selected Columns
columns into the Selected Columns window. Columns from the first selected table
may be renamed if desired by single-clicking on them.

Chapter One
Data Reorganization
Merge - INPUT - Analysis Parameters

On the Merge dialog click on INPUT and then click on analysis parameters:
On this screen, select:
Merge Style
Select the type of merge to perform, either Union, Intersect or Minus.
Retain Duplicate Rows
Select whether or not to include duplicate rows in the answer set.
Add Identifying Column (Union only)
Select whether or not to add an identifying column to the answer set. (This option is
available only when the Merge Style is Union.)
Column Name (Union only)
Specify the name of the identifying column to add to the answer set. (This option is
available only when the Merge Style is Union and Add Identifying Column is
selected.)
Merge - INPUT - Expert Options

On the Merge dialog click on INPUT and then click on expert options:
One or more optional Where Clauses may be entered on this screen. Each Where Clause
entered is applied only to the table currently selected on the screen. On this screen select:
Select table to associate WHERE clause with:

Select the table to associate an optional SQL Where Clause with.
Optional WHERE clause text:
Enter the optional SQL Where Clause text to be associated with the selected table,
restricting the selected rows. (Do not include the word "WHERE" at the beginning of
the text. It will be added automatically.)
Merge - OUTPUT - Storage

Before running the analysis, define Output options. On the Merge dialog click on OUTPUT

Chapter One
Data Reorganization
Execution Plan.

Database Name
Output Name
Output Type
If a table is selected, it will be built with FALLBACK if this option is selected.
selected. (This option should be selected if duplicate rows are expected.)

Merge - OUTPUT - Primary Index

Before running the analysis, define Output options. On the Merge dialog click on OUTPUT
On this screen, select the columns that comprise the primary index of the output table. Select:
Available Columns
A list of columns that will be in the output table or result set. Select columns by
highlighting and then either dragging and dropping into the Primary Index

Chapter One
Data Reorganization
Columns window, or click on the arrow button to move highlighted columns into the
Primary Index Columns window.
A list of columns that comprise the index of the resultant table if an Output Type of
Table is used.
Create the index using the UNIQUE keyword
Select whether or not the primary index should be a unique primary index, i.e. that
more than one row may not have the same combination of primary index column
values.
Run the Merge Analysis

Results - Merge
The results of running the Teradata Warehouse Miner Merge Analysis include the generated
SQL itself, the results of executing the generated SQL, and, if the Create Table (or View)
option is chosen, a Teradata table (or view). All of these results are outlined below.
Merge - RESULTS - Data

On the Merge dialog, click on RESULTS and then click on data (note that the RESULTS
Merge - RESULTS - SQL

On the Merge dialog, click on RESULTS and then click on SQL (note that the RESULTS

Chapter One
Data Reorganization
Output Columns – Merge Analysis

Those columns selected on the "primary index" tab of the OUTPUT panel will comprise the
Unique Primary Index (UPI).

The name and type of result columns match those of the Selected Columns from the first
selected table (the user may rename selected columns as desired)
Tutorial – Merge Analysis
Merge - Example #1
Parameterize a Merge Analysis as follows:
Selected Columns twm_customer_dqa.cust_id

twm_customer_dqa.income (rename inc)
twm_customer_dqa.age
twm_customer.cust_id
twm_customer.income
twm_customer.age
Merge Style Minus
Retain Duplicate Rows No
Where Clause (both tables) cust_id < 1362490
For this example, the Merge Analysis generates the following results data.
cust_id income age

1360000 29592 29
1360001 27612 39
1360002 57612 49
1362480 33

Chapter One
Data Reorganization
Partition
The Partition analysis is one of two functions provided by Teradata Warehouse Miner to
sample data from a table or view. The Partition Analysis is distinguished from the Sample
Analysis in that it is repeatable and is based on the internal hash index encodings provided by
Teradata, rather than the statistically random selections provided by the Sample function.
Given a table, a list of columns to select and a list of columns to hash on, the Partition
Analysis generates a user specific partition or range of partitions from a table using a hash
key. For example, the 3rd partition out of 10 might be requested, or partitions 1 through 3 out
of 10.
To select a specific partition, set start and end partition to the same selected value. If a range
of partitions is requested, the partition number is also returned as xpartid.
The Partition Analysis is parameterized by specifying the table and column(s) to analyze,
options unique to the Partition Analysis, as well as specifying the desired results and SQL or
Expert Options.
Initiate a Partition Analysis

Use the following procedure to initiate a new Partition analysis:
left, double-click on the Partition icon:

Chapter One
Data Reorganization
3. This will bring up the Partition dialog in which you will enter INPUT and OUTPUT
Partition - INPUT - Data Selection

On the Partition dialog click on INPUT and then click on data selection:
Available Databases
All the databases which are available for the Partition Analysis.
Available Tables
All the tables within the Source Database that are available for the Partition
Analysis.
Available Columns
All the columns within the selected table that are available for the Partition Analysis.
Selected Columns
columns into the Selected Columns window. Make sure the correct category is
highlighted:
Partition Columns

Chapter One
Data Reorganization
Column(s) to be in the partitioned result set.

Hash Columns
Column(s) on which to hash-partition tables. Hash-partitioning is performed by
Teradata, just as it does when distributing rows in the database, making use of the
Teradata SQL extensions HASHBUCKET and HASHROW.
Partition - INPUT - Analysis Parameters

On the Partition dialog click on INPUT and then click on analysis parameters:
Number of Partitions
Number of partitions (1 to 65536) to logically split table into, from which Start to
End is selected.
First Partition
First logical partition to select (must be in the range from 1 to Number of
Partitions).
Last Partition
Last logical partition to select (must be in the range from First Partition to Number
of Partitions).
Partition - INPUT - Expert Options

On the Partition dialog click on INPUT and then click on expert options:

Chapter One
Data Reorganization
Partition - OUTPUT - Storage

Before running the analysis, define Output options. On the Partition dialog click on
OUTPUT and then click on storage:
Execution Plan.

Database Name
Output Name
Output Type
selected

Partition - OUTPUT - Primary Index

On the Partition dialog click on OUTPUT and then click on primary index:

Chapter One
Data Reorganization
Select:
Available Columns
is used.
Run the Partition Analysis

Results - Partition Analysis

The results of running the Teradata Warehouse Miner Partition Analysis include the
generated SQL itself, the results of executing the generated SQL, and, if the Create Table (or
View) option is chosen, a Teradata table (or view). All of these results are outlined below.
Partition - RESULTS - Data

On the Partition dialog, click on RESULTS and then click on data (note that the RESULTS
Partition - RESULTS - SQL

On the Partition dialog, click on RESULTS and then click on SQL (note that the RESULTS

Chapter One
Data Reorganization
The generated SQL is returned as text which can be copied, pasted, or printed.
Output Columns – Partition Analysis


Columns Same as The selected Partition Columns from the “Input” – “data selection” tab. If a table is
input type created, those selected columns that are also selected on the “Output” – “primary index”
tab will comprise the primary index of the created table.
xpartid SMALLINT If multiple partitions are requested by making the First Partition parameter less than the
Last Partition parameter, this column will be created with values matching the requested
partition numbers, i.e., setting start = 3 and end = 5 will return xpartid = 3, 4 and 5.
Tutorial - Partition Analysis
Partition - Example #1
Parameterize a Partition Analysis as follows:
Available Tables TWM_CUSTOMER

Selected Columns and Aliases TWM_CUSTOMER.cust_id
TWM_CUSTOMER.income
TWM_CUSTOMER.age
TWM_CUSTOMER.years_with_bank
TWM_CUSTOMER.nbr_children
TWM_CUSTOMER.gender
TWM_CUSTOMER.marital_status
Selected Hash Columns TWM_CUSTOMER.cust_id
Number of Partitions 10
First Partition 1
Last Partition 1
For this example, the Partition Analysis generated the following results. Note that the SQL is
not shown for brevity:
Data

Chapter One
Data Reorganization
cust_id income age years_with_bank nbr_children Gender marital_status

1362485 22690 25 4 0 F 1
1362550 0 15 0 0 M 1
1362564 14357 77 7 0 F 2
1362566 127848 54 4 1 M 2
1362570 20562 50 0 2 F 2
1362580 29363 36 6 3 F 4
1362586 24476 46 6 0 F 1
1362657 27946 39 8 0 F 1
1362661 19649 66 5 0 M 2
1362663 29030 44 4 0 M 1
… … … … … … …
… … … … … … …
… … … … … … …
Partition - Example #2
Parameterize a Partition Analysis as follows:

TWM_CUSTOMER.income
TWM_CUSTOMER.age
TWM_CUSTOMER.gender
Selected Hash Columns TWM_CUSTOMER.cust_id
Number of Partitions 10
First Partition 1
Last Partition 3
For this example, the Partition Analysis generated the following results. Again, the SQL is
not shown:
Data
cust_id income age years_with_bank nbr_children gender marital_status xpartid

1362527 17622 44 1 0 M 2 2
1363486 39942 41 1 5 F 4 3
1363442 144157 58 5 0 M 2 2
1363282 25829 29 8 0 F 1 3
1363238 5788 35 5 2 F 2 2
1363078 9622 46 6 1 M 2 3
1362830 10933 18 3 0 F 1 2
1362670 8877 26 5 0 F 1 3
1362626 15993 30 0 3 F 2 2
1363404 0 17 2 0 M 1 1
… … … … … … … …
… … … … … … … …
… … … … … … … …

Chapter One
Data Reorganization
Sample
The Sample analysis function randomly selects rows from a table or view, producing one or
more samples based on a specified number of rows or a fraction of the total number of rows.
The sampled rows may be stored in a single table, in a separate table for each sample, or in a
single table with a view created for each sample. When connected to a Teradata V2R5 or later
data source, options are provided for sampling with or without replacement of rows,
randomized allocation or proportional allocation by AMP, and stratified or simple random
sampling. When connected to an earlier Teradata release the default options are automatically
used. These options are described more fully below.
Sampling is performed without replacement by default. This means that each row sampled in
a request is unique and once sampled is not replaced in the sampling pool for that request.
Therefore, it is not possible to sample more rows than exist in the sampled table, and if
multiple samples are requested they are mutually exclusive. When sampling with replacement
is requested, each sampled row is immediately returned to the sampling pool and may
therefore be selected multiple times. If multiple samples are requested with replacement, the
samples are not necessarily mutually exclusive.
The default row allocation method is proportional, allocating the requested rows across the
Teradata AMP's as a function of the number of rows on each AMP. This is technically not a
simple random sample because it does not include all possible sample sets. It is however
much faster than randomized allocation, especially for large sample sizes, and should have
sufficient randomness for most applications. When randomized allocation is requested, row
selections are allocated across the AMP's by simulating simple random sampling, a process
that can be comparatively slow.
By default the Sample Analysis function performs simple random sampling. This means that
each possible set of the requested size has an equal probability of being selected (subject to
the limitations of proportional allocation noted above). An option is however provided for
stratified random sampling, wherein the available rows are divided into groups or strata based
on stated conditions prior to samples of a requested size or sizes being taken.
The Sample Analysis is parameterized by specifying the table and column(s) to analyze,
options unique to Sample Analysis, as well as specifying the desired results and SQL or
Expert Options.

Chapter One
Data Reorganization
Initiate a Sample Analysis

Use the following procedure to initiate a new Sample analysis:
left, double-click on the Sample icon:
3. This will bring up the Sample dialog in which you will enter INPUT and OUTPUT options

Chapter One
Data Reorganization
Sample - INPUT - Data Selection

On the Sample dialog click on INPUT and then click on data selection:
Available Databases
All the databases which are available for the Sample Analysis.
Available Tables
All the tables within the Source Database that are available for the Sample Analysis.
Available Columns
All the columns within the selected table that are available for the Sample Analysis.
Selected Columns
columns into the Selected Columns window.
Sample - INPUT - Analysis Parameters

On the Sample dialog click on INPUT and then click on analysis parameters:
Sample Style
Basic - When this option is checked, simple random sampling without stratifying
conditions is performed.
Stratified - When this option is checked, the available rows are divided into groups
or strata based on stated conditions prior to samples of a requested size or sizes being
taken.
Sample Options
Sample with Replacement - When this option is checked, each sampled row is
immediately returned to the sampling pool and may therefore be selected multiple

Chapter One
Data Reorganization
times. If multiple samples are requested with replacement, the samples are not
necessarily mutually exclusive.
When this option is not checked, each row sampled in a request is unique, and once
sampled, is not replaced in the sampling pool for that request. Therefore, it is not
possible to sample more rows than exist in the sampled table, and if multiple samples
are requested they are mutually exclusive.
Sample with Randomized Allocation - When this option is checked, the requested
rows are allocated across the AMP’s by simulating simple random sampling, a
process that can be comparatively slow.
When this option is not checked, requested rows are allocated across the Teradata
AMP’s as a function of the number of rows on each AMP. This is technically not a
simple random sample because it does not include all possible sample sets. It is
however much faster than randomized allocation, especially for large sample sizes,
and should have sufficient randomness for most applications.
Sizes/Fractions separated by ‘,’ (only when Sample Style is Basic)
When the Sample Style is Basic, this option is used to enter a list of one or more
sample sizes or fractions, separated by the list separator for the current locale. If
sample sizes are entered (e.g. 10, 20, 30), they indicate the number of rows to be
returned in each sample. If fractions are entered (e.g. .01, .02, .03), they indicate the
approximate size of each sample as a fraction of the available rows in the table, and
as such must not add up to more than 1.
Stratified Conditions (only when Sample Style is Stratified)
When the Sample Style is Stratified, this option is used to enter one or more
conditions along with corresponding sample sizes or fractions. (For an example of
stratified sampling, refer to Sample Example #5 in Tutorial – Sample Analysis.)
Condition
Each stratum in the sampling must be defined by a conditional expression, such as

gender = ‘M’ or channel IN (‘A’, ‘B’, ‘C’).
Sizes/Fractions
This field is used to enter sizes or fractions for one or more samples, separated by the
list separator for the current locale. If sample sizes are entered (e.g. 10, 20, 30), they
indicate the number of rows to be returned in each sample for the stratum. If fractions
are entered (e.g. .01, .02, .03), they indicate the approximate size of each sample as a
fraction of the available rows in the stratum, and as such must not add up to more
than 1.
Sample - INPUT - Expert Options

On the Sample dialog click on INPUT and then click on expert options:

Chapter One
Data Reorganization
for analysis (for example: cust_id > 0). (Note that the use of this option may negatively
impact the performance of a Basic style Sample with default options.)
Sample - OUTPUT - Storage

Before running the analysis, specify Output options. On the Sample dialog click on
OUTPUT and then click on storage:
Execution Plan.

Output Type
Pull-down to specify Table, Multiple Tables or Multiple Views.
Database Name
Text box to specify the name of the Teradata database where the resultant Table,
Tables and/or Views will be created. By default, this is the “Result Database.”
Table Name
Text box to specify the name of the Teradata table to create if the Output Type is
Table or the underlying base table to create if the Output Type is Multiple Views.
Table Names (n)
Text box to specify the names of the tables to create if the Output Type is
Multiple Tables. (The number of tables n is given in parentheses.)
View Names (n)
Text box to specify the names of the views to create if the Output Type is
Multiple Views. (The number of views n is given in parentheses.)

Chapter One
Data Reorganization

selected

Sample - OUTPUT - Primary Index

On the Sample dialog click on OUTPUT and then click on primary index:
Select:
Available Columns
or Tables are created.
Run the Sample Analysis

Results – Sample Analysis

The results of running the Teradata Warehouse Miner Sample Analysis include the generated
SQL itself, the results of executing the generated SQL, and, if one of the Create options is
chosen, one or more Teradata tables (or views). All of these results are outlined below.
Sample - RESULTS - Data

On the Sample dialog, click on RESULTS and then click on data (note that the RESULTS

Chapter One
Data Reorganization

Chapter One
Data Reorganization
Sample - RESULTS - SQL

On the Sample dialog, click on RESULTS and then click on SQL (note that the RESULTS
Output Columns – Sample Analysis

If the option to Store the tabular output of this analysis in the database is selected, one of the
following tables is built by the Sample Analysis, depending on the Output Type selected.
Table or Multiple Views
If one of these options is selected, a single table is built. If multiple values have been
specified in the Size or Fraction list, a column named xsampleid will be created
indicating which sample the row belongs to – a number from 1 to n for each distinct
value entered in the Size or Fraction list (depending on stratified sampling options).
When the Multiple Views option is selected, multiple views are created operating against
this table, selecting rows based on xsampleid, but not including xsampleid.

Columns Same as The Selected Columns from the “Input” – “data selection” screen. If a table is
input created, those selected columns that are also selected on the “Output” – “primary
type index” screen will comprise the primary index of the created table.
xsampleid SMALLINT If multiple samples are requested in the Size or Fraction list, this column will be
included in the created table with values starting at 1 and incrementing for each
sample specified, i.e., setting size=10,10,10 will return xsampleid=1, 2, 3. (When a
view is created for each sample, this column is not included in the view.)
Multiple Tables
If this option is selected, one table will be built for every value in the Size or Fraction list.

Columns Same as The Selected Columns from the “Input” – “data selection” screen. Those selected
input columns that are also selected on the “Output” – “primary index” tab will comprise
type the primary index of the created tables.
Tutorial - Sample Analysis

Chapter One
Data Reorganization
Sample - Example #1
Parameterize a Sample Analysis as follows:

TWM_CUSTOMER.income
TWM_CUSTOMER.age
TWM_CUSTOMER.gender
Size or Fraction 10
For this example, the Sample Analysis generated the following results. Note that the SQL is
not shown for brevity, and that the specific rows returned will vary randomly.
Data
cust_id income age years_with_bank nbr_children gender marital_status

1362691 26150 46 5 1 M 2
1362548 44554 59 9 2 F 4
1362811 8011 82 2 0 F 2
1363402 24368 63 3 0 F 1
1363011 90248 55 5 0 M 4
1362826 0 15 0 0 F 1
1363018 46884 36 4 2 M 2
1362793 29120 36 6 2 M 3
1363410 28518 31 1 2 M 3
1362676 7230 62 2 0 M 2
Sample - Example #2

TWM_CUSTOMER.income
TWM_CUSTOMER.age
TWM_CUSTOMER.gender
Size or Fraction .1
.2
.3
For this example, the Sample Analysis generated the following results. Again, the SQL is not
shown, and the specific rows returned will vary randomly.
Data

Chapter One
Data Reorganization
cust_id income age years_with_bank nbr_children gender marital_status xsampleid

1362691 26150 46 5 1 M 2 1
1362548 44554 59 9 2 F 4 1
1363160 18548 38 8 0 F 1 3
1363017 0 16 1 0 M 1 3
1362487 6605 71 1 0 M 2 2
1363486 39942 41 1 5 F 4 2
1363200 21015 18 3 0 M 1 3
1363282 25829 29 8 0 F 1 3
1362527 17622 44 1 0 M 2 3
1362609 1929 79 8 0 F 2 3
… … … … … … … …
… … … … … … … …
… … … … … … … …
Sample - Example #3

TWM_CUSTOMER.income
TWM_CUSTOMER.age
TWM_CUSTOMER.gender
Size or Fraction .1
.2
.3
Output Type Multiple Tables
Table Names (3) Twm_Cust_Sample1
Twm_Cust_Sample2
Twm_Cust_Sample3
shown, and the specific rows returned will vary randomly. The data page will have a Load
button which must be click to view the three results.
Sample - Example #4

TWM_CUSTOMER.income
TWM_CUSTOMER.age
TWM_CUSTOMER.gender

Chapter One
Data Reorganization
Size or Fraction .1
.2
.3
Output Type Multiple Views
Table Name Twm_Cust_Sample
View Names (3) Twm_Cust_Sample1_view
Twm_Cust_Sample2_view
Twm_Cust_Sample3_view
shown, and the specific rows returned will vary randomly. The data page will have a Load
button which must be click to view the three results.
Sample - Example #5

TWM_CUSTOMER.income
TWM_CUSTOMER.age
TWM_CUSTOMER.gender
Sample Style Stratified
Stratified Condition gender = ‘M’
Sizes/Fractions 1,2,3
Stratified Condition gender = ‘F’
Sizes/Fractions 4,5,6
For this example, the Sample Analysis generated the following results. Note that not all SQL
is shown for brevity, and that the specific rows returned will vary randomly.
Data
cust_id income age years_with_bank nbr_children gender marital_status xsampleid

1363462 9495 25 4 2 F 3 1
1363081 41876 37 7 0 F 1 1
1362611 24115 48 8 1 F 2 1
1362993 20702 30 0 1 F 3 1
1363066 3240 64 1 0 M 1 1
1363306 15576 46 6 0 F 1 2
1363197 19088 52 2 2 F 2 2
1363400 49258 49 9 0 F 2 2
1362730 12988 37 7 3 F 4 2
1362535 26548 46 5 4 F 4 2
1362999 29403 36 6 2 M 4 2
1363083 22680 64 4 0 M 1 2
1362697 5848 83 3 0 F 1 3
1362492 40252 40 0 5 F 3 3

Chapter One
Data Reorganization
1363039 0 15 7 0 F 1 3
1362548 44554 59 9 2 F 4 3
1362836 5920 66 6 0 F 3 3
1363266 20889 23 2 0 F 3 3
1363051 0 14 6 0 M 1 3
1362563 14711 73 3 0 M 2 3
1362962 2858 83 3 0 M 4 3

Chapter Two
Analytic Data Sets
2. Analytic Data Sets

The most time intensive part of the data mining process is arguably the creation of a data set
from which to build an analytic model. The data in a relational data warehouse is typically
not in a form suitable for input directly into a data mining algorithm. New variables may need
to be created using formulas, aggregations and/or expansions on specific values of a
dimensioning variable. The joining of tables and/or de-normalizing or flattening of relational
tables may also be needed. In addition, statistical transformations are often required,
depending on the type of algorithm to be used as well as the statistical properties of the data
itself. These capabilities are referred to simply as Analytic Data Sets.
Several types of analysis may be involved in building an analytic data set. A Variable
Creation analysis provides expression building and dimensioning to define new variable
columns and place them in a table or view. A Variable Transformation function applies
requested data mining transformation functions to the columns in a table and creates a
transformed table. A Build Data Set analysis joins together the tables or views created by one
or more Variable Creation and/or Variable Transformation functions, allowing column
selection and the application of expert where clause constraints. (It is largely the same as the
Join function in the Reorganization category of functions, but can operate on a single table.)
Note that Identity columns, i.e. columns defined with the attribute "GENERATE … AS
IDENTITY", cannot be analyzed by Analytic Data Set functions.
Variable Creation
The Variable Creation function makes possible the creation of variables as columns in a table
or view. The user creates each new variable as an expression by selecting various SQL
keywords and operators as well as table and column names. SQL keywords and operators
allowed include arithmetic and logical operators, date/time operators, the typical aggregation
functions, as well as the newer ordered analytical (windowed OLAP) functions. The only
typing normally required is the typing of names, descriptions and values (although some
automation is provided for names and values).
In addition to defining variables as expressions or formulas, the user may specify constraints
on the data, either for all the variables defined in a Variable Creation function, or on an
individual basis. Table level constraints defined for all variables result in WHERE, HAVING
or QUALIFY clauses in the generated SQL. Constraints defined for individual variables
result in the use of CASE clauses in order to allow for different constraints on different
variables in the same SQL statement. A feature to allow the creation of numerous similar
variables using constraints based on specific values of one or more ‘dimensioning’ columns is
also provided.
Any number of variables can be defined in a single Variable Creation function, provided they
conform to rules that allow them to be combined in the same table, and they do not exceed
the maximum number of columns allowed in a table by Teradata. Several variable properties
are used in determining which variables can be built in the same table. Some rules of
combining variables in the same Variable Creation function are given below.
• Variables derived in a single table must have the same aggregation type and level.
• A number of tables may be referenced by the variables defined in a single Variable
Creation function.

Chapter Two
Analytic Data Sets
• Variables referenced by another variable must not be dimensioned.

• All the variables in a Variable Creation function share the same table level
constraints.
• The user may request at any time that the intermediate table created by a Variable
Creation function be validated using the Teradata EXPLAIN feature.
The standard result options are available with the Variable Creation function, namely Select,
Explain Select, Create Table and Create View. The choice depends primarily on whether this
analysis produces a final result or an intermediate result, and if so, whether the user wants to
create a permanent table or view for this intermediate result. If a permanent result is not
desired, the Select option can be used to view and verify results. (Even if this analysis
produces an intermediate result directly referred to by another analysis, the Select option can
still be used since a volatile table will automatically be created in this case to allow the
referring analysis to access the results.)
SQL Elements
The Variable Creation function allows the creation of new columns or variables as
SQL expressions or formulas based on the features, functions and operators outlined
below, dependent on the release of Teradata in use at the time the variables are
defined:
1. Column(s) from one or more tables in one or more databases

2. Aggregation functions: MIN, MAX, SUM, AVG, COUNT, CORR,
COVAR_POP/SAMP, STDDEV_POP/SAMP, VAR_POP/SAMP, SKEW,
KURTOSIS, REGR_INTERCEPT/SLOPE/R2
3. Ordered analytical functions AVG, COUNT, MAX, MIN, PERCENT_RANK,
RANK and SUM, equivalents of OLAP functions MDIFF and QUANTILE in
terms of the new ordered analytical functions, and the old OLAP function
MLINREG
4. Arithmetic operators: +, -, *, /, MOD, **
5. Arithmetic functions: ABS, EXP, LN, LOG, SQRT, RANDOM
6. Trigonometric functions: COS, SIN, TAN, ACOS, ASIN, ATAN, ATAN2
7. Hyperbolic functions: COSH, SINH, TANH, ACOSH, ASINH, ATANH
8. CASE expressions, both valued and searched types
9. Comparison operators: =, >, <, <>, <=, >=
10. Logical predicates: (NOT) BETWEEN…AND…, (NOT) IN (expression list), IS
(NOT) NULL, AND, OR, NOT, (NOT) LIKE ‘pattern expression’, ANY, ALL
11. Custom logical predicates: AND ALL, OR ALL (making it easier to connect a
number of conditional expressions with an AND or OR operator)
12. NULL operators: NULLIF, COALESCE, NULLIFZERO, ZEROIFNULL
13. Built-in functions: CURRENT_DATE, CURRENT_TIME,
CURRENT_TIMESTAMP
14. Date/Time functions: ADD_MONTHS and EXTRACT
15. Custom Date/Time differences and elapsed time functions
16. Calendar fields based on a specified date column with all Teradata Calendar
options.
17. String functions: LOWER, UPPER, POSITION, SUBSTRING, TRIM,
concatenate ( || )
18. Type conversion: CAST expression AS data type
19. Parentheses: open ‘(‘ and close ‘)’

Chapter Two
Analytic Data Sets
20. Free SQL Text Entry

21. References to other variables
The same list applies to the creation of Dimensions, with the exclusion of all aggregation
functions and ordered analytical functions. Additionally, the Variable Creation analysis
also allows creation of WHERE, HAVING and QUALIFY clause constraints based
on the same list with the exclusion of aggregation functions (except with HAVING),
and ordered analytical functions (except with QUALIFY).
Variable Properties
Each time a new variable is defined, the program keeps track of several attributes of the
variable that control how it is generated. Some of these attributes can be explicitly set by the
user and some are determined by the SQL verbs or clauses selected by the user.
The properties explicitly set by the user include:
• Column name—either provided by the user or defaulted to a system generated value

• Column type—either chosen by the user, inherited from another column, or set to a
default value (only the numerous Teradata character, numeric and date/time types are
allowed, not byte or graphic types)
• Description—a description of any size may be associated with a variable
• Division by zero protection—by option, divisors are automatically converted to
NULL when they are zero to avoid SQL failure
Duplication By Dimension
Sometimes it is desirable to generate a number of similar variables at one time using data
constraints involving specific values or combinations of values from one or more columns in
the input table. These other columns can be thought of as dimensions upon which the new
variable is expanded or duplicated. For example, instead of creating a single variable
containing a customer’s average transaction amount, it may be desirable to create separate
variables for average transaction amount during each of the last 6 months, yielding 6
variables.
Duplication by dimension is performed at the time a variable is created with the Variable
Creation analysis. The user may dimension a variable on all or a subset of the dimension
values they define. Ordinarily, both the dimensioned and dimensioning variable reside in the
same input table. For example, both the transaction amount (variable being dimensioned) and
the transaction date (dimensioning variable) reside in the transaction table that is used as
input.
It is possible however to dimension a variable via a column in another table such as a

hierarchy table. This requires that the table containing the dimensioning variable also
contains a column that can be matched to a column in the table that contains the column to be
dimensioned. For example, you can dimension the average transaction amount by
department where the table containing the transaction amount also contains a product code,
and the hierarchy table used for dimensioning contains both a product code and department
code. (In this case, the product code must be used in the "join path" between the transaction
and hierarchy table.)

Chapter Two
Analytic Data Sets
Although variables duplicated by dimension are always implemented as aggregates by

necessity, the variables may or may not be summarized values. The example previously given
of average transaction amount is a summarized value where the individual dimension values
apply to multiple rows or observations. However, if the dimension values apply to specific
rows for each anchor key (see Join Paths, Anchor Tables and Anchor Keys below), then
duplicating by dimension amounts to picking out specific values rather than summarizing
over dimension values. An example of this might be dimensioning by month the values in a
table that summarizes transaction amounts by customer and month. In this case, dimensioning
by month simply selects the individual monthly sums or averages, creating a separate variable
for each. To do this the default aggregate function MIN is used.
Depending on the nature of the variable being dimensioned, the user may want to treat values
not applying to a particular dimension value as either NULL or 0. The use of NULL in this
case results in the possibility of the dimensioned variable being NULL if no data applies. The
use of 0 in this case simply gives a total of 0 if no data applies. An option is therefore
provided to the user to indicate that either NULL or 0 should be used when no data applies.
Applying Dimension Values

Consider the following example of defining dimension values based on a column called
tran_code in the input table twm_credit_tran from which a variable is being defined based on
another input column called tran_amt. The valid values of tran_code may be extracted
directly from the twm_credit_tran table using the Values button on the Variable Creation
input screen. (They could also be taken from the output of a previous run of the Data
Explorer or Frequency analyses.) At this point the user might select the tran_code values
‘CA’, ‘CG’, and ‘PM’ as dimension values, and the combination of ‘CA’ and ‘CG’ as a
fourth dimension value. A name is given to each of these dimension values to be used in
conjunction with variable names in naming any variables dimensioned by this dimension
value. A descriptive string may also be associated with each of the dimension values.
The dimension information is shown below for conceptual purposes in the form of two tables.
Note that the Dimension Values table targets the dimension values of tran_code in a
particular table. Notice that the conditions comprising the elements of the dimension may
overlap. That is, they do not need to be mutually exclusive in value.
Dimension Values:
Dimension Value Name Full Description

tran_code = ‘CA’ tran_code_CA Cash advance
tran_code = ‘CG’ tran_code_CG Charge
tran_code = ‘PM’ tran_code_PM Payment
tran_code IN (‘CA’,‘CG’) tran_code_CA_CG Advance or charge
Suppose the above dimension values are applied to a new variable, AVG(tran_amt), with
abbreviation Amt. The select list items for the AVG(tran_amt) dimensioned by these
dimension values would produce 4 variables:
Variable Name Full Description

tran_code_CA_Amt Average Tran Amount for Cash advances
tran_code_CG_Amt Average Tran Amount for Charges
tran_code_PM_Amt Average Tran Amount for Payments
tran_code_CA_CG_Amt Average Tran Amount for Advances or Charges

Chapter Two
Analytic Data Sets
Conditions other than simple inclusion can be used in defining dimensions. In fact, any SQL
construct listed previously with the exception of an aggregation or ordered analytic function
can be used.
Join Paths, Anchor Table and Anchor Keys

For each Variable Creation analysis, appropriate join paths must be set up, if columns from
multiple tables are used in creating the variables. The first step in putting together the Join
Paths is to determine what your “Anchor Table” and “Anchor Keys” are.
The anchor table is a table that contains all of the key values to be included in the final data
set. Physically, this can be a table or a view residing in Teradata. The data set anchor key
columns must be included in the anchor table and must uniquely identify rows in the anchor
table, otherwise unpredictable results may occur when joining this table with others.
Join paths must be specified from the Anchor Table to every table used to create variables,
dimensions and/or specified in a WHERE, QUALIFY or HAVING clause. This information
is used to build up a FROM clause for each table or view to be left outer joined with the
anchor table in order to include the appropriate anchor key values in the data set.
The following is an example of a simple join path between two tables. Note that the
containing databases can differ as can the joining table names and column names.
db1.tbl1.cust_id = db2.tbl2.cid
In some cases more than two tables must be joined together to reach a commonly used table.
By way of an example, a transaction table may not contain the customer identifier that forms
the primary index of the anchor table, but an account number instead, which is tied to
customer identifier in a third table which contains both values.
db1.tbl1.cust_id = db2.tbl2.cust_id AND

db2.tbl2.acct_id = db3.tbl3.acct_id
Of course, more complex examples can occur in practice and can be accommodated by a join
path with sufficient conditions combined together.
The Variable Creation function includes a Join Path wizard to make it easier to build up
complex join paths. Note also that join paths can be automatically extracted from other
analyses in the same project. This suggests that join paths can be created once in a Variable
Creation analysis, and then copied into a project to be used as a template.
SQL Generation
In order to derive the variables defined in a Variable Creation function, SQL is generated in
one of a number of forms depending on the result option selected. (Note that for each of
these forms, there is an option to "Create SQL Only" without executing the SQL.)
• "Select"
• "Explain Select"
• "Drop Table" and "Create Table As"

Chapter Two
Analytic Data Sets
• "Drop View" and "Create View"
When the SELECT option is chosen for output, if another analysis refers to this Variable
Creation analysis for its input, the SQL takes the form of a "Drop Table" and "Create Volatile
Table As".
Note that it is necessary to generate a DROP command prior to a CREATE in case the
definition of the table or view has changed since a previous execution. For each variable, a
select list item is generated for the variable expression. If requested as expert options,
WHERE, QUALIFY and/or HAVING clauses may be generated. In the FROM clause, data is
selected from the anchor table, and left outer joined to any other tables referred to in the
variable, dimension or expert clause definitions. Aliases are generated for each table or view
accessed and all column names are automatically qualified using these aliases.
Initiate a Variable Creation Function

Use the following procedure to initiate a new Variable Creation analysis in Teradata
Warehouse Miner:
2. In the resulting Add New Analysis dialog box, click on ADS under Categories and then
under Analyses double-click on Variable Creation:

Chapter Two
Analytic Data Sets
3. This will bring up the Variable Creation dialog in which you can define INPUT /
OUTPUT options.
Variable Creation - INPUT - Variables

On the Variable Creation dialog, click on INPUT and then click on variables. Click on
Variables on the large tab in the center of the panel.

Chapter Two
Analytic Data Sets
Note that this screen may be resized by clicking on one of the edges or corners and moving
the mouse while holding the button down.
Selection Options
Input Source
Select Table to input from a table or view, or select Analysis to select directly from
the output of a qualifying analysis in the same project. (Selecting Analysis will
cause the referenced analysis to be executed before this analysis whenever this
analysis is run. It will also cause the referenced analysis to create a volatile table if
the Output option of the referenced analysis is Select.)
Databases
All databases which are available for the Variable Creation analysis.
Tables
All tables within the Source Database which are available for the Variable Creation
analysis.
Columns
All columns within the selected table which are available for the Variable Creation
analysis.
Values
If a single column is highlighted and the Values button is clicked, a window appears
above the Columns selector displaying distinct values that appear in the selected
column in the selected table or view. The query to retrieve these values is affected by
two options on the Limits tab of the Tools menu item called Preferences, namely: Use

Chapter Two
Analytic Data Sets
sampling to retrieve distinct value data and Number of rows to sample. To remove
the temporary window that displays the values, select the Hide button at the top of
the display. (Note that if the Input Source is Analysis and the column is in a volatile
table created by the referenced analysis, the retrieval of Values may fail. Just follow
the directions in the informational message displayed in case of failure to retrieve
data values.)
Right-Click Menu Options
The same right-click menu options are offered for the Columns selector on the left side of the
input screen as are offered for other input screens (refer to the Analysis Input Screen topic
in Using Teradata Warehouse Miner). Also, the following right-click options are available
within the Variables panel.
Expand All Nodes

Expands the nodes completely for each variable.
Collapse All Nodes
Collapses the nodes to show only variables.
Switch Input To ‘<table name>’
This option applies only when a SQL Column is highlighted in the Variables panel
(or a Variable containing only a single SQL Column). When this option is selected,
the selectors on the left side of the input screen are adjusted to match the table or
analysis that contains the selected SQL Column. (The column is also selected.)
Switch ‘<table name>’ To Current Input
This option applies only when a SQL Column is highlighted in the Variables panel
(or a Variable containing only a single SQL Column). When this option is selected,
the selectors on the left side of the input screen are used to change the input table or
analysis of the selected SQL Column. A pop-up menu is displayed to allow changing
the input for this column only or for all occurrences. For a single column, a column
with the same name must occur in the new (currently selected) input table or analysis
or an error is given. When all columns are changed, the new table or analysis must
contain all the columns or an error is given and no changes are made.
Apply Dimensions to Variables
This option jumps to the upper dimensions tab so that dimensions can be applied to
variables.
Creating Variables From Columns
The variables to be created are specified one at a time as any type of SQL expression. One
way to create a new variable is to click on the New button to produce the following within the
Variables panel:
Another way to create one or more new variables is to drag and drop one or more columns
from the Columns panel to the empty space at the bottom of the Variables panel (multiple

Chapter Two
Analytic Data Sets
columns may be dragged and dropped at the same time). Each new variable is given the same
name as the corresponding column dropped onto the empty area.
One alternative to dragging and dropping a column is to use the right arrow selection button
to create a new variable from it. Another alternative is to double-click on the column. If the
right arrow button is clicked repeatedly, or the column is double-clicked repeatedly, a range
of columns may be used to create new variables, since the selected column increments each
time the arrow is clicked or the column is double-clicked. (It should be noted that when a
column or column value is selected, the right arrow selection button will only be highlighted
if a SQL Element is not selected. This can be ensured if the right-click option to Collapse All
Nodes is utilized in the SQL Element view.)
Whether dragging and dropping, clicking on the right arrow button or double-clicking on the
column, a new variable based on a column looks something like the following (after
expanding the node).
Creating Variables From SQL Elements
Still another way to create a new variable is to drag and drop a single SQL element from the
SQL Elements panel to the empty space at the bottom of the Variables panel, or to drag and
drop one or more column values displayed by selecting the Values button. In the case of
column values, a variable containing a single SQL Numeric Literal, String Literal or Date
Literal is created as appropriate for each column value. (This technique saves having to edit
the properties of a numeric, string or date literal to set the desired value.)
As with creating variables from selected columns, use of the right arrow selection button or
double-clicking the desired SQL element or column value provides an alternative to dragging
and dropping an element or value. Note however that repeated selection of a SQL element
does not advance the selected element so the result is multiple variables containing the same
SQL element. (Note also that when a SQL element is selected, the right arrow selection
button will only be highlighted if neither a column or a column value is selected in its
respective view.)
When a SQL element is placed on top of another element on the Variables panel, whether by
dragging and dropping it, selecting it with the right arrow or by double-clicking it, the new
element is typically inserted into the expression tree at that point. The element replaced is
then typically moved to an empty operand of the new SQL element.
Whether dragging and dropping, clicking on the right arrow button or double-clicking, a new
variable based on a SQL element looks something like the following example involving the
Average element:

Chapter Two
Analytic Data Sets
Copying or Moving a Variable
It is possible to create a copy of a variable by holding down the Control key on the keyboard
while dragging the variable to another location in the Variables panel. The copy can be
placed ahead of another variable by dropping it on that variable, or at the end of the list of
variables by dropping it on the empty space at the bottom of the Variables panel. It is also
possible to copy a variable in the same manner from another analysis by viewing the other
analysis at the same time and dragging the variable from one analysis to the other.
Please be aware that if the Control key is not held down while performing the copy operation
just described within the same analysis, the variable is moved form one place to the other, i.e.
deleted from its old location and copied to the new one. There are two exceptions to this.
First, this is not the case when copying a variable from one analysis to another, in which case
a copy operation is always performed, with or without holding down the Control key. The
second exception is when moving one child node on top of another child node of the same
parent in the expression tree that defines a variable. In this case, the two nodes or sub-
expressions are switched. (For example, if income and age are added together and age is
moved on top of income, the result is to add age and income, reversing the operands.)
Replicating a Variable
It is possible to create multiple varied copies of a variable by dropping or selecting mupltiple

columns or values onto a component of a variable that is not a folder, that is a component
that is designed to hold only a single element. For example, after selecting the New button, if
10 columns were dragged and dropped onto the empty node underneath the new variable, the
entire variable would be replicated 10 times, each copy containing a different column and
named with the original variable name appended with a number between 1 and 10.
Deleting All Variables
All variables can be deleted from the analysis by selecting the double-back-arrow button in
the center of the Variable Creation window. When this function is requested, one or more
warnings will be given. The first warning indicates how many variables are about to be
deleted. The second possible warning is given if the number of variables being deleted
exceeds 100, the maximum number of operations that can be undone or redone using the
Undo or Redo buttons. (If this warning is given and the Undo button is then selected, only
the first 100 variables will be restored. These are actually the last 100 deleted, since they are
deleted in reverse order.)
Buttons
New Button
Clicking on the New button creates a new Variable on the panel.

Chapter Two
Analytic Data Sets
Add Button
Clicking on the Add button brings up a dialog to allow adding copies of variables from other
loaded analyses.
On this dialog select:
Available Analyses
This drop down list contains all of the Variable Creation analyses currently loaded in
the Project window, including those in other projects.
Available Variables
These are the variables in the currently selected analysis.
Retain dimensions attached to variables when copying

Checking this box will include any applied dimensions on variables copied into the
analysis. Unchecking this box will result in the dimensions being dropped from
copied variables.

Chapter Two
Analytic Data Sets
(Note that if the box is checked and the selected analysis is in another project, and
one of the variables or dimensions applied to a variable being copied contains a
reference to another analysis, an error message will be given and none of the
variables will be copied.)
Map database objects in copied variables to new values

Checking this box will allow the user to change the databases, tables or columns
referenced in the variables being copied (and their dimensions, if any). This is done
by presenting the Object Mapping Wizard similar to the Import Wizard described in
the File Menu section of Using Teradata Warehouse Miner.
OK/Cancel/Apply
Each time the Apply button is clicked a copy of the currently selected variables are
added and a status message given. The Apply button is also disabled as a
consequence until another variable is selected. The dialog can be exited at any time
by clicking the OK or Cancel button. If OK is clicked, the currently selected
variables will be added unless the Apply button is disabled.
Wizard Button
When the Variables tab is selected and either a Variable is selected or nothing is
selected, the Wizard button can be used to generate new variables, each containing a
Searched Case statement. Alternately, when an appropriate folder is selected, When
Conditions for Searched Case statements, or conditional expressions for And All or Or
All statements, can be generated. To do so, highlight the Case Conditions folder under a
Case - Searched node or the Expressions folder under an And All or Or All node and
select the Wizard button.
The maximum number of variables or values that can be generated by a single application
of the wizard is limited to 1000.
The following dialog is given when a Variable or nothing at all is selected. (Note that in
the other cases a subset of these fields is displayed with appropriate instructions at the top
of the dialog.)

Chapter Two
Analytic Data Sets
Variable Prefix
When a comparison operator such as Equal is selected in the Operator field, the
names of the resulting variables consist of the prefix followed by underscore and the
selected value. Otherwise the variable name is the prefix followed by a number.
Description
description of the resulting variables consist of the description specified here
followed by the operator and selected value. Otherwise the description is the
description entered here.
Left Side Column/Expression

Replace the "(empty)" node with a SQL Column or more complex expression
involving a SQL Column.
Then Expression
Replace the "(empty)" node with a SQL element or more complex expression that
will form the Then clause of the generated Searched Case expression. (The default
value of ‘1’ is useful for an indicator variable.)
Else Expression

Chapter Two
Analytic Data Sets
Replace the "(empty)" node with a SQL element or more complex expression that
will form the Else clause of the generated Searched Case expression. (The default
value of ‘0’ is useful for an indicator variable.)
Operator
Select a comparison operator such as Equals or select Between, Not Between, In, Not
In, Is Null or Is Not Null as the operator to use. If Between or Not Between is
selected, a variable or condition is generated for each pair of requested values. If In
or Not In is selected, the Wizard will generate a single variable or condition based on
all requested values when 'OK' or 'Apply' is clicked. If Is Null or Is Not Null is
selected, the Wizard will generate a single variable or condition based on no values.
Otherwise, if a comparison operator such as Equal is selected, the Wizard will
generate a variable or condition for each requested value.
Right Side Values
Values
This tab accepts values displayed by selecting the Values button for input
columns on the left side of the input screen. The displayed values can be drag-
dropped onto this panel, selected with the right-arrow button or selected by
double-clicking them. They can be numeric, string or date type values.
Note that when values are displayed on the left side of the input screen, the
ellipses button (the one displaying ‘…’) may be used to Select All Values.
Range
This tab can be used to generate a range of integer or decimal numeric values
based on a From, To and By field. If desired, the values can be generated in
descending order by making the From value greater than the To value, so that the
By value should always be positive. If the By field is not specified, an
incremental value of 1 is assumed. (Note that a value displayed with the Values
button may be drag-dropped into this field. Note also that the escape key will
revert to the last value entered in this field.)
When the Between or Not Between operator has been specified, the Range fields
behanve somewhat differently and may be used only to specify a single pair of
values using the From and To field, with the From field validated to be less than
or equal to the To field. The By field may not be specified when the Between or
Not Between operator has been specified.
List
A list of numeric, string or date type values can be entered here, separated by
commas (actually, by the standard list separator for the current locale settings).
(Note that a value displayed with the Values button may be drag-dropped into
this field. Note also that the escape key will revert to the last value entered in
this field.)
Clear All
This button will clear all of the fields of this dialog. (This is convenient because all
entries are generally retained when returning to this dialog.)

Chapter Two
Analytic Data Sets
OK
This button will generate the requested variables or conditions and return to the
Variables panel.
Cancel
This button returns to the Variables panel without generating any elements.
Apply
This button will generate the requested variables or conditions and remain on this
panel. A status message is displayed just above this button reporting on the number
of generated conditions.
Delete Button
The Delete button can be used to delete any node within the tree. If applicable, the tree
will roll-up children, but in some cases, a delete may remove all children.
SQL Button
The SQL button can be used to dynamically display the SQL for any node within the
Variables tree. If the resulting display is not closed, the expression changes as you click
on the different levels of the tree comprising a variable. An option is provided in the
display to Qualify column names, that is to precede each column name in the display
with its database and table name.
Properties Button
A number of properties are available when defining a variable to be created, as outlined
below. Click the Properties button when the variable is highlighted, or double click on
the variable to bring up the Properties dialogue:

Chapter Two
Analytic Data Sets
Name:
A name must be specified for each variable. If the SQL expression defining the
variable is simply a SQL Column, the name defaults to the name of the column
automatically when the column is dragged to the variable.
(Tip: Variables can be named by single left-clicking on the name, which produces a
box around the name, as in Windows Explorer)
Output Type:
A specific Teradata data type may optionally be specified for each variable. If
specified, the SQL CAST function is used to force the data type to the requested
specification. Otherwise the type will be generated automatically by the variable’s
expression (Generate Automatically option). Valid options include:
• BYTEINT
• CHAR
• DATE
• DECIMAL
• FLOAT
• INTEGER
• SMALLINT
• TIME
• TIMESTAMP
• VARCHAR
Column Attributes:
One or more column attributes can be entered here in a free-form manner to be used
when an output table is created. They are placed as-entered following the column
name in the CREATE TABLE AS statement. This can be particularly useful when
requesting data compression for an output column, which might look like the
following: COMPRESS NULL.
Description:
An optional description may be specified for each variable. (Note that a default
description is generated automatically by the Wizard if its Description field contains
a value.)
Undo Button
The Undo button can be used to undo changes made to the Variables panel. Note that if a
number of variables or dimension values are added at one time, each addition requires a
separate undo request to reverse. Up to 100 undo requests can be processed.
Redo Button
The Redo button can be used to reinstate a change previously undone with the Undo
button.

Chapter Two
Analytic Data Sets
Question-Mark Help Button

The Question-Mark Help button can be used to request help information about a specific
SQL element by first clicking on the question-mark and then on the SQL element in the
SQL Elements panel, Variables panel or Dimensions panel.
Variable Creation - INPUT - Variables - SQL Elements
The following SQL Elements are supported, by category:
Aggregations
A number of aggregation functions are supported, including several of a statistical nature.
Note that aggregation functions are not allowed in a Dimension value expression, but may be
used in a Variable expression that is being dimensioned. They are not allowed in a Where
clause or Qualify clause either. Double click on Aggregations to view the supported
functions:
Average
The standard average function is supported, taking a single expression argument and
generating AVG(expression). The function returns a value of type float, with the
exception that a value of type date is returned as the average of a date expression. When
dragging an Average into a variable, the following tree element is created:
Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.

Chapter Two
Analytic Data Sets
The option to compute the average over distinct values only is provided, resulting in the
generation of AVG(DISTINCT expression). This option is enabled through the
Properties panel. Double-click on Average, or highlight it and hit the Properties button:
Correlation
An enhanced version of the standard correlation function is supported, generating

CORR(expression1, expression2) and returning a value of type float. When dragging a
Correlation into a variable, the following tree element is created:
Columns, and/or other non-aggregate expressions can be moved over the two (empty)
branches of the tree.
The enhancement is the ability to compute the correlation when either or both the first
and second expression arguments evaluate to type date, generating one of the following:
CORR(date expression1 - DATE '1900-01-01', expression2)

CORR(expression1, date expression2 - DATE '1900-01-01')
CORR(date expression1 - DATE '1900-01-01',
date expression2 - DATE '1900-01-01')
There are no special properties for the Correlation function.
Covariance

Chapter Two
Analytic Data Sets
An enhanced version of the standard covariance function is supported, generating

COVAR_SAMP(expression1, expression2) for the sample covariance or
COVAR_POP(expression1, expression2) for the population covariance, while returning a
value of type float. When dragging a Covariance into a variable, the following tree
element is created:
Columns, and/or other non-aggregate expressions can be moved over the two (empty)
branches of the tree.
The enhancement consists of the ability to compute the covariance when either or both
the first and second expression arguments evaluate to type date, generating one of the
following (in which COVAR_POP may be substituted for COVAR_SAMP):
COVAR_SAMP(date expression1 - DATE '1900-01-01', expression2)

COVAR_SAMP(expression1, date expression2 - DATE '1900-01-01')
COVAR_SAMP(date expression1 - DATE '1900-01-01',
date expression2 - DATE '1900-01-01')
The option to compute the covariances on the population or sample is offered through the
Properties panel. Double-click on Covariance, or highlight it and hit the Properties
button:
Count

Chapter Two
Analytic Data Sets
The standard count function is supported, generating either COUNT(*) or

COUNT(expression) and returning a value of type integer in Teradata run mode or
decimal(15,0) in ANSI run mode. When dragging a Count into a variable, one of the
following tree elements is created:
The (empty) is added, if no expression yet existed within the variable. Otherwise the
expression is maintained. In either case, columns, and/or other non-aggregate expressions
can be moved over the (empty) branch in the tree. An Asterisk (*) may also be moved
from the Other category to request the COUNT(*) function.
The option to compute the count over distinct values only is provided, resulting in the
generation of COUNT(DISTINCT expression). This option is enabled through the
Properties panel. Double-click on Count, or highlight it and hit the Properties button:
Kurtosis
An enhanced version of the standard kurtosis function is supported, generating

KURTOSIS(expression) and returning a type of float. When dragging a Kurtosis into a
variable, the following tree element is created:

Chapter Two
Analytic Data Sets
of the tree.
The enhancement consists of the ability to compute the kurtosis of a date expression,
generating KURTOSIS(date expression - DATE '1900-01-01').
The standard option to compute the kurtosis over distinct values only is also provided,
resulting in the generation of KURTOSIS(DISTINCT expression). This option is enabled
through the Properties panel. Double-click on Kurtosis, or highlight it and hit the
Properties button:
Maximum
The standard maximum function is supported, generating MAX(expression) and

returning a value of type matching the type of the expression. When dragging a
Maximum into a variable, the following tree element is created:
of the tree.

Chapter Two
Analytic Data Sets
The option to compute the maximum over distinct values only is provided, resulting in
the generation of MAX(DISTINCT expression). This option is enabled through the
Properties panel. Double-click on Maximum, or highlight it and hit the Properties
button:
Minimum
The standard minimum function is supported, generating MIN(expression) and returning

a value of type matching the type of the expression. When dragging a Minimum into a
of the tree.
The option to compute the minimum over distinct values only is provided, resulting in the
generation of MIN(DISTINCT expression). This option is enabled through the
Properties panel. Double-click on Minimum, or highlight it and hit the Properties
button:

Chapter Two
Analytic Data Sets
Regression Intercept
The standard regression intercept function is supported, generating

REGR_INTERCEPT(dependent expression, independent expression) and returning a
value of type float. When dragging a Regression Intercept into a variable, the following
tree element is created:
of the tree.
There are no special properties for the Regression Intercept function.
Regression R-Squared
The standard regression coefficient of determination or R-Squared function is supported,

generating REGR_R2(dependent expression, independent expression) and returning a
value of type float. When dragging a Regression R-Squared into a variable, the following

Chapter Two
Analytic Data Sets
of the tree.
There are no special properties for the Regression R-Squared function.
Regression Slope
The standard regression slope function is supported, generating

REGR_SLOPE(dependent expression, independent expression) and returning a value of
type float. When dragging a Regression Slope into a variable, the following tree element
is created:
of the tree.
There are no special properties for the Regression Slope function.
Skewness
An enhanced version of the standard skew function is supported, generating

SKEW(expression) and returning a type of float. When dragging a Skewness into a
of the tree.
The enhancement consists of the ability to compute the skew of a date expression,
generating SKEW(date expression - DATE '1900-01-01').
The standard option to compute the skew over distinct values only is also provided,
resulting in the generation of SKEW(DISTINCT expression). This option is enabled

Chapter Two
Analytic Data Sets
through the Properties panel. Double-click on Skewness, or highlight it and hit the
Properties button:
Standard Deviation
An enhanced version of the standard function for standard deviation is supported,

generating either STDDEV_SAMP(expression) for the sample standard deviation or
STDDEV_POP(expression) for the population standard deviation, while returning a
value of type float. When dragging a Standard Deviation into a variable, the following
of the tree.
The enhancement consists of the ability to compute the standard deviation of a date
expression, generating for the sample version STDDEV_SAMP(date expression - DATE
'1900-01-01').
The standard option to compute the standard deviation over distinct values only is also
provided, resulting in the generation for the sample version of
STDDEV_SAMP(DISTINCT expression). Both this option as well as the options for
population and sample versions of standard deviation are enabled through the Properties
panel. Double-click on Standard Deviation, or highlight it and hit the Properties button:

Chapter Two
Analytic Data Sets
Sum
The standard sum function is supported, generating SUM(expression). The type of the
resulting value depends on the type of the expression being summed. If the expression is
any of the integer types, the resulting value is of type integer. If the expression is a float
or character type, the resulting value is of type float. A decimal expression results in a
value of decimal type with 18 total digits and the same number of fractional digits
contained in the decimal expression. When dragging a Sum into a variable, the following
of the tree.
The option to compute the sum over distinct values only is provided, resulting in the
generation of SUM(DISTINCT expression). This option is enabled through the
Properties panel. Double-click on Sum, or highlight it and hit the Properties button:

Chapter Two
Analytic Data Sets
Variance
An enhanced version of the standard variance function is supported, generating either

VAR_SAMP(expression) for the sample variance or VAR_POP(expression) for the
population variance, while returning a value of type float. When dragging a Variance into
a variable, the following tree element is created:
of the tree.
The enhancement consists of the ability to compute the variance of a date expression,
generating for the sample version VAR_SAMP(date expression - DATE '1900-01-01').
The standard option to compute the variance over distinct values only is also provided,
resulting in the generation for the sample version of VAR_SAMP(DISTINCT
expression). Both this option as well as the options for population and sample versions of
standard deviation are enabled through the Properties panel. Double-click on Standard
Deviation, or highlight it and hit the Properties button:

Chapter Two
Analytic Data Sets
Arithmetic
Numeric functions can operate in general on any expression that can automatically be
converted to a numeric value. Character type operands are automatically converted to a
number of type float if possible performing the numeric function. Additionally, the standard
and Teradata specific numeric operators are supported. Double click on Arithmetic to view
the supported functions and operators:
Absolute Value
The standard absolute value function is supported, generating ABS(expression) and

returning a positive value of the same magnitude with type matching that of expression or

Chapter Two
Analytic Data Sets
float if expression is a character type. When dragging an Absolute Value into a variable,
the following tree element is created:
Columns, and/or other expressions can be moved over the (empty) branch of the tree.
There are no special properties for the Absolute Value function.
Add
The standard Add (+) operator is supported, generating expression+ expression. Within
Teradata, these operators automatically convert numeric operands to the expected result
type before they are applied. Character type data is converted to FLOAT if possible
before being applied. Operands of type DATE are valid, when adding an integer number
of days to a date expression. The resulting data types and other specific usage
information are documented in some detail in the Teradata documentation. When
dragging an Add into a variable, the following tree element is created:
Columns, and/or other expressions can be moved over the (empty) branches of the tree.
There are no special properties for the Add function.
Divide
The standard Divide (/) operator is supported, generating expression / expression. Within
Teradata, these operators automatically convert numeric operands to the expected result
before being applied. The resulting data types and other specific usage information are
documented in some detail in the Teradata documentation. When dragging a Divide into

Chapter Two
Analytic Data Sets
A Teradata Warehouse Miner enhancement to the divide '/' operator is offered to

optionally request that divide-by-zero protection be provided. If this option is requested,
a NULLIF function is added in the denominator so that the overall expression evaluates
to NULL if the expression in the denominator evaluates to zero. This option is enabled
through the Properties panel. Double-click on Divide, or highlight it and hit the
Properties button:
Exponentiate
The Teradata specific exponentiate operatir (Expression raised to a value) is supported,

generating (expression ** value) and returning a value of type float. When dragging an
Exponentiate into a variable, the following tree element is created:
Note that the second argument must resolve to a numeric literal.
There are no special properties for the Exponentiate function.
Logarithm

Chapter Two
Analytic Data Sets
The standard base-10 logarithm function is supported, generating LOG(expression) and

returning a value of type float. When dragging a Logarithm into a variable, the following
There are no special properties for the Logarithm function.
Modulo
The Teradata specific implementation of the Modulo (MOD) operator is supported,

generating expression MOD expression. Within Teradata, this operator automatically
converts numeric operands to the expected result type before they are applied. Character
type data is converted to FLOAT if possible before being applied. The resulting data
types and other specific usage information are documented in some detail in the Teradata
documentation. When dragging a Modulo into a variable, the following tree element is
created:
There are no special properties for the Modulo operator.
Multiply
The standard Multiply (*) operator is supported, generating expression * expression.

Within Teradata, this operator automatically converts numeric operands to the expected
result type before they are applied. Character type data is converted to FLOAT if possible
documented in some detail in the Teradata documentation. When dragging a Multiply
into a variable, the following tree element is created:

Chapter Two
Analytic Data Sets
There are no special properties for the Multiply function.
Natural Exponentiate
The standard natural exponentiate function (e to the power) is supported, generating

EXP(expression) and returning a value of type float. When dragging a Natural
Exponentiate into a variable, the following tree element is created:
There are no special properties for the Natural Exponentiate function.
(Note that it may be advisable to use a Case statement in conjunction with this function if
extreme values in the data may occur, resulting in an overflow or SQL argument error.)
Natural Logarithm
The standard natural logarithm function is supported, generating LN(expression) and

returning a value of type float. When dragging a Natural Logarithm into a variable, the
following tree element is created:
There are no special properties for the Natural Logarithm function.
(Note that it may be advisable to use a Case statement in conjunction with this function if
zero or negative values may occur in the data, resulting in a SQL argument error.)
Random
The random function is a non-standard Teradata SQL feature, generating RANDOM(x, y)

and returning a pseudo-random integer between x and y. When dragging a Random into a

Chapter Two
Analytic Data Sets
The integers x (Lower Bound) and y (Upper Bound) are set through the Properties panel.
Double-click on Random, or highlight it and hit the Properties button:
Square Root
The standard square root function is supported, generating SQRT(expression) and

returning a value of type float. When dragging a Square Root into a variable, the
Note that expressions that resolve to a negative number will result in SQL errors.
There are no special properties for the Square Root function.
Subtract
The standard Subtract (-) operator is supported, generating expression - expression.

Within Teradata, this operator automatically converts numeric operands to the expected
result type before they are applied. Character type data is converted to FLOAT if possible
before being applied. Operands of type DATE are valid, when subtracting an integer

Chapter Two
Analytic Data Sets
number of days from a date expression. The resulting data types and other specific usage
information are documented in some detail in the Teradata documentation. When
dragging a Subtract into a variable, the following tree element is created:
There are no special properties for the Subtract function.
Unary Minus
The standard Unary Minus (-) operator is supported, generating -expression. Within
Teradata, this operator automatically converts numeric operands to the expected result
documented in some detail in the Teradata documentation. When dragging a Unary
Minus into a variable, the following tree element is created:
There are no special properties for the Unary Minus function.
Unary Plus
The standard Unary Plus (+) operator is supported, generating +expression. Within
Teradata, this operator automatically converts numeric operands to the expected result
documented in some detail in the Teradata documentation. When dragging a Unary Plus

Chapter Two
Analytic Data Sets
There are no special properties for the Unary Plus function.
Calendar
A Teradata Warehouse Miner specific function is provided for transforming a date

expression, or the date portion of a timestamp expression, into one of many fields based on
the Teradata system calendar. The type of the expression returned is always integer. Although
the built-in Teradata system calendar is not used to perform the function, the function mimics
the derivation of each of the fields in the Teradata system calendar. It uses some of the same
calculations used in the underlying system calendar views, but it also relies on Teradata date
arithmetic and the standard SQL EXTRACT function. Further, the Teradata Warehouse
Miner calendar function may be applied equally to a date or timestamp expression.
Double-click on Calendar to see all of the supported functions:
Day of Calendar
The Day of Calendar function is supported, returning an integer 1-n, the number of Julian
days since 1/1/1900. When dragging a Day of Calendar into a variable, the following tree
element is created:
Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.

Chapter Two
Analytic Data Sets
There are no special properties for the Day of Calendar function.
Day of Month
The Day of Month function is supported, returning an integer 1-31, the number of the day
within a given month. When dragging a Day of Month into a variable, the following tree
element is created:
branch of the tree.
There are no special properties for the Day of Month function.
Day of Week
The Day of Week function is supported, returning an integer 1-7, the number of the day
within a given week assuming 1/1/1900 Is Monday. When dragging a Day of Week into a
branch of the tree.
There are no special properties for the Day of Week function.
Day of Year
The Day of Year function is supported, returning an integer 1-366, the number of the day
within a given year. When dragging a Day of Year into a variable, the following tree
element is created:
branch of the tree.
There are no special properties for the Day of Year function.

Chapter Two
Analytic Data Sets
Month of Calendar
The Month of Calendar function is supported, returning an integer 1-N, the number of
months Since 1/1/1900. When dragging a Month of Calendar into a variable, the
branch of the tree.
There are no special properties for the Month of Calendar function.
Month of Quarter
The Month of Quarter function is supported, returning an integer 1-3, the number of the
month in a given quarter. When dragging a Month of Quarter into a variable, the
branch of the tree.
There are no special properties for the Month of Quarter function.
Month of Year
The Month of Year function is supported, returning an integer 1-12, the number of the
month in a given year. When dragging a Month of Year into a variable, the following tree
element is created:
branch of the tree.
There are no special properties for the Month of Year function.
Quarter of Calendar

Chapter Two
Analytic Data Sets
The Quarter of Calendar function is supported, returning an integer 1-N, the number of
Quarters Since Q1/1900. When dragging a Quarter of Calendar into a variable, the
branch of the tree.
There are no special properties for the Quarter of Calendar function.
Quarter of Year
The Quarter of Year function is supported, returning an integer 1-4, the quarter of the
year where Jan-Mar=1, Apr-Jun=2, Jul-Sep=3, Oct-Dec=4. When dragging a Quarter of
Year into a variable, the following tree element is created:
branch of the tree.
There are no special properties for the Quarter of Year function.
Week of Calendar
The Week of Calendar function is supported, returning an integer 0-N, partial week in the
beginning is 0. When dragging a Week of Calendar into a variable, the following tree
element is created:
branch of the tree.
There are no special properties for the Week of Calendar function.
Week of Month

Chapter Two
Analytic Data Sets
The Week of Month function is supported, returning an integer 0-5, partial week in the
beginning is 0. When dragging a Week of Month into a variable, the following tree
element is created:
branch of the tree.
There are no special properties for the Week of Month function.
Week of Year
The Week of Year function is supported, returning an integer 0-53, partial week in the
beginning is 0. When dragging a Week of Year into a variable, the following tree element
is created:
branch of the tree.
There are no special properties for the Week of Year function.
Weekday of Month
The Weekday of Month function is supported, returning an integer 1-5, nth occurrence of
day of week in month. When dragging a Weekday of Month into a variable, the
branch of the tree.
There are no special properties for the Weekday of Month function.
Year of Calendar

Chapter Two
Analytic Data Sets
The Year of Calendar function is supported, returning the year, assuming year starts
January 1st. When dragging a Year of Calendar into a variable, the following tree element
is created:
branch of the tree.
There are no special properties for the Year of Calendar function.

Case
Both the standard valued and searched CASE expressions are supported. In both cases the
ELSE expression is optional, and if not specified, the expression will return NULL if no
WHEN conditions evaluate to TRUE. At least one WHEN/THEN condition is required with
both forms of CASE expression, and a test expression is also required with the valued form.
(Note that a CASE statement embedded within a CASE statement as a THEN or ELSE
expression is automatically enclosed in parentheses if it is not already so enclosed. This
makes it easier to achieve correct syntax if a nested CASE statements is needed.)
Double-click on Case to see all of the CASE-related functions:
Case - Searched
The standard searched CASE expression is supported. When dragging a Searched Case

Chapter Two
Analytic Data Sets
The searched CASE statement is built up by supplying one or more conditions within the
Conditions folder. Each time a Condition is added, the following tree element is created:
Which evaluates to:
CASE WHEN condition/expression THEN expression END
Note that the ELSE expression is optional, and if not specified, the expression will return
NULL if no WHEN conditions evaluate to TRUE. At least one WHEN/THEN condition
is required with the CASE expression.
There are no special properties for the Searched Case function.
Case - Valued
The standard valued CASE expression is supported. When dragging a Valued Case into a
The valued CASE statement is built up by supplying one or more conditions within the
Conditions folder. Each time a Condition is added, the following tree element is created:
Which evaluates to:

Chapter Two
Analytic Data Sets
CASE expression WHEN condition/expression THEN expression END
Note that the ELSE expression is optional, and if not specified, the expression will return
NULL if no WHEN conditions evaluate to TRUE. At least one WHEN/THEN condition
is required with the CASE expression, and a test expression is also required with the
valued form.
There are no special properties for the Valued Case function.
Case Condition
For both Searched and Valued CASE statements, any number of conditions can be built
up. In order to do so, a Condition must first be dragged and dropped into the Conditions
folder of a Searched Case or Valued Case expression. Each condition results in an
expression of the form WHEN expression THEN expression. As an example, when
dragging a Condition into the Conditions folder of a Searched Case expression, the
There are no special properties for the Condition function.
Coalesce
The standard COALESCE case expression is supported, generating SQL of the form
COALESCE(expression, …expression). It must be supplied at least two arguments. The
entire COALESCE expression will automatically be enclosed in parenthesis if it is not
part of an expression and not already enclosed in parenthesis. When dragging a Coalesce
Multiple expressions can be built up within the Expressions folder.

Chapter Two
Analytic Data Sets
Note that COALESCE can be used in place of the non-standard Teradata specific
command ZEROIFNULL. For example, COALESCE(column1, 0) is equivalent to
ZEROIFNULL(column1).
There are no special properties for the Coalesce function.
Null If
The standard NULLIF case expression is supported, generating SQL of the form
NULLIF(expression, expression). It must be supplied exactly two arguments. The entire
NULLIF expression will automatically be enclosed in parenthesis if it is not part of an
expression and not already enclosed in parenthesis. When dragging a Null If into a
Note that NULLIF can be used in place of the non-standard Teradata specific command
NULLIFZERO. For example, NULLIF(column1, 0) is equivalent to
NULLIFZERO(column1).
There are no special properties for the Null If function.
Null If Zero
The non-standard Teradata specific NULLIFZERO case expression is supported,

generating SQL of the form NULLIFZERO(expression). It must be supplied exactly one
argument. When dragging a Null If Zero into a variable, the following tree element is
created:
A column, and/or other expression can be moved over the (empty) branch of the tree.
Note that the Null If element, which generates the standard NULLIF command, can be
used in place of the Null If Zero element, which generates the non-standard Teradata
specific command NULLIFZERO. (In Teradata SQL, NULLIF(column1, 0) is equivalent
to NULLIFZERO(column1).)

Chapter Two
Analytic Data Sets
There are no special properties for the Null If Zero function.
Zero If Null
The non-standard Teradata specific ZEROIFNULL case expression is supported,

generating SQL of the form ZEROIFNULL (expression). It must be supplied exactly one
argument. When dragging a Zero If Null into a variable, the following tree element is
created:
A column, and/or other expression can be moved over the (empty) branch of the tree.
Note that the Coalesce element, which generates the standard COALESCE command, can
be used in place of the Zero If Null element, which generates the non-standard Teradata
specific command ZEROIFNULL. (In Teradata SQL, COALESCE(column1, 0) is
equivalent to ZEROIFNULL (column1).)
There are no special properties for the Zero If Null function.
Comparison
The standard comparison operators are supported, including equals (=), not equals (<>), less
than (<), less than or equals (<=), greater than (>) and greater than or equals. Comparison
operators evaluate to a true or false condition which can be used in various contexts such as
case conditions. Double-click on Comparison to see all of the operators:
Equals
The Equals function is supported, generating expression = expression. When dragging an

Equals operator into a variable, the following tree element is created:

Chapter Two
Analytic Data Sets
Columns, and/or other expressions and literals can be moved over the (empty) branches
of the tree.
There are no special properties for the Equals operator.
Greater Than
The Greater Than operator is supported, generating expression > expression. When
dragging a Greater Than operator into a variable, the following tree element is created:
of the tree.
There are no special properties for the Greater Than operator.
Greater Than or Equals
The Greater Than or Equals operator is supported, generating expression => expression.
When dragging a Greater Than or Equals operator into a variable, the following tree
element is created:
of the tree.
There are no special properties for the Greater Than or Equals operator.
Less Than
The Less Than operator is supported, generating expression < expression. When dragging
a Less Than operator into a variable, the following tree element is created:

Chapter Two
Analytic Data Sets
of the tree.
There are no special properties for the Less Than operator.
Less Than or Equals
The Less Than or Equals operator is supported, generating expression <= expression.
When dragging a Less Than or Equals operator into a variable, the following tree element
is created:
of the tree.
There are no special properties for the Less Than or Equals operator.
Does Not Equal
The Does Not Equal operator is supported, generating expression <> expression. When
dragging a Does Not Equal operator into a variable, the following tree element is created:
of the tree.
There are no special properties for the Does Not Equal operator.
Date and Time
Many different date and time functions and operators are offered to extract elements from a
date or time column as well as perform differences/elapsed calculations on multiple date and
time elements. Double-click on Date and Time to see all of the functions and operators:

Chapter Two
Analytic Data Sets
Add Months
The non-standard Teradata specific Add Months function is supported, generating

ADD_MONTHS(date or timestamp expression, integer expression for months). The type
of the value returned is the same as the type of the date or timestamp expression that
months are added to or subtracted from. When dragging an Add Months function into a
Columns, and/or other expressions and/or literals that resolve to a date can be moved
over the first (empty) branch of the tree, while expressions and/or literals that resolve to
type integer can be moved over the second (empty) branch of the tree.
There are no special properties for the Add Months function.
Current Date
The Current Date literal represents the current system date. It generates the SQL keyword
CURRENT_DATE and is of type Date. When dragging a Current Date function into a

Chapter Two
Analytic Data Sets
There are no children (empty) branches of the tree as no arguments are required for the
Current Date function.
There are no special properties for the Current Date function.
Current Time
The Current Time literal represents the current system time and current session Time
Zone displacement. It generates the keyword CURRENT_TIME and is of type Time
With Time Zone. The feature allowing the specification of the number of digits of
precision for fractional seconds is not supported; no fractional digits are provided. When
dragging a Current Time function into a variable, the following tree element is created:
Current Time function.
There are no special properties for the Current Time function.
Current Timestamp
The Current Timestamp literal represents the current system timestamp and current
session Time Zone displacement. It generates the keyword CURRENT_TIMESTAMP
and is of type Timestamp With Time Zone. The feature allowing the specification of the
number of digits of precision for fractional seconds is not supported; six digits are always
provided. When dragging a Current Timestamp function into a variable, the following
Current Timestamp function.
There are no special properties for the Current Timestamp function.
Date Difference
A Teradata Warehouse Miner specific function is provided for calculating the difference
between two Date and/or Timestamp expressions in various units. The integer measures
are calculated by expressing both dates in the requested units and then taking the integer
difference between the two (for example, the difference between April 1 and March 31 is

Chapter Two
Analytic Data Sets
1 month). The fractional measures are in days converted to fractions of longer time
periods. Note that either Date or Timestamp expression may be a literal value, the built-in
function Current Date or Current Timestamp, or the analytic data set's target date value.
When dragging a Date Difference function into a variable, the following tree element is
created:
Columns, and/or other expressions/literals that resolve to a date or timestamp can be

moved over the (empty) branches of the tree.
Options to compute the difference in Days, Weeks, Months, Quarters or Years are set
through the Properties panel. If Weeks, Months, Quarters or Years are requested, the
units may be calculated in one of two different ways, as described on the Properties
panel. Double-click on Date Difference, or highlight it and hit the Properties button:
Date Field
Days
Calculate the date difference in integer days.
Weeks
Either calculate the difference in days and convert this to fractional weeks, or
express both dates in weeks and take the integer difference.
Months
Either calculate the difference in days and convert this to fractional months,
or express both dates in months and take the integer difference.

Chapter Two
Analytic Data Sets
Quarters
Either calculate the difference in days and convert this to fractional quarters,
or express both dates in quarters and take the integer difference.
Years
Either calculate the difference in days and convert this to fractional years, or
express both dates in years and take the integer difference.
Elapsed Time
A Teradata Warehouse Miner specific function is provided for calculating in various

units the elapsed time from midnight represented by the time portion of a Timestamp or
Time expression. Time zones are ignored. By default the elapsed time is calculated in
units of seconds, but may alternatively be requested in minutes, hours or days (fraction of
a day). When dragging an Elapsed Time function into a variable, the following tree
element is created:
Columns, and/or other expressions/literals that resolve to a time or timestamp can be

moved over the (empty) branch of the tree.
Options to compute the elapsed time in seconds, minutes, hours or days (fraction of a
day) are set through the Properties panel. Double-click on Elapsed Time, or highlight it
and hit the Properties button:

Chapter Two
Analytic Data Sets
Extract Day
The standard date/time field extract function is supported for Day, generating
EXTRACT(DAY FROM date/time expression). If this function is applied to a column or
expression of type other than Date or Timestamp, a SQL runtime error will occur. When
dragging an Extract Day function into a variable, the following tree element is created:

The type of the value returned is integer. There are no special properties for the Extract
Day function.
Extract Hour
The standard date/time field extract function is supported for Hour, generating
EXTRACT(HOUR FROM date/time expression). If this function is applied to a column
or expression of type other than Time or Timestamp, a SQL runtime error will occur.
When dragging an Extract Hour function into a variable, the following tree element is
created:

Hour function.
Extract Minute
The standard date/time field extract function is supported for Minute, generating
EXTRACT(MINUTE FROM date/time expression). If this function is applied to a
column or expression of type other than Time or Timestamp, a SQL runtime error will
occur. When dragging an Extract Minute function into a variable, the following tree
element is created:

Chapter Two
Analytic Data Sets

Minute function.
Extract Month
The standard date/time field extract function is supported for Month, generating
EXTRACT(MONTH FROM date/time expression). If this function is applied to a
column or expression of type other than Date or Timestamp, a SQL runtime error will
occur. When dragging an Extract Month function into a variable, the following tree
element is created:

Month function.
Extract Second
The standard date/time field extract function is supported for Second, generating
EXTRACT(DAY FROM date/time expression). If this function is applied to a column or
expression of type other than Time or Timestamp, a SQL runtime error will occur. When
dragging an Extract Second function into a variable, the following tree element is created:


Chapter Two
Analytic Data Sets
The type of the value returned is integer if fractional seconds precision is 0, and
DECIMAL(8, n) if precision is n. There are no special properties for the Extract Second
function.
Extract Year
The standard date/time field extract function is supported for Year, generating
EXTRACT(YEAR FROM date/time expression). If this function is applied to a column
or expression of type other than Date or Timestamp, a SQL runtime error will occur.
When dragging an Extract Year function into a variable, the following tree element is
created:

Year function.
Time Difference
A Teradata Warehouse Miner specific function is provided for calculating the time
differences between two Time or Timestamp expressions in seconds, minutes, hours or
days (fraction of a day). The date portion however of any Timestamp expression is
ignored, so that the measure is strictly the difference between two time values, assumed
to be from the same day. All the measures are based on the difference measured in
seconds, with conversions to other larger units. Any differences in time zones are
ignored. When dragging a Time Difference function into a variable, the following tree
element is created:

The difference in seconds, minutes, hours and days are set through the Properties panel.
Double-click on Time Difference, or highlight it and hit the Properties button:

Chapter Two
Analytic Data Sets
Select Seconds, Minutes, Hours or Days from the Time Field pull-down.
Date/Time Difference
A Teradata Warehouse Miner specific function is provided for calculating the date/time
differences between two Timestamp columns in seconds, minutes, hours or days. Note
that this includes the day differences as well as the time differences. All the measures are
based on the difference measured in seconds, with conversions to other larger units. Any
differences in time zones are ignored. When dragging a Date/Time Difference function

The difference in seconds, minutes, hours and days are set through the Properties panel.
Double-click on Date/Time Difference, or highlight it and hit the Properties button:

Chapter Two
Analytic Data Sets
Select Seconds, Minutes, Hours or Days from the Time Field pull-down.
Literals
A number of SQL literal values may be used in SQL expressions that define created
variables. Double-click on Literals to see all of the literal operators:
Date
SQL Date literal values consist of the keyword DATE followed by a date enclosed in
single quotes with the format YYYY-MM-DD such as DATE ‘20003-12-31’. When
dragging a literal Date into a variable, the following tree element is created:

Chapter Two
Analytic Data Sets
A default date of January 1, 0001 is provided, but can be changed via Properties. Double
click on Date (1/1/0001) or highlight and hit the Properties button:
Either the standard windows Calendar control can be used to set the desired date by
clicking through the months in the calendar using the < and > buttons, or just typing in
the desired date, specifying the month, day and year one at a time.
Null
The SQL Null literal represents an unknown value and is treated as having the type
Integer. It generates the SQL keyword NULL. When dragging a literal Null into a
There are no special properties for the literal Null.
Number
SQL numeric literal values of type BYTEINT, SMALLINT, INTEGER, FLOAT and
DECIMAL are supported. Care should be taken not to exceed the capacity of the type
(for example, specifying more than 18 decimal digits). When dragging a literal Number

Chapter Two
Analytic Data Sets
A default value of an integer 0 is provided, but can be changed via Properties. Double
click on Number (0) or highlight and hit the Properties button:
Typing in an integer format number such as 1, results in a 1 being generated in the SQL,
while decimal format numbers such as 1.0, results in a 1.0000E0 being generated in the
SQL.
String
SQL String literal values consists of zero or more characters enclosed in single quotes
and are treated as being of type character varying with length equal to the number of
characters which are enclosed in quotes. The feature allowing specification of the
character set is not supported. When dragging a literal String into a variable, the
No default string is provided - use Properties to change it. Double click on String or
highlight and hit the Properties button:

Chapter Two
Analytic Data Sets
Type in any valid Teradata string literal.
(Note that the string literal will automatically be enclosed in quotes when SQL is
generated for the literal. If a single quote mark is included in the string literal, it will
automatically be "escaped" by doubling it. If however more than one quote mark is
entered, the value is placed in the SQL "as-is", without adding quote marks. This makes
it possible to enter a hexadecimal literal if desired, such as '00'XC.)
Time
SQL Time literal values consist of the keyword TIME followed by a time enclosed in
single quotes with the format HH:MM:SS. Time zones and fractional seconds are not
supported. When dragging a literal Time into a variable, the following tree element is
created:
A default time midnight is provided, but can be changed via Properties. Double click on
Time (00:00:00) or highlight and hit the Properties button:

Chapter Two
Analytic Data Sets
You can highlight the hours, minutes and seconds and type in the desire time.
Timestamp
SQL Timestamp literal values consist of the keyword TIMESTAMP followed by a

timestamp enclosed in single quotes with the format YYYY-MM-DD HH:MM:SS. Time
zones are not supported on SQL Timestamp Literals. When dragging a literal Timestamp
A default timestamp of the current date and time is provided, but can be changed via
Properties. Double click on Timestamp (CurrentDate CurrentTime) or highlight and hit
the Properties button:

Chapter Two
Analytic Data Sets
Either the standard windows Calendar control can be used to set the desired date by
clicking through the months in the calendar using the < and > buttons, or just typing in
the desired date. You can highlight the hours, minutes and seconds and type in the desire
time.
Target Date
A Target Date, as defined in INPUT-target date (described below), can be used in
variable creation. When no target date has been specified yet, the default value is the
current date. When dragging a literal Target Date into a variable, the following tree
element is created:
There are no special properties for the Target Date operator.
Logical
Logical predicates are used to form conditional expressions that evaluate to true or false in a
manner similar to comparison operators. Click on Logical to view a list of supported
operators:

Chapter Two
Analytic Data Sets
All
The standard All predicate is supported with an expression list but not with a subquery.
They may be used with a comparison operator and with the In / Not In and Like / Not
Like predicates. When dragging an All operator into a variable, the following tree
element is created:
Any number of columns, and/or other expressions can be moved into the Expressions
folder within the tree.
There are no special properties for the All operator.
And
The logical operator AND is supported for use in conditional expressions, connecting
either comparison operators or logical predicates. When dragging an And operator into a

Chapter Two
Analytic Data Sets
Columns, and/or other expressions can be moved into the (empty) branches of the tree.
There are no special properties for the And operator.
And All
The And All operator is a custom operator created for the convenience of connecting a
series of conditional expressions together with SQL And operators. When dragging an
And All operator into a variable, the following tree element is created:
Conditional expressions should be moved into the Expressions folder beneath the And
All node so that they will be connected with And operators. For example, if the
expressions "C1 = C2", "C3 = C4" and "C5 = C6" were moved into the Expressions
folder as three Equal nodes, the resulting SQL would be something like the expression
below. (Of course, the column names such as "C1" would be qualified with table aliases,
and the expression by itself is not valid as a variable, though it would be as a dimension.)
(("C1" = "C2") AND ("C3" = "C4")) AND ("C5" = "C6")
There are no special properties for the And All operator.
Any
The standard Any predicate is supported with an expression list but not with a subquery.
It may be used with a comparison operator and with the In / Not In and Like / Not Like
predicates. The following are some examples of the SQL generated for these cases.
expression = ANY (1, 2), is equivalent to:

expression = 1 OR expression = 2
expression IN ANY (1, 2), is equivalent to:

expression IN (1, 2) and to above
expression LIKE ANY ('%string%', '%string%'), is equivalent to:

expression LIKE ('%string%') OR expression LIKE ('%string%')

Chapter Two
Analytic Data Sets
When dragging an Any operator into a variable, the following tree element is created:
Any number of columns, and/or other expressions can be moved into the Expressions
folder within the tree.
There are no special properties for the Any operator.
Between
The standard BETWEEN comparison predicate is supported. It generates SQL of the

form expression BETWEEN expression AND expression. The Between predicate
evaluates to true if the first expression is greater than or equal to the second expression at
the same time that it is less than or equal to the third expression. When dragging a
Between operator into a variable, the following tree element is created:
Columns, and/or other expressions can be moved into the (empty) branches of the tree,
the first argument being the expression to the left of the BETWEEN, the second to the
right, and the third to the right of the AND.
There are no special properties for the Between operator.
In
The standard In predicate is supported with a single expression or a list of literal

constants, but not with subqueries. That is it may be used to test whether or not an
expression equals another expression or is one of a list of values, but not if it is returned
from a query. The In predicate generates SQL of the form expression IN expression or
expression IN (literal, … literal). The use of the ANY predicate with the IN predicate is
optional. That is, IN (…), IN ANY (…) and = ANY (…) are all equivalent. When
dragging an In operator into a variable, the following tree element is created:

Chapter Two
Analytic Data Sets
One or more literals or a single column or SQL element can be moved into the
Expressions folder within the tree. A column, expression or literal can be moved into the
(empty) branch of the tree.
There are no special properties for the In operator.
Is Null
The standard Is Null predicate is supported to test whether or not an expression has a
SQL NULL value, i.e. is undefined in a particular row. The generated SQL takes the
form expression IS NULL. When dragging an Is Null operator into a variable, the
There are no special properties for the Is Null operator.
Is Not Null
The standard Is Not Null predicate is supported to test whether or not an expression has a
SQL NULL value, i.e. is undefined in a particular row. The generated SQL takes the
form expression IS NOT NULL. When dragging an Is Not Null operator into a variable,
There are no special properties for the Is Not Null operator.
Like
The standard Like predicate is supported with pattern expressions but not with
subqueries. It generates SQL of the form expression LIKE ANY / ALL pattern. The
percent (%) or underscore (_) characters can be used to allow searching for a pattern. The

Chapter Two
Analytic Data Sets
percent character represents zero or more characters of any value, whereas underline
represents exactly one. An “escape” character may not be specified. Some examples
include:
expression LIKE ('%string%')

expression LIKE ('_string_')
expression LIKE ANY ('%string%', '%string%')
When dragging a Like operator into a variable, the following tree element is created:
where the first argument is the expression to the left of the LIKE and the second to the
right.
There are no special properties for the Like operator.
Not
The logical operator NOT is supported for use in conditional expressions, connecting
either comparison operators or logical predicates. When dragging a Not operator into a
Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Not operator.
Not Between
The standard NOT BETWEEN comparison predicate is supported. It generates SQL of

the form or expression NOT BETWEEN expression AND expression. The Not Between
predicate is the logical opposite of the BETWEEN predicate. It tests that the first
expression is less than the second expression, or greater than the third expression. When
dragging a Not Between operator into a variable, the following tree element is created:

Chapter Two
Analytic Data Sets
the first argument being the expression to the left of the BETWEEN, the second to the
right, and the third to the right of the AND.
There are no special properties for the Not Between operator.
Not In
The standard Not In predicate is supported with a single expression or a list of literal
constants, but not with subqueries. That is it may be used to test whether or not an
expression equals another expression or is one of a list of values, but not if it is returned
from a query. The Not In predicate generates SQL of the form expression NOT IN
expression or expression NOT IN (literal, … literal). The use of the ALL predicate with
the Not In predicate is optional, provided the single expression form is not used. That is,
NOT IN (…), NOT IN ALL (…) and <> ALL (…) are equivalent. When dragging a Not
In operator into a variable, the following tree element is created:
One or more literals or a single column or SQL element can be moved into the
Expressions folder within the tree. A column, expression or literal can be moved into the
There are no special properties for the Not In operator.
Not Like
The standard NOT LIKE predicate is supported with pattern expressions but not with
subqueries. It generates SQL of the form expression NOT LIKE ANY / ALL pattern. The
percent (%) or underscore (_) characters can be used to allow searching for a pattern. The
percent character represents zero or more characters of any value, whereas underline
represents exactly one. An “escape” character may not be specified. Some examples
include:
expression NOT LIKE ('%string%')

expression NOT LIKE ('_string_')
expression NOT LIKE ANY ('%string%', '%string%')

Chapter Two
Analytic Data Sets
When dragging a Not Like operator into a variable, the following tree element is created:
where the first argument is the expression to the left of the NOT LIKE and the second to
the right.
There are no special properties for the Not Like operator.
Or
The logical operator OR is supported for use in conditional expressions, connecting either
comparison operators or logical predicates. When dragging an Or operator into a variable,
where the first argument is to the left of the OR, the second to the right.
There are no special properties for the Or operator.
Or All
The Or All operator is a custom operator created for the convenience of connecting a
series of conditional expressions together with SQL Or operators. When dragging an Or
All operator into a variable, the following tree element is created:
Conditional expressions should be moved into the Expressions folder beneath the Or All
node so that they will be connected with Or operators. For example, if the expressions
"C1 = C2", "C3 = C4" and "C5 = C6" were moved into the Expressions folder as three

Chapter Two
Analytic Data Sets
Equal nodes, the resulting SQL would be something like the expression below. (Of
course, the column names such as "C1" would be qualified with table aliases, and the
expression by itself is not valid as a variable, though it would be as a dimension.)
(("C1" = "C2") OR ("C3" = "C4")) OR ("C5" = "C6")
There are no special properties for the Or All operator.
Ordered Analytical Functions
The following Ordered Analytical Functions are available in Variable Creation. Double Click
on Ordered Analytical to see:
Ordered Analytical Functions (previously known as OLAP or On Line Analytical Processing

functions) are distinguished from other SQL functions in that they order the data being
operated on before computing the function value, at times making use of "adjacent"
observations. Most of the functions are standard SQL functions with a common form, but a
few are non-standard Teradata specific functions included because they have no equivalent in
some or all Teradata releases. (These functions may not be mixed in the same analysis with
aggregation functions such as average, and partitioning is not supported in these functions
because of their use of the GROUP BY clause to perform partitioning). Some functions also
contain Teradata Warehouse Miner specific enhancements, as noted in the individual function
descriptions below.
All of the standard ordered analytical functions consist of a value expression enclosed in
parentheses and an OVER construct composed of an optional PARTITION BY clause, an
ORDER BY clause (required with all but group style aggregation) and possibly a ROWS
clause (depending on the function), all within parentheses. The PARTITION BY clause is

Chapter Two
Analytic Data Sets
something like the GROUP BY clause in a simple aggregation, partitioning the rows into
groups over which the function is separately applied. The PARTITION BY clause effectively
causes the function to "start over" for each partitioned group of rows. An example of an
ordered analytical function containing these components is given below.
AVG(sales) OVER (PARTITION BY territory

ORDER BY month
ROWS 2 PRECEDING)
The traditional aggregate functions AVG, COUNT, MIN, MAX and SUM have ordered
versions that take on different styles depending on the ROWS clause that is used. The
variations available for these functions are Cumulative, Group, Moving and Remaining, as
outlined below. The RANK function and related functions PERCENT_RANK and
QUANTILE do not offer the ROWS options. Note that not all variations are available with
Teradata V2R4.1, as noted in the individual function descriptions that follow.
Rows options corresponding to a cumulative style aggregation include:
• ROWS UNBOUNDED PRECEDING

• ROWS BETWEEN UNBOUNDED PRECEDING AND value PRECEDING
• ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
• ROWS BETWEEN UNBOUNDED PRECEDING AND value FOLLOWING
Rows options corresponding to a group style aggregation include:
• ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
Rows options corresponding to a moving style aggregation include:
• ROWS value PRECEDING

• ROWS CURRENT ROW
• ROWS BETWEEN value PRECEDING AND value PRECEDING
• ROWS BETWEEN value PRECEDING AND CURRENT ROW
• ROWS BETWEEN value PRECEDING AND value FOLLOWING
• ROWS BETWEEN CURRENT ROW AND CURRENT ROW
• ROWS BETWEEN CURRENT ROW AND value FOLLOWING
• ROWS BETWEEN value FOLLOWING AND value FOLLOWING
Rows options corresponding to a remaining style aggregation include:
• ROWS BETWEEN value PRECEDING AND UNBOUNDED FOLLOWING

• ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
• ROWS BETWEEN value FOLLOWING AND UNBOUNDED FOLLOWING
Note that Ordered Expression is not an Ordered Analytical Function but simply a means of
specifying a sort direction (ascending or descending) on a sort expression. Note also that
Ordered Analytical Functions are not allowed in a Dimension value, Dimensioned variable,
Where clause or Having clause.
Moving Difference

Chapter Two
Analytic Data Sets
Given one or more columns/expressions, along with a width and sort expression list, this
Ordered Analytical Function derives a new column for each expression giving the
moving difference of the expression when the rows are sorted by the sort expression list.
The moving difference is calculated as the difference between the current value and the
nth previous value of the expression, where N equals the width. The moving difference is
NULL if there is no N -preceding row in the table or group.
th
In Teradata V2R4.1, this function is implemented using an enhanced version of MDIFF,

a non-standard Teradata specific function. MDIFF may not be mixed in the same analysis
with aggregation functions such as average, and partitioning is not supported. The SQL
generated takes the form MDIFF(expression, width, sort expression list). The
enhancement is the ability to compute the moving difference of a date expression,
generating MDIFF(expression - DATE '1900-01-01', width, sort expression list).
In Teradata V2R5.0 and later releases an equivalent version of Moving Difference is

generated using the standard ordered analytical function SUM. In this case the Partition
By option is available, as with other standard ordered analytical functions. Note that the
non-standard expression MDIFF(expression, width, sort expression list) is the same as:
expression - SUM(expression)
OVER (ORDER BY sort expression list ROWS BETWEEN width
PRECEDING AND width PRECEDING)
When dragging a Moving Difference function into a variable, the following tree element
is created:
Sort expressions can be built up in the Sort Expressions folder, and if the system is
V2R5.0 or later, Partition Columns can be built up in that folder (with V2R4.1 systems,
Partition Columns are ignored). Columns, and/or other expressions can be moved into the
(empty) branch of the tree. The Width is specified by the Properties panel. Double click
on Moving Difference, or highlight it and click on the Properties button:

Chapter Two
Analytic Data Sets
A default width of 1 is given and can be updated here.
Moving Linear Regression
The non-standard Teradata specific moving linear regression function is supported,

generating MLINREG(expression, width, sort expression). The function will work
without any special enhancement when applied to a date expression. MLINREG may not
be mixed in the same analysis with aggregation functions such as average, and
partitioning is not supported.
Given a single expression, width, and sort expression, this Ordered Analytical Function
derives a new column giving the moving linear regression extrapolation of the expression
over "width" rows when sorted by the sort expression, using the sort expression as the
independent variable. The current and "width-1" rows after sorting are used to calculate
the simple least squares linear regression. For rows that have less than "width-1" rows
preceding it in the table or group, the function is computed using all preceding rows. The
first two rows in the table or group however will have the NULL value.
As an example, moving linear regression predicting y based on x over w rows looks like:
MLINREG(y, w, x)
When dragging a Moving Linear Regression function into a variable, the following tree
element is created:

Chapter Two
Analytic Data Sets
A single sort expressions should be placed in the Sort Expressions folder, and a column
or expression should be moved into the (empty) branch of the tree. The Width is
specified by the Properties panel. Double click on Moving Linear Regression, or
highlight it and click on the Properties button:
A default width of 3 is given and can be updated here.
Ordered Expression
An Ordered Expression can be used in a Sort Expressions folder with any of the Ordered
Analytical Functions to specify a sort direction, either ascending or descending. The
appropriate SQL keyword, either ASC or DESC, is automatically added to the SQL
generated for the expression placed under the Ordered Expression node in the tree. If an
Ordered Expression is not used, the default sort direction is given, depending on the
Ordered Analytical Function in use. An example of an Ordered Expression in a Sort
Expressions folder is given below.

Chapter Two
Analytic Data Sets
In order to set the sort direction the user must either highlight the Ordered Expression
node and click on the Properties button, or double-click on the Ordered Expression node
to receive the following Properties panel. Clicking on the OK button will cause the
selected sort order to be used.
Percent Rank
Given a sort expression list, this Ordered Analytical Function derives a new column
which assumes a value between 0 and 1 indicating the rank of the rows as a percentage of
rows when sorted by the sort expression list. The formula used for PERCENT_RANK is
(R – 1) / (N – 1) where R is the rank of the row and N is the number of rows overall or in
the partition.
As with the RANK function, when the column or expression has the same value for
multiple rows (say M rows), they are all assigned the same percent rank, while the
following M-1 percent rank values are not assigned. When an optional Partition By
clause is specified, the percent ranks are computed separately over the rows in each
partition. (Note from the formula used for PERCENT_RANK that if there is only one
row to be ranked in the table or partition, division by zero will result and give a numeric
overflow error.) Rows options are not available with the Percent Rank function.
A Teradata Warehouse Miner enhancement to the Percent Rank function is offered to

optionally request that NULL values in any element of the sort expression list cause the
row to be excluded in the ranking process. When dragging a Percent Rank function into a

Chapter Two
Analytic Data Sets
Sort expressions can be built up in the Sort Expressions folder, and Partition Columns
can be built up in that folder. The enhancement to the Percent Rank function to optionally
request that NULL values in any element of the sort expression list cause the row to be
excluded in the ranking process is enabled through the Properties Panel. Double click on
Percent Rank, or highlight it and click on the Properties button:
The default is to Include NULL values in the analysis, but that can be disabled here.
Quantile
Given a sort expression list and the number of quantile partitions, this Ordered Analytical
Function derives a new column giving the quantile partition that each row belongs to
based on the sort expression list and the requested number of quantile partitions. When an
optional Partition By clause is specified, the quantile partitions are computed separately
over the rows in each partition. Rows options are not available with the Quantile
function. Although there is a non-standard Teradata specific command QUANTILE, the
function is implemented in Variable Creation using the standard RANK and COUNT
functions.
A Teradata Warehouse Miner enhancement to the Quantile function is offered to

optionally request that NULL values in any element of the sort expression list cause the
row to be excluded in the ranking process. When dragging a Quantile function into a

Chapter Two
Analytic Data Sets
can be built up in that folder. The enhancement to the Quantile function to optionally
excluded in the ranking process, as well as setting the number of partitions are both
enabled through the Properties Panel. Double click on Percent Rank, or highlight it and
click on the Properties button:
The default number of Partitions is 0, but can be changed here. Additionally, the default
is to include NULL values in the analysis, but that can be disabled here.
Rank
Given a sort expression list, this Ordered Analytical Function derives a new column
indicating the rank of the rows when sorted by the specified sort expression list. When
the column or expression has the same value for multiple rows (say M rows), they are all
assigned the same rank, while the following M-1 rank values are not assigned. For
example, column values 3,3,3,2,1 could be assigned rank values of 1,1,1,4,5. When an
optional Partition By clause is specified, the ranks are determined separately over the
rows in each partition (the ranking process is reset for each new partition). Rows options
are not available with the Rank function.
A Teradata Warehouse Miner enhancement to the Rank function is offered to optionally


Chapter Two
Analytic Data Sets
excluded in the ranking process. When dragging a Rank function into a variable, the
Sort expressions can be built up in the Sort Expressions folder, Partition Columns can be
built up in that folder. The enhancement to the Rank function to optionally request that
NULL values in any element of the sort expression list cause the row to be excluded in
the ranking process is enabled through the Properties Panel. Double click on Percent
Rank, or highlight it and click on the Properties button:
The default is to Include NULL values in the analysis, but that can be disabled here.
Windowed Average
Cumulative, Group, Moving or Remaining Average are supported within the Windowed
Average function. Given a value expression, a width and a sort expression list, this
function derives a new column giving the cumulative, group, moving or remaining
average of the value expression over "width" rows when sorted by the sort expression
list. For rows that have less than "width-1" rows preceding it in the table or group, the
function is computed using all preceding rows. When an optional Partition By clause is
specified, the averages are computed separately over the rows in each partition. Any of
the Rows options may be used to determine the type of average to compute. Note that in
Teradata V2R4.1 only the moving average is available with the "ROWS value
PRECEDING" option. When dragging a Windowed Average function into a variable, the

Chapter Two
Analytic Data Sets
can be built up in that folder. The options to perform a Cumulative, Group, Moving or
Remaining Average, and their associated options, is enabled through the Properties Panel.
Double click on Windowed Average, or highlight it and click on the Properties button:
These options are defined below for each of the four types of Windowed Averages:
1. Aggregation Style: Cumulative

Second Row Style: None, or
Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
Second Value: 0-n
2. Aggregation Style: Group
3. Aggregation Style: Moving

First Row Style: Current Row, or
Value Preceding, or
Value Following.

Chapter Two
Analytic Data Sets

First Value: 0-n
Current Row, or
Value Preceding, or
Value Following.
Second Value: 0-n
4. Aggregation Style: Remaining

Value Preceding, or
Value Following.
First Value: 0-n
Windowed Count
Cumulative, Group, Moving and Remaining Count are supported within the Windowed
Count function. This function derives a new column giving the cumulative, group,
moving or remaining count of the number of rows or rows with non-null values of a
value expression, when rows are sorted by a sort expression list. When an optional
Partition By clause is specified, the counts are accumulated only over the rows in each
partition (the start of a partition resets the accumulated count to 0). With Teradata V2R5
and later releases, any of the Rows options may be used to determine the type of count to
compute. In V2R4.1 only the Group option with no Sort Expression and "ROWS
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING" clause
may be used with the COUNT function. When dragging a Windowed Count function into
Sort expressions can be built up in the Sort Expressions folder, if the system is V2R5.0 or
later, and Partition Columns can be built up in that folder. By default a windowed
COUNT(*) is done, but another expression can be built up in its place. The options to
perform a Cumulative, Group, Moving or Remaining Count, and their associated options,
is enabled through the Properties Panel. Double click on Windowed Count, or highlight it
and click on the Properties button:

Chapter Two
Analytic Data Sets
These options are defined below for each of the four types of Windowed Count:

Current Row, or
Value Preceding, or
Value Following.
Second Value: 0-n

Value Preceding, or
Value Following.
First Value: 0-n
Current Row, or
Value Preceding, or
Value Following.
Second Value: 0-n

Value Preceding, or
Value Following.

Chapter Two
Analytic Data Sets
First Value: 0-n
Windowed Maximum
Cumulative, Group, Moving and Remaining Maximum are supported within the
Windowed Maximum function. This function, available only with Teradata V2R5 and
later releases, derives a new column containing the minimum or maximum value of a
column or expression. When an optional Partition By clause is specified, the minimum or
maximum values are determined over the rows in each partition. Any of the Rows
options may be used with this function. When dragging a Windowed Maximum function
Remaining Maximum, and their associated options, is enabled through the Properties
Panel. Double click on Windowed Maximum, or highlight it and click on the Properties
button:
These options are defined below for each of the four types of Windowed Maximum:


Chapter Two
Analytic Data Sets
Current Row, or
Value Preceding, or
Value Following.
Second Value: 0-n

Value Preceding, or
Value Following.
First Value: 0-n
Current Row, or
Value Preceding, or
Value Following.
Second Value: 0-n

Value Preceding, or
Value Following.
First Value: 0-n
Windowed Minimum
Cumulative, Group, Moving and Remaining Minimum are supported within the
Windowed Minimum function. This function, available only with Teradata V2R5 and
later releases, derives a new column containing the minimum or maximum value of a
column or expression. When an optional Partition By clause is specified, the minimum or
maximum values are determined over the rows in each partition. Any of the Rows
options may be used with this function. When dragging a Windowed Minimum function
Sort expressions can be built up in the Sort Expressions folder, Partition Columns can be
built up in that folder. The options to perform a Cumulative, Group, Moving or
Remaining Minimum, and their associated options, is enabled through the Properties

Chapter Two
Analytic Data Sets
Panel. Double click on Windowed Minimum, or highlight it and click on the Properties
button:
These options are defined below for each of the four types of Windowed Minimum:

Current Row, or
Value Preceding, or
Value Following.
Second Value: 0-n

Value Preceding, or
Value Following.
First Value: 0-n
Current Row, or
Value Preceding, or
Value Following.
Second Value: 0-n


Chapter Two
Analytic Data Sets
Value Preceding, or
Value Following.
First Value: 0-n
Windowed Sum
Cumulative, Group, Moving and Remaining Sum are supported within the Windowed
Sum function. This function derives a new column for a value expression giving the
cumulative, group, moving or remaining sum of the value expression when sorted by a
sort expression list. When an optional Partition By clause is specified, the sums are
accumulated only over the rows in each partition (the start of a partition resets the
accumulated sum to 0). Any of the Rows options may be used with Teradata V2R5 to
determine the type of sum to compute. With Teradata V2R4.1, only Cumulative--Rows
Unbounded Preceding, Group--Between Rows Unbounded Preceding and Unbounded
Following, and Moving--Rows Value Preceding are supported. When dragging a
Windowed Sum function into a variable, the following tree element is created:
Remaining Sum and their associated options, is enabled through the Properties Panel.
Double click on Windowed Sum, or highlight it and click on the Properties button:

Chapter Two
Analytic Data Sets
These options are defined below for each of the four types of Windowed Sum:

Current Row, or
Value Preceding, or
Value Following.
Second Value: 0-n

Value Preceding, or
Value Following.
First Value: 0-n
Current Row, or
Value Preceding, or
Value Following.
Second Value: 0-n

Value Preceding, or
Value Following.
First Value: 0-n
String Functions
The following standard string functions are available in Variable Creation. Double Click on
String to see:

Chapter Two
Analytic Data Sets
Character Length
The standard character length function is supported for determining the length of variable
character data.. (When used with fixed length character data, the defined column length is
always returned.) When dragging a Character Length operator into a variable, the
A column and/or expression for which to get the character length can be moved into the
When used in conjunction with the Trim function, the Character Length function can
also be used to determine the length of fixed character length data by first trimming pad
characters, as in the following.
Concatenate

Chapter Two
Analytic Data Sets
The standard concatenate operator is supported for joining two character expressions
together, generating the SQL expression1 || expression2. Numeric or date expressions are
converted to characters before concatenating. The resulting type, one of the character data
types, depends on the type of the expressions, as described in the Teradata
documentation. When dragging a Concatenate operator into a variable, the following tree
element is created:
where the first argument is to the left of the concatenation operator, the second to the
right.
There are no special properties for the Concatenate operator.
Lower
The standard lower case function is supported for converting all characters in an
expression to lower case. It is valid only if the expression evaluates to a character data
type with the LATIN character set. The SQL generated is LOWER(expression) and the
type returned is that of the expression. When dragging a Lower operator into a variable,
There are no special properties for the Lower operator.
Position
The standard string position function is supported for determining the position of a
substring within a string. The SQL generated is POSITION(expression1 IN expression2)
where expression1 is the substring and expression2 is the string. The two string
expressions must both evaluate to a character, numeric or date type. Numeric or date
expressions are converted to characters before evaluating. The type returned is integer.
The position returned is the logical position, not the byte position. The first position in a
string is treated as 1 and 0 is returned when the substring is not in the string. When
dragging a Position operator into a variable, the following tree element is created:

Chapter Two
Analytic Data Sets
Columns, and/or other expressions can be moved into the (empty) branches of the tree
where the first argument is expression1 as indicated above, and the second expression2.
There are no special properties for the Position operator.
Substring
The standard substring function is supported for extracting a portion of a string based on
a position value and optional length. The SQL generated is SUBSTRING(expression
FROM position FOR length). The expression to take a substring from may be of a
character, numeric or date type, with a numeric or date expression being automatically
converted to a character expression before taking the substring. The first position in the
string is 1, and if length is not specified it means "until the end of the string". The type
returned is VARCHAR. When dragging a Substring operator into a variable, the
Columns, and/or other expressions can be moved into the (empty) branch of the tree. The
starting position and length of the substring are specified in the Properties panel. Double
click on Substring, or highlight it and click on the Properties button:

Chapter Two
Analytic Data Sets
Trim
The semi-standard trim function is supported for removing leading and/or trailing
characters or bytes matching pad characters or a specified character from a character
string. (The ability to specify a character set for the expression is however not supported.)
The SQL generated may take one of these forms:
TRIM(expression)
TRIM(LEADING/TRAILING/BOTH FROM expression)
TRIM(LEADING/TRAILING/BOTH char FROM expression)
The expression to trim may be of a character, numeric, date or byte type, with a numeric
or date expression being automatically converted to a character expression before
trimming. The type returned is VARCHAR (or VARBYTE for byte data). When
dragging a Trim operator into a variable, the following tree element is created:
value to trim and the type of trimming are specified in the Properties panel. Double click
on Substring, or highlight it and click on the Properties button:

Chapter Two
Analytic Data Sets
Valid Trim Styles are (Default), Leading, Trailing or Both. If (Default) is specified, both
leading and trailing pad characters (or null bytes for byte type data) are trimmed. Any
type of character can be specified to be trimmed in Value to Trim.
(Note that the value to trim will automatically be enclosed in quotes when SQL is
generated for the value. If a single quote mark is included in the value, it will
automatically be "escaped" by doubling it. If however more than one quote mark is
entered, the value is placed in the SQL "as-is", without adding quote marks. This makes
it possible to enter a hexadecimal literal if desired, such as '00'XC.)
Upper
The standard upper case function is supported for converting all characters in an
expression to upper case. It is valid only if the expression evaluates to a character data
type. The SQL generated is UPPER(expression) and the type returned is that of the
expression. When dragging an Upper operator into a variable, the following tree element
is created:
Columns, and/or other expressions can be moved into the (empty) branches of the tree
where the first argument is expression1 as indicated above, and the second expression2.
There are no special properties for the Upper operator.

Chapter Two
Analytic Data Sets
Trigonometric
The following trigonometric functions are available in Variable Creation. Double click on
Trigonometric to display:
Arccosine
The standard inverse trigonometric arccosine function is supported, generating

ACOS(expression) and returning a value of type float. When dragging an Arccosine
operator into a variable, the following tree element is created:
There are no special properties for the Arccosine operator.
Arcsine
The standard inverse trigonometric arcsine function is supported, generating

ASIN(expression) and returning a value of type float. When dragging an Arcsine operator

Chapter Two
Analytic Data Sets
There are no special properties for the Arcsine operator.
Arctangent
The standard inverse trigonometric arctangent function is supported, generating

ATAN(expression) and returning a value of type float. When dragging an Arctangent
operator into a variable, the following tree element is created:
There are no special properties for the Arctangent operator.
Arctangent XY
The standard inverse trigonometric arctangent function for x and y coordinates is

supported, generating ATAN2(x, y) and returning a value of type float. When dragging
an Arctangent XY operator into a variable, the following tree element is created:
There are no special properties for the Arctangent XY operator.
Cosine
The standard trigonometric cosine function is supported, generating COS(expression) and

returning a value of type float. When dragging a Cosine operator into a variable, the

Chapter Two
Analytic Data Sets
There are no special properties for the Cosine operator.
Hyperbolic Arccosine
The standard inverse hyperbolic cosine function is supported, generating

ACOSH(expression) and returning a value of type float. When dragging a Hyperbolic
Arccosine operator into a variable, the following tree element is created:
There are no special properties for the Hyperbolic Arccosine operator.
Hyperbolic Arcsine
The standard inverse hyperbolic sine function is supported, generating

ASINH(expression) and returning a value of type float. When dragging a Hyperbolic
Arcsine operator into a variable, the following tree element is created:
There are no special properties for the Hyperbolic Arcsine operator.
Hyperbolic Arctangent
The standard inverse hyperbolic tangent function is supported, generating

ATANH(expression) and returning a value of type float. When dragging a Hyperbolic
Arctangent operator into a variable, the following tree element is created:
There are no special properties for the Hyperbolic Arctangent operator.
Hyperbolic Cosine

Chapter Two
Analytic Data Sets
The standard hyperbolic cosine function is supported, generating COSH(expression) and

returning a value of type float. When dragging a Hyperbolic Cosine operator into a
There are no special properties for the Hyperbolic Cosine operator.
Hyperbolic Sine
The standard hyperbolic sine function is supported, generating SINH(expression) and

returning a value of type float. When dragging a Hyperbolic Sine operator into a variable,
There are no special properties for the Hyperbolic Sine operator.
Hyperbolic Tangent
The standard hyperbolic tangent function is supported, generating TANH(expression)

and returning a value of type float. When dragging a Hyperbolic Tangent operator into a
There are no special properties for the Hyperbolic Tangent operator.
Sine
The standard trigonometric sine function is supported, generating SIN(expression) and

returning a value of type float. When dragging a Sine operator into a variable, the

Chapter Two
Analytic Data Sets
There are no special properties for the Sine operator.
Tangent
The standard trigonometric tangent function is supported, generating TAN(expression)

and returning a value of type float. When dragging a Tangent operator into a variable, the
There are no special properties for the Tangent operator.
Other
Several Teradata functions/operators do not fall into a category as those outlined below. The
Other category holds the following functions and operators. Double click on Other to view
those operators/functions:

Chapter Two
Analytic Data Sets
Asterisk
The SQL Asterisk character (*) may be specified by the user as the argument to a Count
aggregate or Windowed Count ordered analytical function. It represents the fact that all
rows should be counted, not just those with non-null values in a particular column. When
dragging an Asterisk operator into a variable, the following tree element is created:
The SQL Asterisk character (*) is valid within a COUNT aggregate and windowed
aggregate function. There are no special properties for the Asterisk operator.
Bytes
The non-standard Bytes function is supported for determining the length of variable byte
data. (When used with fixed length byte data, the defined column length is always
returned.) When dragging a Bytes operator into a variable, the following tree element is
created:

Chapter Two
Analytic Data Sets
A byte column and/or expression for which to get the length can be moved into the
When used in conjunction with the Trim function, the Bytes function can also be used to
determine the length of fixed length byte data by first trimming null-byte characters, as in
the following.
Cast Function
The standard Cast function is supported, generating SQL of the form CAST (expression
AS data type). The following data types are supported:
BYTEINT
SMALLINT
INTEGER
DECIMAL(m, n)
FLOAT
CHAR(n)
VARCHAR(n)
DATE
TIME(n)
TIMESTAMP(n)
Note that character set and case specific options may not be specified with CHAR and
VARCHAR types. When dragging a Cast operator into a variable, the following tree element
is created:
data types to cast to are specified in the Properties panel. Double click on Cast, or
highlight it and click on the Properties button:

Chapter Two
Analytic Data Sets
Valid data types as listed above, are available in the pull-down.
F(x)
An arithmetic formula of one argument ‘x’ may be entered using the F(x) SQL element.
This element will result in the appropriate SQL for the formula being generated after
replacing the argument ‘x’ in the formula with the SQL for the empty branch of the tree.
A Column or other expression can be moved into the (empty) branch of the tree
representing the argument ‘x’. The formula to generate SQL for is specified in the
Properties panel. Double click on F(x) or highlight it and click on the Properties button:

Chapter Two
Analytic Data Sets
In the example above the formula “x (x – 1) / 2” is entered. Note that a multiply operator
‘*’ is implied between the first ‘x’ and the left parenthesis ‘(‘.
The following rules apply to arithmetic formulas entered in the formula SQL elements.
• Numbers begin with a digit (‘0’ to ‘9’) and may be in integer, decimal or
scientific formats according to client locale settings.
• Whitespace characters are ignored.
• Whenever a number, argument or right parenthesis is followed by an argument or

left parentheis, an implied multiply operator is automatically inserted in the
generated SQL.
• The same operator precedence observed in Teradata SQL is observed in the

formula. The operators in decreasing order of precedence are given below.
o Unary plus ‘+’ and minus ‘-‘

o Exponentiate ‘**’
o Multiply ‘*’, divide ‘/’ and modulo ‘%’
o Add ‘+’ and subtract ‘-‘
• If a function other than the allowed arithmetic functions is required, such as an

aggregate function, it must be entered as an argument and referred to referred to
in the formula with its argument name, such as ‘x’.
• Formulas may be nested by specifying a formula as the argument of another

formula.
F(x,y)

Chapter Two
Analytic Data Sets
An arithmetic formula of two arguments, ‘x’ and ‘y’, may be entered using the F(x,y)
SQL element. This element will result in the appropriate SQL for the formula being
generated after replacing the arguments ‘x’ and ‘y’ in the formula with the SQL for the
empty branches of the tree.
Columns or other expressions can be moved into the (empty) branches of the tree
representing the arguments ‘x’ and ‘y’. The formula to generate SQL for is specified in
the Properties panel. Double click on F(x,y) or highlight it and click on the Properties
button.
F(x,y,z)
An arithmetic formula of three arguments, ‘x’, ‘y’ and ‘z’, may be entered using the
F(x,y,z) SQL element. This element will result in the appropriate SQL for the formula
being generated after replacing the arguments ‘x’, ‘y’ and ‘z’ in the formula with the
SQL for the empty branches of the tree.
Columns or other expressions can be moved into the (empty) branches of the tree
representing the arguments ‘x’, ‘y’ and ‘z’. The formula to generate SQL for is specified
in the Properties panel. Double click on F(x,y,z) or highlight it and click on the Properties
button.
Free-Form SQL
SQL text may be directly entered for an entire expression or into an element of an
expression as a free-format text string. This allows the use of constructs that may not
otherwise be supported in an expression (for example, a subquery in a where clause). Of
course, in using this feature, care should be taken to create a valid expression, since
validation is not performed on the SQL within the free-format text string. When dragging
a Free-Form SQL operator into a variable, the following tree element is created:

Chapter Two
Analytic Data Sets
Double click on Free-Form SQL, or highlight it and click on the Properties button:
Enter a valid SQL expression within the SQL text area.
Variable Reference
A variable defined in a Variable Creation analysis may reference another variable defined
in the same analysis, provided the referenced variable does not contain dimensions. It is
also not possible to reference a variable that results from having a dimension applied to a
variable. Referencing a variable can be particularly useful when the referenced variable is
used merely as an intermediate calculation. The SQL generated consists simply of the
name assigned to the referenced variable.
(When referencing a variable with the same name as an input column, a runtime error
will occur if a column with this name occurs in more than one table being accessed
("Column '<name>' is ambiguous"). If aggregation is being performed in another
variable the error "Selected non-aggregate values must be part of the associated group"
may occur. In these cases it is recommended to rename the referenced variable.)
When dragging a Variable Reference operator into a variable, the following tree element
is created:

Chapter Two
Analytic Data Sets
The variable to reference is specified in the Properties panel. Double click on Variable
Reference, or highlight it and click on the Properties button:
Select the variable to reference in the Variable pull-down.
Parentheses
In some cases, you may wish to explicitly request an expression be enclosed within
beginning ‘(‘ and ending ‘)’ parentheses. The Variable Creation analysis attempts when it
can to provide the correct nesting of parentheses, so this is offered for specialized cases.
Using the explicit Parentheses function results in an expression being parenthesized, as
in: (expression). When dragging a Parentheses into a variable, the following tree element
is created:
There are no special properties for the Parentheses function.
Variable Creation - INPUT - Variables - Dimensions
On the Variable Creation dialog, click on INPUT and then click on variables on the upper
tabs. Click on Dimensions on the large tab in the center of the panel.

Chapter Two
Analytic Data Sets
within the Dimensions panel.
Expand All Nodes

Collapse All Nodes
This option applies only when a SQL Column is highlighted in the Dimensions panel.
When this option is selected, the selectors on the left side of the input screen are
adjusted to match the table or analysis that contain the selected SQL Column. (The
column is also selected.)
This option applies only when a SQL Column is highlighted in the Dimensions panel.
When this option is selected, the selectors on the left side of the input screen are used
to change the input table or analysis of the selected SQL Column. A pop-up menu is
displayed to allow changing the input for this column only or for all occurrences. For
a single column, a column with the same name must occur in the new (currently
selected) input table or analysis or an error is given. When all columns are changed,

Chapter Two
Analytic Data Sets
the new table or analysis must contain all the columns or an error is given and no
changes are made.
Apply Dimensions to Variables
This option jumps to the upper dimensions tab so that dimensions can be applied to
variables.
Selection Options
Input Source
Select Table to input from a table or view, or select Analysis to select directly from
the output of a qualifying analysis in the same project. (Selecting Analysis will
cause the referenced analysis to be executed before this analysis whenever this
analysis is run. It will also cause the referenced analysis to create a volatile table if
the Output option of the referenced analysis is Select.)
Databases
All databases which are available for the Variable Creation analysis.
Tables
All tables within the Source Database which are available for the Variable Creation
analysis.
Columns
All columns within the selected table which are available for the Variable Creation
analysis.
Creating Dimensions From Columns
The Dimension Values to be created are specified one at a time as most every type of SQL
expression. One way to create a new dimension value is to click on the New button,
producing the following within the Dimensions tab:
Another way to create one or more new dimension values is to drag and drop one or more
columns from the Columns panel to the empty space at the bottom of the Dimensions panel
(multiple columns may be dragged and dropped at the same time).
One alternative to dragging and dropping a column is to use the right arrow selection button
to move over one column at a time as a new dimension value. Another alternative is to
double-click on the column. If the right arrow button is clicked repeatedly, or the column is
double-clicked repeatedly, a range of columns may be used to create new dimension values,
since the selected column increments each time the arrow is clicked. (It should be noted that
when a column or column value is selected, the right arrow selection button will only be
highlighted if a SQL Element is not selected. This can be ensured if the right-click option to
Collapse All Nodes is utilized in the SQL Element view.)

Chapter Two
Analytic Data Sets
dimension value based on a column looks something like the following (after expanding the
node).
Creating Dimensions From SQL Elements
Still another way to create a new dimension value is to drag and drop a single SQL element
from the SQL Elements panel to the empty space at the bottom of the Dimensions panel, or to
drag and drop one or more column values displayed by selecting the Values button. In the
case of column values, a dimension value containing a single SQL Numeric Literal, String
Literal or Date Literal is created as appropriate for each column value. (This technique saves
having to edit the properties of a numeric, string or date literal to set the desired value.)
As with creating dimension values from selected columns, use of the right arrow selection
button or double-clicking the desired SQL element or column value provides an alternative to
dragging and dropping an element or value. Note however that repeated selection of a SQL
element does not advance the selected element so the result is multiple dimension values
containing the same SQL element. (Note also that when a SQL element is selected, the right
arrow selection button will only be highlighted if neither a column or a column value is
selected in its respective view.)
When a SQL element is placed on top of another element on the Dimensions panel, whether
by dragging and dropping it, selecting it with the right arrow or by double-clicking it, the new
element is typically inserted into the expression tree at that point. The element replaced is
then typically moved to an empty operand of the new SQL element.
dimension value based on a SQL element looks something like the following example
involving the Equals element:
Copying or Moving a Dimension
It is possible to create a copy of a dimension value by holding down the Control key on the
keyboard while dragging the dimension value to another location in the Dimensions panel.
The copy can be placed ahead of another dimension value by dropping it on that dimension
value, or at the end of the list of dimension values by dropping it on the empty space at the
bottom of the Dimensions panel. It is also possible to copy a dimension value in the same

Chapter Two
Analytic Data Sets
manner from another analysis by viewing the other analysis at the same time and dragging the
dimension value from one analysis to the other.
If the Control key is not held down while performing the copy operation just described within
the same analysis, the dimension value is moved form one place to the other, i.e. deleted from
its old location and copied to the new one. There are two exceptions to this. First, this is not
the case when copying a dimension from one analysis to another, in which case a copy
operation is always performed, with or without holding down the Control key. The second
exception is when moving one child node on top of another child node of the same parent in
the expression tree that defines a dimension. In this case, the two nodes or sub-expressions
are switched. (For example, if income and age are added together and age is moved on top of
income, the result is to add age and income, reversing the operands.)
Replicating a Dimension
It is possible to create multiple varied copies of a dimension value by dropping or selecting

mupltiple columns or values onto a component of a dimension value that is not a folder, that
is a component that is designed to hold only a single element. For example, after selecting
the New button, if 10 columns were dragged and dropped onto the empty node underneath
the new dimension value, the entire dimension value would be replicated 10 times, each copy
containing a different column and named with the original dimension value name appended
with a number between 1 and 10.
Dimension Tool-Tip
Information about a dimension value may be viewed by holding the mouse pointer over it.
Deleting All Dimensions
All dimension values can be deleted from the analysis by selecting the double-back-arrow
button in the center of the Variable Creation window. When this function is requested, one or
more warnings will be given. The first warning indicates how many dimension values are
about to be deleted. The second possible warning is given if the number of dimension values
being deleted exceeds 100, the maximum number of operations that can be undone or redone
using the Undo or Redo buttons. (If this warning is given and the Undo button is then
selected, only the first 100 dimension values will be restored. These are actually the last 100
deleted, since they are deleted in reverse order.) A third possible warning is given if any of
the dimension values about to be deleted has been applied to a variable on the dimensions
screen. If the choice to continue is made, all associations between variables and dimensions
being deleted will be removed. (Note that this part of the operation cannot be "undone"; it is
unaffected by the Undo button.)
Buttons
Wizard Button
When the Dimensions panel is selected, the Wizard button can be used to generate
dimension values, When Conditions for Searched Case statements or conditional
expressions for And All or Or All statements. To generate dimension values, highlight
any dimension value or ensure that no value is highlighted when the Wizard button is
selected. Otherwise, highlight the desired Case Conditions folder under a Case -

Chapter Two
Analytic Data Sets
Searched node or the Expressions folder under an And All or Or All node and select the
Wizard button.
The maximum number of dimensions or values that can be generated by a single

application of the wizard is limited to 1000.
The following dialog is given when generating dimension values. (Instructions at the top
change and a subset of fields is shown when not generating dimension values.)
Dimension Prefix
names of the resulting dimension values consist of the prefix followed by underscore
and the selected value. Otherwise the dimension value name is the prefix followed
by a number.
Description
description of the resulting dimension values consist of the description specified here
followed by the operator and selected value. Otherwise the description is the
description entered here.
Left Side Column/Expression

Replace the "(empty)" node with a SQL Column or more complex expression
involving a SQL Column.

Chapter Two
Analytic Data Sets
Operator
Select a comparison operator such as Equals or select Between, Not Between, In, Not
In, Is Null or Is Not Null as the operator to use. If Between or Not Between is
selected, a dimension value or condition is generated for each pair of requested
values. If In or Not In is selected, the Wizard will generate a single dimension value
or condition based on all requested values when 'OK' or 'Apply' is clicked. If Is Null
or Is Not Null is selected, the Wizard will generate a single dimension value or
condition based on no values. Otherwise, if a comparison operator such as Equal is
selected, the Wizard will generate a dimension value or condition for each requested
value.
Else Value
Select either Else Null or Else Zero to indicate the value to use when the condition is
not met.
Right Side Values
Values
This tab accepts values displayed by selecting the Values button for input
columns on the left side of the input screen. Values can be drag-dropped onto
this panel or selected with the right-arrow button. They can be numeric, string or
date type values.
Note that when values are displayed on the left side of the input screen, the
ellipses button (the one displaying ‘…’) may be used to Select All Values.
Range
This tab can be used to generate a range of integer or decimal numeric values
based on a From, To and By field. If desired, the values can be generated in
descending order by making the From value greater than the To value, so that the
By value should always be positive. If the By field is not specified, an
incremental value of 1 is assumed. (Note that a value displayed with the Values
button may be drag-dropped into this field. Note also that the escape key will
revert to the last value entered in this field.)
When the Between or Not Between operator has been specified, the Range fields
behanve somewhat differently and may be used only to specify a single pair of
values using the From and To field, with the From field validated to be less than
or equal to the To field. The By field may not be specified when the Between or
Not Between operator has been specified.
List
A list of numeric, string or date type values can be entered here, separated by
commas (actually, by the standard list separator for the current locale settings).
(Note that a value displayed with the Values button may be drag-dropped into
this field. Note also that the escape key will revert to the last value entered in
this field.)
Clear All
This button will clear all of the fields of this dialog.

Chapter Two
Analytic Data Sets
OK
This button will generate the requested dimension values or conditions and return to
the Dimensions panel.
Cancel
This button returns to the Dimensions panel without generating any elements.
Apply
This button will generate the requested dimension values or conditions and remain on
this panel. A status message is displayed just above this button reporting on the
number of generated conditions.
Combine Button
When the Dimensions panel is selected and dimensions are defined, the Combine button
can be used to generate combined dimension values based on existing dimension values.
When dimensions are combined their conditions are joined with either an SQL ‘AND’ or
‘OR’ operator.
Dimension Values:
These are the dimension values from the Dimensions panel plus any dimensions
already combined using the ‘Apply’ button (thus becoming candidates for re-
combining). (Information about a dimension value may be viewed by holding the
mouse pointer over it.)

Chapter Two
Analytic Data Sets
Dimensions to Combine:
Using the upper and lower sets of right and left arrow buttons, dimensions may be
selected or de-selected for combining. The AND/OR radio buttons may be selected
to determine the method of combining the conditions represented by the dimension
values. The double left arrow buttons to the right of these panels move combined
dimensions back into the panels in preparation for re-combining.
Combined Dimensions:
The single right and left arrow buttons next to this panel cause the dimensions to be
combined and added to the combined dimensions list, or removed from the list,
respectively. (If the name of any combined dimension is too long, a warning
message is given in the lower left corner of the dialog.) The double left arrow
buttons to the left of this panel move combined dimensions back into the
“Dimensions to Combine” panels in preparation for re-combining. (Thus it is
possible to build up combined dimensions without making dimension values out of
the intermediate results.)
Clear All
This button will clear all of the fields of this dialog except the Dimension Values in
the leftmost panel.
OK
This button will generate the dimensions defined in the Combined Dimensions panel
and return to the Dimensions panel.
Cancel
This button returns to the Dimensions panel without generating any elements.
Apply
This button will generate the dimensions defined in the Combined Dimensions panel
and remain on this panel. A status message is displayed in the lower left corner of
the dialog reporting on the number of generated combined dimensions.
Delete Button
The Delete button can be used to delete any node within the Dimensions tree. If
applicable, the tree will roll-up children, but in some cases, a delete may remove all
children.
SQL Button
The SQL button can be used to dynamically display the SQL for any node within the
Dimensions tree. If the resulting display is not closed, the expression changes as you
click on the different levels of the tree comprising a dimension value. An option is
provided in the display to Qualify column names, that is to precede each column name in
the display with its database and table name.
Properties Button
A number of properties are available when defining a dimension value to be created, as
outlined below. Click the Properties button when the dimension value is highlighted, or
double click on the variable to bring up the Properties dialogue:

Chapter Two
Analytic Data Sets
Name:
A name must be specified for each dimension value.
(Tip: Variables can be named by single left-clicking on the name, which produces a
box around the name, as in Windows Explorer)
Else Condition:
Dimension values are applied to a variable via a CASE construct. By default, the
ELSE condition within the CASE construct is NULL. Here, you can specify a 0 be
used instead.
Description:
An optional description may be specified for each dimension value. (Note that a
default description is generated automatically by the Wizard if its Description field
contains a value, and also by the Combine Dimensions dialog based on individual
descriptions or dimension names.)
Undo Button
The Undo button can be used to undo changes made to the Dimensions panel. Note that
if a number of dimension values are added at one time, each addition requires a separate
undo request to reverse. Up to 100 undo requests can be processed.
(Note also that if a change to a dimension value is undone, and that dimension value is
currently applied to a variable on the dimensions panel, the applied dimension will not
change as a result of the Undo operation.)
Redo Button

Chapter Two
Analytic Data Sets
The Redo button can be used to reinstate a change previously undone with the Undo
button.
Question-Mark Help Button

The Question-Mark Help button can be used to request help information about a specific
SQL element by first clicking on the question-mark and then on the SQL element in the
SQL Elements panel or Dimensions panel.
SQL Elements
The same SQL Elements are supported when creating dimension value as when creating
variables, with the following exceptions:
Aggregations
Aggregations can not be used for creating dimension values.

Ordered Analytical Functions can not be used for creating dimension values..
Variable Creation - INPUT - dimensions

On this panel, dimension values from this or other Variable Creation analyses in the Project
window are applied to variables in this analysis. The panel is shown below:

Chapter Two
Analytic Data Sets
The following right-click menu options are offered for the Available Variables panel.
Expand All Nodes

Collapse All Nodes
Switch Input To
This option applies only when a dimension value is highlighted in the Available
Variables panel. When this option is selected, the Available Analyses selector on
the left side of the screen is adjusted to match the analysis that the selected dimension
value comes from.
Add Dimension(s) to this Analysis
If the dimensions selected are not currently members of this analysis they can be
added to this analysis with this option. (This option could be useful if a dimension
from another analysis is applied to a variable in this analysis and it becomes
necessary to change it.)
Remove Dimension(s) from this Variable
The selection of this option removes all dimension values from the selected variable.
(Dimension values are not removed from the analysis in which they are defined but
rather their association with this variable is removed.)
The options on the INPUT - Dimensions panel are described below.
Available Analyses
Any Variable Creation analysis in any currently loaded project may be selected to
make its dimension values available for selection. Initially, the current analysis is
selected.
Available Dimensions
The available dimensions are the dimension values defined on the Dimensions tab of
the Input – Variables panel in the selected Variable Creation analysis. (Information
about a dimension value may be viewed by holding the mouse pointer over it.)
(Note that if a dimension value comes from an analysis in another project, and if it
contains a reference to another analysis, it may not be applied to a variable in this
analysis, even if it is displayed here. If it is applied to an available variable an error
message will be given.)
Available Variables
The available variables are the variables defined on the Variables tab of the Input –
Variables panel. As dimension values are moved over using the right-arrow and left-
arrow buttons, or dragged and dropped from Available Dimensions, they are shown
below the variable. The resulting output column name will be the dimension name
followed by an underscore and the variable name. The description of the resulting
variable will be the description of the variable followed by connecting characters and
the description of the dimension. (If either the original variable or the dimension
value does not have a description, its name is used when forming the description of

Chapter Two
Analytic Data Sets
the resulting variable.) (Information about an applied dimension value may be

viewed by holding the mouse pointer over it.)
Variables that are referenced by another variable (using a Variable Reference SQL
element) may not have dimension values applied to them.
Variable Creation - INPUT - anchor table

On the Variable Creation dialog, click on INPUT and then click on anchor table:
Anchor Table:
Pull-down with a list of all tables used to create variables, dimensions and/or
specified in a WHERE, QUALIFY or HAVING clause. Select the table that contains
all of the key values to be included in the final data set. Physically, this can be a table
or a view residing in Teradata.
Available Anchor Columns:

All columns within the table selected as the Anchor Table.
Selected Anchor Columns:

Chapter Two
Analytic Data Sets
The columns within the Anchor Table that uniquely identify rows in the anchor table
(otherwise unpredictable results may occur when joining this table with others). By
default, the primary index columns of the selected Anchor Table are initially selected.
For a view, these must be selected manually. (Note that if the Anchor Table is the
standard CALENDAR view in the SYS_CALENDAR database, the calendar_date
column is used by default.)
Join Paths:
A list of all Join Paths, connecting the anchor table to each other table referenced in
the analysis (i.e. in a Variable, Dimension or expert clause) is given here. By right-
clicking on a Join Path the join style can be set to Left Outer Join, Inner Join, Right
Outer Join, Full Outer Join or Cross Join.
If a Cross Join is selected it results in a join path without join steps. Validation is
performed including a count of the rows in the table to be joined.
Join Steps:
A list of the Join Steps comprising the Join Path currently selected above is given
here. Each Join Step consists of two columns connected by an opereator, which
defaults to the equals operator. By right-clicking on a Join Step its operator can be
set to equals (=), not equals (<>), greater than (>), greater than or equals (>=), less
than (<) or less than or equals (<=). The join steps are connected by logical AND
operators in the generated SQL.
Note that a Join Path of style Cross Join does not contain Join Steps.
Load
To load join paths from other Variable Creation analyses in a loaded project, click on
the Load button. This causes each Variable Creation analysis in a loaded project to
be searched for missing join paths. (Missing join paths are those that have no Join
Steps, with the exception of those of style Cross Join which cannot have Join Steps.)
The first join path encountered, if any, for each missing join path is used. When the
load operation is complete, an informational message is displayed at the bottom of
the form summarizing the results of the search.
(Note that if a join path is missing when an analysis is executed, the Load operation
is performed automatically to try to correct the error.)
Wizard…
To set the join paths using the following dialog screens, click on the Wizard…
button:

Chapter Two
Analytic Data Sets
From:
Initially, this is the Anchor Table, along with a list of all columns within the
Anchor Table. If more than one table is required in the Join Path, these are
specified through subsequent clicks of the Add button. Highlight the column to
join to that specified in To below.
To:
Initially, this is the target or right-side table in the Join Path, along with a list of
all columns within that table. If the Anchor Table is not simply joined directly to
this table, it can be changed via pull-down. If more than one table is required in
the Join Path, these are specified through subsequent clicks of the Add button.
Highlight the column to join to that specified in From above.
Steps:
Clicking on the Add button populates the Steps area. Similarly, highlighting a
Step and clicking on the Remove button removes that particular step. Steps
should be entered such that the first step begins with the anchor table (on the left
side) and the last step ends with the target table for the join path (on the right
side). Additionally, the target or right side tables should be grouped together in
the list of steps and not alternate in value (that is, table1, table1, table2, not
table1, table2, table1).
The operator of a Join Step may be changed by right-clicking on the Join Step.

Chapter Two
Analytic Data Sets
Add:
Clicking on the Add button adds a Join Step built from the currently selected
columns. The operator is equals (=) by default, but may be changed by right
clicking on the Join Step.
Remove:
Clicking on the Remove button removes the currently selected Join Step.
Up/Down Arrows:
Clicking on the up or down arrow to the right of the Steps display moves the
currently selected Join Step up or down in the list.
Left/Right Arrows:
Clicking on the left arrow to the right of the From and To table selectors will
move the currently selected To table to the From selector and set the To selector
to the target or right side table for the Join Path. Clicking on the right arrow will
move the currently selected From table to the To selector and set the From
selector to the source or left side table for the Join Path.
Finish:
Clicking on the Finish button accepts all changes and returns to the anchor
panel.
Back:
Clicking on the Back button returns to the previous Join Path.
Next:
Clicking on the Next button proceeds to the next Join Path.
Cancel:
Clicking on the Cancel button discards all changes and returns to the anchor
panel.
Variable Creation - INPUT – analysis parameters

On the Variable Creation dialog, click on INPUT and then click on analysis parameters:

Chapter Two
Analytic Data Sets
Target Date:
If a Target Date was used when creating a variable, dimension, or used in a
WHERE, QUALIFY or HAVING clause, it can be set here. The default value is
the current date, and can be changed by either typing in another date, entering
month, day and year separately or by selecting a date with the standard Windows
calendar control.
Group By Style:
Group by anchor columns
Use of this option causes the anchor columns to be used as the Group By
columns when one or more variables contain an aggregate function. When this is
the case, all variables that don’t already contain an aggregate function are
automatically changed to an aggregate by adding the MIN (minimum) function.
Group by all non-aggregate columns

Use of this option provides more control over the grouping characteristics of the
request with the following effects.
• A group-by clause is generated whether or not aggregation is present.

• Every non-aggregate column is included in the group-by clause.

Chapter Two
Analytic Data Sets
• An order-by clause is generated to match the group-by clause if the

output style is select or explain.
• Aggregation is not forced on non-anchor columns, but aggregation is
forced on new-style windowed OLAP functions.
• If new windowed OLAP functions are present, positional numbers are
used in the group by clause for correct syntax.
• Old-style Teradata OLAP functions may not be used with this option.
Variable Creation - INPUT - Expert Options

On the Variable Creation dialog, click on INPUT and then click on expert options:
This screen provides nearly the same options as those provided by the Variable Creation –
INPUT – Variables screen, as described in the section of the same name. The principal
difference is that instead of Variables or Dimensions there are three fixed expert clauses that
may not be added to or deleted. Therefore, the New and Add buttons are not present. The
Wizard button can be used only to add conditions, not Searched Case statements or
Dimensions.
Variable Creation - INPUT - Expert Options- SQL Elements

Nearly the same functionality is provided for creating expert options as for creating variables.
In particular, the same SQL Elements are supported with the following exceptions:

Chapter Two
Analytic Data Sets
Aggregations
Aggregations can only be used in the Having Clause.

Ordered Analytical Function can only be used in the Qualify Clause.
Variable Creation - INPUT - Expert Options - Expert Clauses

Expert options are available with the Variable Creation function as described below. They are
created and manipulated in the form of SQL expressions in a manner similar to the way
variables are defined. A free-format string may be used as all or part of a SQL expression by
using the Free-Format SQL Text element, thus allowing constructs not otherwise supported,
such as a subquery. (Of course, in using this feature, care should be taken to create a valid
expression, since validation is not performed on the SQL within the free-format text string.)
Where Clause
An SQL WHERE clause is allowed to limit the rows processed from the input table.
Aggregation and ordered analytical (OLAP) functions are not allowed in a WHERE
clause expression. Note that if a subquery is desired it can only be specified using a Free-
Format SQL Text element.
It may be useful to note that if a WHERE clause condition is specified on the "inner"
table of a join (i.e. a table that contributes only matched rows to the results), the join is
logically equivalent to an Inner Join, regardless of whether an Outer type is specified. (In
a Left Outer Join, the left table is the "outer" table and the right table is the "inner" table.)
Having Clause
An SQL HAVING clause may be specified with the Variable Creation function if
aggregation is requested in the variable expressions. Ordered analytical (OLAP)
functions are not allowed in a HAVING clause expression.
Qualify Clause
An SQL QUALIFY clause may be specified with the Variable Creation function if
ordered analytical functions are requested in any of the variable expressions. Aggregation
functions are not allowed in a QUALIFY clause expression.
Variable Creation - OUTPUT - storage

On the Variable Creation dialog, click on OUTPUT and then click on storage:

Chapter Two
Analytic Data Sets
Use the Teradata EXPLAIN feature to display the execution plan for this analysis
Execution Plan.

analysis. Once enabled, the next three fields must be specified:
Database Name
or View will be created. By default, this is the “Result Database.”
Output Name
Output Type
selected

Chapter Two
Analytic Data Sets
Variable Creation - OUTPUT - Primary Index

On the Variable Creation dialog click on OUTPUT and then click on primary index:
On this screen, select the columns which comprise the primary index of the output table:
Available Columns
is used.
Create the index using the UNIQUE keyword
When selected, a Unique Primary Index will be created on the table. Otherwise a
Primary Index will be created by default.
Run the Variable Creation Analysis

After setting INPUT and OUTPUT parameters as described above, you are ready to run the
analysis. To run the analysis you can either:

Chapter Two
Analytic Data Sets

Results - Variable Creation

The results of running a Variable Creation analysis include the generated SQL itself, the
results of executing the generated SQL, and, if selected, a Teradata table (or view). All these
results are outlined below.
Variable Creation - RESULTS - Data

On the Variable Creation dialog, click on RESULTS and then click on data (note that the
RESULTS tab will be grayed-out/disabled until after the analysis is completed):
The results of the completed query are returned in this Data Viewer page. This page has the
properties of the Data page discussed in the Chapter on Using the Teradata Warehouse Miner
Graphical User Interface. With the exception of the Explain Select Result Option, these
results will match the tables described below in the Output Column Definition section,
depending upon the parameters chosen for the analysis.
Variable Creation - RESULTS - SQL

On the Variable Creation dialog, click on RESULTS and then click on SQL (note that the
Tutorial – Variable Creation
Variable Creation - Example #1
Parameterize a Variable Creation Analysis as follows:

Chapter Two
Analytic Data Sets
1. Select TWM_CUSTOMER as the Available Table.
2. Create seven variables by double-clicking on the following columns. (Note that the
variable name will default to the column name.)
• TWM_CUSTOMER.cust_id
• TWM_CUSTOMER.income
• TWM_CUSTOMER.age
• TWM_CUSTOMER.years_with_bank
• TWM_CUSTOMER.nbr_children
• TWM_CUSTOMER.gender
• TWM_CUSTOMER.marital_status
3. Select TWM_CREDIT_TRAN as the Available Table.
4. Create a variable by clicking on the New button and build up an expression as

follows.
5. Drag an Add (Arithmetic) SQL Element over the Variable, and then drag the
following two columns over the empty arguments:
• TWM_CREDIT_TRAN.interest_amt
• TWM_CREDIT_TRAN.principal_amt
6. Because there may be negative values, drag and drop an Absolute Value (Arithmetic)
SQL Element over both interest_amt and principal_amt:
7. Take the average of this expression, by dragging and dropping an Average

(Aggregation) on top of the Add:

Chapter Two
Analytic Data Sets
8. Because this analysis may generate many NULL values by joining

TWM_CUSTOMER to TWM_CREDIT_TRAN, drag a Coalesce (Case) on top of
the Average:
9. Drag and drop a Number (Literal) 0 into the expressions folder and rename it from
Variable1 to avg_cc_tran_amt to complete the variable:
10. Go to INPUT-anchor table and select TWM_CUSTOMER as the anchor table.
11. Specify the Join Path from TWM_CUSTOMER to TWM_CREDIT_TRAN by

clicking on the Wizard button and specifying that they be joined on the column
"cust_id".
12. Go to OUTPUT-storage, and select Store the tabular output of this analysis in the
database. Specify that a Table should be created named twm_tutorials_vc1.
For this example, the Variable Creation Analysis generated the following results. Note that the
SQL is not shown for brevity:
Data
Only the first 10 rows after sorting are shown.
cust_id income age years nbr gender mar avg_cc_tran_amt

1362480 50890 33 3 2 M 2 264.17
1362481 20855 36 6 2 F 2 0
1362484 10053 42 2 0 F 1 182.57
1362485 22690 25 4 0 F 1 175.40
1362486 10701 76 6 0 F 3 0
1362487 6605 71 1 0 M 2 149.16
1362488 7083 77 7 0 F 2 0
1362489 55888 35 5 2 F 3 397.07
1362492 40252 40 0 5 F 3 214.05
1362496 0 13 2 0 M 1 0

Chapter Two
Analytic Data Sets
Variable Creation - Example #2
1. Select TWM_CUSTOMER as the Available Table

2. Create a variable by clicking on the New button and drag and drop the following
columns. Note the variable name will default to the column name
3. Create a variable by clicking on the New button and drag and drop the SQL Element
Maximum (Aggregation) on to the empty argument in the variable.
4. Drag and drop a Number (Literal) on to the empty argument in the Maximum, and
rename the variable acct:
5. Select TWM_ACCOUNTS as the Available Table

• TWM_ACCOUNTS.ending_balance
7. Drag and drop an Average (Aggregation) SQL Element over ending_balance, and
rename the variable bal:
8. Select TWM_TRANSACTIONS as the Available Table

• TWM_TRANSACTIONS.tran_id
10. Drag and drop an Count (Aggregation) SQL Element over tran_id, and rename the
variable nbr_trans:
11. Select TWM_ACCOUNTS as the Available Table
12. Go to INPUT-variables-Dimensions and click on the New button three times to create
three dimension values. Drag TWM_ACCOUNTS.acct_type to each of the three
dimension values.
13. Drag and drop an Equals (Comparison) SQL Element on top of each instance of
acct_type in the three dimensions.

Chapter Two
Analytic Data Sets
14. Drag and drop a String (Literal) SQL Element into the second argument of the
Equals. Specify a string value of CC, CK, SV for each of the three dimensions by
double-clicking on String, and entering the values. Rename each dimension value
CC, CK, and SV accordingly:
Change the Properties of the dimensions CC, CK, SV, modifying the Else condition
to ELSE ZERO from ELSE NULL
15. Select TWM_TRANSACTIONS as the Available Table
16. Go to INPUT-variables-Dimensions and click on the New button four times to create
four dimension values. Drag TWM_ACCOUNTS.tran_date to each of the four
dimension values.
17. Drag and drop a Quarter of Year (Calendar) SQL Element on top of each instance of
tran_date in the four dimension values
18. Drag and drop an Equals (Comparison) SQL Element on top of each Quarter of Year
instance in the four dimension values.
19. Drag and drop a Number (Literal) SQL Element into the second argument of the
Equals. Specify a number of 1-4 for each of the four dimension values by double-
clicking on Number, and entering the values. Rename each dimension value Q1-Q4
accordingly:

Chapter Two
Analytic Data Sets
20. Go to INPUT-dimensions and apply the dimension values to the variables as follows:
• acct – CK, CC, SV
• bal – CK, CC, SV
• nbr_trans – Q1, Q2, Q3, Q4:
21. Go to INPUT-anchor table and select TWM_CUSTOMER as the anchor table.
22. Specify the Join Paths from TWM_CUSTOMER to each of the following by
selecting a table in the Join Path from Anchor Table To: and clicking on the
Wizard button. Specify the following Join Paths:
a.) From Anchor Table (TWM_CUSTOMER) to TWM_ACCOUNTS
TWM_CUSTOMER.cust_id --> TWM_ACCOUNTS.cust_id

Chapter Two
Analytic Data Sets
b.) From Anchor Table (TWM_CUSTOMER) to TWM_TRANSACTIONS
TWM_CUSTOMER.cust_id --> TWM_ACCOUNTS.cust_id
TWM_ACCOUNTS.acct_nbr --> TWM_TRANSACTIONS.acct_nbr
For this example, the Variable Creation Analysis generated the following results. Once again, the
SQL is not shown:
Data
cust_id CK_acct SV_acct CC_acct CK_bal SV_bal CC_bal Q1_nbr Q2_nbr Q3_nbr Q4_nbr
1362480 1.00 1.00 1.00 54.77 196.73 4.08 113 17 17 10
1362481 0.00 0.00 0.00 0.00 0.00 0.00 0 0 0 0
1362484 1.00 1.00 1.00 50.46 374.50 108.74 113 27 20 27
1362485 1.00 0.00 1.00 26.34 0.00 463.16 13 18 50 90
1362486 1.00 1.00 0.00 1656.14 58.12 0.00 12 10 13 15
1362487 1.00 1.00 1.00 707.41 2.38 481.00 17 25 25 36
1362488 1.00 0.00 0.00 122.42 0.00 0.00 26 39 27 7
1362489 1.00 1.00 1.00 79.60 52.69 4.49 56 51 44 5
1362492 1.00 0.00 1.00 443.84 0.00 476.92 3 42 64 21
1362496 0.00 1.00 0.00 0.00 251.06 0.00 3 3 3 3

Chapter Two
Analytic Data Sets
Variable Transformation
Introduction
One aspect of creating an analytic data set to be used as input to a data mining algorithm is the
transformation of variables into a format useful to the algorithm. In general, transformations that
are reasonably performed as part of SQL expressions have been included in the Variable Creation
function, whereas transformations that require a more elaborate SQL structure are provided in the
Variable Transformation function. Specifically, transformations in the Variable Transformation
function may require calculating global aggregates or more complex measures in derived tables,
or may include a separate null replacement transformation as a preprocessing step using a
preliminary volatile table. Variable Transformation is however limited to operating on a single
input table.
The Variable Transformation function makes it possible to specify at one time any mixture of
transformations for any number of columns in a single input table. The user may also specify that
columns from the input table be retained unchanged, or retained with a different name and/or
type. The result is a new table or view based on the same or transformed columns from the input
table.
The Variable Transformation functions include:
Bin Code Retain

Derive Null Replacement
Design Code Sigmoid
Recode Z Score
Rescale
In order to use the Variable Transformation analysis, the user selects a single input table and
then, on a column by column basis, selects what transformation or action they want to perform, if
any. The user may choose any of the offered transformations and/or a simple copy or Retain
operation. That is, they may choose to include any input table column, as is or with a different
name or type, in the output table, whether or not they choose to transform it. By default, the result
column name is the same as the input column name, unless multiple result columns may result (as
with the design coding transformation). If a specific type is specified, it results in casting the
retained column or transformed column.
Anchor columns are included automatically in the result table, so they should not be included as
retained columns. Note that it is the user’s responsibility to insure that result column names do
not conflict with each other.
The user may also specify that a null transformation be performed in a preprocessing step prior to
the requested transformation. In this case the null transformation is produced in a volatile table
that is then automatically referenced by the generated SQL, both by the transformation SQL and
by any derived aggregates the transformation may require.
It is possible that the user may specify more transformations than can be performed in a single
analysis. This can happen either because the maximum number of columns allowed by Teradata
is exceeded (256 in V2R4.1 and 2048 in V2R5), or because the generated SQL is simply too large
or complex. If this sort of failure occurs, the user must split up the transformations into multiple

Chapter Two
Analytic Data Sets
analyses and either add a join step or rely on the Build Data Set analysis to join the output tables
together.
Bin Code
Bin Coding is useful when it is desired to replace a continuous numeric column with a categorical
one. Bin coding produces ordinal values, i.e. numeric categorical values where order is
meaningful. It uses the same techniques used in Histogram analysis, allowing the user to choose
between equal-width bins, equal-width bins with a user specified minimum and maximum range,
bins with a user specified width, evenly distributed bins, or bins with user-specified boundaries as
follows.
If the minimum and maximum are specified, all values less than the minimum are put in to “bin
0,” while all values greater than the maximum are put in to “bin N+1.” The same is true when the
boundary option is specified.
Derive
Derive allows you to enter simple expressions based upon columns within a table. For example, if
you know that all values are positive or zero, the Derive Analysis can be used to add one to the
column and take the natural logarithm of it. The Derive expression may be specified in a
structured way as in the Variable Creation function, and may include any functions or operators
supported by the Variable Creation function except a reference to another variable. It may also
include free-formatted SQL text in all or part of an expression, making it possible to use
constructs not supported by the expression builder. Of course, care should be taken in using this
feature to create a valid expression, since validation is not performed on the SQL within the free-
format text string.
Special handling is given to aggregation functions if they appear in the user-defined expression.
Any requested aggregation function is computed over the entire input table (limited of course by
the where clause if specified as an expert option) in one of the global aggregate derived tables
shared by the other transformation functions. The aggregation is then treated as a constant in the
user-defined expression. And although the user-defined expression may include ordered
analytical functions, it may not include an aggregate within an ordered analytical function.
Design Code
Design coding is useful when a categorical data element must be re-expressed as one or more
meaningful numeric data elements. Many classes of analytical algorithms from the statistical and
artificial intelligence communities require variables, inputs, or outputs to be numeric and
numerically meaningful. It does this, roughly speaking, by creating a binary numeric field for
each categorical data value. Design coding is offered in two forms, one known as dummy-coding
and the other as contrast-coding. A “Values” function is provided to select the possible values
from the input table.
In “dummy-coding”, a new column is produced for each listed value, with a value of 0 or 1
depending on whether that value is assumed by the original column. Alternately, given a list of
values to “contrast-code” along with a “reference value”, a new column is produced for each
listed value, with a value of 0 or 1 depending on whether that value is assumed by the original
column, or a value of –1 if that original value is equal to the reference value.
When using “Dummy Coding,” if a column assumes n values, new columns may be created for
all n values, (or for only n-1 values, because the nth column will be perfectly correlated with the

Chapter Two
Analytic Data Sets
first n-1 columns). When using “Contrast Coding”, only n-1 or fewer new columns may be
created from a categorical column with n values.
Recode
Recoding a categorical data column is most often done to “re-express” existing values of a
column (variable) into some new “coding scheme”. Additionally, it is also done to correct data
quality problems and to focus an analysis on a particular value. It allows for mapping individual
values, NULL values or any number of remaining values (ELSE option) to a new value, a NULL
value or the same value. A “Values” function is provided to select the possible values from the
input table.
Rescale
Rescaling limits the upper and/or lower boundaries of the data in a continuous numeric column
using a linear rescaling function based on maximum and/or minimum data values. It may be
useful with algorithms that require or work better with data within a certain range. Rescale is only
valid on numeric columns, and not columns of type date.
The user may supply new minimum and maximum values (lower, upper) to form new variable
boundaries. If only the lower boundary is supplied, the variable is aligned to this value; or if only
an upper boundary value is specified, the variable is aligned to that value. If a requested column
has a constant value (max and min are the same), then the transformation will fail with an SQL
error.
The rescale transformation formulas can be thought of as:
l + (x − min (x )) ⋅ (r − l )
ƒ(x, l, r) = (i.e. if both lower and upper specified)
max (x ) − min (x )
ƒ(x, l) = x − min (x ) + l (i.e. only lower specified)
ƒ(x,r) = x − max (x ) + r (i.e. only upper specified)
Retain
The retain option allows you to copy a column as is, along with any transformed columns into the
final analytic data set. When using this option, they may choose to include any input table
column, as is or with a different name or type, in the output table, without actually “transforming”
it. By default, the result column name is the same as the input column name. If a specific type is
specified, it results in casting the retained column.
Null Replacement
NULL value replacement is offered as a transformation function. A literal value, the mean,
median, mode or an imputed value joined from another table can be used as the replacement
value. The median value can be requested with or without averaging of two middle values when
there is an even number of values. The replacement value can also be the analytic data set’s target
date value. Literal value replacement is supported for numeric, character and date data types.
Mean value replacement is supported for columns of numeric type or date type, with special
coding required for date type. Median without averaging, mode and imputed value replacement
are valid for any supported type, with distinct SQL generated for computing the median value of

Chapter Two
Analytic Data Sets
numeric, date and other type columns. Median with averaging is however supported only for
numeric and date type columns.
Sigmoid
A Sigmoid transformation provides rescaling of continuous numeric data in a more sophisticated
way than the Rescaling transformation function. In a Sigmoid transformation a numeric column is
transformed using a type of sigmoid or s-shaped function. One of these, called a logit function,
produces a continuously increasing value between 0 and 1. Another called the modified logit
function, is twice the logit minus 1 and produces a value between –1 and 1. A third, called the
hyperbolic tangent function, also produces a value between –1 and 1. (Note that the logit function
is the same as the function previously called the sigmoid function, and the hyperbolic tangent
function is the same as the math function of the same name.) These non-linear transformations
are generally more useful in data mining than a linear Rescaling transformation.
The logit value is calculated as:
1
ƒ(x) =
1 + e−x
The modified logit value is calculated as:
⎡ 1 ⎤
ƒ(x) = 2 ∗ ⎢ −x ⎥
−1
⎣1 + e ⎦
which is equivalent to:
⎡1 − e − x ⎤
ƒ(x) = ⎢ −x ⎥
⎣1 + e ⎦
The hyperbolic tangent value is calculated as:
e2x − 1
ƒ(x) = 2 x
e +1
Note that for absolute values of x greater than or equal to 36, the value of the sigmoid function is
effectively 1 for positive arguments or 0 for negative arguments, within about 15 digits of
significance.
Z Score
Like a Sigmoid transformation, a Z-Score transformation provides rescaling of continuous
numeric data in a more sophisticated way than a Rescaling transformation. In a Z-Score
transformation, a numeric column is transformed into its Z-score based on the mean value and
standard deviation of the data in the column. It transforms each column value into the number of
standard deviations from the mean value of the column. This non-linear transformation is
generally more useful in data mining than a linear Rescaling transformation.
For a value, the number of standard deviations away from the mean is calculated as:

Chapter Two
Analytic Data Sets
n
1
x −
n
∑
i =1
x
ƒ(x) =
∑ x2
⎛1 n
⎞
2
n
−⎜
⎝ n
∑
i =1
x⎟
⎠
Initiate a Variable Transformation Function

Use the following procedure to initiate a new Variable Transformation analysis in Teradata
Warehouse Miner:
2. In the resulting Add New Analysis dialog box, click on ADS under Categories and then under
Analyses double-click on Variable Transformation:

Chapter Two
Analytic Data Sets
3. This will bring up the Variable Transformation dialog in which you can define INPUT /
OUTPUT options and initiate any of the Variable Transformation functions, i.e:
Retain Recode
Bin Code Rescale
Derive Sigmoid
Design Code Z Score
Null Replacement
Variable Transformation - INPUT - Transformations

On the Variable Transformations dialog, click on INPUT and then click on transformations:

Chapter Two
Analytic Data Sets
Available Databases
All databases which are available for the Variable Transformation analysis.
Available Tables
All tables within the Source Database which are available for the Variable
Transformation analysis.
Available Columns
All columns within the selected table which are available for the Variable
Transformation analysis.
‘Transformations…’ Window
Move column(s) into this window for the Variable Transformation analysis to execute
against. First, highlight the function you wish to use in this window, for example Bin
Code:
Next, with selected function highlighted, choose column(s) by highlighting it in the

Chapter Two
Analytic Data Sets
Available Columns window and then clicking on the arrow button to move highlighted
column(s) into the ‘Transformations…’ window. Columns may also be dragged and
dropped into the appropriate folder.
Right-click options for this window are described below.
within the Transformations window.
Expand All Nodes

Expands the nodes completely for each transformation folder.
Collapse All Nodes
Collapses the nodes to show only transformation folders.
When this option is selected, the selectors on the left side of the input screen are
adjusted to match the table or analysis of the column associated with the selected
transformation, or otherwise with the column associated with the first transformation
specified in the current ordering.
When this option is selected, the selectors on the left side of the input screen are used
to change the input table or analysis of all columns in all transformations. The new
table or analysis must contain all columns in all of the transformations or an error is
given and no changes are made.
Remove All Transformations of this Type
This option applies only to Transformation folder nodes and will remove all
transformations contained within a selected folder (that is, of a given type).
Double-Back-Arrow Button
Clicking on the button with two arrows pointing to the left will remove all
transformations from the Transformations window. A prompt is given before removing
the transformations, which are removed only if OK is clicked in response.
Add Button
Clicking on the Add button leads to a dialog from which transformations may be selected
from loaded analyses to add as copies to the current analysis.

Chapter Two
Analytic Data Sets
Available Analyses
This drop down list contains all of the Variable Transformation analyses currently
loaded in the Project window, including those in other projects.
Available Transformation Types

This drop down list contains all of the types of transformation so that the following
selector may be filtered and transformations more easily selected. Note that the
default value of All leads to the listing of all available transformations.
Available Transformations
These are the transformations in the currently selected analysis, filtered by type if a
specific type is selected in the selector immediately above this one. (Note that a
Derive transformation that references more than one column cannot be added, even if
it appears as an available transformation.) Select one or more transformations to add.
Column To Transform

Chapter Two
Analytic Data Sets
This drop-down selector contains all of the possible columns in the table being
transformed. By default the column with name matching that being transformed in
the selected transformation to add will be selected. If a column with matching name
does not exist the user must select an appropriate column to transform.
If more than one transformation is selected the Column To Transform selector is

disabled. In this case the selected transformations are applied to columns with the
same name as the columns in the selected transformations. If any one of the selected
transformations does not transform a column with a name matching one of the
columns in the table to transform an error message is given and no transformations
are added.
OK/Cancel/Apply
Each time the Apply button is clicked a copy of the currently selected
transformations are added and a status message is given. The Apply button is also
disabled as a consequence until another transformation or column to transform is
selected. The dialog can be exited at any time by clicking the OK or Cancel button.
If OK is clicked, the currently selected transformations will be added unless the
Apply button is disabled.
Reorder Button
Clicking on the Reorder button leads to a dialog from which transformations in the
current analysis may be reordered for output purposes.

Chapter Two
Analytic Data Sets
Up/Down Arrow Buttons

One or more transformations may be highlighted and moved up or down in the list
using the Up and Down arrow buttons. The Up arrow button is disabled when the
top most selected transformation is at the top of the list, and the Down arrow button
is disabled when the bottom most selected transformation is at the bottom of the list.
Ellipses (…) Button

The following options are presented when the Ellipses button is clicked.
Move to Top
This option moves all the selected transformations to the top of the list.
Move to Bottom
This option moves all the selected transformations to the bottom of the list.
Restore Initial Order
This option reorders the transformations to match the order when the dialog was
displayed.
Order by Input Columns

Chapter Two
Analytic Data Sets
This option reorders the transformations by input column as displayed in the

Available Columns selector. This may either be in alphabetical or table order
depending on Preference or right-click options previously selected.
Properties Button
The Properties button leads to a dialog from which properties or default properties may
be set, as described in the following sections.
Setting Properties - Variable Transformation

Each requested transformation contains properties that can be set by editing the properties of the
column node for that transformation.
In the ‘Transformations…’ window, click on the column that was added when the transformation
was requested, for example column cust_id under Bin Code:
With column highlighted, click on the Properties button to bring up the Properties dialog:
(Tip: You can also double-click on the column name to bring up the Properties dialog.)

Chapter Two
Analytic Data Sets
Setting Default Properties - Variable Transformation

Each type of transformation contains default properties that can be changed by editing the
properties of the folder node for that type of transformation. When a column is added to a
transformation folder node, the default properties currently in effect for that type of
transformation are used to set the initial property values for the newly added transformation.
The default properties for each type of transformation are saved along with the analysis so that
they will be available if changes are made to the analysis at a later time.
In the ‘Transformations…’ window, click on the folder associated with the type of transformation
you want to set default properties for:
With the folder highlighted, click on the Properties button to bring up the Properties dialog for
the selected transformation type (Bin Code in the example below):
Apply to existing <transformation type> transformations

If there are already transformations in a folder node when the default properties for that folder
node are changed, the user may request that the changes be applied to all existing
transformations in that folder. Otherwise, the property values of existing transformations are
not changed when default properties are set. (Note: This option is not available when setting
default properties for Derive transformations.)
(Tip: You can also double-click on the column name to bring up the Default Properties dialog.)

Chapter Two
Analytic Data Sets
Properties Dialog – Common Features

While some options on the Properties dialog (shown above) depend on the function in use, the
Properties dialog will usually contain these common features:
Output
For most transformations, the Properties dialog will have an Output tab. (For Retain
transformations, the Properties dialog has no tabs but directly displays Output options.) Clicking
on Output leads to the following display and options:
Output Name / Suffix:

If this field appears, you can rename (assign an alias to) the column, if needed. (Only in
the case of Design Code, where multiple columns are created, does this appear as Output
Suffix because it follows the prefix representing the value being encoded, separated by an
underscore character.)
Output Type:
When this field appears it lets you select output type. The default is Generate
Automatically, but you can also select the following types. (Depending on the type
selected, one or more length fields may also be presented.)
BYTEINT
CHAR
DATE
DECIMAL
FLOAT
INTEGER
SMALLINT
TIME
TIMESTAMP

Chapter Two
Analytic Data Sets
VARCHAR
Column Attributes:
One or more column attributes can be entered here in a free-form manner to be used
when an output table is created. They are placed as-entered following the column
name in the CREATE TABLE AS statement. This can be particularly useful when
requesting data compression for an output column, which might look like the following:
COMPRESS NULL.
Description:
An optional description may be specified for each transformation.
Null Replacement
For most transformations, the Properties dialog will have a Null Replacement tab. Click on
Null Replacement to display options for replacing null values within the column.
On this screen you can elect to replace null values by clicking the checkbox, and then specifying
what null values are to be replaced with. The choices are:
Imputed Value
You will then need to select a column in the Imputed Column field. Click the down-
arrow beside the Imputed Column field to display available columns. (You may need
to expand tree items to drill down to individual columns.)
Literal Value
The value as specified in the Literal Value field. Literal value replacement is
supported for numeric, character and date data types.
Mean
Average value - Mean value replacement is supported for columns of numeric type or
date type, with special coding provided for date type.

Chapter Two
Analytic Data Sets
Median
The median value can be requested with averaging of two middle values when there
is an even number of values. Supported only for numeric and date type columns .
Median (No Averaging)
The median value can be requested without averaging of two middle values when
there is an even number of values.
Mode
The most frequently occurring value.
Target Date
Literal Target Date as specified on the INPUT-target date panel.
Properties Dialog – Function-Specific Features

Some contents of the Properties dialog will depend on the particular function in use (Bin Code,
Design Code, etc). Function-specific options on the Properties dialog are as follows:
Properties - Bin Code

If doing a Bin Code transformation there will be a Bin Code tab on the Properties dialog. Click
on Bin Code to access the following options:
Bin Code Style:

Select a Bin Code Style. Options are:
Bins
Specify a number of equal sized data bins (Default = 10).
Bin with Boundaries
Specify the minimum and maximum value and the number of equal sized data bins
(Default = 10) to create in this range.
Boundaries
Specify a list of boundary values to define the bins.
Quantiles
Specify a number of bins with a nearly equal number of values (Default = 10).
Width
Specify the desired width of each bin.
After setting values/options on the Properties dialog, click on OK to close the Properties dialog.
Then continue specifying INPUT and OUTPUT parameters described further in this chapter.
Properties - Derive
If doing a Derive transformation there will be a Derive tab on the Properties dialog. Click on
Derive to access a variation of the Variable Creation input screen. It has been altered to initially
contain a single variable consisting of the column that the Derive transformation is based upon. It
has also been altered so that the input table cannot be changed and so that a new variable cannot
be entered or the existing one deleted.
Special handling is given to aggregation functions if they appear in the user-defined expression.
Any requested aggregation function is computed over the entire input table (limited of course by
the where clause if specified as an expert option) in one of the global aggregate derived tables
shared by the other transformation functions. The aggregation is then treated as a constant in the
user-defined expression. And although the user-defined expression may include ordered
analytical functions, it may not include an aggregate within an ordered analytical function.

Chapter Two
Analytic Data Sets
Special handling is also given when specifying the default properties for a Derive transformation
in the Default Properties dialog. A single variable called <default column> is initially provided.
Wherever it appears in the expression created by the user, it will be replaced by the selected
column that was used to define a specific Derive transformation. (If more instances of <default
column> are needed, the initially provided instance can be copied by dragging it with the control
key held down). This makes the default Derive transformation behave like a template for a
custom transformation.
Properties - Design Code

If doing a Design Code transformation there will be a Design Code tab on the Properties dialog.
Click on Design Code to access the following options:
Encoding Style:
Select an Encoding Style. Options are:
Contrast Code
Choose to “contrast-code” all values, resulting in –1/0/1 generated as values.
Reference Value
The value for which a –1 will be generated when the column is equal to it. (This
option only available when Contrast Code is selected.)
Dummy Code
Choose to “dummy-code” all values, resulting in 0/1 generated as values.
Values to Encode
Value
A list of values within the column that “dummy-codes” or “contrast-codes” will be
generated for. If the Contrast Coding option is selected, the Reference Value must
not be listed. Double-click in the area shown to enter the values.
Column
The desired name of the result of the Design Coding Analysis. A default name
is provided if the values are loaded with the Values… button. The data type
generated is BYTEINT.
Values
Brings up the design code wizard which determines the distinct values of the column
being design coded, and assigns default column names of <value>_<column name>
(for example, 123_Department). These columns can be renamed by highlighting them
and typing over the current name.
Special handling is necessary for the default properties of a Design Code transformation. Since
the column to be transformed is not yet known, column prefixes are associated with specific
values rather than column names. Then, when the default properties are applied to a specific
column, the column name is appended to the default prefixes. For example, if the value 0 is
associated with the prefix "0_", when the default properties are applied to the column "amount", 0
is associated with the column "0_amount".

Chapter Two
Analytic Data Sets
Properties - Recode
If doing a Recode transformation there will be a Recode tab on the Properties dialog. Click on
Recode to access the following options:
Values to Recode:
Create a list of categorical values to transform from one value to another. Use the Add
(and Remove) buttons as necessary to build a list or use the Values button.
From
List existing values within column to recode. These are the “Old” values to be
replaced by new values below. For example: 0, ELSE, NULL
To
New values to replace corresponding old value, one for one. For example: N, Y, N.
In this example, you will change a column 0 and other values into a column with Y/N
by changing 0 to N, all other values to Y and NULL (unknown) values to N.
Recode all other values as:

NULL
SAME
(Literal Value)
Enter a literal value in the field provided.
Values
Brings up the recode wizard which determines the distinct values of the column being
recoded, and allows you to type a new value in for each of n distinct values.
Properties - Rescale
If doing a Rescale transformation there will be a Rescale tab on the Properties dialog. Click on
Rescale to access the following options:
Upper and Lower Bound:

Then enter numeric values indicating the upper and lower bounds to rescale the column
to.
Lower Bound:
Then enter numeric value indicating the lower bound to rescale the column to.
Upper Bound:
Then enter numeric value indicating the upper bound to rescale the column to.
Properties - Sigmoid
If doing a Sigmoid transformation there will be a Sigmoid tab on the Properties dialog. Click on
Sigmoid to access the following options:
Statistical Computation:
The choices are:
Logit

Chapter Two
Analytic Data Sets
Modified Logit
Hyperbolic Tangent
Variable Transformation - INPUT - Primary Key

On the Variable Transformation dialog, click on INPUT and then click on primary key:
The purpose of this screen is to specify the columns that comprise the primary key of the input
table or view being transformed. (This is required only when null value replacement is requested
in one of the requested transformations.)
If input comes from a table the primary index columns of the table will be selected by default. To
change these columns, or to enter them initially if input is from a view, use the selectors as
described below.
Available Tables:
Pull-down with the name of the input table or view.
Available Columns:
All columns within the table or view selected in Available Tables. Highlight those
columns which comprise the primary key of the table or view and either drag and drop

Chapter Two
Analytic Data Sets
them to Selected Primary Key Columns, or use the right arrow button > to move them
over.
Selected Primary Key Columns:
All columns within the table or view that constitute the primary key (that is, that uniquely
identify each row). If undesired columns were moved into this area, highlight those
columns and either drag and drop them back to Available Columns, or use the left arrow
button < to move them back.
Variable Transformation - INPUT – Analysis Parameters

On the Variable Transformation dialog, click on INPUT and then click on analysis parameters:
If a Target Date was used for NULL value replacement, it can be set here. The default value is the
current date, and can be changed by either typing in another date, specifying month, day and year
separately, or selecting a date with the standard Windows calendar control as shown above.
Variable Transformation - INPUT - Expert Options

On the Variable Transformations dialog, click on INPUT and then click on expert options:
The resulting screen has the WHERE option available:
Optional WHERE clause text

Chapter Two
Analytic Data Sets
Option to generate a SQL WHERE clause(s) to restrict rows selected for analysis.
Variable Transformation - OUTPUT - Storage

On the Variable Transformation dialog click on OUTPUT and then click on storage:
Use the Teradata EXPLAIN feature to display the execution plan for this analysis
Execution Plan.

analysis. Once enabled, the next three fields must be specified:
Database Name
Text box to specify the name of the Teradata database where the resultant Table or
View will be created in. By default, this is the “Result Database.”
Output Name
Output Type
If a table is selected, it will be built as a MULTISET table if this option is selected

If this option is selected the analysis will only generate SQL, returning it and terminating
immediately.
Variable Transformation - OUTPUT - Primary Index

On the Variable Transformation dialog click on OUTPUT and then click on primary index:

Chapter Two
Analytic Data Sets
On this screen, select the columns which comprise the primary index of the output table:
Available Columns
A list of columns which comprise the index of the resultant table if an Output Table is
used.
Select columns by highlighting and then either dragging and dropping into the Primary
Index Columns window, or click on the arrow button to move highlighted columns into
the Primary Index Columns window.
Run the Variable Transformation Analysis

After setting INPUT and OUTPUT parameters as described above, you are ready to run the

Chapter Two
Analytic Data Sets
Results - Variable Transformation

The results of running a Variable Transformation analysis include the generated SQL itself, the
results of executing the generated SQL, and, if selected, a Teradata table (or view). All these
results are outlined below.
Variable Transformation - RESULTS - Data

On the Variable Transformation dialog, click on RESULTS and then click on data (note that the
The results of the completed query are returned in this Data Viewer page. This page has the
Graphical User Interface. With the exception of the Explain Select Result Option, these results
will match the tables described below in the Output Column Definition section, depending upon
the parameters chosen for the analysis.
Variable Transformation - RESULTS - SQL

On the Variable Transformation dialog, click on RESULTS and then click on SQL (note that the
Tutorial – Variable Transformation Analysis
Variable Transformation - Example #1
Parameterize a Variable Transformation Analysis as follows:
1. Select twm_tutorials_vc1 (as created in Variable Creation Tutorial #1) as the

Available Table
2. Drag and drop the following columns from twm_tutorials_vc1 to the following
transformation functions in Created Transformations:
• cust_id - Retain folder

• income - Bin Code folder

Chapter Two
Analytic Data Sets
• gender - Design Code folder

• marital_status - Recode folder
• age - Rescale folder
• avg_cc_tran_amt - Z Score folder (rename zavg_cc_tran_amt)
3. Let all of the transformation functions properties default, except as follows. Double
click on the variable name to bring up the Properties screen:
4. gender - Design Code
Click on the Design Code tab on the Properties screen and then click on the Values
button to bring up the Design Code values Wizard:
Select both F and M by highlighting the and hitting the Add> button. Hit Finish to
exit the Wizard.
5. The default values of F_gender and M_gender are given for the values of F and M
respectively. Highlight those values and type in Females and Males accordingly:

Chapter Two
Analytic Data Sets
6. marital_status - Recode
Click on the Recode tab on the Properties screen and then click on the Values button
to bring up the Recode values Wizard:

Chapter Two
Analytic Data Sets
Select 1-4 by highlighting them and hitting the Add> button. Hit Finish to exit the
Wizard.
7. Specify recode values as follows: 1-S, 2-M, 3-S and 4-S.
8. age - Rescale
Specify a lower bound of 0 and an upper bound of 1 as follows:

Chapter Two
Analytic Data Sets
database. Specify that a Table should be created named twm_tutorials_vt1.
For this example, the Variable Transformation Analysis generated the following results. Note that
the SQL is not shown for brevity:
Data
Only the first 10 columns after sorting are shown.
cust_id income Females Males marital Age zavg_cc_tran

1362480 9 0 1 M 0.15 0.88
1362481 3 1 0 M 0.21 -1.09
1362484 1 1 0 S 0.33 0.27
1362485 4 1 0 S 0.00 0.22
1362486 1 1 0 S 0.98 -1.09
1362487 1 0 1 M 0.88 0.02
1362488 1 1 0 M 1.00 -1.09
1362489 10 1 0 S 0.19 1.88
1362492 3 1 0 S 0.36 0.15
1362496 1 0 1 S 0.00 -0.39

Chapter Two
Analytic Data Sets
Build ADS (Analytic Data Set)
The purpose of analytic data set functions is to build a data set table or view. Each Variable
Creation and Variable Transformation analysis creates a table or view to be joined together into a
final data set table. This duty is performed by the Build ADS analysis.
The Build ADS analysis has similar functionality to the Join analysis in the Reorganization group
of analyses. However, it is distinguished by these differences.
• A join table or view is not required, so that it may operate on a single table or view.
• Tables are joined together via Join Paths as in a Variable Creation analysis, but without
Anchor Columns (refer to the section Variable Creation – Input – anchor table).
• By using Join Paths, Build ADS allows the use of Cross Join as a Join Style.
• By using Join Paths, the Join Style can be set differently for different tables.
• By using Join Paths, comparison operators may be set individually in Join Steps.
It should be pointed out that although the Variable Creaton analysis can be used in place of
Build ADS, Build ADS is simpler and easier to use in the functions it performs.
Initiate a Build ADS

Use the following procedure to initiate a new Build ADS analysis in Teradata Warehouse Miner:
2. In the resulting Add New Analysis dialog box, click on to highlight ADS under Categories,
and then under Analyses double-click on Build ADS:

Chapter Two
Analytic Data Sets
3. This will bring up the Build ADS dialog in which you will enter INPUT and OUTPUT options
Build ADS - INPUT - Data Selection

On the Build ADS dialog click on INPUT and then click on data selection:
Available Databases
All the databases which are available for the Build ADS Analysis.
Available Tables
All the tables within the Source Database that are available for the Build ADS Analysis.
Available Columns
All the columns within the selected table that are available for the Build ADS Analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the Selected
Selected Columns window.

Chapter Two
Analytic Data Sets
Build ADS - INPUT – Anchor Table

On the Build ADS dialog click on INPUT and then click on anchor table:
This screen performs the same function it does for the Variable Creation analysis with the
exception that the selector for Anchor Columns is not used. Refer to the section Variable
Creation – Input – Anchor Table for details.
Build ADS - INPUT - Expert Options

On the Build ADS dialog click on INPUT and then click on expert options:
This screen provides the option to generate a SQL WHERE clause(s) to restrict rows selected for
analysis (for example: cust_id > 0).
It may be useful to note that if a WHERE clause condition is specified on the "inner" table of a
join (i.e. a table that contributes only matched rows to the results), the join is logically equivalent
to an Inner Join, regardless of whether an Outer type is specified. (In a Left Outer Join, the left
table is the "outer" table and the right table is the "inner" table.)
Build ADS - OUTPUT - Storage

Before running the analysis, define Output options. On the Build ADS dialog click on OUTPUT
Execution Plan.

Chapter Two
Analytic Data Sets

Database Name
Text box to specify the name of the Teradata database where the resultant Table or
View will be created in. By default, this is the “Result Database.”
Output Name
Output Type
If a table is selected, it will be built as a MULTISET table if this option is selected

If this option is selected the analysis will only generate SQL, returning it and terminating
immediately.
Build ADS - OUTPUT - Primary Index

Before running the analysis, define Output options. On the Build ADS dialog click on OUTPUT
On this screen, select the columns which comprise the primary index of the output table. Select:
Available Columns
A list of columns which comprise the index of the resultant table if an Output Table is
used.
Select columns by highlighting and then either dragging and dropping into the Primary
Index Columns window, or click on the arrow button to move highlighted columns into
the Primary Index Columns window.
Run the Build ADS Analysis

After setting parameters on the INPUT and OUTPUT screens as described above, you are ready
to run the analysis. To run the analysis you can either:

Chapter Two
Analytic Data Sets
Results - Build ADS

The results of running the Build ADS analysis include the generated SQL itself, the results of
executing the generated SQL, and, if the Create Table (or View) option is chosen, a Teradata
table (or view). All of these results are outlined below.
Build ADS - RESULTS - Data

On the Build ADS dialog, click on RESULTS and then click on data (note that the RESULTS
The results of the completed query are returned in a Data page within Results. This page has the
Graphical User Interface. With the exception of the Explain Select Result Option, these results
will match the tables described below in the Output Column Definition section, depending upon
the parameters chosen for the analysis.
Build ADS - RESULTS - SQL

On the Build ADS dialog, click on RESULTS and then click on SQL (note that the RESULTS
Tutorial – Build ADS Analysis
Build ADS - Example #1
Parameterize a Build ADS Analysis as follows:

TWM_CUSTOMER.postal_code
(From Variable Transformation Tutorial #1)
twm_tutorials_vt1.age
twm_tutorials_vt1.income

Chapter Two
Analytic Data Sets
twm_tutorials_vt1.marital_status
twm_tutorials_vt1.Females
twm_tutorials_vt1.Males
twm_tutorials_vt1.zavg_cc_tran_amt
(From Variable Creation Tutorial #2)
twm_tutorials_vc2.CC_acct
twm_tutorials_vc2.CC_bal
twm_tutorials_vc2.CK_acct
twm_tutorials_vc2.CK_bal
twm_tutorials_vc2.SV_acct
twm_tutorials_vc2.SV_bal
twm_tutorials_vc2.Q1_nbr_trans
Anchor Table TWM_CUSTOMER
Inner Join to twm_tutorials_vt1 on cust_id
Inner Join to twm_tutorials_vc2 on cust_id
Go to OUTPUT-storage, and select Store the tabular output of this analysis in the
database. Specify that a Table should be created named twm_tutorials_bads1.
For this example, the Build ADS Analysis generates a table of 747 rows with the 19 columns
above joined together and, containing the same cust_id values as the TWM_CUSTOMER table.
Note that the SQL is not shown for brevity.

Chapter Two
Analytic Data Sets
Refresh
The Refresh Analysis is provided as a means to re-execute a chain of referenced analyses with a
different set of user specified parameters without modifying the original analyses. It falls under
the ADS umbrella because it is designed to allow the user to refresh an analytic data set, however
in addition to ADS Analyses it may also be used to refresh Score Analyses.
Creating an analytic data set can require a lot of thought and result in many steps of creating
variables and reorganizing data. There can be multiple tables joined by complicated join paths,
sophisticated arithmetic formulas, as well as the dimensioning of variables. With the use of
Analysis References, that provide a means to feed the output of a previous analysis into a
subsequent analysis, the result can be a complex string of analyses that make up the creation of a
final analytic data set. As the source data changes over time, it might be necessary to modify the
parameters used in generating the analytic data set. Appart from Refresh, there are two ways to do
this. The first way is to reproduce the entire set of analyses used to generate the analytic data set
with the new modified parameters. This is not ideal because if it is a complicated set of analyses it
could take a significant amount of time to reproduce it when you only wanted to change a few
things. The second way is to actually change the original analyses with new parameters. The
problem with this is that the original ADS template is now permanently changed.
With the Refresh Analysis, the original analyses can be re-executed with the modified parameters
without affecting the original parameters used. If any of the parameters are not selected to be
changed, then the original values are used. When Refresh is run, the analysis to be refreshed is
executed (along with any analyses that it references) using the new parameters specified within
Refresh. Over and above this it should be noted that, using one of the most powerful features of
the Refresh analysis, the referenced analyses will only generate the columns needed for the
analysis that is being refreshed.
Initiate a Refresh Analysis

Use the following procedure to initiate a new Refresh Analysis in Teradata Warehouse Miner:
2. In the resulting Add New Analysis dialog box, with ADS highlighted on the left, double-click
on the Refresh icon:

Chapter Two
Analytic Data Sets
3. This will bring up the Refresh dialog in which you will enter INPUT options to parameterize
the analysis as described in the next section.
Refresh - INPUT - Data Selection

On the Refresh dialog click on INPUT and the analysis parameters tab will automatically be
selected.
Available Analyses
Select a single analysis from the list of all of the analyses in the current project which are
available for the Refresh Analysis.
Modify Output
Check the box if you wish to change the output database and/or output table of the analysis to be
refreshed
DatabaseName
The name of the output database of the analysis to be refreshed
Table/View Name
The name of the output table or view of the analysis to be refreshed

Chapter Two
Analytic Data Sets
Modify Anchor Table

Check the box if you wish to modify the Anchor Table. This applies only if there are one or more
Variable Creation Analyses either as the selected analysis or, if it exists, in the chain of analyses
referenced by the selected analysis. When a new Anchor Table is selected, the new Anchor Table
is joined to the old Anchor Table(s) in all the Variable Creation Analyses in the reference chain
by means of a LEFT OUTER JOIN. (Any Anchor Tables in analyses of other types are not
affected.)
Modify Target Date
Check the box if you wish to modify the Target Date. This applies only if there are one or more
Variable Creation or Variable Transformation Analyses that use a target date either as the
selected analysis or, if it exists, in the chain of analyses referenced by the selected analysis.
Generate SQL Only
Check the box if you wish for the selected analysis, and if it exists, the chain of analyses
referenced by the selected analysis to generate SQL rather than execute it.
Run the Refresh Analysis

After setting parameters on the INPUT screen as described above you are ready to run the
Press the F5 key on your keyboard
Results - Refresh
On the Refresh dialog click on RESULTS (note that the RESULTS tab will be grayed-
out/disabled until after the analysis is completed):
Tutorial – Refresh
Refresh – Example
(Note: The following example will contain a Variable Creation, which will then be input into
the Refresh Analysis)
1. Select TWM_CUSTOMER as the Available Table

2. Create one variable by clicking on the New button and drag and drop the following
column. Note the variable name will default to the column name.
3. Select TWM_CREDIT_TRAN as the Available Table.
4. Create a variable by clicking on the New button and build up an expression as follows:

Chapter Two
Analytic Data Sets
5. Rename the variable to “avg_tran_amt” and drag an AVERAGE (Arithmetic) SQL

Element over the Variable, and then drag the following column over the empty
arguments:
• TWM_CREDIT_TRAN.tran_amt
6. Go to INPUT- variables-Dimensions tab and create a dimension by clicking on the New

button, and then drag an AND (Logical) on the Dimension Value. Rename the Dimension
to “LastMonth”
7. Drag a LESS THAN OR EQUALS (Comparison) onto the first empty argument, and a
GREATER THAN (Comparison) onto the second empty argument.
8. Drag a DATE DIFFERENCE (Date and Time) onto the first empty argument of each
comparison operator.
9. Drag a TARGET DATE (Literals) onto the first empty argument of each Date Difference
and drag TWM_CREDIT_TRAN.tran_date onto the second empty argument of each
Date Difference.

Chapter Two
Analytic Data Sets
10. Drag a NUMBER (30) (Literal) onto the empty argument of the LESS THAN OR
EQUALS, and drag a NUMBER (0) (Literal) onto the empty arugment of the GREATER
THAN.
11. Go to the INPUT-dimensions tab and apply the dimension to the variable in the following
way.

Chapter Two
Analytic Data Sets
12. Specify the Join Paths from TWM_CUSTOMER to each of the following by selecting a
table in the Join Path from Anchor Table To: and clicking on the Wizard button.
Specify the following Join Paths:
TWM_CUSTOMER.cust_id - TWM_CREDIT_TRAN.cust_id
13. Go to INPUT-target date, and change the Target Date to 7/31/1995.

database. Specify that a Table should be created named twm_tutorials_refresh.
Run the analysis.
Parameterize a Refresh Analysis as follows:
Available Analyses Variable Creation1

Modify Output Checked
Table Name twm_tutorials_refresh2
Modify Anchor Table Checked, twm_savings_tran
Modify Target Date 8/31/1995
Run the Analysis. View the generated SQL (which has been generated by the Variable
Creation Analysis, but modified by the Refresh Analysis) to see how the target date, output
table, and anchor table have been changed.

Chapter Three
Matrix Functions
3. Matrix Functions
Overview – Matrix Functions

Teradata Warehouse Miner functions are provided to build matrices which can drastically reduce
the amount of data required for analytic algorithms. Numeric columns in potentially huge
relational tables are reduced to a comparatively compact matrix (n-by-n if there are n columns),
which can be delivered to two of the Teradata Warehouse Miner Analytic Algorithms (Linear
Regression and Factor Analysis), or an external application for further analysis. One example of
an external application would be SAS, which provides principal component analysis or linear
regression analysis based on a correlation or covariance matrix as input.
The matrix functions must operate on numeric data. Columns of type DATE will not produce
meaningful results. NULL values are handled via listwise or pairwise deletion in the Matrix
analysis.
These functions are valid for any of the supported data reduction matrix types, namely
correlation, covariance, sums of squares and cross products, and corrected sums of squares and
cross products. Internally the Matrix analysis stores the matrix as an extended sums of squares
and cross products matrix, with an additional column containing a constant value, 1. The actual
conversion to another type, if requested, is computed in the Export Matrix analysis.
The Matrix functions are:
Matrix
Build an extended Sums of Squares and Cross-Products (SSCP) data reduction matrix.
Optionally, restart the Matrix process upon a failure or when a previously-executed
Matrix was stopped.
Export Matrix
Convert or export the resultant matrix and build either a SAS data step, a Teradata table,
or just view the results. Valid matrices include:
• Pearson-product moment correlations (COR)
• Covariances (COV)
• Sums of squares and cross-products (SSCP)
• Corrected Sums of squares and cross-products (CSSCP)

Chapter Three
Matrix Functions
Matrix Analysis
The Matrix analysis will process the input data so that one of the following data reduction
matrices can be exported via the Export Matrix analysis:

The formulas used to calculate these matrices are given below.
Correlation
The Pearson Product-Moment Correlation value of the pairwise combinations of each
column within the selected table. This is calculated as follows, for each pairwise
combination of variables X and Y:
(n ⋅ ∑ xy ) − ∑ x ⋅ ∑ y
ƒ(x,y) =
((n ⋅ ∑ x )− (∑ x) )⋅ ((n ⋅ ∑ y )− (∑ y ) )
2 2 2 2
where n is the total number of occurrences of this variable.
Covariance
The Covariance value of the pairwise combinations of each column within the selected
table. This is calculated as follows, for each pairwise combination of variables X and Y:
⎛ ⎞
⎛
⎜
∑ x ⋅ y ⎞⎟ − ⎜ ∑ x ⋅∑ y ⎟
⎟
ƒ(x,y) = ⎜⎜
⎝ n − 1 ⎟⎠ ⎜ n ( n − 1) ⎟
⎜⎜ ⎟⎟
⎝ ⎠
Sums of Squares and Cross-Products

The Sums of squares and Cross-Products value of the pair-wise combinations of each
column within the selected table. This is calculated as follows, for each pair-wise
n
ƒ(x,y) = ∑
i =1
xy
Corrected Sums of Squares and Cross-Products

The Corrected Sums of squares and Cross-Products value of the pair-wise combinations
of each column within the selected table. This is calculated as follows, for each pair-wise

Chapter Three
Matrix Functions
ƒ(x,y) = ∑ x⋅y −
∑ x ⋅∑ y
n
The matrix functions must operate on numeric data. Columns of type DATE will not produce
meaningful results.
An option is provided for list-wise versus pair-wise deletion or omission of values which are
NULL. With list-wise deletion, the default option, if the value of any column to be included in
matrix calculations is NULL, the entire row is omitted during matrix calculations. Alternatively,
if pair-wise deletion is chosen, only pairs of values involving a NULL are ignored, not entire
rows. The danger in this case is that when later analysis is performed on the matrix, it is possible
that mathematical irregularities will be found due to the calculations being made over different
numbers of observations.
The Matrix analysis has restart capabilities as well. If a system failure occurs, or the Matrix
analysis is stopped by the end-user, it can be restarted, beginning its calculations at the point of
stoppage.
Note that the name of the Matrix analysis will be used to fetch the matrix values from the
database for those functions that are dependent upon a matrix – namely, Export Matrix, Linear
Regression and Factor Analysis.
Initiate a Matrix Function

Use the following procedure to initiate a new Matrix analysis in Teradata Warehouse Miner:
2. In the resulting Add New Analysis dialog box, click on Matrix Functions under Categories
and then under Analyses double-click on Matrix:

Chapter Three
Matrix Functions
3. This will bring up the Matrix dialog in which you will enter INPUT and OUTPUT options to
parameterize the analysis as described in the next sections.

Chapter Three
Matrix Functions
Matrix - INPUT - Data Selection

On the Matrix dialog click on INPUT and then click on data selection:
Available Databases
All the databases which are available for the Matrix analysis.
Available Tables
All the tables within the Source Database that are available for the Matrix analysis.
Available Columns
All the columns within the selected table that are available for the Matrix analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the Selected
Selected Columns window.
Matrix - INPUT - Analysis Parameters

On the Matrix dialog click on INPUT and then click on analysis parameters:
Null Handling
Provides option for list-wise versus pair-wise deletion, used for omission of values which
are NULL.
Pairwise Deletion
Only pairs of values involving a NULL are ignored, not entire rows.
Listwise Deletion
If the value of any column to be included in the matrix is NULL, the entire row is
omitted during matrix calculations.
Matrix Width
The width of the matrix results. Width is the number of columns processed with each
SQL statement.
Number of Connections

Chapter Three
Matrix Functions
The number of threads or simultaneous connections to the data source. Multiple sessions
may speed the SQL execution process.
Continue Execution (instead of starting over)

If a previously executed Matrix analysis was stopped, or failed after some portion of the
matrix was built, this option will be enabled to allow you to begin the Matrix process at
the point the analysis terminated.
Run the Matrix Analysis

After setting parameters on the INPUT screens as described above, you are ready to run the
Results - Matrix
The results from running the Matrix analysis are persisted within the Metadata model, and are not
returned to the front-end. Results can be viewed using the Export Matrix analysis (next section in
this chapter).
Tutorial - Matrix
Matrix Example #1
Parameterize a Matrix analysis as follows. Note that this matrix will be used in the Linear
Regression and Factor Analysis Tutorials in subsequent chapters:
Selected Table and Columns TWM_CUSTOMER_ANALYSIS.income

TWM_CUSTOMER_ANALYSIS.age
TWM_CUSTOMER_ANALYSIS.years_with_bank
TWM_CUSTOMER_ANALYSIS.nbr_children
TWM_CUSTOMER_ANALYSIS.marital_status
TWM_CUSTOMER_ANALYSIS.female
TWM_CUSTOMER_ANALYSIS.single
TWM_CUSTOMER_ANALYSIS.married
TWM_CUSTOMER_ANALYSIS.separated
TWM_CUSTOMER_ANALYSIS.ccacct
TWM_CUSTOMER_ANALYSIS.ckacct
TWM_CUSTOMER_ANALYSIS.svacct
TWM_CUSTOMER_ANALYSIS.avg_cc_bal
TWM_CUSTOMER_ANALYSIS.avg_ck_bal
TWM_CUSTOMER_ANALYSIS.avg_sv_bal
TWM_CUSTOMER_ANALYSIS.avg_cc_tran_amt
TWM_CUSTOMER_ANALYSIS.avg_cc_tran_cnt
TWM_CUSTOMER_ANALYSIS.avg_ck_tran_amt

Chapter Three
Matrix Functions
TWM_CUSTOMER_ANALYSIS.avg_ck_tran_cnt
TWM_CUSTOMER_ANALYSIS.avg_sv_tran_amt
TWM_CUSTOMER_ANALYSIS.avg_sv_tran_cnt
TWM_CUSTOMER_ANALYSIS.cc_rev
Analysis Name: Customer_Analysis_Matrix
There are no viewable results generated as a result of executing the Matrix analysis. The results
will be viewed via the Export Matrix analysis tutorial (later in this chapter). Save the Matrix
analysis with the above mentioned name “Customer_Analysis_Matrix” for use in the Linear
Regression and Factor analysis tutorials.
Matrix Example #2
Parameterize a Matrix analysis as follows:
Selected Tables and Columns TWM_CUSTOMER_ANALYSIS.income

TWM_CUSTOMER_ANALYSIS.age
TWM_CUSTOMER_ANALYSIS.years_with_bank
TWM_CUSTOMER_ANALYSIS.nbr_children
TWM_CUSTOMER_ANALYSIS.marital_status
TWM_CUSTOMER_ANALYSIS.female
Analysis Name: Customer_Analysis_Matrix_Short
There are no viewable results generated as a result of executing the Matrix analysis. The results
will be viewed via the Export Matrix analysis tutorial. Save the Matrix analysis with the above
mentioned name “Customer_Analysis_Matrix_Short” for use during the Export Matrix tutorial.

Chapter Three
Matrix Functions
Export Matrix
The Export Matrix analysis will export the matrix data values built by the Matrix analysis in one
of the following forms. (Note that the form is not specified when the matrix is built, yet the
matrix can be requested in any form when it is exported.)

The exported matrices can take on one of the following formats:
• SAS DataStep
• Teradata Table
• Viewable Results
If a SAS data step script is created to build a “special” (matrix) SAS data set, the script will
produce, when executed with a SAS application, a data set with the same name as the SAS file
name. This function automatically appends “.sas” to the end of the requested output (script)
name, and SAS will create a .log file when the script is executed.
If a table containing the matrix is created, the table will contain one column for each column used
to build the matrix, with the same name as the original column, or the alias, if any, which was
given to the Matrix analysis. In addition, an XIDX column is added to the front of the result table,
along with an XCOL column containing the name of the original column or alias.
To view the correlation, covariance, SSCP or CSSCP matrix, specify no Output Options on the
analysis parameters tab. After the analysis has executed, click on the Results tab to view the
matrix.

Chapter Three
Matrix Functions
Initiate an Export Matrix Function

Use the following procedure to initiate a new Export Matrix analysis in Teradata Warehouse
Miner:
2. In the resulting Add New Analysis dialog box, click on Matrix Functions under Categories
and then under Analyses double-click on Export Matrix:
3. This will bring up the Export Matrix dialog in which you will enter INPUT and OUTPUT

Chapter Three
Matrix Functions
Export Matrix - INPUT - Data Selection

On the Matrix dialog click on INPUT and then click on data selection:
Available Matrices
All the matrices within the Metadata Database that have been previously built with the
Matrix analysis and have been saved to metadata are available to export with the Export
Matrix analysis. These are identified by the analysis name of the Matrix analysis.
Selected Matrix
The Matrix analysis name of the matrix to export.
Export Matrix - INPUT - Analysis Parameters

On the Matrix dialog click on INPUT and then click on analysis parameters:
Matrix Type
Provides options for the specific type of matrix to export.
Correlation
Export the matrix values as Pearson-product moment correlations.
Covariance
Export the matrix values as Covariances.
SSCP
Export the matrix values as an extended Sums of squares and cross-products, with the
column of constant 1’s labeled INTERCEPT.
CSSCP
Export the matrix values as Corrected Sums of squares and cross-products.
Output Options
Create a SAS DataStep based on this Matrix
Build the matrix results within a SAS DataStep script.
Use truncated (8 character) Column Names

Chapter Three
Matrix Functions
Check to force column/alias name to 8 characters or less.

File Name
You can use the Browse button to bring up a standard browse dialog, to choose
a location to save the exported Flat File, Report or SAS Data Step.
Create a Database Table based on this matrix
Build the matrix results as a Teradata table. You will need to specify a Table Name.
Run the Export Matrix

Results – Export Matrix

An XML Results object is built when an Export Matrix analysis has completed execution. It
contains a single tag – <Format> Matrix, where <Format> is either Correlation, Covariance,
SSCP, or CSSCP. On the Export Matrix dialog, click on RESULTS (note that the RESULTS
tab will be grayed-out/disabled until after the analysis has been run to completion):
Output Columns - Export Matrix

If the “Create a Database Table” output option is chosen, then the following table is built by the
Export Matrix analysis. Those columns in bold below will comprise the Unique Primary Index
(UPI).

XIDX Integer Unique integer value indicating an internal “index” of the column(s) selected in
Selected Tables and Columns. Used as a row identifier in order to manipulate
the table with matrix algebra.
XCOL VARCHAR(30) The column names selected in Selected Tables and Columns.
<matrix_values> FLOAT A column, with the same name as that selected in Selected Tables and
Columns will be generated. Data type for all is FLOAT.
Tutorial - Export Matrix
Export Matrix Example #1
Parameterize a Export Matrix analysis as follows:

Chapter Three
Matrix Functions
Selected Matrix Customer_Analysis_Matrix_Short

SSCP Enabled
Create a SAS Data Step Enabled
Use truncated (8 character) column names Enabled
File Name Twm_SSCP_Values.sas
Run the analysis and edit the resultant SSCP_Values.sas SAS data step script:
DATA SSCP_Values (type=SSCP);

infile cards flowover;
ARRAY columns [7] INTERCEP income age years_01 nbr_ch01 marita01 female;
input _type_ $ _name_ $ columns[*];

cards;
N CNT 747.000000 747.000000 747.000000 747.000000 747.000000 747.000000

MEAN AVG 22728.281124 42.479250 3.907631 0.714859 1.882195 0.559572
STD STDDEV 22207.221405 19.114879 2.675634 1.103410 0.892051 0.496771
SSCP INTERCEP 747.000000 16978026.000000 31732.000000 2919.000000 534.000000

1406.000000 418.000000
SSCP income 16978026.000000 753779217048.000000 798771897.000000 68143689.000000
17316503.000000 35612419.000000 8290553.000000
SSCP age 31732.000000 798771897.000000 1620524.000000 130921.000000 21784.000000
64058.000000 17696.000000
SSCP years_01 2919.000000 68143689.000000 130921.000000 16747.000000 2010.000000
5475.000000 1629.000000
SSCP nbr_ch01 534.000000 17316503.000000 21784.000000 2010.000000 1290.000000
1355.000000 295.000000
SSCP marita01 1406.000000 35612419.000000 64058.000000 5475.000000 1355.000000
3240.000000 787.000000
SSCP female 418.000000 8290553.000000 17696.000000 1629.000000 295.000000
787.000000 418.000000
;
Parameterize an Export Matrix analysis as follows:

CSSCP Enabled
Create Table Enabled
Table Name Twm_CSSCP_Matrix
Run the analysis and view the results with either QueryMan or the SQL Node by executing the
following queries:
SHOW TABLE <result_db>.CSSCP_Matrix;
SELECT * from <result_db>.CSSCP_Matrix order by 1;

Chapter Three
Matrix Functions
Note the following results:

CREATE SET TABLE <result_db>.CSSCP_Matrix ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
(
XIDX INTEGER,
XCOL VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
income FLOAT,
age FLOAT,
years_with_bank FLOAT,
nbr_children FLOAT,
marital_status FLOAT,
female FLOAT)
UNIQUE PRIMARY INDEX ( XIDX );
XIDX XCOL income age years_wi

1 income 367897869180.964 77558080.3574296 1799836.3975904
2 age 77558080.3574296 272572.4283802 6924.0682731
3 years_wi 1799836.3975904 6924.0682731 5340.626506
4 nbr_chil 5179600.8795181 -899.9196787 -76.6746988
5 marital_ 3656455.7389558 4332.1740295 -19.1285141
6 female -1209868.5100402 -60.3266399 -4.3895582
…
XIDX XCOL nbr_chil marital_ female
1 income 5179600.8795181 3656455.7389558 -1209868.5100402
2 age -899.9196787 4332.1740295 -60.3266399
3 years_wi -76.6746988 -19.1285141 -4.3895582
4 nbr_chil 908.2650602 349.9076305 -3.811245
5 marital_ 349.9076305 593.6331995 .2423025
6 female -3.811245 .2423025 184.0990629
Parameterize a Export Matrix analysis as follows:

Correlation Enabled
Output Options None
Run the analysis to see the following results:
Results
Click on the Results tab to the see the following Matrix Report:
- income age years_with_bank nbr_children marital_status female

income 1.0000 * * * * *
age 0.2449 1.0000 * * * *
years_with_bank 0.0406 0.1815 1.0000 * * *
nbr_children 0.2833 -0.0572 -0.0348 1.0000 * *
marital_status 0.2474 0.3406 -0.0107 0.4765 1.0000 *
female -0.1470 -0.0085 -0.0044 -0.0093 0.0007 1.0000

Chapter Four
Scoring
4. Scoring
PMML Scoring
Predictive Model Markup Language (PMML) is an XML standard being developed by the Data
Mining Group, a vendor-led consortium established in 1998 to develop data-mining standards.
NCR co-developed the initial PMML specification along with Angoss, Magnify, SPSS and The
National Center for Data Mining at the University of Illinois at Chicago.
PMML enables the definition and subsequent sharing of predictive models between applications.
It represents and describes data mining and statistical models, as well as some of the operations
required for cleaning and transforming data prior to modeling. PMML aims to provide enough
infrastructure for an application to be able to produce a model (the PMML producer) and another
application to consume it (the PMML consumer) simply by reading the PMML data file. This
means that a model developed in a desktop data-mining tool can be deployed or scored against an
entire data warehouse.
PMML-compliant XML documents consist of the following major constructs:
Feature Function
Data Dictionary Defines the data to the model and specifies each data
attribute’s type and value range.
Mining Schema Defines attribute information specific to a certain model. It

specifies an attribute's usage type, whether it be active or
independent (an input of the model), predicted or dependent
(an output of the model), or supplementary (descriptive
information that is ignored by the model).
Transformation Dictionary Contains simple algorithm-specific data transformations

such as normalization (map values to numbers),
discretization (map continuous values to discrete values),
value mapping (map discrete values to discrete values) and
aggregation (simple averages and counts).
Models Identifies model parameters for regression models, cluster

models, decision tree models, neural networks, Bayesian
models, association rules and sequence models.
Each PMML construct supports a mechanism for extending the content of a model. Liberal use of
such “extensions” requires that vendors who produce PMML-based models collaborate closely
with vendors who wish to consume that PMML. Please refer to the Teradata Warehouse Miner
Release Definition document for details about the products and product versions supported for
PMML consumption in Teradata ADS Generator and Teradata Warehouse Miner.
Although PMML is a great step forward, it has several flaws other than extensions, namely
encapsulation of the process of cleaning, transforming and aggregating data. Teradata recognized
this limitation early on—if the PMML document could not represent the analytic variables that

Chapter Four
Scoring
were input to the analytic tools, it would be nearly impossible to consume PMML for scoring
predictive models. This is because the deployment (scoring phase) of a predictive model requires
the existence of the same variables upon which the model was built. For this reason, the PMML
Scoring analysis is included in both the Teradata ADS Generator as well as Teradata Warehouse
Miner.
Initiate PMML Scoring

Use the following procedure to initiate a PMML Scoring analysis:
2. In the resulting Add New Analysis dialog box, click on Scoring under Categories and then
under Analyses double-click on PMML Scoring:
3. This will bring up the PMML Scoring dialog in which you will enter INPUT and OUTPUT

Chapter Four
Scoring
PMML Scoring - INPUT - Data Selection

On the PMML Scoring dialog click on INPUT and then click on data selection:
Select Filename
The fully qualified name of the XML file containing the PMML model to be scored. A
filename can either be entered here or loaded using the Browse button.
Note that when a saved analysis with a valid model is first loaded into the project space
its model is embedded in the analysis and the displayed filename reflects the file the
model was originally built from, even if it resided on another client machine. Hovering
the mouse over the filename will display the original filename, computer name and
modified date.
Modify
Select this button to remove the embedded model from the analysis and return to
the standard browse filename selection input method. Once selected however, the
model is taken from a file rather than the previous embedded model. (NOTE: If
the analysis isn't saved the next load of the analysis will still contain the previous
embedded model.)
Browse
Bring up the Standard Windows location dialogue in order to navigate to the file
containing the PMML model.
view >>
Once the XML file containing the PMML model is selected (or there is an
embedded model), the view >> hyperlink is enabled. The model can be viewed
by clicking this link.
Available Databases
All available source databases that have been added through Connection Properties.
Available Tables
The tables available for scoring are listed in this window, though all may not strictly
qualify: the input table or tables to be scored must contain the same column names used
in the original analysis.
Available Columns
The columns available for scoring are listed in this window.
Selected Columns:
Index Columns
Note that the Selected Columns window is actually a split window for specifying
Index and/or Retain columns if desired. If a table is specified as input, the primary
index of the table is defaulted here, but can be changed. If a view is specified as
input, an index must be provided.
Retain Columns

Chapter Four
Scoring
Other columns within the table being scored can be appended to the scored table, by
specifying them here. Columns specified in Index Columns may not be specified
here.
PMML Scoring - OUTPUT

On the PMML Scoring dialog click on OUTPUT:
Output Table:
Database name
The name of the database.
Table name
The name of the scored output table to be created.
If this option is checked the SQL to score this PMML model will be generated.
Maximum SQL statement size allowed (default 64000):
The SQL statements generated will not exceed this max value in characters.
Generate as a stored procedure with name:
If this option is checked the SQL produced will be generated in the form of a
stored procedure having the name that was given.
Model Output Options:

This control allows you to add probabilities to the score table generated, in addition to the
dependent variable prediction itself.
Name
Name of the column containing the probability. Click the check-box to have it included
in the score table generated.
Display Name
For display purposes only, the “DisplayName” of the column containing the probability
from the PMML file.
Target Field
For display purposes only, the “TargetField” or name of the dependent variable from the
PMML file.
Feature
For display purposes only, a description of what the column will contain – currently only
Probabilities.
Value
For display purposes only, the actual value of the dependent variable from the PMML
file.

Chapter Four
Scoring
Run the PMML Scoring Analysis

Results - PMML Scoring

The results of running the Teradata Warehouse Miner PMML Scoring Analysis include the
following outlined below.
PMML Scoring - RESULTS - Reports

On the PMML Scoring dialog click RESULTS and then click on reports (note that the
PMML Score Report

Resulting Scored Table Name
This is the name given the table with the scored values of the model.
Number of Rows in Scored Table
This is the number of rows in the scored table.
PMML Scoring - RESULTS - Data

On the PMML Scoring dialog click RESULTS and then click on data (note that the RESULTS
A sample of rows from the scored table is displayed here – the size determined by the setting
specified by Maximum result rows to display in Tools-Preferences-Limits. By default, the
index of the table being scored as well as the dependent column prediction are in the scored table
– additional columns as specified in the OUTPUT panel may be displayed as well.
PMML Scoring - RESULTS - SQL

On the PMML Scoring dialog click RESULTS and then click on SQL (note that the RESULTS

Chapter Four
Scoring
The scoring SQL is always shown here.
Output Columns - PMML Scoring

The following table is built in the Result Database by the PMML Scoring analysis. Note that the
options selected affect the structure of the table. Those columns in bold below will comprise the
Primary Index. Also note that there may be repeated groups of columns, and that some columns
will be generated only if specific options are selected.

Key User One or more key columns, which default to the index, defined in the table to
Defined be scored (i.e. in Selected Table). The data type defaults to the same as the
scored table, but can be changed via Primary Index Columns.
<app_var> User One or more columns as selected under Retain Columns.
Defined
< dep_var > User The predicted value of the dependent variable. The name used defaults to the
(Default) Defined Dependent Variable specified when the model was built. The data type used
is the same as the Dependent Variable.
P_<dep_var><value> FLOAT If any additional probability output is requested on the OUTPUT panel, it
will be displayed using the name provided in the PMML model.
PMML Scoring Tutorials

PMML files generated by SAS Enterprise Miner have been copied into the Teradata Warehouse
Miner installation folder (C:\Program Files\NCR\Teradata Warehouse Miner <release> by
default), within the “Scripts/PMML UDF Install and Scripts” folder. Some of these files provide
the input to PMML Scoring in the following tutorials:
1. RegressionContinuousPMML.xml
A Linear Regression model which predicts a continuous outcome.
2. DecisionTreeDiscretePMML.xml
A Decision Tree model which predicts a discrete outcome.
3. RegressionDiscretePMML.xml
A Logistic Regression model which predicts a discrete outcome.
4. NeuralMLPDiscretePMML.xml
A MLP Neural Network model which predicts a discrete outcome.
5. ClusterPMML.xml
A Cluster model which predicts which of 20 clusters a customer should be assigned to.
Tutorial #1 - PMML Scoring
Parameterize a PMML Scoring Analysis to score a Linear Regression model which predicts a
continuous outcome as follows:
Select File Name RegressionContinuousPMML.xml

Chapter Four
Scoring
Selected Tables twm_customer_analysis

Index Columns cust_id
Result Table Name twm_pmml_score_reg_1
Run the analysis, and click on Results when it completes. For this example, the PMML Scoring
Analysis generated the following results. A single click on each page name populates Results
with the item. SQL is always produced and displayed, but is not shown here for brevity.
Report
Resulting Scored Table Name score_reg_1
Number of Rows in Scored File 747
Data
cust_id cc_rev
1362527 -3.09123331353583
1363078 17.1112717566361
1362588 8.28237095448635
1363486 27.7070270772696
1362752 53.7221660256401
1362893 -3.32443782325574
1363017 14.7337070009494
1363444 15.8410540579199
1362548 35.790895539682
1362487 11.3670140503415
… …
… …
… …
Parameterize a PMML Scoring Analysis to score a Decision Tree model which predicts a discrete
outcome as follows:
Select File Name DecisionTreeDiscretePMML.xml

Result Table Name twm_pmml_score_tree_1
P_ccacct1 Enabled
P_ccacct0 Enabled
Report
Resulting Scored Table Name score_tree_1
Data

Chapter Four
Scoring
cust_id ccacct P_ccacct1 P_ccacct0

1362527 0 0 1
1363078 1 0.948387096774194 0.0516129032258065
1362588 1 0.893333333333333 0.106666666666667
1363486 0 0 1
1362752 1 0.948387096774194 0.0516129032258065
1362893 1 0.948387096774194 0.0516129032258065
1363017 1 0.948387096774194 0.0516129032258065
1363444 0 0 1
1362548 1 0.948387096774194 0.0516129032258065
1362487 1 0.948387096774194 0.0516129032258065
… … … …
… … … …
… … … …
Parameterize a PMML Scoring Analysis to score a Logistic Regression model which predicts a
discrete outcome as follows:
Select File Name RegressionDiscretePMML.xml

Result Table Name twm_pmml_score_reg_2
P_ccacct1 Enabled
P_ccacct0 Enabled
Report
Resulting Scored Table Name score_reg_2
Data
cust_id ccacct P_ccacct1 P_ccacct0
1362527 0 0.125740481096571 0.874259518903429
1363078 1 0.861086203667224 0.138913796332776
1362588 1 0.723429148501114 0.276570851498886
1363486 0 0.125034199014627 0.874965800985373
1362752 0 0.419312298702164 0.580687701297836
1362893 1 0.970060355675886 0.0299396443241139
1363017 1 0.999980678896465 1.93211035354291E-05
1363444 0 0.173538764837706 0.826461235162294
1362548 0 0.265964538752992 0.734035461247008
1362487 1 0.872345777062174 0.127654222937826
… … … …
… … … …
… … … …

Chapter Four
Scoring
Parameterize a PMML Scoring Analysis to score a MLP Neural Network model which predicts a
discrete outcome as follows:
Select File Name NeuralMLPDiscretePMML.xml

Result Table Name twm_pmml_score_nn_2
Report
Resulting Scored Table Name score_nn_2
Data
cust_id ccacct
1362527 0
1363078 1
1362588 1
1363486 0
1362752 1
1362893 1
1363017 1
1363444 0
1362548 1
1362487 1
… …
… …
… …
Parameterize a PMML Scoring Analysis to score a Cluster model which predicts which of 20
clusters this customer should be assigned to as follows:
Select File Name ClusterPMML.xml

Result Table Name twm_pmml_score_cluster_1
with the item.
Report
Resulting Scored Table Name score_cluster_1

Chapter Four
Scoring
Data
cust_id Cluster
1362527 10
1363078 10
1362588 7
1363486 1
1362752 19
1362893 7
1363017 7
1363444 7
1362548 1
1362487 7
… …
… …
… …

Chapter Five
Publishing
5. Publishing
Publishing Overview
The Publish Analysis is provided as a means to save an analytic model by storing the SQL
generated by an associated Score Analysis and/or ADS analysis into Publish Tables (metadata
tables used by the Model Manager application). When a Score Analysis is selected as input into a
Publish Analysis, the SQL that was generated by that Score Analysis is stored in such a way that
Model Manager can replace key components of that SQL and re-execute it, making it possible to
effectively re-use a published model (the SQL template) on different sets of data.
Analysis References
The Publish Analysis makes use of the Analysis References feature in the following way.
Because one of the parameters of input is another analysis, it is in effect referencing that analysis.
When that analysis is selected as input, the Publish Analysis then manages the execution of any
analyses that are references of the input analysis. For example, it is a distinct possibility that the
input into the final Score Analysis will be a series of Reorganization or ADS Analyses linked
together via Analysis References. A possible scenario would be a Variable Creation Analysis that
is referenced by (input into) a Join, and then a Sample. The resulting analytic data set (ADS)
might then be used as the input to a Score Analysis. In this scenario, because each analysis is
dependent upon the previous one, the SQL from each analysis will be published (stored in the
Publish Tables) in the proper order of execution so that it will work when re-executed via Model
Manager. This ensures that all of the SQL necessary to generate the ADS and resulting analytic
model will be captured.
Minimum SQL Storage

An additional feature of the Publish Analysis is that only the variables necessary for the analytic
model and the Score Analysis are used. Because the focus of publishing is to store a model (Score
SQL) for future use, it is wasteful to store SQL that generates variables that are not used. For
instance, if a Variable Creation was executed that created 100 variables, but the model was
created and then scored using only 5 of those variables, Publish will only store the SQL that is
needed to generate those 5 variables in the Variable Creation analysis. In general, for a given
ADS analysis, the only variables that are generated are the ones that are necessary for the
subsequent analysis reference.
No SQL Execution of Published Analyses

Publish is designed as the last step in the creation and score of an ADS and analytic model.
Therefore it is assumed that the analyses have been executed and deemed suitable for publishing.
For this reason, Publish doesn’t execute the SQL that is stored as it would be redundant.
Analyses Available To Publish

Build ADS, Variable Creation, Variable Transformation, Join, Sample, Denorm, Partition, Tree
Scoring, Cluster Scoring, Logistic Regression Scoring, Linear Regression Scoring, Factor
Analysis Scoring, Neural Networks and PMML Scoring are all available to be published.
Limitations with Respect to Analysis References

Any number of ADS Analyses can be in the chain of referenced analyses to be published, but
there can only be one Score Analysis, and it must be the last one.

Chapter Five
Publishing
For an analysis to be available for publishing, it must store its tabular output in the database as a
table or view.
The anchor table of the last Variable Creation analysis within the chain of referenced analyses to
be published will be stored as the published anchor table. If that anchor table is the output table of
another Variable Creation analysis within the chain of referenced analyses to be published, the
publish will fail with the following error message:
The current anchor table 'AnchorDatabase.AnchorTable' of the last Variable Creation

analysis is the output table of a referenced Variable Creation analysis. Please use a
different anchor table.
The anchor table of the last Variable Creation analysis must be changed to the output table of a
different analysis (not a Variable Creation), or to a permanent table or view in Teradata for the
publish to be successful.
Initiate a Publish Analysis

Use the following procedure to initiate a new Publish Analysis in Teradata Warehouse Miner:
Please note: to execute a Publish Analysis successfully the Publish Tables must be installed in the
Publish Database.
2. In the resulting Add New Analysis dialog box, with Publish highlighted on the left, double-
click on the Publish icon:

Chapter Five
Publishing
3. This will bring up the Publish dialog in which you will enter INPUT options to parameterize
the analysis as described in the next section.

Chapter Five
Publishing
Publish - INPUT - Data Selection

On the Publish dialog click on INPUT and then click on data selection:
Available Analyses to Publish

Select a single analysis from the list of all of the analyses in the current project which are
available for the Publish Analysis.
Name of Model To Publish
Enter the name of the model that is being published.
Published By
Enter the name of the person who is publishing the model.
Version
Enter the version of the model being published.
Expiration Date
Enter the date that the model will expire.
Description
Enter a description of the model being published.
Preview the Publish Analysis

After setting parameters on the INPUT screen as described above, you can preview the SQL and
parameter values that will be stored in the Publish Tables.
By clicking on the button in the bottom center of the input screen, a pop up
window will appear that contains the following information that will be stored in the Publish
Tables:
Publish Date
The date that Publishing occurs, automatically set to the current date.
Expiration Date
The date that the model expires, set on the input screen by the user.
ADS Output Database
The database that was used to store the results of the ADS Analysis (if applicable)
ADS Output Table
The table that contains the results of the ADS Analysis (if applicable)
Score Output Database
The database that was used to store the results of the Score Analysis (if applicable)
Score Output Table

Chapter Five
Publishing
The table that contains the results of the Score Analysis if (applicable)
Model Variables
A list of the variables that were used in the model along with their descriptions.
Score Columns
A list of the columns that are generated in the output of the score (if applicable), along with their
descriptions.
ADS SQL to be Published

The SQL that was generated by the ADS Analysis (if applicable).
Score SQL to be Published

The SQL that was generated by the Score Analysis (if applicable).
Run the Publish Analysis

After setting parameters on the INPUT screen as described above or previewing the values to be
published, you are ready to run the analysis. To run the analysis you can either:

• Click the button from within the Preview pop up window.
By running the analysis, the information needed by Model Manager to re-use the model will be
stored in the Publish Tables within the Publish Database.
Results - Publish
On the Publish dialog click on RESULTS (note that the RESULTS tab will be grayed-
out/disabled until after the analysis is completed):
Select either the report or SQL tab to view the report or the SQL generated by the execution of
the Publish analysis.
Tutorial – Publish
Publish – Example

Chapter Five
Publishing
The following example contains a Variable Creation analysis that is referenced by a PMML Score
and is then published.
Parameterize a Variable Creation Analysis named Variable Creation1 as follows:
1. Select TWM_CUSTOMER_ANALYSIS as the Available Table.
2. Select all the columns in the input table into the Variables panel.
Run the analysis.
Parameterize a PMML Scoring Analysis named PMML Scoring1 to score a Decision Tree model
which predicts a discrete outcome as follows:
Input:
Select Input Source Analysis
Available Analyses Variable Creation1
Available Tables twm_publish_vc1
Select File Name DecisionTreeDiscretePMML.xml
(located in Scripts\PMML UDF Install under the
directory where the application is installed)
Output – Storage:
Result Table Name twm_publish_score1
P_ccacct1 Enabled
P_ccacct0 Enabled
Run the analysis.
Parameterize a Publish Analysis named Publish1 as follows:
Available Analyses to Publish PMML Scoring1

Name of Model to Publish PMML Scoring Demo
Published By Tutorial User
Version 1
Expiration Date 1/1/2010
Description This is a demo of the Publish Analysis.

Chapter Five
Publishing
Click on the button in the bottom center of the input screen. This will open a pop
up window. By clicking on the button within the pop up window you will see the
following screens:

Chapter Five
Publishing

Chapter Five
Publishing
The final Publish Results screen shows the Score SQL to be Published (not shown here).
Click on the button to execute the Publish Analysis and store the information in the
Publish Tables.
Click on the Results to view what was published. Select the report tab to view the report portion
as shown below, and the SQL tab to view the SQL (not shown here).

Chapter Five
Publishing

References
References
1) Agrawal, R. Mannila, H. Srikant, R. Toivonen, H. and Verkamo, I., Fast Discovery of

Association Rules. In Advances in Knowledge Discovery and Data Mining, 1996, eds. U.M.
Fayyad, G. Paitetsky-Shapiro, P. Smyth and R. Uthurusamy. Menlo Park, AAAI Press/The
MIT Press.
2) Agresti, A. (1990) Categorical Data Analysis. Wiley, New York.
3) Arabie, P., Hubert, L., and DeSoete, G., Clustering and Classification, World Scientific, 1996
4) Belsley, D.A., Kuh, E., and Welsch, R.E. (1980) Regression Diagnostics: Identifying
Influential Data and Sources of Collinearity. Wiley, New York.
5) Bradley, P., Fayyad, U. and Reina, C., Scaling EM Clustering to Large Databases, Microsoft
Research Technical Report MSR-TR-98-35, 1998
6) Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. Classification and Regression
Trees. Wadsworth, Belmont, 1984.
7) Cox, D.R. and Hinkley, D.V. (1974) Theoretical Statistics. Chapman & Hall/CRC, New
York.
8) Finn, J.D. (1974) A General Model for Multivariate Analysis. Holt, Rinehart and Winston,
New York.
9) Harman, H.H. (1976) Modern Factor Analysis. University of Chicago Press, Chicago.
10) Hosmer, D.W. and Lemeshow, S. (1989) Applied Logistic Regression. Wiley, New York.
11) Johnson, R.A. and Wichern, D.W. (1998) Applied Multivariate Statistical Analysis, 4th
Edition. Prentice Hall, New Jersey.
12) Kachigan, S.K. (1991) Multivariate Statistical Analysis. Radius Press, New York.
13) Kass, G. V. (1979) An Exploratory Technique for Investigating Large Quantities of

Categorical Data, Applied Statistics (1980) 29, No. 2 pp. 119-127
14) Kaufman, L. and Rousseeuw, P., Finding Groups in Data, J Wiley & Sons, 1990
15) Kennedy, W.J. and Gentle, J.E. (1980) Statistical Computing. Marcel Dekker, New York.
16) Kleinbaum, D.G. and Kupper, L.L. (1978) Applied Regression Analysis and Other
Multivariable Methods. Duxbury Press, North Scituate, Massachusetts.
17) Maddala, G.S. (1983) Limited-Dependent and Qualitative Variables In Econometrics.

Cambridge University Press, Cambridge, United Kingdom.
18) Maindonald, J.H. (1984) Statistical Computation. Wiley, New York.

References
19) McCullagh, P.M. and Nelder, J.A. (1989) Generalized Linear Models, 2nd Edition. Chapman
& Hall/CRC, New York.
20) McLachlan, G.J. and Krishnan, T., The EM Algorithm and Extensions, J Wiley & Sons,
1997
21) Menard, S (1995) Applied Logistic Regression Analysis, Sage, Thousand Oaks
22) Mulaik, S.A. (1972) The Foundations of Factor Analysis. McGraw-Hill, New York.
23) Neter, J., Kutner, M.H., Nachtsheim, C.J., and Wasserman, W. (1996) Applied Linear
Statistical Models, 4th Edition. WCB/McGraw-Hill, New York.
24) Nocedal, J. and Wright, S.J. (1999) Numerical Optimization. Springer-Verlag, New York.
25) Orchestrate/OSH Component User’s Guide Vol II, Analytics Library, Chapter 2:
Introduction to Data Mining. Torrent Systems, Inc., 1997.
26) Ordonez, C. and Cereghini, P. (2000) SQLEM: Fast Clustering in SQL using the EM
Algorithm. SIGMOD Conference 2000: 559-570
27) Ordonez, C. (2004): Programming the K-means clustering algorithm in SQL. KDD 2004:
823-828
28) Ordonez, C. (2004): Horizontal aggregations for building tabular data sets. DMKD 2004: 35-
42
29) Peduzzi, P.N., Hardy, R.J., and Holford, T.R. (1980) A Stepwise Variable Selection
Procedure for Nonlinear Regression Models. Biometrics 36, 511-516.
30) Pregibon, D. (1981) Logistic Regression Diagnostics. Annals of Statistics, Vol. 9, No. 4, 705-
724.
31) Quinlan, J.R. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, 1993.
32) Roweis, S. and Ghahramani, Z., A Unifying Review of Linear Gaussian Models, Journal of
Neural Computation, 1999
33) SPSS 7.5 Statistical Algorithms Manual, SPSS Inc., Chicago.
34) SYSTAT 9: Statistics I. (1999) SPSS Inc., Chicago.
35) Tatsuoka, M.M. (1971) Multivariate Analysis: Techniques For Educational and
Psychological Research. Wiley, New York.
36) Tatsuoka, M.M. (1974) Selected Topics in Advanced Statistics, Classification Procedures,
Institute for Personality and Ability Testing, 1974
37) Teradata Warehouse Miner User’s Guide Release 03.00.02, B035-2093-022A, January 2002

References
38) Wilkinson, L., Blank, G., and Gruber, C. (1996) Desktop Data Analysis With SYSTAT.
Prentice Hall, New Jersey.
39) Pagano, Gauvreau Principles of Biostatistics 2nd Edition
40) Conover, W.J. Practical Nonparametric Statistics 3rd Edition
41) D'Agostino, R. B. and Stephens, M. A., eds. Goodness-of-fit Techniques, 1986,. New York:
Dekker.
42) D’Agostino, R, Belanger, A., and D’Agostino,R. Jr., A Suggestion for Using Powerful and
Informative Tests of Normality, American Statistician, 1990, Vol. 44, No. 4
43) Royston, JP., An Extension of Shapiro and Wilk’s W Test for Normality to Large Samples,
Applied Statistics, 1982, 31, No. 2, pp.115-124
44) Royston, JP, Algorithm AS 177: Expected normal order statistics (exact and approximate),
1982, Applied Statistics, 31, 161-165.
45) Royston, JP., Algorithm AS 181: The W Test for Normality. 1982, Applied Statistics, 31,
176–180.
46) Royston , JP., A Remark on Algorithm AS 181: The W Test for Normality., 1995, Applied
Statistics, 44, 547–551.
47) H. L. Harter and D. B. Owen, eds, Selected Tables in Mathematical Statistics, Vol. 1..
Providence, Rhode Island: American Mathematical Society.
48) Shapiro, SS and Francia, RS (1972). An approximate analysis of variance test for normality,
Journal of the American Statistical Association, 67, 215-216
49) D'Agostino, RB. (1971) An omnibus test of normality for moderate and large size samples,
Biometrica, 58, 341-348
50) NIST/SEMATECH e-Handbook of Statistical Methods,

http://www.itl.nist.gov/div898/handbook/index.htm, 2005.
51) PST, Portland State University, http://www.upa.pdx.edu, 2005.
52) Wendorf, Craig A., MANUALS FOR UNIVARIATE AND MULTIVARIATE STATISTICS
© 1997, Revised 2004-03-12, UWSP, http://www.uwsp.edu/psych/cw/statmanual, 2005
53) UZ, University of Zurich, http://www.id.unizh.ch, 2005
54) NUMS, Northwestern University Medical School,

http://www.basic.northwestern.edu/statguidefiles/sghome.html , 2005 (inactive)
55) Takahashi, T. (2005) Getting Started: International Character Sets and the Teradata
Database, NCR Corporation, 541-0004068-C02

References

Teradata Warehouse Miner: User Guide - Volume 2 ADS Generation

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Teradata Warehouse Miner: User Guide - Volume 2 ADS Generation

Caricato da

Copyright:

Formati disponibili

Teradata Warehouse Miner

User Guide - Volume 2

Teradata Development Division,

NCR, Teradata and BYNET are registered trademarks of NCR Corporation.

Copyright © 1999 - 2007

About This Manual

Who Should Read This Manual

How This Manual is Organized

This manual is organized and presents information as follows:

Chapter 2 “Analytic Data Sets” describes the Variable Creation, Variable

Conventions Used in this Manual

© 1999-2007 NCR Corporation, All Rights Reserved ii

Monospace Code sample

Related Documents and Other Sources of Information

How to Get Support

For Service Levels: Enhanced/Business Critical

For Service Levels: Standard/None/Time&Materials/Unknown

iii © 1999-2007 NCR Corporation, All Rights Reserved

ABOUT THIS MANUAL ........................................................................................................................... II

1. DATA REORGANIZATION .................................................................................................................. 8

© 1999-2007 NCR Corporation, All Rights Reserved iv

Initiate a Partition Analysis................................................................................................................. 31

v © 1999-2007 NCR Corporation, All Rights Reserved

Variable Transformation - INPUT - Expert Options......................................................................... 198

© 1999-2007 NCR Corporation, All Rights Reserved vi

Tutorial #5 - PMML Scoring ............................................................................................................. 239

vii © 1999-2007 NCR Corporation, All Rights Reserved

The Teradata Warehouse Miner data reorganization functions include:

© 1999-2007 NCR Corporation, All Rights Reserved 8

9 © 1999-2007 NCR Corporation, All Rights Reserved

© 1999-2007 NCR Corporation, All Rights Reserved 10

Initiate a Denorm Analysis

1. Click on the Add New Analysis icon in the toolbar:

11 © 1999-2007 NCR Corporation, All Rights Reserved

Denorm - INPUT - Data Selection

On this screen select:

Denorm - INPUT - Analysis Parameters

© 1999-2007 NCR Corporation, All Rights Reserved 12

On this screen select:

13 © 1999-2007 NCR Corporation, All Rights Reserved

Treat Undefined Index Values As:

Compress undefined index values in output table

Denorm - INPUT - Expert Options

© 1999-2007 NCR Corporation, All Rights Reserved 14

This screen provides the following options:

Store the tabular output of this analysis in the database

Generate the SQL for this analysis, but do not execute it

Run the Denorm Analysis

15 © 1999-2007 NCR Corporation, All Rights Reserved

Denorm - RESULTS - Data

Denorm - RESULTS - SQL

Available Tables twm_accounts

© 1999-2007 NCR Corporation, All Rights Reserved 16

Note – only the first 10 rows shown.

cust_id acct_nbr CC_ending_balance CK_ending_balance SV_ending_balance

17 © 1999-2007 NCR Corporation, All Rights Reserved

Initiate a Join Analysis

1. Click on the Add New Analysis icon in the toolbar:

© 1999-2007 NCR Corporation, All Rights Reserved 18

Join - INPUT - Data Selection

On this screen select:

Join - INPUT – Join Columns

19 © 1999-2007 NCR Corporation, All Rights Reserved