Sei sulla pagina 1di 255

Teradata Warehouse Miner

User Guide - Volume 2

ADS Generation
Release 05.01.00
B035-2301-077A

Teradata Development Division,


Teradata Application Engineering
The product described in this book is a licensed product of Teradata, a division of NCR Corporation

NCR, Teradata and BYNET are registered trademarks of NCR Corporation.


TeraMiner is a trademark of NCR Corporation.
Intel, Pentium, and XEON are registered trademarks of Intel Corporation.
Linux is a registered trademark of Linus Torvalds.
Microsoft, Windows, Windows Server, Windows NT, Windows Vista, Visual Studio and Excel are either registered
trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.
SAS and SAS/C are registered trademark of SAS Institute Inc.
SPSS is a registered trademark of SPSS Inc.
STATISTICA and StatSoft are trademarks or registered trademarks of StatSoft, Inc.
Sun Microsystems, Sun Java, Solaris, SPARC, and Sun are trademarks or registered trademarks of Sun Microsystems,
Inc. in the U.S. or other countries.
Unicode is a registered trademark of Unicode, Inc.
UNIX is a registered trademark of The Open Group in the US and other countries.
Other product and company names mentioned herein may be the trademarks of their respective owners.

THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN “AS-IS” BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR
NON- INFRINGEMENT. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF IMPLIED WARRANTIES, SO THE ABOVE EXCLUSION
MAY NOT APPLY TO YOU. IN NO EVENT WILL NCR CORPORATION (NCR) BE LIABLE FOR ANY INDIRECT, DIRECT, SPECIAL,
INCIDENTAL OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS OR LOST SAVINGS, EVEN IF EXPRESSLY ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.

The information contained in this document may contain references or cross references to features, functions, products, or
services that are not announced or available in your country. Such references do not imply that NCR intends to announce
such features, functions, products, or services in your country. Please consult your local NCR representative for those
features, functions, products, or services available in your country.

Information contained in this document may contain technical inaccuracies or typographical errors. Information may be
changed or updated without notice. NCR may also make improvements or changes in the products or services described
in this information at any time without notice.

To maintain the quality of our products and services, we would like your comments on the accuracy, clarity, organization,
and value of this document. Please e-mail: teradata-books@lists.ncr.com

Any comments or materials (collectively referred to as “Feedback”) sent to NCR will be deemed non-confidential. NCR will
have no obligation of any kind with respect to Feedback and will be free to use, reproduce, disclose, exhibit, display,
transform, create derivative works of and distribute the Feedback and derivative works thereof without limitation on a
royalty-free basis. Further, NCR will be free to use any ideas, concepts, know-how or techniques contained in such
Feedback for any purpose whatsoever, including developing, manufacturing, or marketing products or services
incorporating Feedback.

Copyright © 1999 - 2007


By NCR Corporation
Dayton, Ohio U.S.A.
All Rights Reserved
Preface

About This Manual

This publication describes how to use the features and functions of NCR’s Teradata Warehouse
Miner, Release 5. All information required to use the analytic functions in the Teradata
Warehouse Miner product is provided in this manual. Teradata Warehouse Miner is a set of
Microsoft® .NET™ interfaces and a multi-tier User Interface that together help understand the
quality of data residing in a Teradata® database, create analytic data sets, and build and score
analytic models directly in the Teradata database.

Who Should Read This Manual


This manual is written for users of the Teradata Warehouse Miner product. It serves to guide the
end-user through each analytic function available from the Teradata Warehouse Miner user
interface. You should be familiar with Teradata SQL, the operation and administration of the
Teradata RDBMS system and statistical techniques. Users should also be familiar with the
Microsoft® Windows® 2000®, or Microsoft® Windows® XP® operating environment and standard
Microsoft® Windows® operating techniques.

How This Manual is Organized


0

This manual is organized and presents information as follows:

Chapter 1 “Data Reorganization” describees the Denorm, Join, Merge, Partition, and
Sample analyses.

Chapter 2 “Analytic Data Sets” describes the Variable Creation, Variable


Transformation and Build Data Set analyses available with Teradata
Warehouse Miner.

Chapter 3 “Matrix Functions” describes how to use the Teradata Warehouse Miner
Matrix Functions to build and export a Correlation, Covariance, or Sums of
Squares and Cross Products matrix.

Chapter 4 “Scoring” describes how to use the Teradata Warehouse Miner Predictive
Model Markup Language (PMML) Scoring analysis.

Chapter 5 “Publishing” describes how to use the Teradata Warehouse Miner Publish
analysis to make the SQL representing models and analytic data sets
available to the Model Manager application.

Conventions Used in this Manual


The following typographical conventions are used in this guide:

Convention Description
Italic text Titles (esp. screen names/titles)
New terms for emphasis

© 1999-2007 NCR Corporation, All Rights Reserved ii


Preface

Monospace Code sample


Output
ALL CAPS Acronyms
Bold Screen item and/or esp. something you will click on
or highlight in following a procedure.

Related Documents and Other Sources of Information


Teradata documentation related to the use of Teradata Warehouse Miner, including
documentation for the Teradata ODBC Driver for Windows, is available from www.info.ncr.com.

Support Information
0

How to Get Support


For information regarding support program availability, please contact the local Account Team.
Telephone assistance may be obtained through NCR’s Teradata Solutions Global Support Center
(TSGSC) at either of the following numbers:

Americas RCCA

For Service Levels: Enhanced/Business Critical


1-800-531-2222 (PIN number required)

MSC – Atlanta

For Service Levels: Standard/None/Time&Materials/Unknown


1-800-262-7782

iii © 1999-2007 NCR Corporation, All Rights Reserved


Table of Contents

Table of Contents

ABOUT THIS MANUAL ........................................................................................................................... II


WHO SHOULD READ THIS MANUAL ........................................................................................................... II
HOW THIS MANUAL IS ORGANIZED ............................................................................................................ II
CONVENTIONS USED IN THIS MANUAL ....................................................................................................... II
RELATED DOCUMENTS AND OTHER SOURCES OF INFORMATION ............................................................... III
SUPPORT INFORMATION ............................................................................................................................. III
How to Get Support .............................................................................................................................. iii
TABLE OF CONTENTS ........................................................................................................................... IV

1. DATA REORGANIZATION .................................................................................................................. 8


DENORM ................................................................................................................................................... 10
Initiate a Denorm Analysis .................................................................................................................. 11
Denorm - INPUT - Data Selection ...................................................................................................... 12
Denorm - INPUT - Analysis Parameters............................................................................................. 12
Denorm - INPUT - Expert Options...................................................................................................... 14
Denorm - OUTPUT ............................................................................................................................. 14
Run the Denorm Analysis .................................................................................................................... 15
Results - Denorm ................................................................................................................................. 15
Denorm - RESULTS - Data ................................................................................................................. 16
Denorm - RESULTS - SQL .................................................................................................................. 16
Tutorial - Denorm................................................................................................................................ 16
JOIN .......................................................................................................................................................... 18
Initiate a Join Analysis ........................................................................................................................ 18
Join - INPUT - Data Selection ............................................................................................................ 19
Join - INPUT – Join Columns ............................................................................................................. 19
Join - INPUT - Analysis Parameters ................................................................................................... 20
Join - INPUT - Expert Options............................................................................................................ 20
Join - OUTPUT - Storage.................................................................................................................... 21
Join - OUTPUT - Primary Index ......................................................................................................... 22
Run the Join Analysis .......................................................................................................................... 22
Results - Join ....................................................................................................................................... 22
Join - RESULTS - Data ....................................................................................................................... 22
Join - RESULTS - SQL ........................................................................................................................ 23
Output Columns – Join Analysis ......................................................................................................... 23
Tutorial – Join Analysis....................................................................................................................... 23
MERGE ...................................................................................................................................................... 25
Initiate a Merge Analysis..................................................................................................................... 25
Merge - INPUT - Data Selection ......................................................................................................... 26
Merge - INPUT - Analysis Parameters ............................................................................................... 27
Merge - INPUT - Expert Options ........................................................................................................ 27
Merge - OUTPUT - Storage ................................................................................................................ 27
Merge - OUTPUT - Primary Index ..................................................................................................... 28
Run the Merge Analysis ....................................................................................................................... 29
Results - Merge.................................................................................................................................... 29
Merge - RESULTS - Data.................................................................................................................... 29
Merge - RESULTS - SQL..................................................................................................................... 29
Output Columns – Merge Analysis ...................................................................................................... 30
Tutorial – Merge Analysis ................................................................................................................... 30
PARTITION ................................................................................................................................................ 31

© 1999-2007 NCR Corporation, All Rights Reserved iv


Table of Contents

Initiate a Partition Analysis................................................................................................................. 31


Partition - INPUT - Data Selection ..................................................................................................... 32
Partition - INPUT - Analysis Parameters ........................................................................................... 33
Partition - INPUT - Expert Options .................................................................................................... 33
Partition - OUTPUT - Storage ............................................................................................................ 34
Partition - OUTPUT - Primary Index ................................................................................................. 34
Run the Partition Analysis ................................................................................................................... 35
Results - Partition Analysis ................................................................................................................. 35
Partition - RESULTS - Data................................................................................................................ 35
Partition - RESULTS - SQL................................................................................................................. 35
Output Columns – Partition Analysis .................................................................................................. 36
Tutorial - Partition Analysis................................................................................................................ 36
SAMPLE..................................................................................................................................................... 38
Initiate a Sample Analysis ................................................................................................................... 39
Sample - INPUT - Data Selection........................................................................................................ 40
Sample - INPUT - Analysis Parameters .............................................................................................. 40
Sample - INPUT - Expert Options ....................................................................................................... 41
Sample - OUTPUT - Storage............................................................................................................... 42
Sample - OUTPUT - Primary Index .................................................................................................... 43
Run the Sample Analysis...................................................................................................................... 43
Results – Sample Analysis ................................................................................................................... 43
Sample - RESULTS - Data................................................................................................................... 43
Sample - RESULTS - SQL ................................................................................................................... 45
Output Columns – Sample Analysis..................................................................................................... 45
Tutorial - Sample Analysis .................................................................................................................. 45
2. ANALYTIC DATA SETS...................................................................................................................... 50
VARIABLE CREATION................................................................................................................................ 50
Initiate a Variable Creation Function ................................................................................................. 55
Variable Creation - INPUT - Variables .............................................................................................. 56
Variable Creation - INPUT - Variables - SQL Elements .................................................................... 67
Variable Creation - INPUT - Variables - Dimensions ...................................................................... 151
Variable Creation - INPUT - dimensions.......................................................................................... 161
Variable Creation - INPUT - anchor table........................................................................................ 163
Variable Creation - INPUT – analysis parameters ........................................................................... 166
Variable Creation - INPUT - Expert Options.................................................................................... 168
Variable Creation - INPUT - Expert Options- SQL Elements........................................................... 168
Variable Creation - INPUT - Expert Options - Expert Clauses ........................................................ 169
Variable Creation - OUTPUT - storage............................................................................................ 169
Variable Creation - OUTPUT - Primary Index................................................................................. 171
Run the Variable Creation Analysis .................................................................................................. 171
Results - Variable Creation ............................................................................................................... 172
Variable Creation - RESULTS - Data ............................................................................................... 172
Variable Creation - RESULTS - SQL ................................................................................................ 172
Tutorial – Variable Creation ............................................................................................................. 172
VARIABLE TRANSFORMATION ................................................................................................................ 179
Introduction ....................................................................................................................................... 179
Initiate a Variable Transformation Function .................................................................................... 183
Variable Transformation - INPUT - Transformations ...................................................................... 184
Setting Properties - Variable Transformation ................................................................................... 190
Setting Default Properties - Variable Transformation ...................................................................... 191
Properties Dialog – Common Features............................................................................................. 192
Properties Dialog – Function-Specific Features............................................................................... 194
Variable Transformation - INPUT - Primary Key............................................................................. 197
Variable Transformation - INPUT – Analysis Parameters ............................................................... 198

v © 1999-2007 NCR Corporation, All Rights Reserved


Table of Contents

Variable Transformation - INPUT - Expert Options......................................................................... 198


Variable Transformation - OUTPUT - Storage................................................................................. 199
Variable Transformation - OUTPUT - Primary Index...................................................................... 199
Run the Variable Transformation Analysis ....................................................................................... 200
Results - Variable Transformation .................................................................................................... 201
Variable Transformation - RESULTS - Data .................................................................................... 201
Variable Transformation - RESULTS - SQL ..................................................................................... 201
Tutorial – Variable Transformation Analysis.................................................................................... 201
BUILD ADS (ANALYTIC DATA SET) ....................................................................................................... 206
Initiate a Build ADS........................................................................................................................... 206
Build ADS - INPUT - Data Selection ................................................................................................ 207
Build ADS - INPUT – Anchor Table ................................................................................................. 208
Build ADS - INPUT - Expert Options................................................................................................ 208
Build ADS - OUTPUT - Storage........................................................................................................ 208
Build ADS - OUTPUT - Primary Index............................................................................................. 209
Run the Build ADS Analysis .............................................................................................................. 209
Results - Build ADS ........................................................................................................................... 210
Tutorial – Build ADS Analysis........................................................................................................... 210
REFRESH ................................................................................................................................................. 212
Initiate a Refresh Analysis ................................................................................................................. 212
Refresh - INPUT - Data Selection ..................................................................................................... 213
Run the Refresh Analysis ................................................................................................................... 214
Results - Refresh ................................................................................................................................ 214
Tutorial – Refresh.............................................................................................................................. 214
3. MATRIX FUNCTIONS....................................................................................................................... 218
OVERVIEW – MATRIX FUNCTIONS .......................................................................................................... 218
MATRIX ANALYSIS ................................................................................................................................. 219
Initiate a Matrix Function ................................................................................................................. 220
Matrix - INPUT - Data Selection....................................................................................................... 222
Matrix - INPUT - Analysis Parameters ............................................................................................. 222
Run the Matrix Analysis .................................................................................................................... 223
Results - Matrix ................................................................................................................................. 223
Tutorial - Matrix................................................................................................................................ 223
EXPORT MATRIX ..................................................................................................................................... 225
Initiate an Export Matrix Function.................................................................................................... 226
Export Matrix - INPUT - Data Selection........................................................................................... 227
Export Matrix - INPUT - Analysis Parameters ................................................................................. 227
Run the Export Matrix ....................................................................................................................... 228
Results – Export Matrix ..................................................................................................................... 228
Output Columns - Export Matrix....................................................................................................... 228
Tutorial - Export Matrix .................................................................................................................... 228
4. SCORING ............................................................................................................................................. 231
PMML SCORING..................................................................................................................................... 231
Initiate PMML Scoring ...................................................................................................................... 232
PMML Scoring - INPUT - Data Selection......................................................................................... 233
PMML Scoring - OUTPUT................................................................................................................ 234
Run the PMML Scoring Analysis....................................................................................................... 235
Results - PMML Scoring.................................................................................................................... 235
PMML Scoring Tutorials................................................................................................................... 236
Tutorial #1 - PMML Scoring ............................................................................................................. 236
Tutorial #2 - PMML Scoring ............................................................................................................. 237
Tutorial #3 - PMML Scoring ............................................................................................................. 238
Tutorial #4 - PMML Scoring ............................................................................................................. 239

© 1999-2007 NCR Corporation, All Rights Reserved vi


Table of Contents

Tutorial #5 - PMML Scoring ............................................................................................................. 239


5. PUBLISHING....................................................................................................................................... 241
PUBLISHING OVERVIEW .......................................................................................................................... 241
Initiate a Publish Analysis ................................................................................................................. 242
Publish - INPUT - Data Selection ..................................................................................................... 244
Preview the Publish Analysis............................................................................................................. 244
Run the Publish Analysis ................................................................................................................... 245
Results - Publish ................................................................................................................................ 245
Tutorial – Publish.............................................................................................................................. 245
REFERENCES ......................................................................................................................................... 251

vii © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

1. Data Reorganization

The data Reorganization functions provide the ability to join, merge and denormalize
preprocessed results into a wide analytic data set, as well as select a subset of the rows in a
table. The result of these functions is a new restructured table that has been built from one or
more existing tables, and/or a subset of the rows in a table.

The Sampling and Partitioning functions build a new table containing randomly selected rows
in an existing table or view. Sampling is useful when it becomes unwieldy to perform an
analytic process because of the high volume of data available. This is especially true for
compute intensive analytic modeling tasks. Partitioning is similar to sampling but allows
mutually distinct but all inclusive subsets of data to be requested by separate processes.

In the case of the Data Reorganization functions, NULL values are passed back as NULL. A
special case is the Denorm Analysis which allows you to convert NULL values to zero.

Note that Identity columns, i.e. columns defined with the attribute "GENERATE … AS
IDENTITY", cannot be analyzed by Data Reorganization functions.

The Teradata Warehouse Miner data reorganization functions include:

Denorm
Create new table denormalizing by removing key column(s).
Join
Join tables or views by columns into a combined result table.
Merge
Merge tables or views by rows into a combined result table.
Partition
Select partition(s) from a table using a hash key.
Sample
Select sample(s) from a table by size or fraction.

In order to add a Data Reorganization Analysis to a Teradata Warehouse Miner Data Mining
Project, create a new analysis as described in Chapter 3. Select Reorganization from the
menu:

© 1999-2007 NCR Corporation, All Rights Reserved 8


Chapter One
Data Reorganization

Double-click or highlight the desired analysis and click the OK button. Optionally select an
existing analysis for incorporation into the current data mining project. Each of these specific
analyses are described in detail in the subsequent sections.

9 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

Denorm
Denorm Analysis is provided to denormalize or “flatten out” (sometimes referred to as
“pivoting”) a table so it can be used as an analytic data set. This is done by removing part of a
multi-part index and replicating remaining columns based upon the unique values of the
removed index column.

Many analytical techniques from the statistical and artificial intelligence communities require
a denormalized table, or data set, as input. The Denorm function is provided to help analytical
modelers and database administrators save considerable time and effort when a denormalized
table needs to be constructed from data which exists in relational form in the data warehouse.
The aggregations typically used in the construction of a denormalized table, (AVG, SUM,
MIN, MAX, and COUNT), are provided in the Denorm function as user selectable options.

Analytical modelers typically refer to the rows of a denormalized table as “observations”, and
typically refer to the columns as “variables”.

Given a table name, the names of index columns to remove, the names of index columns to
retain, the names of remaining columns to denormalize, the values of the removed index
columns to denormalize, and finally the names of any already denormalized columns to
retain, the Denorm Analysis creates a new denormalized table. All columns other than the
retain key and denormalize columns are dropped in the new table, unless they are specified as
columns to retain. However, in this case they should already be denormalized, that is have the
same value for each of the removed key columns.

New columns names are concatenated from the prefix associated by the user with the Values
to Denormalize (which occur in the Index Remove Columns), and the alias or name of the
Denormalize Column.

An option is provided which allows you to specify an aggregation method in the case where
new columns have multiple values to choose from. A user specified aggregation method,
specifically MIN, MAX, AVG, SUM or COUNT, should only be used when there are non-
unique index values or when a part of the index is being ignored, that is, when part of the
index is neither being retained nor removed (denormalized by).

Finally an option to specify zero instead of NULL, the default, for the value of those
denormalized columns for which the index is not defined, is also provided.

Literal values entered for columns of type DATE must be entered in the format defined or
defaulted for the column in question. For example, if the date format of a key value being
removed is ’YYYYMMDD’, then a parameter for this key value might be entered as
“19990703.”

The Denorm Analysis is parameterized by specifying the table and column(s) to analyze,
options unique to the Denorm Analysis, as well as specifying the desired results and SQL or
Expert Options.

© 1999-2007 NCR Corporation, All Rights Reserved 10


Chapter One
Data Reorganization

Initiate a Denorm Analysis


Use the following procedure to initiate a new Denorm analysis in Teradata Warehouse Miner:

1. Click on the Add New Analysis icon in the toolbar:

2. In the resulting Add New Analysis dialog box, with Reorganization highlighted on the
left, double-click on the Denorm icon:

3. This will bring up the Denorm dialog in which you will enter INPUT and OUTPUT
options to parameterize the analysis as described in the next sections.

11 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

Denorm - INPUT - Data Selection


On the Denorm dialog click on INPUT and then click on data selection:

On this screen select:

Available Databases
All the databases which are available for the Denorm Analysis.
Available Tables
All the tables within the Source Database that are available for the Denorm Analysis.
Available Columns
All the columns within the selected table that are available for the Denorm Analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the
Selected Columns window, or click on the arrow button to move highlighted
columns into the Selected Columns window. Make sure the correct category is
highlighted:
Index Retain Columns
List of index columns to retain (not remove) in resultant denormalized table
(click to expand/highlight).
Index Remove Columns
List of index columns to remove (denormalize by values of these columns).
(Click to expand/highlight.)
Denormalize Columns
List of columns/aliases to denormalize (i.e. replicate for given values of removed
index columns). (Click to expand/highlight.)
Retain Columns
List of columns to retain which are already denormalized (i.e. have a constant
value over the selected values of removed key columns). (Click to
expand/highlight.)

Denorm - INPUT - Analysis Parameters


On the Denorm dialog click on INPUT and then click on analysis parameters:

© 1999-2007 NCR Corporation, All Rights Reserved 12


Chapter One
Data Reorganization

On this screen select:

Values to Denormalize
A list of values and prefixes which are valid values in the column specified in Index
Remove Columns. Use the Add and Remove buttons to set values for:
Prefix
An optional string (must be a valid Teradata word) that will define the unique
Value specified.
Value in <remove column>
A list of distinct values which the column specified in Index Remove Columns
takes on.

Add button
Both Prefix and Value in <remove column> can be specified manually by clicking
on the Add button and typing the required values.

Remove button
Remove the currently highlighted Prefix and Value in <remove column>.

Values…
Selecting the Values button brings up the following Denorm Values Wizard:

Once the Values button is selected, a status message indicating which column values
are being fetched appears. Once this is complete, the distinct values of the column
being removed are listed in the left-most column. These values can be dragged and
dropped into the right-most column, or selected via the Add button. Similarly, they

13 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

can be removed via the Remove button. Once the values to be denormalized are
moved to Selected values, click the Finish button to return to the Teradata Warehouse
Miner user interface or to continue to select the values of the next Denormalize
Column, if there is more than one.

When the Values load process is finished, a default value is generated for each
Column Prefix by concatenating the values of the Index Remove Columns, each
value followed by an underscore character. If the combination of the prefix and the
longest Denormalize Column name or alias will be greater than 30 characters in
length, the prefix is left blank, to be filled in by the user. Note that if the name of a
Denormalize Column or the values of the Index Remove Columns are long, it may
be necessary to specify a comparatively short alias for the Denormalize Column so
that automatic prefixes can be generated. Otherwise, it may be necessary to specify a
short prefix manually.

Aggregation Method
This parameter allows you to specify an aggregation method in the case where new
columns have multiple values to choose from. Valid user specified aggregation
methods, include MIN, MAX, AVG, SUM and COUNT. These should only be used
when there are non-unique indices or when a part of the index is being ignored, that
is, when part of the key is neither being retained nor removed (i.e. denormalized by).

Treat Undefined Index Values As:


This parameter allows you to specify zero instead of NULL for the value of those
denormalized columns for which the value of the removed column is not the target
value. The default is zero. If DATE or TIMESTAMP columns are specified in
Denormalize Columns, and ZERO is selected, an error will occur.

Compress undefined index values in output table


This parameter allows you to request data compression of either NULL or zero
values (depending on the Treat Undefined Index Values As option above) for those
denormalized columns for which the value of the removed column is not the target
value.

Denorm - INPUT - Expert Options


On the Denorm dialog click on INPUT and then click on expert options:

This screen provides the option to generate a SQL WHERE clause(s) to restrict rows selected
for analysis (for example: cust_id > 0).

Denorm - OUTPUT

© 1999-2007 NCR Corporation, All Rights Reserved 14


Chapter One
Data Reorganization

Before running the analysis, specify Output options. On the Denorm dialog click on
OUTPUT:

This screen provides the following options:

Use the Teradata EXPLAIN feature to display the execution path for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.

Store the tabular output of this analysis in the database


Option to generate a Teradata TABLE or VIEW populated with the results of the
analysis. Once enabled, the following three fields must be specified:
Database Name
Text box to specify the name of the Teradata database where the resultant Table
or View will be created in. By default, this is the “Result Database.”
Output Name
Text box to specify the name of the Teradata Table or View.
Output Type
Pull-down to specify Table or View.
Create Output table using the FALLBACK keyword
If a table is selected, it will be built with FALLBACK if this option is selected
Create Output table using the MULTISET keyword
If a table is selected, it will be built as a MULTISET table if this option is
selected

Generate the SQL for this analysis, but do not execute it


If this option is selected the analysis will only generate SQL, returning it and
terminating immediately.

Run the Denorm Analysis


After setting parameters on the INPUT and OUTPUT screens as described above, you are
ready to run the analysis. To run the analysis you can either:
• Click the Run icon on the toolbar, or
• Select Run <project name> on the Project menu, or
• Press the F5 key on your keyboard

Results - Denorm
The results of running the Denorm Analysis include the generated SQL itself, the results of
executing the generated SQL, and, if the Create Table (or View) option is chosen, a Teradata
table (or view). All of these results are outlined below.

15 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

Denorm - RESULTS - Data


On the Denorm dialog, click on RESULTS and then click on data (note that the RESULTS
tab will be grayed-out/disabled until after the analysis is completed):

Results of the completed query are returned in a Data Viewer page within the Results
Browser. This page has the properties of the Data View page discussed in the Chapter on
Using the Teradata Warehouse Miner Graphical User Interface. With the exception of the
Explain Select Result Option, these results will match the tables described below in the
Output Column Definition section, depending upon the parameters chosen for the analysis.

Denorm - RESULTS - SQL


On the Denorm dialog, click on RESULTS and then click on SQL (note that the RESULTS
tab will be grayed-out/disabled until after the analysis is completed):

The generated SQL is returned here as text which can be copied, pasted, or printed.

Tutorial - Denorm
Parameterize a Denorm Analysis as follows:

Available Tables twm_accounts


Index Retain Columns cust_id
Index Remove Columns acct_type
Denormalize Columns ending_balance
Retain Columns acct_nbr

Values to Denormalize
Value SV
CK
CC
Prefix SV_ (Value - SV)
CK_ (Value - CK)
CC_ (Value - CC)
Aggregation Method MIN
Treat undefined key values as Zero

© 1999-2007 NCR Corporation, All Rights Reserved 16


Chapter One
Data Reorganization

For this example, the Denorm Analysis generated the following results. Note that the SQL is
not shown for brevity:

Data

Note – only the first 10 rows shown.

cust_id acct_nbr CC_ending_balance CK_ending_balance SV_ending_balance


1363215 0000000013632153 0.00 0.00 2689.95
1362654 0000000013626543 0.00 0.00 622.46
1362793 4561143213627934 407.08 0.00 0.00
1362666 0000000013626663 0.00 0.00 300.42
1362700 0000000013627002 0.00 4494.03 0.00
1363400 0000000013634003 0.00 0.00 137.85
1363374 4561143213633744 0.00 0.00 0.00
1362941 0000000013629413 0.00 0.00 877.14
1362586 0000000013625862 0.00 260.70 0.00
1362883 0000000013628833 0.00 0.00 149.74
… … … … …
… … … … …
… … … … …

17 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

Join
The Join analysis is useful in joining together tables and/or views into an intermediate or
final analytic data set. The Join Analysis provides a graphical user interface to several of the
most common, though certainly not all, join mechanisms in Teradata. Consequently, it should
not be thought of or used as a complete replacement for SQL approaches to executing any
generic Teradata join.

By default, an INNER join is performed on the given tables based on the given join columns.
This means that rows will be returned only for primary index column values that appear in all
selected tables. By option, a LEFT outer join can be requested, which returns rows for all
primary index column values found in the first table specified, and fills in any missing values
from the other tables with NULL values. Alternatively, a RIGHT outer join can be requested
to return all rows found in the last requested table, filling in any missing values from the first
table with NULL values (or from the incremental right outer joins preceding it if more than
two tables were selected). Finally, an option to perform a FULL outer join can be requested
which retains all primary index values from all selected tables with missing values set to
NULL.

The Join Analysis is parameterized by specifying the table and column(s) to analyze, options
unique to the Join Analysis, as well as specifying the desired results and SQL or Expert
Options.

Initiate a Join Analysis


Use the following procedure to initiate a new Join analysis in Teradata Warehouse Miner:

1. Click on the Add New Analysis icon in the toolbar:

2. In the resulting Add New Analysis dialog box, with Reorganization highlighted on the
left, double-click on the Join icon:

© 1999-2007 NCR Corporation, All Rights Reserved 18


Chapter One
Data Reorganization

3. This will bring up the Join dialog in which you will enter INPUT and OUTPUT options to
parameterize the analysis as described in the next sections.

Join - INPUT - Data Selection


On the Join dialog click on INPUT and then click on data selection:

On this screen select:

Available Databases
All the databases which are available for the Join Analysis.
Available Tables
All the tables within the Source Database that are available for the Join Analysis.
Available Columns
All the columns within the selected table that are available for the Join Analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the
Selected Columns window, or click on the arrow button to move highlighted
columns into the Selected Columns window.

Join - INPUT – Join Columns


On the Join dialog click on INPUT and then click on join columns:

19 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

This screen is used to specify the columns on which to join together the tables or views
selected in this analysis. For tables, the primary index columns are displayed as default
values which may be changed. Join columns are matched for each table or view, one-for-one
in the order specified. Each table or view must therefore have the same number of join
columns specified. The screen contains these fields:

Available Tables
All tables specified under data selection Selected Columns.
Available Columns
All columns specified under data selection Selected Columns.
Selected Join Columns
Select columns by highlighting and then either dragging and dropping into the
Selected Join Columns window, or click on the arrow button to move highlighted
columns into the Selected Join Columns window.

Join - INPUT - Analysis Parameters


On the Join dialog click on INPUT and then click on analysis parameters:

On this screen select:

Anchor Table
For all the join types, this table will be the first column to which all other joins are
performed against.
Join Style
Select the type of join to perform, either Inner, or Left, Right or Full outer join.

Join - INPUT - Expert Options


On the Join dialog click on INPUT and then click on expert options:

© 1999-2007 NCR Corporation, All Rights Reserved 20


Chapter One
Data Reorganization

This screen provides the option to generate a SQL WHERE clause(s) to restrict rows selected
for analysis (for example: cust_id > 0).

It may be useful to note that if a WHERE clause condition is specified on the "inner" table of
a join (i.e. a table that contributes only matched rows to the results), the join is logically
equivalent to an Inner Join, regardless of whether an Outer type is specified. (In a Left Outer
Join, the left table is the "outer" table and the right table is the "inner" table.)

Join - OUTPUT - Storage


Before running the analysis, define Output options. On the Join dialog click on OUTPUT
and then click on storage:

On this screen select:

Use the Teradata EXPLAIN feature to display the execution path for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.

Store the tabular output of this analysis in the database


Option to generate a Teradata TABLE or VIEW populated with the results of the
analysis. Once enabled, the following three fields must be specified:
Database Name
Text box to specify the name of the Teradata database where the resultant Table
or View will be created in. By default, this is the “Result Database.”
Output Name
Text box to specify the name of the Teradata Table or View.
Output Type
Pull-down to specify Table or View.
Create Output table using the FALLBACK keyword
If a table is selected, it will be built with FALLBACK if this option is selected
Create Output table using the MULTISET keyword
If a table is selected, it will be built as a MULTISET table if this option is
selected

Generate the SQL for this analysis, but do not execute it

21 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

If this option is selected the analysis will only generate SQL, returning it and
terminating immediately.

Join - OUTPUT - Primary Index


Before running the analysis, define Output options. On the Join dialog click on OUTPUT
and then click on primary index:

On this screen, select the columns which comprise the primary index of the output table.
Select:

Available Columns
A list of columns which comprise the index of the resultant table if an Output Table
is used.
Primary Index Columns
Select columns by highlighting and then either dragging and dropping into the
Primary Index Columns window, or click on the arrow button to move highlighted
columns into the Primary Index Columns window.

Run the Join Analysis


After setting parameters on the INPUT and OUTPUT screens as described above, you are
ready to run the analysis. To run the analysis you can either:
• Click the Run icon on the toolbar, or
• Select Run <project name> on the Project menu, or
• Press the F5 key on your keyboard

Results - Join
The results of running the Teradata Warehouse Miner Join Analysis include the generated
SQL itself, the results of executing the generated SQL, and, if the Create Table (or View)
option is chosen, a Teradata table (or view). All of these results are outlined below.

Join - RESULTS - Data


On the Join dialog, click on RESULTS and then click on data (note that the RESULTS tab
will be grayed-out/disabled until after the analysis is completed):

© 1999-2007 NCR Corporation, All Rights Reserved 22


Chapter One
Data Reorganization

The results of the completed query are returned in a Data Viewer page within the Results
Browser. This page has the properties of the Data View page discussed in the Chapter on
Using the Teradata Warehouse Miner Graphical User Interface. With the exception of the
Explain Select Result Option, these results will match the tables described below in the
Output Column Definition section, depending upon the parameters chosen for the analysis.

Join - RESULTS - SQL


On the Join dialog, click on RESULTS and then click on SQL (note that the RESULTS tab
will be grayed-out/disabled until after the analysis is completed):

The generated SQL is returned here as text which can be copied, pasted, or printed.

Output Columns – Join Analysis


When the “Store the Tabular Output” option is selected, the following table will be built.

Name Type Definition


Columns Same as The Selected Columns from the joined tables. If a table is created, those selected columns
input type that are also selected on the “Output” – “primary index” tab will comprise the primary index
of the created table.

Tutorial – Join Analysis

Join - Example #1

Parameterize a Join Analysis as follows:

Selected Columns TWM_CUSTOMER.cust_id


TWM_CHECKING_ACCT.ending_balance
(Rename to Chk_Bal)
TWM_CREDIT_ACCT.ending_balance
(Rename to Crd_Bal)
TWM_SAVINGS_ACCT.ending_balance
(Rename to Sav_Bal)

For this example, the Join Analysis generated the following results. Note that the SQL is not

23 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

shown for brevity:

Data

Note – only the first 10 rows shown.

cust_id Chk_Bal Crd_Bal Sav_Bal


1362759 4.62 835.19 79.52
1363088 769.47 739.86 1426.1
1362587 292 859.44 1596.17
1362952 473.45 782.77 232.76
1362771 141.86 1200 2361.62
1363316 947.22 1000 629.06
1363342 14.45 1600 57.96
1362700 4494.03 107.4 315.88
1363448 1961.87 1400 1763.31
1362936 109.31 244.47 792.11
… … … …
… … … …
… … … …

© 1999-2007 NCR Corporation, All Rights Reserved 24


Chapter One
Data Reorganization

Merge
The Merge analysis merges together tables or views by performing an SQL UNION,
INTERSECT or MINUS operation. The merge operation brings together rows from two or
more tables, matching up the selected columns in the order they are selected. (This can be
contrasted with the Join function that brings together columns from multiple tables.) The
rows contained in the answer set are determined by the choice of the Merge Style,
determining whether the Union, Intersect or Minus operator is applied to each table after the
first table selected. An additional option is provided to determine if duplicate rows, if any,
should be included in the answer set. You may also specify one or more optional SQL Where
Clauses to apply to selected tables (each Where Clause is applied to just one table).

When the Union merge style is selected, the union of the rows containing selected columns
from the first table and each subsequent table is performed using the SQL UNION operator.
The final answer table contains all the qualifying rows from each table. With the Union
merge style, an option is provided to add an identifying column to the answer set and to name
the column if desired. This column assumes an integer value from 1 to n to indicate the input
table each row in the answer set comes from.

When the Intersect merge style is selected, the intersection of the rows containing selected
columns from the first table and each subsequent table is performed using the SQL
INTERSECT operator. The final answer table contains all the qualifying rows that exist in
each of the tables being merged. (That is, if a row is not contained in each of the requested
tables, it is not included in the answer set.)

When the Minus merge style is selected, the rows containing selected columns from the first
table are included in the answer table provided they do not appear in any of the other selected
tables. This is achieved using the SQL MINUS operator for each table after the first. (The
MINUS operator is a Teradata specific SQL operator equivalent to the standard EXCEPT
operator.)

Initiate a Merge Analysis


Use the following procedure to initiate a new Merge analysis in Teradata Warehouse Miner:

1. Click on the Add New Analysis icon in the toolbar:

2. In the resulting Add New Analysis dialog box, with Reorganization highlighted on the
left, double-click on the Merge icon:

25 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

3. This will bring up the Merge dialog in which you will enter INPUT and OUTPUT options
to parameterize the analysis as described in the next sections.

Merge - INPUT - Data Selection


On the Merge dialog click on INPUT and then click on data selection:

On this screen select:

Available Databases
All the databases that are available for the Merge Analysis.
Available Tables
All the tables within the Source Database that are available for the Merge Analysis.
Available Columns
All the columns within the selected table that are available for the Merge Analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the
Selected Columns window, or click on the arrow button to move highlighted
columns into the Selected Columns window. Columns from the first selected table
may be renamed if desired by single-clicking on them.

© 1999-2007 NCR Corporation, All Rights Reserved 26


Chapter One
Data Reorganization

Merge - INPUT - Analysis Parameters


On the Merge dialog click on INPUT and then click on analysis parameters:

On this screen, select:

Merge Style
Select the type of merge to perform, either Union, Intersect or Minus.
Retain Duplicate Rows
Select whether or not to include duplicate rows in the answer set.
Add Identifying Column (Union only)
Select whether or not to add an identifying column to the answer set. (This option is
available only when the Merge Style is Union.)
Column Name (Union only)
Specify the name of the identifying column to add to the answer set. (This option is
available only when the Merge Style is Union and Add Identifying Column is
selected.)

Merge - INPUT - Expert Options


On the Merge dialog click on INPUT and then click on expert options:

One or more optional Where Clauses may be entered on this screen. Each Where Clause
entered is applied only to the table currently selected on the screen. On this screen select:

Select table to associate WHERE clause with:


Select the table to associate an optional SQL Where Clause with.
Optional WHERE clause text:
Enter the optional SQL Where Clause text to be associated with the selected table,
restricting the selected rows. (Do not include the word "WHERE" at the beginning of
the text. It will be added automatically.)

Merge - OUTPUT - Storage


Before running the analysis, define Output options. On the Merge dialog click on OUTPUT
and then click on storage:

27 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

On this screen select:

Use the Teradata EXPLAIN feature to display the execution path for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.

Store the tabular output of this analysis in the database


Option to generate a Teradata TABLE or VIEW populated with the results of the
analysis. Once enabled, the following three fields must be specified:
Database Name
Text box to specify the name of the Teradata database where the resultant Table
or View will be created in. By default, this is the “Result Database.”
Output Name
Text box to specify the name of the Teradata Table or View.
Output Type
Pull-down to specify Table or View.
Create Output table using the FALLBACK keyword
If a table is selected, it will be built with FALLBACK if this option is selected.
Create Output table using the MULTISET keyword
If a table is selected, it will be built as a MULTISET table if this option is
selected. (This option should be selected if duplicate rows are expected.)

Generate the SQL for this analysis, but do not execute it


If this option is selected the analysis will only generate SQL, returning it and
terminating immediately.

Merge - OUTPUT - Primary Index


Before running the analysis, define Output options. On the Merge dialog click on OUTPUT
and then click on primary index:

On this screen, select the columns that comprise the primary index of the output table. Select:

Available Columns
A list of columns that will be in the output table or result set. Select columns by
highlighting and then either dragging and dropping into the Primary Index

© 1999-2007 NCR Corporation, All Rights Reserved 28


Chapter One
Data Reorganization

Columns window, or click on the arrow button to move highlighted columns into the
Primary Index Columns window.
Primary Index Columns
A list of columns that comprise the index of the resultant table if an Output Type of
Table is used.
Create the index using the UNIQUE keyword
Select whether or not the primary index should be a unique primary index, i.e. that
more than one row may not have the same combination of primary index column
values.

Run the Merge Analysis


After setting parameters on the INPUT and OUTPUT screens as described above, you are
ready to run the analysis. To run the analysis you can either:
• Click the Run icon on the toolbar, or
• Select Run <project name> on the Project menu, or
• Press the F5 key on your keyboard

Results - Merge
The results of running the Teradata Warehouse Miner Merge Analysis include the generated
SQL itself, the results of executing the generated SQL, and, if the Create Table (or View)
option is chosen, a Teradata table (or view). All of these results are outlined below.

Merge - RESULTS - Data


On the Merge dialog, click on RESULTS and then click on data (note that the RESULTS
tab will be grayed-out/disabled until after the analysis is completed):

The results of the completed query are returned in a Data Viewer page within the Results
Browser. This page has the properties of the Data View page discussed in the Chapter on
Using the Teradata Warehouse Miner Graphical User Interface. With the exception of the
Explain Select Result Option, these results will match the tables described below in the
Output Column Definition section, depending upon the parameters chosen for the analysis.

Merge - RESULTS - SQL


On the Merge dialog, click on RESULTS and then click on SQL (note that the RESULTS
tab will be grayed-out/disabled until after the analysis is completed):

29 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

The generated SQL is returned here as text which can be copied, pasted, or printed.

Output Columns – Merge Analysis


When the “Store the Tabular Output” option is selected, the following table will be built.
Those columns selected on the "primary index" tab of the OUTPUT panel will comprise the
Unique Primary Index (UPI).

Name Type Definition


The name and type of result columns match those of the Selected Columns from the first
selected table (the user may rename selected columns as desired)

Tutorial – Merge Analysis

Merge - Example #1

Parameterize a Merge Analysis as follows:

Selected Columns twm_customer_dqa.cust_id


twm_customer_dqa.income (rename inc)
twm_customer_dqa.age
twm_customer.cust_id
twm_customer.income
twm_customer.age

Merge Style Minus

Retain Duplicate Rows No

Where Clause (both tables) cust_id < 1362490

For this example, the Merge Analysis generates the following results data.

cust_id income age


1360000 29592 29
1360001 27612 39
1360002 57612 49
1362480 33

© 1999-2007 NCR Corporation, All Rights Reserved 30


Chapter One
Data Reorganization

Partition
The Partition analysis is one of two functions provided by Teradata Warehouse Miner to
sample data from a table or view. The Partition Analysis is distinguished from the Sample
Analysis in that it is repeatable and is based on the internal hash index encodings provided by
Teradata, rather than the statistically random selections provided by the Sample function.

Given a table, a list of columns to select and a list of columns to hash on, the Partition
Analysis generates a user specific partition or range of partitions from a table using a hash
key. For example, the 3rd partition out of 10 might be requested, or partitions 1 through 3 out
of 10.

To select a specific partition, set start and end partition to the same selected value. If a range
of partitions is requested, the partition number is also returned as xpartid.

The Partition Analysis is parameterized by specifying the table and column(s) to analyze,
options unique to the Partition Analysis, as well as specifying the desired results and SQL or
Expert Options.

Initiate a Partition Analysis


Use the following procedure to initiate a new Partition analysis:

1. Click on the Add New Analysis icon in the toolbar:

2. In the resulting Add New Analysis dialog box, with Reorganization highlighted on the
left, double-click on the Partition icon:

31 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

3. This will bring up the Partition dialog in which you will enter INPUT and OUTPUT
options to parameterize the analysis as described in the next sections.

Partition - INPUT - Data Selection


On the Partition dialog click on INPUT and then click on data selection:

On this screen select:

Available Databases
All the databases which are available for the Partition Analysis.
Available Tables
All the tables within the Source Database that are available for the Partition
Analysis.
Available Columns
All the columns within the selected table that are available for the Partition Analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the
Selected Columns window, or click on the arrow button to move highlighted
columns into the Selected Columns window. Make sure the correct category is
highlighted:
Partition Columns

© 1999-2007 NCR Corporation, All Rights Reserved 32


Chapter One
Data Reorganization

Column(s) to be in the partitioned result set.


Hash Columns
Column(s) on which to hash-partition tables. Hash-partitioning is performed by
Teradata, just as it does when distributing rows in the database, making use of the
Teradata SQL extensions HASHBUCKET and HASHROW.

Partition - INPUT - Analysis Parameters


On the Partition dialog click on INPUT and then click on analysis parameters:

On this screen select:

Number of Partitions
Number of partitions (1 to 65536) to logically split table into, from which Start to
End is selected.
First Partition
First logical partition to select (must be in the range from 1 to Number of
Partitions).
Last Partition
Last logical partition to select (must be in the range from First Partition to Number
of Partitions).

Partition - INPUT - Expert Options


On the Partition dialog click on INPUT and then click on expert options:

This screen provides the option to generate a SQL WHERE clause(s) to restrict rows selected
for analysis (for example: cust_id > 0).

33 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

Partition - OUTPUT - Storage


Before running the analysis, define Output options. On the Partition dialog click on
OUTPUT and then click on storage:

On this screen select:

Use the Teradata EXPLAIN feature to display the execution path for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.

Store the tabular output of this analysis in the database


Option to generate a Teradata TABLE or VIEW populated with the results of the
analysis. Once enabled, the following three fields must be specified:
Database Name
Text box to specify the name of the Teradata database where the resultant Table
or View will be created in. By default, this is the “Result Database.”
Output Name
Text box to specify the name of the Teradata Table or View.
Output Type
Pull-down to specify Table or View.
Create Output table using the FALLBACK keyword
If a table is selected, it will be built with FALLBACK if this option is selected
Create Output table using the MULTISET keyword
If a table is selected, it will be built as a MULTISET table if this option is
selected

Generate the SQL for this analysis, but do not execute it


If this option is selected the analysis will only generate SQL, returning it and
terminating immediately.

Partition - OUTPUT - Primary Index


On the Partition dialog click on OUTPUT and then click on primary index:

© 1999-2007 NCR Corporation, All Rights Reserved 34


Chapter One
Data Reorganization

On this screen, select the columns which comprise the primary index of the output table.
Select:

Available Columns
A list of columns which comprise the index of the resultant table if an Output Table
is used.
Primary Index Columns
Select columns by highlighting and then either dragging and dropping into the
Primary Index Columns window, or click on the arrow button to move highlighted
columns into the Primary Index Columns window.

Run the Partition Analysis


After setting parameters on the INPUT and OUTPUT screens as described above, you are
ready to run the analysis. To run the analysis you can either:
• Click the Run icon on the toolbar, or
• Select Run <project name> on the Project menu, or
• Press the F5 key on your keyboard

Results - Partition Analysis


The results of running the Teradata Warehouse Miner Partition Analysis include the
generated SQL itself, the results of executing the generated SQL, and, if the Create Table (or
View) option is chosen, a Teradata table (or view). All of these results are outlined below.

Partition - RESULTS - Data


On the Partition dialog, click on RESULTS and then click on data (note that the RESULTS
tab will be grayed-out/disabled until after the analysis is completed):

The results of the completed query are returned in a Data Viewer page within the Results
Browser. This page has the properties of the Data View page discussed in the Chapter on
Using the Teradata Warehouse Miner Graphical User Interface. With the exception of the
Explain Select Result Option, these results will match the tables described below in the
Output Column Definition section, depending upon the parameters chosen for the analysis.

Partition - RESULTS - SQL


On the Partition dialog, click on RESULTS and then click on SQL (note that the RESULTS
tab will be grayed-out/disabled until after the analysis is completed):

35 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

The generated SQL is returned as text which can be copied, pasted, or printed.

Output Columns – Partition Analysis


When the “Store the Tabular Output” option is selected, the following table will be built.

Name Type Definition


Columns Same as The selected Partition Columns from the “Input” – “data selection” tab. If a table is
input type created, those selected columns that are also selected on the “Output” – “primary index”
tab will comprise the primary index of the created table.
xpartid SMALLINT If multiple partitions are requested by making the First Partition parameter less than the
Last Partition parameter, this column will be created with values matching the requested
partition numbers, i.e., setting start = 3 and end = 5 will return xpartid = 3, 4 and 5.

Tutorial - Partition Analysis

Partition - Example #1

Parameterize a Partition Analysis as follows:

Available Tables TWM_CUSTOMER


Selected Columns and Aliases TWM_CUSTOMER.cust_id
TWM_CUSTOMER.income
TWM_CUSTOMER.age
TWM_CUSTOMER.years_with_bank
TWM_CUSTOMER.nbr_children
TWM_CUSTOMER.gender
TWM_CUSTOMER.marital_status
Selected Hash Columns TWM_CUSTOMER.cust_id
Number of Partitions 10
First Partition 1
Last Partition 1

For this example, the Partition Analysis generated the following results. Note that the SQL is
not shown for brevity:

Data

Note – only the first 10 rows shown.

© 1999-2007 NCR Corporation, All Rights Reserved 36


Chapter One
Data Reorganization

cust_id income age years_with_bank nbr_children Gender marital_status


1362485 22690 25 4 0 F 1
1362550 0 15 0 0 M 1
1362564 14357 77 7 0 F 2
1362566 127848 54 4 1 M 2
1362570 20562 50 0 2 F 2
1362580 29363 36 6 3 F 4
1362586 24476 46 6 0 F 1
1362657 27946 39 8 0 F 1
1362661 19649 66 5 0 M 2
1362663 29030 44 4 0 M 1
… … … … … … …
… … … … … … …
… … … … … … …

Partition - Example #2

Parameterize a Partition Analysis as follows:

Available Tables TWM_CUSTOMER


Selected Columns and Aliases TWM_CUSTOMER.cust_id
TWM_CUSTOMER.income
TWM_CUSTOMER.age
TWM_CUSTOMER.years_with_bank
TWM_CUSTOMER.nbr_children
TWM_CUSTOMER.gender
TWM_CUSTOMER.marital_status
Selected Hash Columns TWM_CUSTOMER.cust_id
Number of Partitions 10
First Partition 1
Last Partition 3

For this example, the Partition Analysis generated the following results. Again, the SQL is
not shown:

Data

Note – only the first 10 rows shown.

cust_id income age years_with_bank nbr_children gender marital_status xpartid


1362527 17622 44 1 0 M 2 2
1363486 39942 41 1 5 F 4 3
1363442 144157 58 5 0 M 2 2
1363282 25829 29 8 0 F 1 3
1363238 5788 35 5 2 F 2 2
1363078 9622 46 6 1 M 2 3
1362830 10933 18 3 0 F 1 2
1362670 8877 26 5 0 F 1 3
1362626 15993 30 0 3 F 2 2
1363404 0 17 2 0 M 1 1
… … … … … … … …
… … … … … … … …
… … … … … … … …

37 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

Sample
The Sample analysis function randomly selects rows from a table or view, producing one or
more samples based on a specified number of rows or a fraction of the total number of rows.
The sampled rows may be stored in a single table, in a separate table for each sample, or in a
single table with a view created for each sample. When connected to a Teradata V2R5 or later
data source, options are provided for sampling with or without replacement of rows,
randomized allocation or proportional allocation by AMP, and stratified or simple random
sampling. When connected to an earlier Teradata release the default options are automatically
used. These options are described more fully below.

Sampling is performed without replacement by default. This means that each row sampled in
a request is unique and once sampled is not replaced in the sampling pool for that request.
Therefore, it is not possible to sample more rows than exist in the sampled table, and if
multiple samples are requested they are mutually exclusive. When sampling with replacement
is requested, each sampled row is immediately returned to the sampling pool and may
therefore be selected multiple times. If multiple samples are requested with replacement, the
samples are not necessarily mutually exclusive.

The default row allocation method is proportional, allocating the requested rows across the
Teradata AMP's as a function of the number of rows on each AMP. This is technically not a
simple random sample because it does not include all possible sample sets. It is however
much faster than randomized allocation, especially for large sample sizes, and should have
sufficient randomness for most applications. When randomized allocation is requested, row
selections are allocated across the AMP's by simulating simple random sampling, a process
that can be comparatively slow.

By default the Sample Analysis function performs simple random sampling. This means that
each possible set of the requested size has an equal probability of being selected (subject to
the limitations of proportional allocation noted above). An option is however provided for
stratified random sampling, wherein the available rows are divided into groups or strata based
on stated conditions prior to samples of a requested size or sizes being taken.

The Sample Analysis is parameterized by specifying the table and column(s) to analyze,
options unique to Sample Analysis, as well as specifying the desired results and SQL or
Expert Options.

© 1999-2007 NCR Corporation, All Rights Reserved 38


Chapter One
Data Reorganization

Initiate a Sample Analysis


Use the following procedure to initiate a new Sample analysis:

1. Click on the Add New Analysis icon in the toolbar:

2. In the resulting Add New Analysis dialog box, with Reorganization highlighted on the
left, double-click on the Sample icon:

3. This will bring up the Sample dialog in which you will enter INPUT and OUTPUT options
to parameterize the analysis as described in the next sections.

39 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

Sample - INPUT - Data Selection


On the Sample dialog click on INPUT and then click on data selection:

On this screen select:

Available Databases
All the databases which are available for the Sample Analysis.
Available Tables
All the tables within the Source Database that are available for the Sample Analysis.
Available Columns
All the columns within the selected table that are available for the Sample Analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the
Selected Columns window, or click on the arrow button to move highlighted
columns into the Selected Columns window.

Sample - INPUT - Analysis Parameters


On the Sample dialog click on INPUT and then click on analysis parameters:

On this screen select:

Sample Style

Basic - When this option is checked, simple random sampling without stratifying
conditions is performed.

Stratified - When this option is checked, the available rows are divided into groups
or strata based on stated conditions prior to samples of a requested size or sizes being
taken.

Sample Options

Sample with Replacement - When this option is checked, each sampled row is
immediately returned to the sampling pool and may therefore be selected multiple

© 1999-2007 NCR Corporation, All Rights Reserved 40


Chapter One
Data Reorganization

times. If multiple samples are requested with replacement, the samples are not
necessarily mutually exclusive.

When this option is not checked, each row sampled in a request is unique, and once
sampled, is not replaced in the sampling pool for that request. Therefore, it is not
possible to sample more rows than exist in the sampled table, and if multiple samples
are requested they are mutually exclusive.

Sample with Randomized Allocation - When this option is checked, the requested
rows are allocated across the AMP’s by simulating simple random sampling, a
process that can be comparatively slow.

When this option is not checked, requested rows are allocated across the Teradata
AMP’s as a function of the number of rows on each AMP. This is technically not a
simple random sample because it does not include all possible sample sets. It is
however much faster than randomized allocation, especially for large sample sizes,
and should have sufficient randomness for most applications.

Sizes/Fractions separated by ‘,’ (only when Sample Style is Basic)

When the Sample Style is Basic, this option is used to enter a list of one or more
sample sizes or fractions, separated by the list separator for the current locale. If
sample sizes are entered (e.g. 10, 20, 30), they indicate the number of rows to be
returned in each sample. If fractions are entered (e.g. .01, .02, .03), they indicate the
approximate size of each sample as a fraction of the available rows in the table, and
as such must not add up to more than 1.

Stratified Conditions (only when Sample Style is Stratified)

When the Sample Style is Stratified, this option is used to enter one or more
conditions along with corresponding sample sizes or fractions. (For an example of
stratified sampling, refer to Sample Example #5 in Tutorial – Sample Analysis.)

Condition

Each stratum in the sampling must be defined by a conditional expression, such as


gender = ‘M’ or channel IN (‘A’, ‘B’, ‘C’).

Sizes/Fractions

This field is used to enter sizes or fractions for one or more samples, separated by the
list separator for the current locale. If sample sizes are entered (e.g. 10, 20, 30), they
indicate the number of rows to be returned in each sample for the stratum. If fractions
are entered (e.g. .01, .02, .03), they indicate the approximate size of each sample as a
fraction of the available rows in the stratum, and as such must not add up to more
than 1.

Sample - INPUT - Expert Options


On the Sample dialog click on INPUT and then click on expert options:

41 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

This screen provides the option to generate a SQL WHERE clause(s) to restrict rows selected
for analysis (for example: cust_id > 0). (Note that the use of this option may negatively
impact the performance of a Basic style Sample with default options.)

Sample - OUTPUT - Storage


Before running the analysis, specify Output options. On the Sample dialog click on
OUTPUT and then click on storage:

On this screen select:

Use the Teradata EXPLAIN feature to display the execution path for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.

Store the tabular output of this analysis in the database


Option to generate a Teradata TABLE or VIEW populated with the results of the
analysis. Once enabled, the following three fields must be specified:
Output Type
Pull-down to specify Table, Multiple Tables or Multiple Views.
Database Name
Text box to specify the name of the Teradata database where the resultant Table,
Tables and/or Views will be created. By default, this is the “Result Database.”
Table Name
Text box to specify the name of the Teradata table to create if the Output Type is
Table or the underlying base table to create if the Output Type is Multiple Views.
Table Names (n)
Text box to specify the names of the tables to create if the Output Type is
Multiple Tables. (The number of tables n is given in parentheses.)
View Names (n)
Text box to specify the names of the views to create if the Output Type is
Multiple Views. (The number of views n is given in parentheses.)
Create Output table using the FALLBACK keyword
If a table is selected, it will be built with FALLBACK if this option is selected
Create Output table using the MULTISET keyword

© 1999-2007 NCR Corporation, All Rights Reserved 42


Chapter One
Data Reorganization

If a table is selected, it will be built as a MULTISET table if this option is


selected

Generate the SQL for this analysis, but do not execute it


If this option is selected the analysis will only generate SQL, returning it and
terminating immediately.

Sample - OUTPUT - Primary Index


On the Sample dialog click on OUTPUT and then click on primary index:

On this screen, select the columns which comprise the primary index of the output table.
Select:

Available Columns
A list of columns which comprise the index of the resultant table if an Output Table
or Tables are created.
Primary Index Columns
Select columns by highlighting and then either dragging and dropping into the
Primary Index Columns window, or click on the arrow button to move highlighted
columns into the Primary Index Columns window.

Run the Sample Analysis


After setting parameters on the INPUT and OUTPUT screens as described above, you are
ready to run the analysis. To run the analysis you can either:
• Click the Run icon on the toolbar, or
• Select Run <project name> on the Project menu, or
• Press the F5 key on your keyboard

Results – Sample Analysis


The results of running the Teradata Warehouse Miner Sample Analysis include the generated
SQL itself, the results of executing the generated SQL, and, if one of the Create options is
chosen, one or more Teradata tables (or views). All of these results are outlined below.

Sample - RESULTS - Data


On the Sample dialog, click on RESULTS and then click on data (note that the RESULTS
tab will be grayed-out/disabled until after the analysis is completed):

43 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

The results of the completed query are returned in a Data Viewer page within the Results
Browser. This page has the properties of the Data View page discussed in the Chapter on
Using the Teradata Warehouse Miner Graphical User Interface. With the exception of the
Explain Select Result Option, these results will match the tables described below in the
Output Column Definition section, depending upon the parameters chosen for the analysis.

© 1999-2007 NCR Corporation, All Rights Reserved 44


Chapter One
Data Reorganization

Sample - RESULTS - SQL


On the Sample dialog, click on RESULTS and then click on SQL (note that the RESULTS
tab will be grayed-out/disabled until after the analysis is completed):

The generated SQL is returned as text which can be copied, pasted, or printed.

Output Columns – Sample Analysis


If the option to Store the tabular output of this analysis in the database is selected, one of the
following tables is built by the Sample Analysis, depending on the Output Type selected.

Table or Multiple Views

If one of these options is selected, a single table is built. If multiple values have been
specified in the Size or Fraction list, a column named xsampleid will be created
indicating which sample the row belongs to – a number from 1 to n for each distinct
value entered in the Size or Fraction list (depending on stratified sampling options).

When the Multiple Views option is selected, multiple views are created operating against
this table, selecting rows based on xsampleid, but not including xsampleid.

Name Type Definition


Columns Same as The Selected Columns from the “Input” – “data selection” screen. If a table is
input created, those selected columns that are also selected on the “Output” – “primary
type index” screen will comprise the primary index of the created table.
xsampleid SMALLINT If multiple samples are requested in the Size or Fraction list, this column will be
included in the created table with values starting at 1 and incrementing for each
sample specified, i.e., setting size=10,10,10 will return xsampleid=1, 2, 3. (When a
view is created for each sample, this column is not included in the view.)

Multiple Tables

If this option is selected, one table will be built for every value in the Size or Fraction list.

Name Type Definition


Columns Same as The Selected Columns from the “Input” – “data selection” screen. Those selected
input columns that are also selected on the “Output” – “primary index” tab will comprise
type the primary index of the created tables.

Tutorial - Sample Analysis

45 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

Sample - Example #1

Parameterize a Sample Analysis as follows:

Available Tables TWM_CUSTOMER


Selected Columns TWM_CUSTOMER.cust_id
TWM_CUSTOMER.income
TWM_CUSTOMER.age
TWM_CUSTOMER.years_with_bank
TWM_CUSTOMER.nbr_children
TWM_CUSTOMER.gender
TWM_CUSTOMER.marital_status
Size or Fraction 10

For this example, the Sample Analysis generated the following results. Note that the SQL is
not shown for brevity, and that the specific rows returned will vary randomly.

Data

cust_id income age years_with_bank nbr_children gender marital_status


1362691 26150 46 5 1 M 2
1362548 44554 59 9 2 F 4
1362811 8011 82 2 0 F 2
1363402 24368 63 3 0 F 1
1363011 90248 55 5 0 M 4
1362826 0 15 0 0 F 1
1363018 46884 36 4 2 M 2
1362793 29120 36 6 2 M 3
1363410 28518 31 1 2 M 3
1362676 7230 62 2 0 M 2

Sample - Example #2

Parameterize a Sample Analysis as follows:

Available Tables TWM_CUSTOMER


Selected Columns TWM_CUSTOMER.cust_id
TWM_CUSTOMER.income
TWM_CUSTOMER.age
TWM_CUSTOMER.years_with_bank
TWM_CUSTOMER.nbr_children
TWM_CUSTOMER.gender
TWM_CUSTOMER.marital_status
Size or Fraction .1
.2
.3

For this example, the Sample Analysis generated the following results. Again, the SQL is not
shown, and the specific rows returned will vary randomly.

Data

© 1999-2007 NCR Corporation, All Rights Reserved 46


Chapter One
Data Reorganization

Note – only the first 10 rows shown.

cust_id income age years_with_bank nbr_children gender marital_status xsampleid


1362691 26150 46 5 1 M 2 1
1362548 44554 59 9 2 F 4 1
1363160 18548 38 8 0 F 1 3
1363017 0 16 1 0 M 1 3
1362487 6605 71 1 0 M 2 2
1363486 39942 41 1 5 F 4 2
1363200 21015 18 3 0 M 1 3
1363282 25829 29 8 0 F 1 3
1362527 17622 44 1 0 M 2 3
1362609 1929 79 8 0 F 2 3
… … … … … … … …
… … … … … … … …
… … … … … … … …

Sample - Example #3

Parameterize a Sample Analysis as follows:

Available Tables TWM_CUSTOMER


Selected Columns TWM_CUSTOMER.cust_id
TWM_CUSTOMER.income
TWM_CUSTOMER.age
TWM_CUSTOMER.years_with_bank
TWM_CUSTOMER.nbr_children
TWM_CUSTOMER.gender
TWM_CUSTOMER.marital_status
Size or Fraction .1
.2
.3
Output Type Multiple Tables
Table Names (3) Twm_Cust_Sample1
Twm_Cust_Sample2
Twm_Cust_Sample3

For this example, the Sample Analysis generated the following results. Again, the SQL is not
shown, and the specific rows returned will vary randomly. The data page will have a Load
button which must be click to view the three results.

Sample - Example #4

Parameterize a Sample Analysis as follows:

Available Tables TWM_CUSTOMER


Selected Columns and Aliases TWM_CUSTOMER.cust_id
TWM_CUSTOMER.income
TWM_CUSTOMER.age
TWM_CUSTOMER.years_with_bank
TWM_CUSTOMER.nbr_children
TWM_CUSTOMER.gender

47 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter One
Data Reorganization

TWM_CUSTOMER.marital_status
Size or Fraction .1
.2
.3
Output Type Multiple Views
Table Name Twm_Cust_Sample
View Names (3) Twm_Cust_Sample1_view
Twm_Cust_Sample2_view
Twm_Cust_Sample3_view

For this example, the Sample Analysis generated the following results. Again, the SQL is not
shown, and the specific rows returned will vary randomly. The data page will have a Load
button which must be click to view the three results.

Sample - Example #5

Parameterize a Sample Analysis as follows:

Available Tables TWM_CUSTOMER


Selected Columns and Aliases TWM_CUSTOMER.cust_id
TWM_CUSTOMER.income
TWM_CUSTOMER.age
TWM_CUSTOMER.years_with_bank
TWM_CUSTOMER.nbr_children
TWM_CUSTOMER.gender
TWM_CUSTOMER.marital_status
Sample Style Stratified
Stratified Condition gender = ‘M’
Sizes/Fractions 1,2,3
Stratified Condition gender = ‘F’
Sizes/Fractions 4,5,6

For this example, the Sample Analysis generated the following results. Note that not all SQL
is shown for brevity, and that the specific rows returned will vary randomly.

Data

cust_id income age years_with_bank nbr_children gender marital_status xsampleid


1363462 9495 25 4 2 F 3 1
1363081 41876 37 7 0 F 1 1
1362611 24115 48 8 1 F 2 1
1362993 20702 30 0 1 F 3 1
1363066 3240 64 1 0 M 1 1
1363306 15576 46 6 0 F 1 2
1363197 19088 52 2 2 F 2 2
1363400 49258 49 9 0 F 2 2
1362730 12988 37 7 3 F 4 2
1362535 26548 46 5 4 F 4 2
1362999 29403 36 6 2 M 4 2
1363083 22680 64 4 0 M 1 2
1362697 5848 83 3 0 F 1 3
1362492 40252 40 0 5 F 3 3

© 1999-2007 NCR Corporation, All Rights Reserved 48


Chapter One
Data Reorganization

1363039 0 15 7 0 F 1 3
1362548 44554 59 9 2 F 4 3
1362836 5920 66 6 0 F 3 3
1363266 20889 23 2 0 F 3 3
1363051 0 14 6 0 M 1 3
1362563 14711 73 3 0 M 2 3
1362962 2858 83 3 0 M 4 3

49 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

2. Analytic Data Sets


The most time intensive part of the data mining process is arguably the creation of a data set
from which to build an analytic model. The data in a relational data warehouse is typically
not in a form suitable for input directly into a data mining algorithm. New variables may need
to be created using formulas, aggregations and/or expansions on specific values of a
dimensioning variable. The joining of tables and/or de-normalizing or flattening of relational
tables may also be needed. In addition, statistical transformations are often required,
depending on the type of algorithm to be used as well as the statistical properties of the data
itself. These capabilities are referred to simply as Analytic Data Sets.

Several types of analysis may be involved in building an analytic data set. A Variable
Creation analysis provides expression building and dimensioning to define new variable
columns and place them in a table or view. A Variable Transformation function applies
requested data mining transformation functions to the columns in a table and creates a
transformed table. A Build Data Set analysis joins together the tables or views created by one
or more Variable Creation and/or Variable Transformation functions, allowing column
selection and the application of expert where clause constraints. (It is largely the same as the
Join function in the Reorganization category of functions, but can operate on a single table.)

Note that Identity columns, i.e. columns defined with the attribute "GENERATE … AS
IDENTITY", cannot be analyzed by Analytic Data Set functions.

Variable Creation
The Variable Creation function makes possible the creation of variables as columns in a table
or view. The user creates each new variable as an expression by selecting various SQL
keywords and operators as well as table and column names. SQL keywords and operators
allowed include arithmetic and logical operators, date/time operators, the typical aggregation
functions, as well as the newer ordered analytical (windowed OLAP) functions. The only
typing normally required is the typing of names, descriptions and values (although some
automation is provided for names and values).

In addition to defining variables as expressions or formulas, the user may specify constraints
on the data, either for all the variables defined in a Variable Creation function, or on an
individual basis. Table level constraints defined for all variables result in WHERE, HAVING
or QUALIFY clauses in the generated SQL. Constraints defined for individual variables
result in the use of CASE clauses in order to allow for different constraints on different
variables in the same SQL statement. A feature to allow the creation of numerous similar
variables using constraints based on specific values of one or more ‘dimensioning’ columns is
also provided.

Any number of variables can be defined in a single Variable Creation function, provided they
conform to rules that allow them to be combined in the same table, and they do not exceed
the maximum number of columns allowed in a table by Teradata. Several variable properties
are used in determining which variables can be built in the same table. Some rules of
combining variables in the same Variable Creation function are given below.

• Variables derived in a single table must have the same aggregation type and level.
• A number of tables may be referenced by the variables defined in a single Variable
Creation function.

© 1999-2007 NCR Corporation, All Rights Reserved 50


Chapter Two
Analytic Data Sets

• Variables referenced by another variable must not be dimensioned.


• All the variables in a Variable Creation function share the same table level
constraints.
• The user may request at any time that the intermediate table created by a Variable
Creation function be validated using the Teradata EXPLAIN feature.

The standard result options are available with the Variable Creation function, namely Select,
Explain Select, Create Table and Create View. The choice depends primarily on whether this
analysis produces a final result or an intermediate result, and if so, whether the user wants to
create a permanent table or view for this intermediate result. If a permanent result is not
desired, the Select option can be used to view and verify results. (Even if this analysis
produces an intermediate result directly referred to by another analysis, the Select option can
still be used since a volatile table will automatically be created in this case to allow the
referring analysis to access the results.)

SQL Elements
The Variable Creation function allows the creation of new columns or variables as
SQL expressions or formulas based on the features, functions and operators outlined
below, dependent on the release of Teradata in use at the time the variables are
defined:

1. Column(s) from one or more tables in one or more databases


2. Aggregation functions: MIN, MAX, SUM, AVG, COUNT, CORR,
COVAR_POP/SAMP, STDDEV_POP/SAMP, VAR_POP/SAMP, SKEW,
KURTOSIS, REGR_INTERCEPT/SLOPE/R2
3. Ordered analytical functions AVG, COUNT, MAX, MIN, PERCENT_RANK,
RANK and SUM, equivalents of OLAP functions MDIFF and QUANTILE in
terms of the new ordered analytical functions, and the old OLAP function
MLINREG
4. Arithmetic operators: +, -, *, /, MOD, **
5. Arithmetic functions: ABS, EXP, LN, LOG, SQRT, RANDOM
6. Trigonometric functions: COS, SIN, TAN, ACOS, ASIN, ATAN, ATAN2
7. Hyperbolic functions: COSH, SINH, TANH, ACOSH, ASINH, ATANH
8. CASE expressions, both valued and searched types
9. Comparison operators: =, >, <, <>, <=, >=
10. Logical predicates: (NOT) BETWEEN…AND…, (NOT) IN (expression list), IS
(NOT) NULL, AND, OR, NOT, (NOT) LIKE ‘pattern expression’, ANY, ALL
11. Custom logical predicates: AND ALL, OR ALL (making it easier to connect a
number of conditional expressions with an AND or OR operator)
12. NULL operators: NULLIF, COALESCE, NULLIFZERO, ZEROIFNULL
13. Built-in functions: CURRENT_DATE, CURRENT_TIME,
CURRENT_TIMESTAMP
14. Date/Time functions: ADD_MONTHS and EXTRACT
15. Custom Date/Time differences and elapsed time functions
16. Calendar fields based on a specified date column with all Teradata Calendar
options.
17. String functions: LOWER, UPPER, POSITION, SUBSTRING, TRIM,
concatenate ( || )
18. Type conversion: CAST expression AS data type
19. Parentheses: open ‘(‘ and close ‘)’

51 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

20. Free SQL Text Entry


21. References to other variables

The same list applies to the creation of Dimensions, with the exclusion of all aggregation
functions and ordered analytical functions. Additionally, the Variable Creation analysis
also allows creation of WHERE, HAVING and QUALIFY clause constraints based
on the same list with the exclusion of aggregation functions (except with HAVING),
and ordered analytical functions (except with QUALIFY).

Variable Properties
Each time a new variable is defined, the program keeps track of several attributes of the
variable that control how it is generated. Some of these attributes can be explicitly set by the
user and some are determined by the SQL verbs or clauses selected by the user.

The properties explicitly set by the user include:

• Column name—either provided by the user or defaulted to a system generated value


• Column type—either chosen by the user, inherited from another column, or set to a
default value (only the numerous Teradata character, numeric and date/time types are
allowed, not byte or graphic types)
• Description—a description of any size may be associated with a variable
• Division by zero protection—by option, divisors are automatically converted to
NULL when they are zero to avoid SQL failure

Duplication By Dimension
Sometimes it is desirable to generate a number of similar variables at one time using data
constraints involving specific values or combinations of values from one or more columns in
the input table. These other columns can be thought of as dimensions upon which the new
variable is expanded or duplicated. For example, instead of creating a single variable
containing a customer’s average transaction amount, it may be desirable to create separate
variables for average transaction amount during each of the last 6 months, yielding 6
variables.

Duplication by dimension is performed at the time a variable is created with the Variable
Creation analysis. The user may dimension a variable on all or a subset of the dimension
values they define. Ordinarily, both the dimensioned and dimensioning variable reside in the
same input table. For example, both the transaction amount (variable being dimensioned) and
the transaction date (dimensioning variable) reside in the transaction table that is used as
input.

It is possible however to dimension a variable via a column in another table such as a


hierarchy table. This requires that the table containing the dimensioning variable also
contains a column that can be matched to a column in the table that contains the column to be
dimensioned. For example, you can dimension the average transaction amount by
department where the table containing the transaction amount also contains a product code,
and the hierarchy table used for dimensioning contains both a product code and department
code. (In this case, the product code must be used in the "join path" between the transaction
and hierarchy table.)

© 1999-2007 NCR Corporation, All Rights Reserved 52


Chapter Two
Analytic Data Sets

Although variables duplicated by dimension are always implemented as aggregates by


necessity, the variables may or may not be summarized values. The example previously given
of average transaction amount is a summarized value where the individual dimension values
apply to multiple rows or observations. However, if the dimension values apply to specific
rows for each anchor key (see Join Paths, Anchor Tables and Anchor Keys below), then
duplicating by dimension amounts to picking out specific values rather than summarizing
over dimension values. An example of this might be dimensioning by month the values in a
table that summarizes transaction amounts by customer and month. In this case, dimensioning
by month simply selects the individual monthly sums or averages, creating a separate variable
for each. To do this the default aggregate function MIN is used.

Depending on the nature of the variable being dimensioned, the user may want to treat values
not applying to a particular dimension value as either NULL or 0. The use of NULL in this
case results in the possibility of the dimensioned variable being NULL if no data applies. The
use of 0 in this case simply gives a total of 0 if no data applies. An option is therefore
provided to the user to indicate that either NULL or 0 should be used when no data applies.

Applying Dimension Values


Consider the following example of defining dimension values based on a column called
tran_code in the input table twm_credit_tran from which a variable is being defined based on
another input column called tran_amt. The valid values of tran_code may be extracted
directly from the twm_credit_tran table using the Values button on the Variable Creation
input screen. (They could also be taken from the output of a previous run of the Data
Explorer or Frequency analyses.) At this point the user might select the tran_code values
‘CA’, ‘CG’, and ‘PM’ as dimension values, and the combination of ‘CA’ and ‘CG’ as a
fourth dimension value. A name is given to each of these dimension values to be used in
conjunction with variable names in naming any variables dimensioned by this dimension
value. A descriptive string may also be associated with each of the dimension values.

The dimension information is shown below for conceptual purposes in the form of two tables.
Note that the Dimension Values table targets the dimension values of tran_code in a
particular table. Notice that the conditions comprising the elements of the dimension may
overlap. That is, they do not need to be mutually exclusive in value.

Dimension Values:

Dimension Value Name Full Description


tran_code = ‘CA’ tran_code_CA Cash advance
tran_code = ‘CG’ tran_code_CG Charge
tran_code = ‘PM’ tran_code_PM Payment
tran_code IN (‘CA’,‘CG’) tran_code_CA_CG Advance or charge

Suppose the above dimension values are applied to a new variable, AVG(tran_amt), with
abbreviation Amt. The select list items for the AVG(tran_amt) dimensioned by these
dimension values would produce 4 variables:

Variable Name Full Description


tran_code_CA_Amt Average Tran Amount for Cash advances
tran_code_CG_Amt Average Tran Amount for Charges
tran_code_PM_Amt Average Tran Amount for Payments
tran_code_CA_CG_Amt Average Tran Amount for Advances or Charges

53 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Conditions other than simple inclusion can be used in defining dimensions. In fact, any SQL
construct listed previously with the exception of an aggregation or ordered analytic function
can be used.

Join Paths, Anchor Table and Anchor Keys


For each Variable Creation analysis, appropriate join paths must be set up, if columns from
multiple tables are used in creating the variables. The first step in putting together the Join
Paths is to determine what your “Anchor Table” and “Anchor Keys” are.

The anchor table is a table that contains all of the key values to be included in the final data
set. Physically, this can be a table or a view residing in Teradata. The data set anchor key
columns must be included in the anchor table and must uniquely identify rows in the anchor
table, otherwise unpredictable results may occur when joining this table with others.

Join paths must be specified from the Anchor Table to every table used to create variables,
dimensions and/or specified in a WHERE, QUALIFY or HAVING clause. This information
is used to build up a FROM clause for each table or view to be left outer joined with the
anchor table in order to include the appropriate anchor key values in the data set.

The following is an example of a simple join path between two tables. Note that the
containing databases can differ as can the joining table names and column names.

db1.tbl1.cust_id = db2.tbl2.cid

In some cases more than two tables must be joined together to reach a commonly used table.
By way of an example, a transaction table may not contain the customer identifier that forms
the primary index of the anchor table, but an account number instead, which is tied to
customer identifier in a third table which contains both values.

db1.tbl1.cust_id = db2.tbl2.cust_id AND


db2.tbl2.acct_id = db3.tbl3.acct_id

Of course, more complex examples can occur in practice and can be accommodated by a join
path with sufficient conditions combined together.

The Variable Creation function includes a Join Path wizard to make it easier to build up
complex join paths. Note also that join paths can be automatically extracted from other
analyses in the same project. This suggests that join paths can be created once in a Variable
Creation analysis, and then copied into a project to be used as a template.

SQL Generation
In order to derive the variables defined in a Variable Creation function, SQL is generated in
one of a number of forms depending on the result option selected. (Note that for each of
these forms, there is an option to "Create SQL Only" without executing the SQL.)

• "Select"
• "Explain Select"
• "Drop Table" and "Create Table As"

© 1999-2007 NCR Corporation, All Rights Reserved 54


Chapter Two
Analytic Data Sets

• "Drop View" and "Create View"

When the SELECT option is chosen for output, if another analysis refers to this Variable
Creation analysis for its input, the SQL takes the form of a "Drop Table" and "Create Volatile
Table As".

Note that it is necessary to generate a DROP command prior to a CREATE in case the
definition of the table or view has changed since a previous execution. For each variable, a
select list item is generated for the variable expression. If requested as expert options,
WHERE, QUALIFY and/or HAVING clauses may be generated. In the FROM clause, data is
selected from the anchor table, and left outer joined to any other tables referred to in the
variable, dimension or expert clause definitions. Aliases are generated for each table or view
accessed and all column names are automatically qualified using these aliases.

Initiate a Variable Creation Function


Use the following procedure to initiate a new Variable Creation analysis in Teradata
Warehouse Miner:

1. Click on the Add New Analysis icon in the toolbar:

2. In the resulting Add New Analysis dialog box, click on ADS under Categories and then
under Analyses double-click on Variable Creation:

55 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

3. This will bring up the Variable Creation dialog in which you can define INPUT /
OUTPUT options.

Variable Creation - INPUT - Variables


On the Variable Creation dialog, click on INPUT and then click on variables. Click on
Variables on the large tab in the center of the panel.

© 1999-2007 NCR Corporation, All Rights Reserved 56


Chapter Two
Analytic Data Sets

Note that this screen may be resized by clicking on one of the edges or corners and moving
the mouse while holding the button down.

Selection Options

Input Source
Select Table to input from a table or view, or select Analysis to select directly from
the output of a qualifying analysis in the same project. (Selecting Analysis will
cause the referenced analysis to be executed before this analysis whenever this
analysis is run. It will also cause the referenced analysis to create a volatile table if
the Output option of the referenced analysis is Select.)
Databases
All databases which are available for the Variable Creation analysis.
Tables
All tables within the Source Database which are available for the Variable Creation
analysis.
Columns
All columns within the selected table which are available for the Variable Creation
analysis.
Values
If a single column is highlighted and the Values button is clicked, a window appears
above the Columns selector displaying distinct values that appear in the selected
column in the selected table or view. The query to retrieve these values is affected by
two options on the Limits tab of the Tools menu item called Preferences, namely: Use

57 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

sampling to retrieve distinct value data and Number of rows to sample. To remove
the temporary window that displays the values, select the Hide button at the top of
the display. (Note that if the Input Source is Analysis and the column is in a volatile
table created by the referenced analysis, the retrieval of Values may fail. Just follow
the directions in the informational message displayed in case of failure to retrieve
data values.)

Right-Click Menu Options

The same right-click menu options are offered for the Columns selector on the left side of the
input screen as are offered for other input screens (refer to the Analysis Input Screen topic
in Using Teradata Warehouse Miner). Also, the following right-click options are available
within the Variables panel.

Expand All Nodes


Expands the nodes completely for each variable.
Collapse All Nodes
Collapses the nodes to show only variables.
Switch Input To ‘<table name>’
This option applies only when a SQL Column is highlighted in the Variables panel
(or a Variable containing only a single SQL Column). When this option is selected,
the selectors on the left side of the input screen are adjusted to match the table or
analysis that contains the selected SQL Column. (The column is also selected.)
Switch ‘<table name>’ To Current Input
This option applies only when a SQL Column is highlighted in the Variables panel
(or a Variable containing only a single SQL Column). When this option is selected,
the selectors on the left side of the input screen are used to change the input table or
analysis of the selected SQL Column. A pop-up menu is displayed to allow changing
the input for this column only or for all occurrences. For a single column, a column
with the same name must occur in the new (currently selected) input table or analysis
or an error is given. When all columns are changed, the new table or analysis must
contain all the columns or an error is given and no changes are made.
Apply Dimensions to Variables
This option jumps to the upper dimensions tab so that dimensions can be applied to
variables.

Creating Variables From Columns

The variables to be created are specified one at a time as any type of SQL expression. One
way to create a new variable is to click on the New button to produce the following within the
Variables panel:

Another way to create one or more new variables is to drag and drop one or more columns
from the Columns panel to the empty space at the bottom of the Variables panel (multiple

© 1999-2007 NCR Corporation, All Rights Reserved 58


Chapter Two
Analytic Data Sets

columns may be dragged and dropped at the same time). Each new variable is given the same
name as the corresponding column dropped onto the empty area.

One alternative to dragging and dropping a column is to use the right arrow selection button
to create a new variable from it. Another alternative is to double-click on the column. If the
right arrow button is clicked repeatedly, or the column is double-clicked repeatedly, a range
of columns may be used to create new variables, since the selected column increments each
time the arrow is clicked or the column is double-clicked. (It should be noted that when a
column or column value is selected, the right arrow selection button will only be highlighted
if a SQL Element is not selected. This can be ensured if the right-click option to Collapse All
Nodes is utilized in the SQL Element view.)

Whether dragging and dropping, clicking on the right arrow button or double-clicking on the
column, a new variable based on a column looks something like the following (after
expanding the node).

Creating Variables From SQL Elements

Still another way to create a new variable is to drag and drop a single SQL element from the
SQL Elements panel to the empty space at the bottom of the Variables panel, or to drag and
drop one or more column values displayed by selecting the Values button. In the case of
column values, a variable containing a single SQL Numeric Literal, String Literal or Date
Literal is created as appropriate for each column value. (This technique saves having to edit
the properties of a numeric, string or date literal to set the desired value.)

As with creating variables from selected columns, use of the right arrow selection button or
double-clicking the desired SQL element or column value provides an alternative to dragging
and dropping an element or value. Note however that repeated selection of a SQL element
does not advance the selected element so the result is multiple variables containing the same
SQL element. (Note also that when a SQL element is selected, the right arrow selection
button will only be highlighted if neither a column or a column value is selected in its
respective view.)

When a SQL element is placed on top of another element on the Variables panel, whether by
dragging and dropping it, selecting it with the right arrow or by double-clicking it, the new
element is typically inserted into the expression tree at that point. The element replaced is
then typically moved to an empty operand of the new SQL element.

Whether dragging and dropping, clicking on the right arrow button or double-clicking, a new
variable based on a SQL element looks something like the following example involving the
Average element:

59 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Copying or Moving a Variable

It is possible to create a copy of a variable by holding down the Control key on the keyboard
while dragging the variable to another location in the Variables panel. The copy can be
placed ahead of another variable by dropping it on that variable, or at the end of the list of
variables by dropping it on the empty space at the bottom of the Variables panel. It is also
possible to copy a variable in the same manner from another analysis by viewing the other
analysis at the same time and dragging the variable from one analysis to the other.

Please be aware that if the Control key is not held down while performing the copy operation
just described within the same analysis, the variable is moved form one place to the other, i.e.
deleted from its old location and copied to the new one. There are two exceptions to this.
First, this is not the case when copying a variable from one analysis to another, in which case
a copy operation is always performed, with or without holding down the Control key. The
second exception is when moving one child node on top of another child node of the same
parent in the expression tree that defines a variable. In this case, the two nodes or sub-
expressions are switched. (For example, if income and age are added together and age is
moved on top of income, the result is to add age and income, reversing the operands.)

Replicating a Variable

It is possible to create multiple varied copies of a variable by dropping or selecting mupltiple


columns or values onto a component of a variable that is not a folder, that is a component
that is designed to hold only a single element. For example, after selecting the New button, if
10 columns were dragged and dropped onto the empty node underneath the new variable, the
entire variable would be replicated 10 times, each copy containing a different column and
named with the original variable name appended with a number between 1 and 10.

Deleting All Variables

All variables can be deleted from the analysis by selecting the double-back-arrow button in
the center of the Variable Creation window. When this function is requested, one or more
warnings will be given. The first warning indicates how many variables are about to be
deleted. The second possible warning is given if the number of variables being deleted
exceeds 100, the maximum number of operations that can be undone or redone using the
Undo or Redo buttons. (If this warning is given and the Undo button is then selected, only
the first 100 variables will be restored. These are actually the last 100 deleted, since they are
deleted in reverse order.)

Buttons

New Button
Clicking on the New button creates a new Variable on the panel.

© 1999-2007 NCR Corporation, All Rights Reserved 60


Chapter Two
Analytic Data Sets

Add Button
Clicking on the Add button brings up a dialog to allow adding copies of variables from other
loaded analyses.

On this dialog select:

Available Analyses
This drop down list contains all of the Variable Creation analyses currently loaded in
the Project window, including those in other projects.

Available Variables
These are the variables in the currently selected analysis.

Retain dimensions attached to variables when copying


Checking this box will include any applied dimensions on variables copied into the
analysis. Unchecking this box will result in the dimensions being dropped from
copied variables.

61 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

(Note that if the box is checked and the selected analysis is in another project, and
one of the variables or dimensions applied to a variable being copied contains a
reference to another analysis, an error message will be given and none of the
variables will be copied.)

Map database objects in copied variables to new values


Checking this box will allow the user to change the databases, tables or columns
referenced in the variables being copied (and their dimensions, if any). This is done
by presenting the Object Mapping Wizard similar to the Import Wizard described in
the File Menu section of Using Teradata Warehouse Miner.

OK/Cancel/Apply
Each time the Apply button is clicked a copy of the currently selected variables are
added and a status message given. The Apply button is also disabled as a
consequence until another variable is selected. The dialog can be exited at any time
by clicking the OK or Cancel button. If OK is clicked, the currently selected
variables will be added unless the Apply button is disabled.

Wizard Button
When the Variables tab is selected and either a Variable is selected or nothing is
selected, the Wizard button can be used to generate new variables, each containing a
Searched Case statement. Alternately, when an appropriate folder is selected, When
Conditions for Searched Case statements, or conditional expressions for And All or Or
All statements, can be generated. To do so, highlight the Case Conditions folder under a
Case - Searched node or the Expressions folder under an And All or Or All node and
select the Wizard button.

The maximum number of variables or values that can be generated by a single application
of the wizard is limited to 1000.

The following dialog is given when a Variable or nothing at all is selected. (Note that in
the other cases a subset of these fields is displayed with appropriate instructions at the top
of the dialog.)

© 1999-2007 NCR Corporation, All Rights Reserved 62


Chapter Two
Analytic Data Sets

Variable Prefix
When a comparison operator such as Equal is selected in the Operator field, the
names of the resulting variables consist of the prefix followed by underscore and the
selected value. Otherwise the variable name is the prefix followed by a number.

Description
When a comparison operator such as Equal is selected in the Operator field, the
description of the resulting variables consist of the description specified here
followed by the operator and selected value. Otherwise the description is the
description entered here.

Left Side Column/Expression


Replace the "(empty)" node with a SQL Column or more complex expression
involving a SQL Column.

Then Expression
Replace the "(empty)" node with a SQL element or more complex expression that
will form the Then clause of the generated Searched Case expression. (The default
value of ‘1’ is useful for an indicator variable.)

Else Expression

63 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Replace the "(empty)" node with a SQL element or more complex expression that
will form the Else clause of the generated Searched Case expression. (The default
value of ‘0’ is useful for an indicator variable.)

Operator
Select a comparison operator such as Equals or select Between, Not Between, In, Not
In, Is Null or Is Not Null as the operator to use. If Between or Not Between is
selected, a variable or condition is generated for each pair of requested values. If In
or Not In is selected, the Wizard will generate a single variable or condition based on
all requested values when 'OK' or 'Apply' is clicked. If Is Null or Is Not Null is
selected, the Wizard will generate a single variable or condition based on no values.
Otherwise, if a comparison operator such as Equal is selected, the Wizard will
generate a variable or condition for each requested value.

Right Side Values

Values
This tab accepts values displayed by selecting the Values button for input
columns on the left side of the input screen. The displayed values can be drag-
dropped onto this panel, selected with the right-arrow button or selected by
double-clicking them. They can be numeric, string or date type values.

Note that when values are displayed on the left side of the input screen, the
ellipses button (the one displaying ‘…’) may be used to Select All Values.

Range
This tab can be used to generate a range of integer or decimal numeric values
based on a From, To and By field. If desired, the values can be generated in
descending order by making the From value greater than the To value, so that the
By value should always be positive. If the By field is not specified, an
incremental value of 1 is assumed. (Note that a value displayed with the Values
button may be drag-dropped into this field. Note also that the escape key will
revert to the last value entered in this field.)

When the Between or Not Between operator has been specified, the Range fields
behanve somewhat differently and may be used only to specify a single pair of
values using the From and To field, with the From field validated to be less than
or equal to the To field. The By field may not be specified when the Between or
Not Between operator has been specified.

List
A list of numeric, string or date type values can be entered here, separated by
commas (actually, by the standard list separator for the current locale settings).
(Note that a value displayed with the Values button may be drag-dropped into
this field. Note also that the escape key will revert to the last value entered in
this field.)

Clear All
This button will clear all of the fields of this dialog. (This is convenient because all
entries are generally retained when returning to this dialog.)

© 1999-2007 NCR Corporation, All Rights Reserved 64


Chapter Two
Analytic Data Sets

OK
This button will generate the requested variables or conditions and return to the
Variables panel.

Cancel
This button returns to the Variables panel without generating any elements.

Apply
This button will generate the requested variables or conditions and remain on this
panel. A status message is displayed just above this button reporting on the number
of generated conditions.

Delete Button
The Delete button can be used to delete any node within the tree. If applicable, the tree
will roll-up children, but in some cases, a delete may remove all children.

SQL Button
The SQL button can be used to dynamically display the SQL for any node within the
Variables tree. If the resulting display is not closed, the expression changes as you click
on the different levels of the tree comprising a variable. An option is provided in the
display to Qualify column names, that is to precede each column name in the display
with its database and table name.

Properties Button
A number of properties are available when defining a variable to be created, as outlined
below. Click the Properties button when the variable is highlighted, or double click on
the variable to bring up the Properties dialogue:

65 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Name:

A name must be specified for each variable. If the SQL expression defining the
variable is simply a SQL Column, the name defaults to the name of the column
automatically when the column is dragged to the variable.

(Tip: Variables can be named by single left-clicking on the name, which produces a
box around the name, as in Windows Explorer)

Output Type:

A specific Teradata data type may optionally be specified for each variable. If
specified, the SQL CAST function is used to force the data type to the requested
specification. Otherwise the type will be generated automatically by the variable’s
expression (Generate Automatically option). Valid options include:

• BYTEINT
• CHAR
• DATE
• DECIMAL
• FLOAT
• INTEGER
• SMALLINT
• TIME
• TIMESTAMP
• VARCHAR

Column Attributes:

One or more column attributes can be entered here in a free-form manner to be used
when an output table is created. They are placed as-entered following the column
name in the CREATE TABLE AS statement. This can be particularly useful when
requesting data compression for an output column, which might look like the
following: COMPRESS NULL.

Description:
An optional description may be specified for each variable. (Note that a default
description is generated automatically by the Wizard if its Description field contains
a value.)

Undo Button
The Undo button can be used to undo changes made to the Variables panel. Note that if a
number of variables or dimension values are added at one time, each addition requires a
separate undo request to reverse. Up to 100 undo requests can be processed.

Redo Button
The Redo button can be used to reinstate a change previously undone with the Undo
button.

© 1999-2007 NCR Corporation, All Rights Reserved 66


Chapter Two
Analytic Data Sets

Question-Mark Help Button


The Question-Mark Help button can be used to request help information about a specific
SQL element by first clicking on the question-mark and then on the SQL element in the
SQL Elements panel, Variables panel or Dimensions panel.

Variable Creation - INPUT - Variables - SQL Elements

The following SQL Elements are supported, by category:

Aggregations
A number of aggregation functions are supported, including several of a statistical nature.
Note that aggregation functions are not allowed in a Dimension value expression, but may be
used in a Variable expression that is being dimensioned. They are not allowed in a Where
clause or Qualify clause either. Double click on Aggregations to view the supported
functions:

Average
The standard average function is supported, taking a single expression argument and
generating AVG(expression). The function returns a value of type float, with the
exception that a value of type date is returned as the average of a date expression. When
dragging an Average into a variable, the following tree element is created:

Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.

67 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

The option to compute the average over distinct values only is provided, resulting in the
generation of AVG(DISTINCT expression). This option is enabled through the
Properties panel. Double-click on Average, or highlight it and hit the Properties button:

Correlation

An enhanced version of the standard correlation function is supported, generating


CORR(expression1, expression2) and returning a value of type float. When dragging a
Correlation into a variable, the following tree element is created:

Columns, and/or other non-aggregate expressions can be moved over the two (empty)
branches of the tree.

The enhancement is the ability to compute the correlation when either or both the first
and second expression arguments evaluate to type date, generating one of the following:

CORR(date expression1 - DATE '1900-01-01', expression2)


CORR(expression1, date expression2 - DATE '1900-01-01')
CORR(date expression1 - DATE '1900-01-01',
date expression2 - DATE '1900-01-01')

There are no special properties for the Correlation function.

Covariance

© 1999-2007 NCR Corporation, All Rights Reserved 68


Chapter Two
Analytic Data Sets

An enhanced version of the standard covariance function is supported, generating


COVAR_SAMP(expression1, expression2) for the sample covariance or
COVAR_POP(expression1, expression2) for the population covariance, while returning a
value of type float. When dragging a Covariance into a variable, the following tree
element is created:

Columns, and/or other non-aggregate expressions can be moved over the two (empty)
branches of the tree.

The enhancement consists of the ability to compute the covariance when either or both
the first and second expression arguments evaluate to type date, generating one of the
following (in which COVAR_POP may be substituted for COVAR_SAMP):

COVAR_SAMP(date expression1 - DATE '1900-01-01', expression2)


COVAR_SAMP(expression1, date expression2 - DATE '1900-01-01')
COVAR_SAMP(date expression1 - DATE '1900-01-01',
date expression2 - DATE '1900-01-01')

The option to compute the covariances on the population or sample is offered through the
Properties panel. Double-click on Covariance, or highlight it and hit the Properties
button:

Count

69 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

The standard count function is supported, generating either COUNT(*) or


COUNT(expression) and returning a value of type integer in Teradata run mode or
decimal(15,0) in ANSI run mode. When dragging a Count into a variable, one of the
following tree elements is created:

The (empty) is added, if no expression yet existed within the variable. Otherwise the
expression is maintained. In either case, columns, and/or other non-aggregate expressions
can be moved over the (empty) branch in the tree. An Asterisk (*) may also be moved
from the Other category to request the COUNT(*) function.

The option to compute the count over distinct values only is provided, resulting in the
generation of COUNT(DISTINCT expression). This option is enabled through the
Properties panel. Double-click on Count, or highlight it and hit the Properties button:

Kurtosis

An enhanced version of the standard kurtosis function is supported, generating


KURTOSIS(expression) and returning a type of float. When dragging a Kurtosis into a
variable, the following tree element is created:

© 1999-2007 NCR Corporation, All Rights Reserved 70


Chapter Two
Analytic Data Sets

Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.

The enhancement consists of the ability to compute the kurtosis of a date expression,
generating KURTOSIS(date expression - DATE '1900-01-01').

The standard option to compute the kurtosis over distinct values only is also provided,
resulting in the generation of KURTOSIS(DISTINCT expression). This option is enabled
through the Properties panel. Double-click on Kurtosis, or highlight it and hit the
Properties button:

Maximum

The standard maximum function is supported, generating MAX(expression) and


returning a value of type matching the type of the expression. When dragging a
Maximum into a variable, the following tree element is created:

Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.

71 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

The option to compute the maximum over distinct values only is provided, resulting in
the generation of MAX(DISTINCT expression). This option is enabled through the
Properties panel. Double-click on Maximum, or highlight it and hit the Properties
button:

Minimum

The standard minimum function is supported, generating MIN(expression) and returning


a value of type matching the type of the expression. When dragging a Minimum into a
variable, the following tree element is created:

Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.

The option to compute the minimum over distinct values only is provided, resulting in the
generation of MIN(DISTINCT expression). This option is enabled through the
Properties panel. Double-click on Minimum, or highlight it and hit the Properties
button:

© 1999-2007 NCR Corporation, All Rights Reserved 72


Chapter Two
Analytic Data Sets

Regression Intercept

The standard regression intercept function is supported, generating


REGR_INTERCEPT(dependent expression, independent expression) and returning a
value of type float. When dragging a Regression Intercept into a variable, the following
tree element is created:

Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.

There are no special properties for the Regression Intercept function.

Regression R-Squared

The standard regression coefficient of determination or R-Squared function is supported,


generating REGR_R2(dependent expression, independent expression) and returning a
value of type float. When dragging a Regression R-Squared into a variable, the following
tree element is created:

73 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.

There are no special properties for the Regression R-Squared function.

Regression Slope

The standard regression slope function is supported, generating


REGR_SLOPE(dependent expression, independent expression) and returning a value of
type float. When dragging a Regression Slope into a variable, the following tree element
is created:

Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.

There are no special properties for the Regression Slope function.

Skewness

An enhanced version of the standard skew function is supported, generating


SKEW(expression) and returning a type of float. When dragging a Skewness into a
variable, the following tree element is created:

Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.

The enhancement consists of the ability to compute the skew of a date expression,
generating SKEW(date expression - DATE '1900-01-01').

The standard option to compute the skew over distinct values only is also provided,
resulting in the generation of SKEW(DISTINCT expression). This option is enabled

© 1999-2007 NCR Corporation, All Rights Reserved 74


Chapter Two
Analytic Data Sets

through the Properties panel. Double-click on Skewness, or highlight it and hit the
Properties button:

Standard Deviation

An enhanced version of the standard function for standard deviation is supported,


generating either STDDEV_SAMP(expression) for the sample standard deviation or
STDDEV_POP(expression) for the population standard deviation, while returning a
value of type float. When dragging a Standard Deviation into a variable, the following
tree element is created:

Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.

The enhancement consists of the ability to compute the standard deviation of a date
expression, generating for the sample version STDDEV_SAMP(date expression - DATE
'1900-01-01').

The standard option to compute the standard deviation over distinct values only is also
provided, resulting in the generation for the sample version of
STDDEV_SAMP(DISTINCT expression). Both this option as well as the options for
population and sample versions of standard deviation are enabled through the Properties
panel. Double-click on Standard Deviation, or highlight it and hit the Properties button:

75 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Sum

The standard sum function is supported, generating SUM(expression). The type of the
resulting value depends on the type of the expression being summed. If the expression is
any of the integer types, the resulting value is of type integer. If the expression is a float
or character type, the resulting value is of type float. A decimal expression results in a
value of decimal type with 18 total digits and the same number of fractional digits
contained in the decimal expression. When dragging a Sum into a variable, the following
tree element is created:

Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.

The option to compute the sum over distinct values only is provided, resulting in the
generation of SUM(DISTINCT expression). This option is enabled through the
Properties panel. Double-click on Sum, or highlight it and hit the Properties button:

© 1999-2007 NCR Corporation, All Rights Reserved 76


Chapter Two
Analytic Data Sets

Variance

An enhanced version of the standard variance function is supported, generating either


VAR_SAMP(expression) for the sample variance or VAR_POP(expression) for the
population variance, while returning a value of type float. When dragging a Variance into
a variable, the following tree element is created:

Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.

The enhancement consists of the ability to compute the variance of a date expression,
generating for the sample version VAR_SAMP(date expression - DATE '1900-01-01').

The standard option to compute the variance over distinct values only is also provided,
resulting in the generation for the sample version of VAR_SAMP(DISTINCT
expression). Both this option as well as the options for population and sample versions of
standard deviation are enabled through the Properties panel. Double-click on Standard
Deviation, or highlight it and hit the Properties button:

77 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Arithmetic

Numeric functions can operate in general on any expression that can automatically be
converted to a numeric value. Character type operands are automatically converted to a
number of type float if possible performing the numeric function. Additionally, the standard
and Teradata specific numeric operators are supported. Double click on Arithmetic to view
the supported functions and operators:

Absolute Value

The standard absolute value function is supported, generating ABS(expression) and


returning a positive value of the same magnitude with type matching that of expression or

© 1999-2007 NCR Corporation, All Rights Reserved 78


Chapter Two
Analytic Data Sets

float if expression is a character type. When dragging an Absolute Value into a variable,
the following tree element is created:

Columns, and/or other expressions can be moved over the (empty) branch of the tree.

There are no special properties for the Absolute Value function.

Add

The standard Add (+) operator is supported, generating expression+ expression. Within
Teradata, these operators automatically convert numeric operands to the expected result
type before they are applied. Character type data is converted to FLOAT if possible
before being applied. Operands of type DATE are valid, when adding an integer number
of days to a date expression. The resulting data types and other specific usage
information are documented in some detail in the Teradata documentation. When
dragging an Add into a variable, the following tree element is created:

Columns, and/or other expressions can be moved over the (empty) branches of the tree.

There are no special properties for the Add function.

Divide

The standard Divide (/) operator is supported, generating expression / expression. Within
Teradata, these operators automatically convert numeric operands to the expected result
type before they are applied. Character type data is converted to FLOAT if possible
before being applied. The resulting data types and other specific usage information are
documented in some detail in the Teradata documentation. When dragging a Divide into
a variable, the following tree element is created:

Columns, and/or other expressions can be moved over the (empty) branches of the tree.

79 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

A Teradata Warehouse Miner enhancement to the divide '/' operator is offered to


optionally request that divide-by-zero protection be provided. If this option is requested,
a NULLIF function is added in the denominator so that the overall expression evaluates
to NULL if the expression in the denominator evaluates to zero. This option is enabled
through the Properties panel. Double-click on Divide, or highlight it and hit the
Properties button:

Exponentiate

The Teradata specific exponentiate operatir (Expression raised to a value) is supported,


generating (expression ** value) and returning a value of type float. When dragging an
Exponentiate into a variable, the following tree element is created:

Columns, and/or other expressions can be moved over the (empty) branches of the tree.
Note that the second argument must resolve to a numeric literal.

There are no special properties for the Exponentiate function.

Logarithm

© 1999-2007 NCR Corporation, All Rights Reserved 80


Chapter Two
Analytic Data Sets

The standard base-10 logarithm function is supported, generating LOG(expression) and


returning a value of type float. When dragging a Logarithm into a variable, the following
tree element is created:

Columns, and/or other expressions can be moved over the (empty) branch of the tree.

There are no special properties for the Logarithm function.

Modulo

The Teradata specific implementation of the Modulo (MOD) operator is supported,


generating expression MOD expression. Within Teradata, this operator automatically
converts numeric operands to the expected result type before they are applied. Character
type data is converted to FLOAT if possible before being applied. The resulting data
types and other specific usage information are documented in some detail in the Teradata
documentation. When dragging a Modulo into a variable, the following tree element is
created:

Columns, and/or other expressions can be moved over the (empty) branches of the tree.

There are no special properties for the Modulo operator.

Multiply

The standard Multiply (*) operator is supported, generating expression * expression.


Within Teradata, this operator automatically converts numeric operands to the expected
result type before they are applied. Character type data is converted to FLOAT if possible
before being applied. The resulting data types and other specific usage information are
documented in some detail in the Teradata documentation. When dragging a Multiply
into a variable, the following tree element is created:

81 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Columns, and/or other expressions can be moved over the (empty) branches of the tree.

There are no special properties for the Multiply function.

Natural Exponentiate

The standard natural exponentiate function (e to the power) is supported, generating


EXP(expression) and returning a value of type float. When dragging a Natural
Exponentiate into a variable, the following tree element is created:

Columns, and/or other expressions can be moved over the (empty) branch of the tree.

There are no special properties for the Natural Exponentiate function.

(Note that it may be advisable to use a Case statement in conjunction with this function if
extreme values in the data may occur, resulting in an overflow or SQL argument error.)

Natural Logarithm

The standard natural logarithm function is supported, generating LN(expression) and


returning a value of type float. When dragging a Natural Logarithm into a variable, the
following tree element is created:

Columns, and/or other expressions can be moved over the (empty) branch of the tree.

There are no special properties for the Natural Logarithm function.

(Note that it may be advisable to use a Case statement in conjunction with this function if
zero or negative values may occur in the data, resulting in a SQL argument error.)

Random

The random function is a non-standard Teradata SQL feature, generating RANDOM(x, y)


and returning a pseudo-random integer between x and y. When dragging a Random into a
variable, the following tree element is created:

© 1999-2007 NCR Corporation, All Rights Reserved 82


Chapter Two
Analytic Data Sets

The integers x (Lower Bound) and y (Upper Bound) are set through the Properties panel.
Double-click on Random, or highlight it and hit the Properties button:

Square Root

The standard square root function is supported, generating SQRT(expression) and


returning a value of type float. When dragging a Square Root into a variable, the
following tree element is created:

Columns, and/or other expressions can be moved over the (empty) branch of the tree.
Note that expressions that resolve to a negative number will result in SQL errors.

There are no special properties for the Square Root function.

Subtract

The standard Subtract (-) operator is supported, generating expression - expression.


Within Teradata, this operator automatically converts numeric operands to the expected
result type before they are applied. Character type data is converted to FLOAT if possible
before being applied. Operands of type DATE are valid, when subtracting an integer

83 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

number of days from a date expression. The resulting data types and other specific usage
information are documented in some detail in the Teradata documentation. When
dragging a Subtract into a variable, the following tree element is created:

Columns, and/or other expressions can be moved over the (empty) branch of the tree.

There are no special properties for the Subtract function.

Unary Minus

The standard Unary Minus (-) operator is supported, generating -expression. Within
Teradata, this operator automatically converts numeric operands to the expected result
type before they are applied. Character type data is converted to FLOAT if possible
before being applied. The resulting data types and other specific usage information are
documented in some detail in the Teradata documentation. When dragging a Unary
Minus into a variable, the following tree element is created:

Columns, and/or other expressions can be moved over the (empty) branch of the tree.

There are no special properties for the Unary Minus function.

Unary Plus

The standard Unary Plus (+) operator is supported, generating +expression. Within
Teradata, this operator automatically converts numeric operands to the expected result
type before they are applied. Character type data is converted to FLOAT if possible
before being applied. The resulting data types and other specific usage information are
documented in some detail in the Teradata documentation. When dragging a Unary Plus
into a variable, the following tree element is created:

Columns, and/or other expressions can be moved over the (empty) branch of the tree.

© 1999-2007 NCR Corporation, All Rights Reserved 84


Chapter Two
Analytic Data Sets

There are no special properties for the Unary Plus function.

Calendar

A Teradata Warehouse Miner specific function is provided for transforming a date


expression, or the date portion of a timestamp expression, into one of many fields based on
the Teradata system calendar. The type of the expression returned is always integer. Although
the built-in Teradata system calendar is not used to perform the function, the function mimics
the derivation of each of the fields in the Teradata system calendar. It uses some of the same
calculations used in the underlying system calendar views, but it also relies on Teradata date
arithmetic and the standard SQL EXTRACT function. Further, the Teradata Warehouse
Miner calendar function may be applied equally to a date or timestamp expression.

Double-click on Calendar to see all of the supported functions:

Day of Calendar

The Day of Calendar function is supported, returning an integer 1-n, the number of Julian
days since 1/1/1900. When dragging a Day of Calendar into a variable, the following tree
element is created:

Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.

85 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

There are no special properties for the Day of Calendar function.

Day of Month

The Day of Month function is supported, returning an integer 1-31, the number of the day
within a given month. When dragging a Day of Month into a variable, the following tree
element is created:

Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.

There are no special properties for the Day of Month function.

Day of Week

The Day of Week function is supported, returning an integer 1-7, the number of the day
within a given week assuming 1/1/1900 Is Monday. When dragging a Day of Week into a
variable, the following tree element is created:

Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.

There are no special properties for the Day of Week function.

Day of Year

The Day of Year function is supported, returning an integer 1-366, the number of the day
within a given year. When dragging a Day of Year into a variable, the following tree
element is created:

Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.

There are no special properties for the Day of Year function.

© 1999-2007 NCR Corporation, All Rights Reserved 86


Chapter Two
Analytic Data Sets

Month of Calendar

The Month of Calendar function is supported, returning an integer 1-N, the number of
months Since 1/1/1900. When dragging a Month of Calendar into a variable, the
following tree element is created:

Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.

There are no special properties for the Month of Calendar function.

Month of Quarter

The Month of Quarter function is supported, returning an integer 1-3, the number of the
month in a given quarter. When dragging a Month of Quarter into a variable, the
following tree element is created:

Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.

There are no special properties for the Month of Quarter function.

Month of Year

The Month of Year function is supported, returning an integer 1-12, the number of the
month in a given year. When dragging a Month of Year into a variable, the following tree
element is created:

Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.

There are no special properties for the Month of Year function.

Quarter of Calendar

87 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

The Quarter of Calendar function is supported, returning an integer 1-N, the number of
Quarters Since Q1/1900. When dragging a Quarter of Calendar into a variable, the
following tree element is created:

Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.

There are no special properties for the Quarter of Calendar function.

Quarter of Year

The Quarter of Year function is supported, returning an integer 1-4, the quarter of the
year where Jan-Mar=1, Apr-Jun=2, Jul-Sep=3, Oct-Dec=4. When dragging a Quarter of
Year into a variable, the following tree element is created:

Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.

There are no special properties for the Quarter of Year function.

Week of Calendar

The Week of Calendar function is supported, returning an integer 0-N, partial week in the
beginning is 0. When dragging a Week of Calendar into a variable, the following tree
element is created:

Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.

There are no special properties for the Week of Calendar function.

Week of Month

© 1999-2007 NCR Corporation, All Rights Reserved 88


Chapter Two
Analytic Data Sets

The Week of Month function is supported, returning an integer 0-5, partial week in the
beginning is 0. When dragging a Week of Month into a variable, the following tree
element is created:

Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.

There are no special properties for the Week of Month function.

Week of Year

The Week of Year function is supported, returning an integer 0-53, partial week in the
beginning is 0. When dragging a Week of Year into a variable, the following tree element
is created:

Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.

There are no special properties for the Week of Year function.

Weekday of Month

The Weekday of Month function is supported, returning an integer 1-5, nth occurrence of
day of week in month. When dragging a Weekday of Month into a variable, the
following tree element is created:

Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.

There are no special properties for the Weekday of Month function.

Year of Calendar

89 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

The Year of Calendar function is supported, returning the year, assuming year starts
January 1st. When dragging a Year of Calendar into a variable, the following tree element
is created:

Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.

There are no special properties for the Year of Calendar function.


Case

Both the standard valued and searched CASE expressions are supported. In both cases the
ELSE expression is optional, and if not specified, the expression will return NULL if no
WHEN conditions evaluate to TRUE. At least one WHEN/THEN condition is required with
both forms of CASE expression, and a test expression is also required with the valued form.

(Note that a CASE statement embedded within a CASE statement as a THEN or ELSE
expression is automatically enclosed in parentheses if it is not already so enclosed. This
makes it easier to achieve correct syntax if a nested CASE statements is needed.)

Double-click on Case to see all of the CASE-related functions:

Case - Searched

The standard searched CASE expression is supported. When dragging a Searched Case
into a variable, the following tree element is created:

© 1999-2007 NCR Corporation, All Rights Reserved 90


Chapter Two
Analytic Data Sets

The searched CASE statement is built up by supplying one or more conditions within the
Conditions folder. Each time a Condition is added, the following tree element is created:

Which evaluates to:

CASE WHEN condition/expression THEN expression END

Columns, and/or other expressions can be moved over the (empty) branches of the tree.

Note that the ELSE expression is optional, and if not specified, the expression will return
NULL if no WHEN conditions evaluate to TRUE. At least one WHEN/THEN condition
is required with the CASE expression.

There are no special properties for the Searched Case function.

Case - Valued

The standard valued CASE expression is supported. When dragging a Valued Case into a
variable, the following tree element is created:

The valued CASE statement is built up by supplying one or more conditions within the
Conditions folder. Each time a Condition is added, the following tree element is created:

Which evaluates to:

91 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

CASE expression WHEN condition/expression THEN expression END

Columns, and/or other expressions can be moved over the (empty) branches of the tree.

Note that the ELSE expression is optional, and if not specified, the expression will return
NULL if no WHEN conditions evaluate to TRUE. At least one WHEN/THEN condition
is required with the CASE expression, and a test expression is also required with the
valued form.

There are no special properties for the Valued Case function.

Case Condition

For both Searched and Valued CASE statements, any number of conditions can be built
up. In order to do so, a Condition must first be dragged and dropped into the Conditions
folder of a Searched Case or Valued Case expression. Each condition results in an
expression of the form WHEN expression THEN expression. As an example, when
dragging a Condition into the Conditions folder of a Searched Case expression, the
following tree element is created:

Columns, and/or other expressions can be moved over the (empty) branches of the tree.

There are no special properties for the Condition function.

Coalesce

The standard COALESCE case expression is supported, generating SQL of the form
COALESCE(expression, …expression). It must be supplied at least two arguments. The
entire COALESCE expression will automatically be enclosed in parenthesis if it is not
part of an expression and not already enclosed in parenthesis. When dragging a Coalesce
into a variable, the following tree element is created:

Multiple expressions can be built up within the Expressions folder.

© 1999-2007 NCR Corporation, All Rights Reserved 92


Chapter Two
Analytic Data Sets

Note that COALESCE can be used in place of the non-standard Teradata specific
command ZEROIFNULL. For example, COALESCE(column1, 0) is equivalent to
ZEROIFNULL(column1).

There are no special properties for the Coalesce function.

Null If

The standard NULLIF case expression is supported, generating SQL of the form
NULLIF(expression, expression). It must be supplied exactly two arguments. The entire
NULLIF expression will automatically be enclosed in parenthesis if it is not part of an
expression and not already enclosed in parenthesis. When dragging a Null If into a
variable, the following tree element is created:

Columns, and/or other expressions can be moved over the (empty) branches of the tree.

Note that NULLIF can be used in place of the non-standard Teradata specific command
NULLIFZERO. For example, NULLIF(column1, 0) is equivalent to
NULLIFZERO(column1).

There are no special properties for the Null If function.

Null If Zero

The non-standard Teradata specific NULLIFZERO case expression is supported,


generating SQL of the form NULLIFZERO(expression). It must be supplied exactly one
argument. When dragging a Null If Zero into a variable, the following tree element is
created:

A column, and/or other expression can be moved over the (empty) branch of the tree.

Note that the Null If element, which generates the standard NULLIF command, can be
used in place of the Null If Zero element, which generates the non-standard Teradata
specific command NULLIFZERO. (In Teradata SQL, NULLIF(column1, 0) is equivalent
to NULLIFZERO(column1).)

93 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

There are no special properties for the Null If Zero function.

Zero If Null

The non-standard Teradata specific ZEROIFNULL case expression is supported,


generating SQL of the form ZEROIFNULL (expression). It must be supplied exactly one
argument. When dragging a Zero If Null into a variable, the following tree element is
created:

A column, and/or other expression can be moved over the (empty) branch of the tree.

Note that the Coalesce element, which generates the standard COALESCE command, can
be used in place of the Zero If Null element, which generates the non-standard Teradata
specific command ZEROIFNULL. (In Teradata SQL, COALESCE(column1, 0) is
equivalent to ZEROIFNULL (column1).)

There are no special properties for the Zero If Null function.

Comparison
The standard comparison operators are supported, including equals (=), not equals (<>), less
than (<), less than or equals (<=), greater than (>) and greater than or equals. Comparison
operators evaluate to a true or false condition which can be used in various contexts such as
case conditions. Double-click on Comparison to see all of the operators:

Equals

The Equals function is supported, generating expression = expression. When dragging an


Equals operator into a variable, the following tree element is created:

© 1999-2007 NCR Corporation, All Rights Reserved 94


Chapter Two
Analytic Data Sets

Columns, and/or other expressions and literals can be moved over the (empty) branches
of the tree.

There are no special properties for the Equals operator.

Greater Than

The Greater Than operator is supported, generating expression > expression. When
dragging a Greater Than operator into a variable, the following tree element is created:

Columns, and/or other expressions and literals can be moved over the (empty) branches
of the tree.

There are no special properties for the Greater Than operator.

Greater Than or Equals

The Greater Than or Equals operator is supported, generating expression => expression.
When dragging a Greater Than or Equals operator into a variable, the following tree
element is created:

Columns, and/or other expressions and literals can be moved over the (empty) branches
of the tree.

There are no special properties for the Greater Than or Equals operator.

Less Than

The Less Than operator is supported, generating expression < expression. When dragging
a Less Than operator into a variable, the following tree element is created:

95 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Columns, and/or other expressions and literals can be moved over the (empty) branches
of the tree.

There are no special properties for the Less Than operator.

Less Than or Equals

The Less Than or Equals operator is supported, generating expression <= expression.
When dragging a Less Than or Equals operator into a variable, the following tree element
is created:

Columns, and/or other expressions and literals can be moved over the (empty) branches
of the tree.

There are no special properties for the Less Than or Equals operator.

Does Not Equal

The Does Not Equal operator is supported, generating expression <> expression. When
dragging a Does Not Equal operator into a variable, the following tree element is created:

Columns, and/or other expressions and literals can be moved over the (empty) branches
of the tree.

There are no special properties for the Does Not Equal operator.

Date and Time

Many different date and time functions and operators are offered to extract elements from a
date or time column as well as perform differences/elapsed calculations on multiple date and
time elements. Double-click on Date and Time to see all of the functions and operators:

© 1999-2007 NCR Corporation, All Rights Reserved 96


Chapter Two
Analytic Data Sets

Add Months

The non-standard Teradata specific Add Months function is supported, generating


ADD_MONTHS(date or timestamp expression, integer expression for months). The type
of the value returned is the same as the type of the date or timestamp expression that
months are added to or subtracted from. When dragging an Add Months function into a
variable, the following tree element is created:

Columns, and/or other expressions and/or literals that resolve to a date can be moved
over the first (empty) branch of the tree, while expressions and/or literals that resolve to
type integer can be moved over the second (empty) branch of the tree.

There are no special properties for the Add Months function.

Current Date

The Current Date literal represents the current system date. It generates the SQL keyword
CURRENT_DATE and is of type Date. When dragging a Current Date function into a
variable, the following tree element is created:

97 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

There are no children (empty) branches of the tree as no arguments are required for the
Current Date function.

There are no special properties for the Current Date function.

Current Time

The Current Time literal represents the current system time and current session Time
Zone displacement. It generates the keyword CURRENT_TIME and is of type Time
With Time Zone. The feature allowing the specification of the number of digits of
precision for fractional seconds is not supported; no fractional digits are provided. When
dragging a Current Time function into a variable, the following tree element is created:

There are no children (empty) branches of the tree as no arguments are required for the
Current Time function.

There are no special properties for the Current Time function.

Current Timestamp

The Current Timestamp literal represents the current system timestamp and current
session Time Zone displacement. It generates the keyword CURRENT_TIMESTAMP
and is of type Timestamp With Time Zone. The feature allowing the specification of the
number of digits of precision for fractional seconds is not supported; six digits are always
provided. When dragging a Current Timestamp function into a variable, the following
tree element is created:

There are no children (empty) branches of the tree as no arguments are required for the
Current Timestamp function.

There are no special properties for the Current Timestamp function.

Date Difference

A Teradata Warehouse Miner specific function is provided for calculating the difference
between two Date and/or Timestamp expressions in various units. The integer measures
are calculated by expressing both dates in the requested units and then taking the integer
difference between the two (for example, the difference between April 1 and March 31 is

© 1999-2007 NCR Corporation, All Rights Reserved 98


Chapter Two
Analytic Data Sets

1 month). The fractional measures are in days converted to fractions of longer time
periods. Note that either Date or Timestamp expression may be a literal value, the built-in
function Current Date or Current Timestamp, or the analytic data set's target date value.
When dragging a Date Difference function into a variable, the following tree element is
created:

Columns, and/or other expressions/literals that resolve to a date or timestamp can be


moved over the (empty) branches of the tree.

Options to compute the difference in Days, Weeks, Months, Quarters or Years are set
through the Properties panel. If Weeks, Months, Quarters or Years are requested, the
units may be calculated in one of two different ways, as described on the Properties
panel. Double-click on Date Difference, or highlight it and hit the Properties button:

Date Field
Days
Calculate the date difference in integer days.
Weeks
Either calculate the difference in days and convert this to fractional weeks, or
express both dates in weeks and take the integer difference.
Months
Either calculate the difference in days and convert this to fractional months,
or express both dates in months and take the integer difference.

99 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Quarters
Either calculate the difference in days and convert this to fractional quarters,
or express both dates in quarters and take the integer difference.
Years
Either calculate the difference in days and convert this to fractional years, or
express both dates in years and take the integer difference.

Elapsed Time

A Teradata Warehouse Miner specific function is provided for calculating in various


units the elapsed time from midnight represented by the time portion of a Timestamp or
Time expression. Time zones are ignored. By default the elapsed time is calculated in
units of seconds, but may alternatively be requested in minutes, hours or days (fraction of
a day). When dragging an Elapsed Time function into a variable, the following tree
element is created:

Columns, and/or other expressions/literals that resolve to a time or timestamp can be


moved over the (empty) branch of the tree.

Options to compute the elapsed time in seconds, minutes, hours or days (fraction of a
day) are set through the Properties panel. Double-click on Elapsed Time, or highlight it
and hit the Properties button:

© 1999-2007 NCR Corporation, All Rights Reserved 100


Chapter Two
Analytic Data Sets

Extract Day

The standard date/time field extract function is supported for Day, generating
EXTRACT(DAY FROM date/time expression). If this function is applied to a column or
expression of type other than Date or Timestamp, a SQL runtime error will occur. When
dragging an Extract Day function into a variable, the following tree element is created:

Columns, and/or other expressions/literals that resolve to a date or timestamp can be


moved over the (empty) branch of the tree.

The type of the value returned is integer. There are no special properties for the Extract
Day function.

Extract Hour

The standard date/time field extract function is supported for Hour, generating
EXTRACT(HOUR FROM date/time expression). If this function is applied to a column
or expression of type other than Time or Timestamp, a SQL runtime error will occur.
When dragging an Extract Hour function into a variable, the following tree element is
created:

Columns, and/or other expressions/literals that resolve to a time or timestamp can be


moved over the (empty) branch of the tree.

The type of the value returned is integer. There are no special properties for the Extract
Hour function.

Extract Minute

The standard date/time field extract function is supported for Minute, generating
EXTRACT(MINUTE FROM date/time expression). If this function is applied to a
column or expression of type other than Time or Timestamp, a SQL runtime error will
occur. When dragging an Extract Minute function into a variable, the following tree
element is created:

101 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Columns, and/or other expressions/literals that resolve to a time or timestamp can be


moved over the (empty) branch of the tree.

The type of the value returned is integer. There are no special properties for the Extract
Minute function.

Extract Month

The standard date/time field extract function is supported for Month, generating
EXTRACT(MONTH FROM date/time expression). If this function is applied to a
column or expression of type other than Date or Timestamp, a SQL runtime error will
occur. When dragging an Extract Month function into a variable, the following tree
element is created:

Columns, and/or other expressions/literals that resolve to a date or timestamp can be


moved over the (empty) branch of the tree.

The type of the value returned is integer. There are no special properties for the Extract
Month function.

Extract Second

The standard date/time field extract function is supported for Second, generating
EXTRACT(DAY FROM date/time expression). If this function is applied to a column or
expression of type other than Time or Timestamp, a SQL runtime error will occur. When
dragging an Extract Second function into a variable, the following tree element is created:

Columns, and/or other expressions/literals that resolve to a time or timestamp can be


moved over the (empty) branch of the tree.

© 1999-2007 NCR Corporation, All Rights Reserved 102


Chapter Two
Analytic Data Sets

The type of the value returned is integer if fractional seconds precision is 0, and
DECIMAL(8, n) if precision is n. There are no special properties for the Extract Second
function.

Extract Year

The standard date/time field extract function is supported for Year, generating
EXTRACT(YEAR FROM date/time expression). If this function is applied to a column
or expression of type other than Date or Timestamp, a SQL runtime error will occur.
When dragging an Extract Year function into a variable, the following tree element is
created:

Columns, and/or other expressions/literals that resolve to a date or timestamp can be


moved over the (empty) branch of the tree.

The type of the value returned is integer. There are no special properties for the Extract
Year function.

Time Difference

A Teradata Warehouse Miner specific function is provided for calculating the time
differences between two Time or Timestamp expressions in seconds, minutes, hours or
days (fraction of a day). The date portion however of any Timestamp expression is
ignored, so that the measure is strictly the difference between two time values, assumed
to be from the same day. All the measures are based on the difference measured in
seconds, with conversions to other larger units. Any differences in time zones are
ignored. When dragging a Time Difference function into a variable, the following tree
element is created:

Columns, and/or other expressions/literals that resolve to a time or timestamp can be


moved over the (empty) branches of the tree.

The difference in seconds, minutes, hours and days are set through the Properties panel.
Double-click on Time Difference, or highlight it and hit the Properties button:

103 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Select Seconds, Minutes, Hours or Days from the Time Field pull-down.

Date/Time Difference

A Teradata Warehouse Miner specific function is provided for calculating the date/time
differences between two Timestamp columns in seconds, minutes, hours or days. Note
that this includes the day differences as well as the time differences. All the measures are
based on the difference measured in seconds, with conversions to other larger units. Any
differences in time zones are ignored. When dragging a Date/Time Difference function
into a variable, the following tree element is created:

Columns, and/or other expressions/literals that resolve to a time or timestamp can be


moved over the (empty) branches of the tree.

The difference in seconds, minutes, hours and days are set through the Properties panel.
Double-click on Date/Time Difference, or highlight it and hit the Properties button:

© 1999-2007 NCR Corporation, All Rights Reserved 104


Chapter Two
Analytic Data Sets

Select Seconds, Minutes, Hours or Days from the Time Field pull-down.

Literals

A number of SQL literal values may be used in SQL expressions that define created
variables. Double-click on Literals to see all of the literal operators:

Date

SQL Date literal values consist of the keyword DATE followed by a date enclosed in
single quotes with the format YYYY-MM-DD such as DATE ‘20003-12-31’. When
dragging a literal Date into a variable, the following tree element is created:

105 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

A default date of January 1, 0001 is provided, but can be changed via Properties. Double
click on Date (1/1/0001) or highlight and hit the Properties button:

Either the standard windows Calendar control can be used to set the desired date by
clicking through the months in the calendar using the < and > buttons, or just typing in
the desired date, specifying the month, day and year one at a time.

Null

The SQL Null literal represents an unknown value and is treated as having the type
Integer. It generates the SQL keyword NULL. When dragging a literal Null into a
variable, the following tree element is created:

There are no special properties for the literal Null.

Number

SQL numeric literal values of type BYTEINT, SMALLINT, INTEGER, FLOAT and
DECIMAL are supported. Care should be taken not to exceed the capacity of the type
(for example, specifying more than 18 decimal digits). When dragging a literal Number
into a variable, the following tree element is created:

© 1999-2007 NCR Corporation, All Rights Reserved 106


Chapter Two
Analytic Data Sets

A default value of an integer 0 is provided, but can be changed via Properties. Double
click on Number (0) or highlight and hit the Properties button:

Typing in an integer format number such as 1, results in a 1 being generated in the SQL,
while decimal format numbers such as 1.0, results in a 1.0000E0 being generated in the
SQL.

String

SQL String literal values consists of zero or more characters enclosed in single quotes
and are treated as being of type character varying with length equal to the number of
characters which are enclosed in quotes. The feature allowing specification of the
character set is not supported. When dragging a literal String into a variable, the
following tree element is created:

No default string is provided - use Properties to change it. Double click on String or
highlight and hit the Properties button:

107 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Type in any valid Teradata string literal.

(Note that the string literal will automatically be enclosed in quotes when SQL is
generated for the literal. If a single quote mark is included in the string literal, it will
automatically be "escaped" by doubling it. If however more than one quote mark is
entered, the value is placed in the SQL "as-is", without adding quote marks. This makes
it possible to enter a hexadecimal literal if desired, such as '00'XC.)

Time

SQL Time literal values consist of the keyword TIME followed by a time enclosed in
single quotes with the format HH:MM:SS. Time zones and fractional seconds are not
supported. When dragging a literal Time into a variable, the following tree element is
created:

A default time midnight is provided, but can be changed via Properties. Double click on
Time (00:00:00) or highlight and hit the Properties button:

© 1999-2007 NCR Corporation, All Rights Reserved 108


Chapter Two
Analytic Data Sets

You can highlight the hours, minutes and seconds and type in the desire time.

Timestamp

SQL Timestamp literal values consist of the keyword TIMESTAMP followed by a


timestamp enclosed in single quotes with the format YYYY-MM-DD HH:MM:SS. Time
zones are not supported on SQL Timestamp Literals. When dragging a literal Timestamp
into a variable, the following tree element is created:

A default timestamp of the current date and time is provided, but can be changed via
Properties. Double click on Timestamp (CurrentDate CurrentTime) or highlight and hit
the Properties button:

109 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Either the standard windows Calendar control can be used to set the desired date by
clicking through the months in the calendar using the < and > buttons, or just typing in
the desired date. You can highlight the hours, minutes and seconds and type in the desire
time.

Target Date
A Target Date, as defined in INPUT-target date (described below), can be used in
variable creation. When no target date has been specified yet, the default value is the
current date. When dragging a literal Target Date into a variable, the following tree
element is created:

There are no special properties for the Target Date operator.

Logical

Logical predicates are used to form conditional expressions that evaluate to true or false in a
manner similar to comparison operators. Click on Logical to view a list of supported
operators:

© 1999-2007 NCR Corporation, All Rights Reserved 110


Chapter Two
Analytic Data Sets

All
The standard All predicate is supported with an expression list but not with a subquery.
They may be used with a comparison operator and with the In / Not In and Like / Not
Like predicates. When dragging an All operator into a variable, the following tree
element is created:

Any number of columns, and/or other expressions can be moved into the Expressions
folder within the tree.

There are no special properties for the All operator.

And

The logical operator AND is supported for use in conditional expressions, connecting
either comparison operators or logical predicates. When dragging an And operator into a
variable, the following tree element is created:

111 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Columns, and/or other expressions can be moved into the (empty) branches of the tree.

There are no special properties for the And operator.

And All

The And All operator is a custom operator created for the convenience of connecting a
series of conditional expressions together with SQL And operators. When dragging an
And All operator into a variable, the following tree element is created:

Conditional expressions should be moved into the Expressions folder beneath the And
All node so that they will be connected with And operators. For example, if the
expressions "C1 = C2", "C3 = C4" and "C5 = C6" were moved into the Expressions
folder as three Equal nodes, the resulting SQL would be something like the expression
below. (Of course, the column names such as "C1" would be qualified with table aliases,
and the expression by itself is not valid as a variable, though it would be as a dimension.)

(("C1" = "C2") AND ("C3" = "C4")) AND ("C5" = "C6")

There are no special properties for the And All operator.

Any

The standard Any predicate is supported with an expression list but not with a subquery.
It may be used with a comparison operator and with the In / Not In and Like / Not Like
predicates. The following are some examples of the SQL generated for these cases.

expression = ANY (1, 2), is equivalent to:


expression = 1 OR expression = 2

expression IN ANY (1, 2), is equivalent to:


expression IN (1, 2) and to above

expression LIKE ANY ('%string%', '%string%'), is equivalent to:


expression LIKE ('%string%') OR expression LIKE ('%string%')

© 1999-2007 NCR Corporation, All Rights Reserved 112


Chapter Two
Analytic Data Sets

When dragging an Any operator into a variable, the following tree element is created:

Any number of columns, and/or other expressions can be moved into the Expressions
folder within the tree.

There are no special properties for the Any operator.

Between

The standard BETWEEN comparison predicate is supported. It generates SQL of the


form expression BETWEEN expression AND expression. The Between predicate
evaluates to true if the first expression is greater than or equal to the second expression at
the same time that it is less than or equal to the third expression. When dragging a
Between operator into a variable, the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branches of the tree,
the first argument being the expression to the left of the BETWEEN, the second to the
right, and the third to the right of the AND.

There are no special properties for the Between operator.

In

The standard In predicate is supported with a single expression or a list of literal


constants, but not with subqueries. That is it may be used to test whether or not an
expression equals another expression or is one of a list of values, but not if it is returned
from a query. The In predicate generates SQL of the form expression IN expression or
expression IN (literal, … literal). The use of the ANY predicate with the IN predicate is
optional. That is, IN (…), IN ANY (…) and = ANY (…) are all equivalent. When
dragging an In operator into a variable, the following tree element is created:

113 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

One or more literals or a single column or SQL element can be moved into the
Expressions folder within the tree. A column, expression or literal can be moved into the
(empty) branch of the tree.

There are no special properties for the In operator.

Is Null

The standard Is Null predicate is supported to test whether or not an expression has a
SQL NULL value, i.e. is undefined in a particular row. The generated SQL takes the
form expression IS NULL. When dragging an Is Null operator into a variable, the
following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branches of the tree.

There are no special properties for the Is Null operator.

Is Not Null

The standard Is Not Null predicate is supported to test whether or not an expression has a
SQL NULL value, i.e. is undefined in a particular row. The generated SQL takes the
form expression IS NOT NULL. When dragging an Is Not Null operator into a variable,
the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branches of the tree.

There are no special properties for the Is Not Null operator.

Like

The standard Like predicate is supported with pattern expressions but not with
subqueries. It generates SQL of the form expression LIKE ANY / ALL pattern. The
percent (%) or underscore (_) characters can be used to allow searching for a pattern. The

© 1999-2007 NCR Corporation, All Rights Reserved 114


Chapter Two
Analytic Data Sets

percent character represents zero or more characters of any value, whereas underline
represents exactly one. An “escape” character may not be specified. Some examples
include:

expression LIKE ('%string%')


expression LIKE ('_string_')
expression LIKE ANY ('%string%', '%string%')

When dragging a Like operator into a variable, the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branches of the tree,
where the first argument is the expression to the left of the LIKE and the second to the
right.

There are no special properties for the Like operator.

Not

The logical operator NOT is supported for use in conditional expressions, connecting
either comparison operators or logical predicates. When dragging a Not operator into a
variable, the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branch of the tree.

There are no special properties for the Not operator.

Not Between

The standard NOT BETWEEN comparison predicate is supported. It generates SQL of


the form or expression NOT BETWEEN expression AND expression. The Not Between
predicate is the logical opposite of the BETWEEN predicate. It tests that the first
expression is less than the second expression, or greater than the third expression. When
dragging a Not Between operator into a variable, the following tree element is created:

115 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Columns, and/or other expressions can be moved into the (empty) branches of the tree,
the first argument being the expression to the left of the BETWEEN, the second to the
right, and the third to the right of the AND.

There are no special properties for the Not Between operator.

Not In

The standard Not In predicate is supported with a single expression or a list of literal
constants, but not with subqueries. That is it may be used to test whether or not an
expression equals another expression or is one of a list of values, but not if it is returned
from a query. The Not In predicate generates SQL of the form expression NOT IN
expression or expression NOT IN (literal, … literal). The use of the ALL predicate with
the Not In predicate is optional, provided the single expression form is not used. That is,
NOT IN (…), NOT IN ALL (…) and <> ALL (…) are equivalent. When dragging a Not
In operator into a variable, the following tree element is created:

One or more literals or a single column or SQL element can be moved into the
Expressions folder within the tree. A column, expression or literal can be moved into the
(empty) branch of the tree.

There are no special properties for the Not In operator.

Not Like

The standard NOT LIKE predicate is supported with pattern expressions but not with
subqueries. It generates SQL of the form expression NOT LIKE ANY / ALL pattern. The
percent (%) or underscore (_) characters can be used to allow searching for a pattern. The
percent character represents zero or more characters of any value, whereas underline
represents exactly one. An “escape” character may not be specified. Some examples
include:

expression NOT LIKE ('%string%')


expression NOT LIKE ('_string_')
expression NOT LIKE ANY ('%string%', '%string%')

© 1999-2007 NCR Corporation, All Rights Reserved 116


Chapter Two
Analytic Data Sets

When dragging a Not Like operator into a variable, the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branches of the tree,
where the first argument is the expression to the left of the NOT LIKE and the second to
the right.

There are no special properties for the Not Like operator.

Or

The logical operator OR is supported for use in conditional expressions, connecting either
comparison operators or logical predicates. When dragging an Or operator into a variable,
the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branches of the tree,
where the first argument is to the left of the OR, the second to the right.

There are no special properties for the Or operator.

Or All

The Or All operator is a custom operator created for the convenience of connecting a
series of conditional expressions together with SQL Or operators. When dragging an Or
All operator into a variable, the following tree element is created:

Conditional expressions should be moved into the Expressions folder beneath the Or All
node so that they will be connected with Or operators. For example, if the expressions
"C1 = C2", "C3 = C4" and "C5 = C6" were moved into the Expressions folder as three

117 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Equal nodes, the resulting SQL would be something like the expression below. (Of
course, the column names such as "C1" would be qualified with table aliases, and the
expression by itself is not valid as a variable, though it would be as a dimension.)

(("C1" = "C2") OR ("C3" = "C4")) OR ("C5" = "C6")

There are no special properties for the Or All operator.

Ordered Analytical Functions

The following Ordered Analytical Functions are available in Variable Creation. Double Click
on Ordered Analytical to see:

Ordered Analytical Functions (previously known as OLAP or On Line Analytical Processing


functions) are distinguished from other SQL functions in that they order the data being
operated on before computing the function value, at times making use of "adjacent"
observations. Most of the functions are standard SQL functions with a common form, but a
few are non-standard Teradata specific functions included because they have no equivalent in
some or all Teradata releases. (These functions may not be mixed in the same analysis with
aggregation functions such as average, and partitioning is not supported in these functions
because of their use of the GROUP BY clause to perform partitioning). Some functions also
contain Teradata Warehouse Miner specific enhancements, as noted in the individual function
descriptions below.

All of the standard ordered analytical functions consist of a value expression enclosed in
parentheses and an OVER construct composed of an optional PARTITION BY clause, an
ORDER BY clause (required with all but group style aggregation) and possibly a ROWS
clause (depending on the function), all within parentheses. The PARTITION BY clause is

© 1999-2007 NCR Corporation, All Rights Reserved 118


Chapter Two
Analytic Data Sets

something like the GROUP BY clause in a simple aggregation, partitioning the rows into
groups over which the function is separately applied. The PARTITION BY clause effectively
causes the function to "start over" for each partitioned group of rows. An example of an
ordered analytical function containing these components is given below.

AVG(sales) OVER (PARTITION BY territory


ORDER BY month
ROWS 2 PRECEDING)

The traditional aggregate functions AVG, COUNT, MIN, MAX and SUM have ordered
versions that take on different styles depending on the ROWS clause that is used. The
variations available for these functions are Cumulative, Group, Moving and Remaining, as
outlined below. The RANK function and related functions PERCENT_RANK and
QUANTILE do not offer the ROWS options. Note that not all variations are available with
Teradata V2R4.1, as noted in the individual function descriptions that follow.

Rows options corresponding to a cumulative style aggregation include:

• ROWS UNBOUNDED PRECEDING


• ROWS BETWEEN UNBOUNDED PRECEDING AND value PRECEDING
• ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
• ROWS BETWEEN UNBOUNDED PRECEDING AND value FOLLOWING

Rows options corresponding to a group style aggregation include:

• ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

Rows options corresponding to a moving style aggregation include:

• ROWS value PRECEDING


• ROWS CURRENT ROW
• ROWS BETWEEN value PRECEDING AND value PRECEDING
• ROWS BETWEEN value PRECEDING AND CURRENT ROW
• ROWS BETWEEN value PRECEDING AND value FOLLOWING
• ROWS BETWEEN CURRENT ROW AND CURRENT ROW
• ROWS BETWEEN CURRENT ROW AND value FOLLOWING
• ROWS BETWEEN value FOLLOWING AND value FOLLOWING

Rows options corresponding to a remaining style aggregation include:

• ROWS BETWEEN value PRECEDING AND UNBOUNDED FOLLOWING


• ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
• ROWS BETWEEN value FOLLOWING AND UNBOUNDED FOLLOWING

Note that Ordered Expression is not an Ordered Analytical Function but simply a means of
specifying a sort direction (ascending or descending) on a sort expression. Note also that
Ordered Analytical Functions are not allowed in a Dimension value, Dimensioned variable,
Where clause or Having clause.

Moving Difference

119 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Given one or more columns/expressions, along with a width and sort expression list, this
Ordered Analytical Function derives a new column for each expression giving the
moving difference of the expression when the rows are sorted by the sort expression list.
The moving difference is calculated as the difference between the current value and the
nth previous value of the expression, where N equals the width. The moving difference is
NULL if there is no N -preceding row in the table or group.
th

In Teradata V2R4.1, this function is implemented using an enhanced version of MDIFF,


a non-standard Teradata specific function. MDIFF may not be mixed in the same analysis
with aggregation functions such as average, and partitioning is not supported. The SQL
generated takes the form MDIFF(expression, width, sort expression list). The
enhancement is the ability to compute the moving difference of a date expression,
generating MDIFF(expression - DATE '1900-01-01', width, sort expression list).

In Teradata V2R5.0 and later releases an equivalent version of Moving Difference is


generated using the standard ordered analytical function SUM. In this case the Partition
By option is available, as with other standard ordered analytical functions. Note that the
non-standard expression MDIFF(expression, width, sort expression list) is the same as:

expression - SUM(expression)
OVER (ORDER BY sort expression list ROWS BETWEEN width
PRECEDING AND width PRECEDING)

When dragging a Moving Difference function into a variable, the following tree element
is created:

Sort expressions can be built up in the Sort Expressions folder, and if the system is
V2R5.0 or later, Partition Columns can be built up in that folder (with V2R4.1 systems,
Partition Columns are ignored). Columns, and/or other expressions can be moved into the
(empty) branch of the tree. The Width is specified by the Properties panel. Double click
on Moving Difference, or highlight it and click on the Properties button:

© 1999-2007 NCR Corporation, All Rights Reserved 120


Chapter Two
Analytic Data Sets

A default width of 1 is given and can be updated here.

Moving Linear Regression

The non-standard Teradata specific moving linear regression function is supported,


generating MLINREG(expression, width, sort expression). The function will work
without any special enhancement when applied to a date expression. MLINREG may not
be mixed in the same analysis with aggregation functions such as average, and
partitioning is not supported.

Given a single expression, width, and sort expression, this Ordered Analytical Function
derives a new column giving the moving linear regression extrapolation of the expression
over "width" rows when sorted by the sort expression, using the sort expression as the
independent variable. The current and "width-1" rows after sorting are used to calculate
the simple least squares linear regression. For rows that have less than "width-1" rows
preceding it in the table or group, the function is computed using all preceding rows. The
first two rows in the table or group however will have the NULL value.

As an example, moving linear regression predicting y based on x over w rows looks like:

MLINREG(y, w, x)

When dragging a Moving Linear Regression function into a variable, the following tree
element is created:

121 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

A single sort expressions should be placed in the Sort Expressions folder, and a column
or expression should be moved into the (empty) branch of the tree. The Width is
specified by the Properties panel. Double click on Moving Linear Regression, or
highlight it and click on the Properties button:

A default width of 3 is given and can be updated here.

Ordered Expression

An Ordered Expression can be used in a Sort Expressions folder with any of the Ordered
Analytical Functions to specify a sort direction, either ascending or descending. The
appropriate SQL keyword, either ASC or DESC, is automatically added to the SQL
generated for the expression placed under the Ordered Expression node in the tree. If an
Ordered Expression is not used, the default sort direction is given, depending on the
Ordered Analytical Function in use. An example of an Ordered Expression in a Sort
Expressions folder is given below.

© 1999-2007 NCR Corporation, All Rights Reserved 122


Chapter Two
Analytic Data Sets

In order to set the sort direction the user must either highlight the Ordered Expression
node and click on the Properties button, or double-click on the Ordered Expression node
to receive the following Properties panel. Clicking on the OK button will cause the
selected sort order to be used.

Percent Rank

Given a sort expression list, this Ordered Analytical Function derives a new column
which assumes a value between 0 and 1 indicating the rank of the rows as a percentage of
rows when sorted by the sort expression list. The formula used for PERCENT_RANK is
(R – 1) / (N – 1) where R is the rank of the row and N is the number of rows overall or in
the partition.

As with the RANK function, when the column or expression has the same value for
multiple rows (say M rows), they are all assigned the same percent rank, while the
following M-1 percent rank values are not assigned. When an optional Partition By
clause is specified, the percent ranks are computed separately over the rows in each
partition. (Note from the formula used for PERCENT_RANK that if there is only one
row to be ranked in the table or partition, division by zero will result and give a numeric
overflow error.) Rows options are not available with the Percent Rank function.

A Teradata Warehouse Miner enhancement to the Percent Rank function is offered to


optionally request that NULL values in any element of the sort expression list cause the
row to be excluded in the ranking process. When dragging a Percent Rank function into a
variable, the following tree element is created:

123 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Sort expressions can be built up in the Sort Expressions folder, and Partition Columns
can be built up in that folder. The enhancement to the Percent Rank function to optionally
request that NULL values in any element of the sort expression list cause the row to be
excluded in the ranking process is enabled through the Properties Panel. Double click on
Percent Rank, or highlight it and click on the Properties button:

The default is to Include NULL values in the analysis, but that can be disabled here.

Quantile

Given a sort expression list and the number of quantile partitions, this Ordered Analytical
Function derives a new column giving the quantile partition that each row belongs to
based on the sort expression list and the requested number of quantile partitions. When an
optional Partition By clause is specified, the quantile partitions are computed separately
over the rows in each partition. Rows options are not available with the Quantile
function. Although there is a non-standard Teradata specific command QUANTILE, the
function is implemented in Variable Creation using the standard RANK and COUNT
functions.

A Teradata Warehouse Miner enhancement to the Quantile function is offered to


optionally request that NULL values in any element of the sort expression list cause the
row to be excluded in the ranking process. When dragging a Quantile function into a
variable, the following tree element is created:

© 1999-2007 NCR Corporation, All Rights Reserved 124


Chapter Two
Analytic Data Sets

Sort expressions can be built up in the Sort Expressions folder, and Partition Columns
can be built up in that folder. The enhancement to the Quantile function to optionally
request that NULL values in any element of the sort expression list cause the row to be
excluded in the ranking process, as well as setting the number of partitions are both
enabled through the Properties Panel. Double click on Percent Rank, or highlight it and
click on the Properties button:

The default number of Partitions is 0, but can be changed here. Additionally, the default
is to include NULL values in the analysis, but that can be disabled here.

Rank

Given a sort expression list, this Ordered Analytical Function derives a new column
indicating the rank of the rows when sorted by the specified sort expression list. When
the column or expression has the same value for multiple rows (say M rows), they are all
assigned the same rank, while the following M-1 rank values are not assigned. For
example, column values 3,3,3,2,1 could be assigned rank values of 1,1,1,4,5. When an
optional Partition By clause is specified, the ranks are determined separately over the
rows in each partition (the ranking process is reset for each new partition). Rows options
are not available with the Rank function.

A Teradata Warehouse Miner enhancement to the Rank function is offered to optionally


request that NULL values in any element of the sort expression list cause the row to be

125 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

excluded in the ranking process. When dragging a Rank function into a variable, the
following tree element is created:

Sort expressions can be built up in the Sort Expressions folder, Partition Columns can be
built up in that folder. The enhancement to the Rank function to optionally request that
NULL values in any element of the sort expression list cause the row to be excluded in
the ranking process is enabled through the Properties Panel. Double click on Percent
Rank, or highlight it and click on the Properties button:

The default is to Include NULL values in the analysis, but that can be disabled here.

Windowed Average

Cumulative, Group, Moving or Remaining Average are supported within the Windowed
Average function. Given a value expression, a width and a sort expression list, this
function derives a new column giving the cumulative, group, moving or remaining
average of the value expression over "width" rows when sorted by the sort expression
list. For rows that have less than "width-1" rows preceding it in the table or group, the
function is computed using all preceding rows. When an optional Partition By clause is
specified, the averages are computed separately over the rows in each partition. Any of
the Rows options may be used to determine the type of average to compute. Note that in
Teradata V2R4.1 only the moving average is available with the "ROWS value
PRECEDING" option. When dragging a Windowed Average function into a variable, the
following tree element is created:

© 1999-2007 NCR Corporation, All Rights Reserved 126


Chapter Two
Analytic Data Sets

Sort expressions can be built up in the Sort Expressions folder, and Partition Columns
can be built up in that folder. The options to perform a Cumulative, Group, Moving or
Remaining Average, and their associated options, is enabled through the Properties Panel.
Double click on Windowed Average, or highlight it and click on the Properties button:

These options are defined below for each of the four types of Windowed Averages:

1. Aggregation Style: Cumulative


Second Row Style: None, or
Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
Second Value: 0-n

2. Aggregation Style: Group

3. Aggregation Style: Moving


First Row Style: Current Row, or
Value Preceding, or
Value Following.

127 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

(If Value Preceding/Following)


First Value: 0-n
Second Row Style: None, or
Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
Second Value: 0-n

4. Aggregation Style: Remaining


First Row Style: Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
First Value: 0-n

Windowed Count

Cumulative, Group, Moving and Remaining Count are supported within the Windowed
Count function. This function derives a new column giving the cumulative, group,
moving or remaining count of the number of rows or rows with non-null values of a
value expression, when rows are sorted by a sort expression list. When an optional
Partition By clause is specified, the counts are accumulated only over the rows in each
partition (the start of a partition resets the accumulated count to 0). With Teradata V2R5
and later releases, any of the Rows options may be used to determine the type of count to
compute. In V2R4.1 only the Group option with no Sort Expression and "ROWS
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING" clause
may be used with the COUNT function. When dragging a Windowed Count function into
a variable, the following tree element is created:

Sort expressions can be built up in the Sort Expressions folder, if the system is V2R5.0 or
later, and Partition Columns can be built up in that folder. By default a windowed
COUNT(*) is done, but another expression can be built up in its place. The options to
perform a Cumulative, Group, Moving or Remaining Count, and their associated options,
is enabled through the Properties Panel. Double click on Windowed Count, or highlight it
and click on the Properties button:

© 1999-2007 NCR Corporation, All Rights Reserved 128


Chapter Two
Analytic Data Sets

These options are defined below for each of the four types of Windowed Count:

1. Aggregation Style: Cumulative


Second Row Style: None, or
Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
Second Value: 0-n

2. Aggregation Style: Group

3. Aggregation Style: Moving


First Row Style: Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
First Value: 0-n
Second Row Style: None, or
Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
Second Value: 0-n

4. Aggregation Style: Remaining


First Row Style: Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)

129 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

First Value: 0-n

Windowed Maximum

Cumulative, Group, Moving and Remaining Maximum are supported within the
Windowed Maximum function. This function, available only with Teradata V2R5 and
later releases, derives a new column containing the minimum or maximum value of a
column or expression. When an optional Partition By clause is specified, the minimum or
maximum values are determined over the rows in each partition. Any of the Rows
options may be used with this function. When dragging a Windowed Maximum function
into a variable, the following tree element is created:

Sort expressions can be built up in the Sort Expressions folder, and Partition Columns
can be built up in that folder. The options to perform a Cumulative, Group, Moving or
Remaining Maximum, and their associated options, is enabled through the Properties
Panel. Double click on Windowed Maximum, or highlight it and click on the Properties
button:

These options are defined below for each of the four types of Windowed Maximum:

1. Aggregation Style: Cumulative


Second Row Style: None, or

© 1999-2007 NCR Corporation, All Rights Reserved 130


Chapter Two
Analytic Data Sets

Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
Second Value: 0-n

2. Aggregation Style: Group

3. Aggregation Style: Moving


First Row Style: Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
First Value: 0-n
Second Row Style: None, or
Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
Second Value: 0-n

4. Aggregation Style: Remaining


First Row Style: Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
First Value: 0-n

Windowed Minimum

Cumulative, Group, Moving and Remaining Minimum are supported within the
Windowed Minimum function. This function, available only with Teradata V2R5 and
later releases, derives a new column containing the minimum or maximum value of a
column or expression. When an optional Partition By clause is specified, the minimum or
maximum values are determined over the rows in each partition. Any of the Rows
options may be used with this function. When dragging a Windowed Minimum function
into a variable, the following tree element is created:

Sort expressions can be built up in the Sort Expressions folder, Partition Columns can be
built up in that folder. The options to perform a Cumulative, Group, Moving or
Remaining Minimum, and their associated options, is enabled through the Properties

131 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Panel. Double click on Windowed Minimum, or highlight it and click on the Properties
button:

These options are defined below for each of the four types of Windowed Minimum:

1. Aggregation Style: Cumulative


Second Row Style: None, or
Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
Second Value: 0-n

2. Aggregation Style: Group

3. Aggregation Style: Moving


First Row Style: Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
First Value: 0-n
Second Row Style: None, or
Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
Second Value: 0-n

4. Aggregation Style: Remaining


First Row Style: Current Row, or

© 1999-2007 NCR Corporation, All Rights Reserved 132


Chapter Two
Analytic Data Sets

Value Preceding, or
Value Following.
(If Value Preceding/Following)
First Value: 0-n

Windowed Sum

Cumulative, Group, Moving and Remaining Sum are supported within the Windowed
Sum function. This function derives a new column for a value expression giving the
cumulative, group, moving or remaining sum of the value expression when sorted by a
sort expression list. When an optional Partition By clause is specified, the sums are
accumulated only over the rows in each partition (the start of a partition resets the
accumulated sum to 0). Any of the Rows options may be used with Teradata V2R5 to
determine the type of sum to compute. With Teradata V2R4.1, only Cumulative--Rows
Unbounded Preceding, Group--Between Rows Unbounded Preceding and Unbounded
Following, and Moving--Rows Value Preceding are supported. When dragging a
Windowed Sum function into a variable, the following tree element is created:

Sort expressions can be built up in the Sort Expressions folder, and Partition Columns
can be built up in that folder. The options to perform a Cumulative, Group, Moving or
Remaining Sum and their associated options, is enabled through the Properties Panel.
Double click on Windowed Sum, or highlight it and click on the Properties button:

133 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

These options are defined below for each of the four types of Windowed Sum:

1. Aggregation Style: Cumulative


Second Row Style: None, or
Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
Second Value: 0-n

2. Aggregation Style: Group

3. Aggregation Style: Moving


First Row Style: Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
First Value: 0-n
Second Row Style: None, or
Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
Second Value: 0-n

4. Aggregation Style: Remaining


First Row Style: Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
First Value: 0-n

String Functions

The following standard string functions are available in Variable Creation. Double Click on
String to see:

© 1999-2007 NCR Corporation, All Rights Reserved 134


Chapter Two
Analytic Data Sets

Character Length

The standard character length function is supported for determining the length of variable
character data.. (When used with fixed length character data, the defined column length is
always returned.) When dragging a Character Length operator into a variable, the
following tree element is created:

A column and/or expression for which to get the character length can be moved into the
(empty) branch of the tree.

When used in conjunction with the Trim function, the Character Length function can
also be used to determine the length of fixed character length data by first trimming pad
characters, as in the following.

Concatenate

135 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

The standard concatenate operator is supported for joining two character expressions
together, generating the SQL expression1 || expression2. Numeric or date expressions are
converted to characters before concatenating. The resulting type, one of the character data
types, depends on the type of the expressions, as described in the Teradata
documentation. When dragging a Concatenate operator into a variable, the following tree
element is created:

Columns, and/or other expressions can be moved into the (empty) branches of the tree,
where the first argument is to the left of the concatenation operator, the second to the
right.

There are no special properties for the Concatenate operator.

Lower

The standard lower case function is supported for converting all characters in an
expression to lower case. It is valid only if the expression evaluates to a character data
type with the LATIN character set. The SQL generated is LOWER(expression) and the
type returned is that of the expression. When dragging a Lower operator into a variable,
the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branches of the tree.

There are no special properties for the Lower operator.

Position

The standard string position function is supported for determining the position of a
substring within a string. The SQL generated is POSITION(expression1 IN expression2)
where expression1 is the substring and expression2 is the string. The two string
expressions must both evaluate to a character, numeric or date type. Numeric or date
expressions are converted to characters before evaluating. The type returned is integer.
The position returned is the logical position, not the byte position. The first position in a
string is treated as 1 and 0 is returned when the substring is not in the string. When
dragging a Position operator into a variable, the following tree element is created:

© 1999-2007 NCR Corporation, All Rights Reserved 136


Chapter Two
Analytic Data Sets

Columns, and/or other expressions can be moved into the (empty) branches of the tree
where the first argument is expression1 as indicated above, and the second expression2.

There are no special properties for the Position operator.

Substring

The standard substring function is supported for extracting a portion of a string based on
a position value and optional length. The SQL generated is SUBSTRING(expression
FROM position FOR length). The expression to take a substring from may be of a
character, numeric or date type, with a numeric or date expression being automatically
converted to a character expression before taking the substring. The first position in the
string is 1, and if length is not specified it means "until the end of the string". The type
returned is VARCHAR. When dragging a Substring operator into a variable, the
following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branch of the tree. The
starting position and length of the substring are specified in the Properties panel. Double
click on Substring, or highlight it and click on the Properties button:

137 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Trim

The semi-standard trim function is supported for removing leading and/or trailing
characters or bytes matching pad characters or a specified character from a character
string. (The ability to specify a character set for the expression is however not supported.)
The SQL generated may take one of these forms:

TRIM(expression)
TRIM(LEADING/TRAILING/BOTH FROM expression)

TRIM(LEADING/TRAILING/BOTH char FROM expression)

The expression to trim may be of a character, numeric, date or byte type, with a numeric
or date expression being automatically converted to a character expression before
trimming. The type returned is VARCHAR (or VARBYTE for byte data). When
dragging a Trim operator into a variable, the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branch of the tree. The
value to trim and the type of trimming are specified in the Properties panel. Double click
on Substring, or highlight it and click on the Properties button:

© 1999-2007 NCR Corporation, All Rights Reserved 138


Chapter Two
Analytic Data Sets

Valid Trim Styles are (Default), Leading, Trailing or Both. If (Default) is specified, both
leading and trailing pad characters (or null bytes for byte type data) are trimmed. Any
type of character can be specified to be trimmed in Value to Trim.

(Note that the value to trim will automatically be enclosed in quotes when SQL is
generated for the value. If a single quote mark is included in the value, it will
automatically be "escaped" by doubling it. If however more than one quote mark is
entered, the value is placed in the SQL "as-is", without adding quote marks. This makes
it possible to enter a hexadecimal literal if desired, such as '00'XC.)

Upper

The standard upper case function is supported for converting all characters in an
expression to upper case. It is valid only if the expression evaluates to a character data
type. The SQL generated is UPPER(expression) and the type returned is that of the
expression. When dragging an Upper operator into a variable, the following tree element
is created:

Columns, and/or other expressions can be moved into the (empty) branches of the tree
where the first argument is expression1 as indicated above, and the second expression2.

There are no special properties for the Upper operator.

139 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Trigonometric

The following trigonometric functions are available in Variable Creation. Double click on
Trigonometric to display:

Arccosine

The standard inverse trigonometric arccosine function is supported, generating


ACOS(expression) and returning a value of type float. When dragging an Arccosine
operator into a variable, the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Arccosine operator.

Arcsine

The standard inverse trigonometric arcsine function is supported, generating


ASIN(expression) and returning a value of type float. When dragging an Arcsine operator
into a variable, the following tree element is created:

© 1999-2007 NCR Corporation, All Rights Reserved 140


Chapter Two
Analytic Data Sets

Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Arcsine operator.

Arctangent

The standard inverse trigonometric arctangent function is supported, generating


ATAN(expression) and returning a value of type float. When dragging an Arctangent
operator into a variable, the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Arctangent operator.

Arctangent XY

The standard inverse trigonometric arctangent function for x and y coordinates is


supported, generating ATAN2(x, y) and returning a value of type float. When dragging
an Arctangent XY operator into a variable, the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Arctangent XY operator.

Cosine

The standard trigonometric cosine function is supported, generating COS(expression) and


returning a value of type float. When dragging a Cosine operator into a variable, the
following tree element is created:

141 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Cosine operator.

Hyperbolic Arccosine

The standard inverse hyperbolic cosine function is supported, generating


ACOSH(expression) and returning a value of type float. When dragging a Hyperbolic
Arccosine operator into a variable, the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Hyperbolic Arccosine operator.

Hyperbolic Arcsine

The standard inverse hyperbolic sine function is supported, generating


ASINH(expression) and returning a value of type float. When dragging a Hyperbolic
Arcsine operator into a variable, the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Hyperbolic Arcsine operator.

Hyperbolic Arctangent

The standard inverse hyperbolic tangent function is supported, generating


ATANH(expression) and returning a value of type float. When dragging a Hyperbolic
Arctangent operator into a variable, the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Hyperbolic Arctangent operator.

Hyperbolic Cosine

© 1999-2007 NCR Corporation, All Rights Reserved 142


Chapter Two
Analytic Data Sets

The standard hyperbolic cosine function is supported, generating COSH(expression) and


returning a value of type float. When dragging a Hyperbolic Cosine operator into a
variable, the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Hyperbolic Cosine operator.

Hyperbolic Sine

The standard hyperbolic sine function is supported, generating SINH(expression) and


returning a value of type float. When dragging a Hyperbolic Sine operator into a variable,
the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Hyperbolic Sine operator.

Hyperbolic Tangent

The standard hyperbolic tangent function is supported, generating TANH(expression)


and returning a value of type float. When dragging a Hyperbolic Tangent operator into a
variable, the following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Hyperbolic Tangent operator.

Sine

The standard trigonometric sine function is supported, generating SIN(expression) and


returning a value of type float. When dragging a Sine operator into a variable, the
following tree element is created:

143 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Sine operator.

Tangent

The standard trigonometric tangent function is supported, generating TAN(expression)


and returning a value of type float. When dragging a Tangent operator into a variable, the
following tree element is created:

Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Tangent operator.

Other

Several Teradata functions/operators do not fall into a category as those outlined below. The
Other category holds the following functions and operators. Double click on Other to view
those operators/functions:

© 1999-2007 NCR Corporation, All Rights Reserved 144


Chapter Two
Analytic Data Sets

Asterisk

The SQL Asterisk character (*) may be specified by the user as the argument to a Count
aggregate or Windowed Count ordered analytical function. It represents the fact that all
rows should be counted, not just those with non-null values in a particular column. When
dragging an Asterisk operator into a variable, the following tree element is created:

The SQL Asterisk character (*) is valid within a COUNT aggregate and windowed
aggregate function. There are no special properties for the Asterisk operator.

Bytes

The non-standard Bytes function is supported for determining the length of variable byte
data. (When used with fixed length byte data, the defined column length is always
returned.) When dragging a Bytes operator into a variable, the following tree element is
created:

145 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

A byte column and/or expression for which to get the length can be moved into the
(empty) branch of the tree.

When used in conjunction with the Trim function, the Bytes function can also be used to
determine the length of fixed length byte data by first trimming null-byte characters, as in
the following.

Cast Function

The standard Cast function is supported, generating SQL of the form CAST (expression
AS data type). The following data types are supported:

BYTEINT
SMALLINT
INTEGER
DECIMAL(m, n)
FLOAT
CHAR(n)
VARCHAR(n)
DATE
TIME(n)
TIMESTAMP(n)

Note that character set and case specific options may not be specified with CHAR and
VARCHAR types. When dragging a Cast operator into a variable, the following tree element
is created:

Columns, and/or other expressions can be moved into the (empty) branch of the tree. The
data types to cast to are specified in the Properties panel. Double click on Cast, or
highlight it and click on the Properties button:

© 1999-2007 NCR Corporation, All Rights Reserved 146


Chapter Two
Analytic Data Sets

Valid data types as listed above, are available in the pull-down.

F(x)

An arithmetic formula of one argument ‘x’ may be entered using the F(x) SQL element.
This element will result in the appropriate SQL for the formula being generated after
replacing the argument ‘x’ in the formula with the SQL for the empty branch of the tree.

A Column or other expression can be moved into the (empty) branch of the tree
representing the argument ‘x’. The formula to generate SQL for is specified in the
Properties panel. Double click on F(x) or highlight it and click on the Properties button:

147 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

In the example above the formula “x (x – 1) / 2” is entered. Note that a multiply operator
‘*’ is implied between the first ‘x’ and the left parenthesis ‘(‘.

The following rules apply to arithmetic formulas entered in the formula SQL elements.

• Numbers begin with a digit (‘0’ to ‘9’) and may be in integer, decimal or
scientific formats according to client locale settings.

• Whitespace characters are ignored.

• Whenever a number, argument or right parenthesis is followed by an argument or


left parentheis, an implied multiply operator is automatically inserted in the
generated SQL.

• The same operator precedence observed in Teradata SQL is observed in the


formula. The operators in decreasing order of precedence are given below.

o Unary plus ‘+’ and minus ‘-‘


o Exponentiate ‘**’
o Multiply ‘*’, divide ‘/’ and modulo ‘%’
o Add ‘+’ and subtract ‘-‘

• If a function other than the allowed arithmetic functions is required, such as an


aggregate function, it must be entered as an argument and referred to referred to
in the formula with its argument name, such as ‘x’.

• Formulas may be nested by specifying a formula as the argument of another


formula.

F(x,y)

© 1999-2007 NCR Corporation, All Rights Reserved 148


Chapter Two
Analytic Data Sets

An arithmetic formula of two arguments, ‘x’ and ‘y’, may be entered using the F(x,y)
SQL element. This element will result in the appropriate SQL for the formula being
generated after replacing the arguments ‘x’ and ‘y’ in the formula with the SQL for the
empty branches of the tree.

Columns or other expressions can be moved into the (empty) branches of the tree
representing the arguments ‘x’ and ‘y’. The formula to generate SQL for is specified in
the Properties panel. Double click on F(x,y) or highlight it and click on the Properties
button.

F(x,y,z)

An arithmetic formula of three arguments, ‘x’, ‘y’ and ‘z’, may be entered using the
F(x,y,z) SQL element. This element will result in the appropriate SQL for the formula
being generated after replacing the arguments ‘x’, ‘y’ and ‘z’ in the formula with the
SQL for the empty branches of the tree.

Columns or other expressions can be moved into the (empty) branches of the tree
representing the arguments ‘x’, ‘y’ and ‘z’. The formula to generate SQL for is specified
in the Properties panel. Double click on F(x,y,z) or highlight it and click on the Properties
button.

Free-Form SQL

SQL text may be directly entered for an entire expression or into an element of an
expression as a free-format text string. This allows the use of constructs that may not
otherwise be supported in an expression (for example, a subquery in a where clause). Of
course, in using this feature, care should be taken to create a valid expression, since
validation is not performed on the SQL within the free-format text string. When dragging
a Free-Form SQL operator into a variable, the following tree element is created:

149 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Double click on Free-Form SQL, or highlight it and click on the Properties button:

Enter a valid SQL expression within the SQL text area.

Variable Reference

A variable defined in a Variable Creation analysis may reference another variable defined
in the same analysis, provided the referenced variable does not contain dimensions. It is
also not possible to reference a variable that results from having a dimension applied to a
variable. Referencing a variable can be particularly useful when the referenced variable is
used merely as an intermediate calculation. The SQL generated consists simply of the
name assigned to the referenced variable.

(When referencing a variable with the same name as an input column, a runtime error
will occur if a column with this name occurs in more than one table being accessed
("Column '<name>' is ambiguous"). If aggregation is being performed in another
variable the error "Selected non-aggregate values must be part of the associated group"
may occur. In these cases it is recommended to rename the referenced variable.)

When dragging a Variable Reference operator into a variable, the following tree element
is created:

© 1999-2007 NCR Corporation, All Rights Reserved 150


Chapter Two
Analytic Data Sets

The variable to reference is specified in the Properties panel. Double click on Variable
Reference, or highlight it and click on the Properties button:

Select the variable to reference in the Variable pull-down.

Parentheses

In some cases, you may wish to explicitly request an expression be enclosed within
beginning ‘(‘ and ending ‘)’ parentheses. The Variable Creation analysis attempts when it
can to provide the correct nesting of parentheses, so this is offered for specialized cases.
Using the explicit Parentheses function results in an expression being parenthesized, as
in: (expression). When dragging a Parentheses into a variable, the following tree element
is created:

Columns, and/or other expressions can be moved over the (empty) branch of the tree.

There are no special properties for the Parentheses function.

Variable Creation - INPUT - Variables - Dimensions

On the Variable Creation dialog, click on INPUT and then click on variables on the upper
tabs. Click on Dimensions on the large tab in the center of the panel.

151 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Right-Click Menu Options

The same right-click menu options are offered for the Columns selector on the left side of the
input screen as are offered for other input screens (refer to the Analysis Input Screen topic
in Using Teradata Warehouse Miner). Also, the following right-click options are available
within the Dimensions panel.

Expand All Nodes


Expands the nodes completely for each variable.
Collapse All Nodes
Collapses the nodes to show only variables.
Switch Input To ‘<table name>’
This option applies only when a SQL Column is highlighted in the Dimensions panel.
When this option is selected, the selectors on the left side of the input screen are
adjusted to match the table or analysis that contain the selected SQL Column. (The
column is also selected.)
Switch ‘<table name>’ To Current Input
This option applies only when a SQL Column is highlighted in the Dimensions panel.
When this option is selected, the selectors on the left side of the input screen are used
to change the input table or analysis of the selected SQL Column. A pop-up menu is
displayed to allow changing the input for this column only or for all occurrences. For
a single column, a column with the same name must occur in the new (currently
selected) input table or analysis or an error is given. When all columns are changed,

© 1999-2007 NCR Corporation, All Rights Reserved 152


Chapter Two
Analytic Data Sets

the new table or analysis must contain all the columns or an error is given and no
changes are made.
Apply Dimensions to Variables
This option jumps to the upper dimensions tab so that dimensions can be applied to
variables.

Selection Options

On this screen, select:

Input Source
Select Table to input from a table or view, or select Analysis to select directly from
the output of a qualifying analysis in the same project. (Selecting Analysis will
cause the referenced analysis to be executed before this analysis whenever this
analysis is run. It will also cause the referenced analysis to create a volatile table if
the Output option of the referenced analysis is Select.)
Databases
All databases which are available for the Variable Creation analysis.
Tables
All tables within the Source Database which are available for the Variable Creation
analysis.
Columns
All columns within the selected table which are available for the Variable Creation
analysis.

Creating Dimensions From Columns

The Dimension Values to be created are specified one at a time as most every type of SQL
expression. One way to create a new dimension value is to click on the New button,
producing the following within the Dimensions tab:

Another way to create one or more new dimension values is to drag and drop one or more
columns from the Columns panel to the empty space at the bottom of the Dimensions panel
(multiple columns may be dragged and dropped at the same time).

One alternative to dragging and dropping a column is to use the right arrow selection button
to move over one column at a time as a new dimension value. Another alternative is to
double-click on the column. If the right arrow button is clicked repeatedly, or the column is
double-clicked repeatedly, a range of columns may be used to create new dimension values,
since the selected column increments each time the arrow is clicked. (It should be noted that
when a column or column value is selected, the right arrow selection button will only be
highlighted if a SQL Element is not selected. This can be ensured if the right-click option to
Collapse All Nodes is utilized in the SQL Element view.)

153 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Whether dragging and dropping, clicking on the right arrow button or double-clicking, a new
dimension value based on a column looks something like the following (after expanding the
node).

Creating Dimensions From SQL Elements

Still another way to create a new dimension value is to drag and drop a single SQL element
from the SQL Elements panel to the empty space at the bottom of the Dimensions panel, or to
drag and drop one or more column values displayed by selecting the Values button. In the
case of column values, a dimension value containing a single SQL Numeric Literal, String
Literal or Date Literal is created as appropriate for each column value. (This technique saves
having to edit the properties of a numeric, string or date literal to set the desired value.)

As with creating dimension values from selected columns, use of the right arrow selection
button or double-clicking the desired SQL element or column value provides an alternative to
dragging and dropping an element or value. Note however that repeated selection of a SQL
element does not advance the selected element so the result is multiple dimension values
containing the same SQL element. (Note also that when a SQL element is selected, the right
arrow selection button will only be highlighted if neither a column or a column value is
selected in its respective view.)

When a SQL element is placed on top of another element on the Dimensions panel, whether
by dragging and dropping it, selecting it with the right arrow or by double-clicking it, the new
element is typically inserted into the expression tree at that point. The element replaced is
then typically moved to an empty operand of the new SQL element.

Whether dragging and dropping, clicking on the right arrow button or double-clicking, a new
dimension value based on a SQL element looks something like the following example
involving the Equals element:

Copying or Moving a Dimension

It is possible to create a copy of a dimension value by holding down the Control key on the
keyboard while dragging the dimension value to another location in the Dimensions panel.
The copy can be placed ahead of another dimension value by dropping it on that dimension
value, or at the end of the list of dimension values by dropping it on the empty space at the
bottom of the Dimensions panel. It is also possible to copy a dimension value in the same

© 1999-2007 NCR Corporation, All Rights Reserved 154


Chapter Two
Analytic Data Sets

manner from another analysis by viewing the other analysis at the same time and dragging the
dimension value from one analysis to the other.

If the Control key is not held down while performing the copy operation just described within
the same analysis, the dimension value is moved form one place to the other, i.e. deleted from
its old location and copied to the new one. There are two exceptions to this. First, this is not
the case when copying a dimension from one analysis to another, in which case a copy
operation is always performed, with or without holding down the Control key. The second
exception is when moving one child node on top of another child node of the same parent in
the expression tree that defines a dimension. In this case, the two nodes or sub-expressions
are switched. (For example, if income and age are added together and age is moved on top of
income, the result is to add age and income, reversing the operands.)

Replicating a Dimension

It is possible to create multiple varied copies of a dimension value by dropping or selecting


mupltiple columns or values onto a component of a dimension value that is not a folder, that
is a component that is designed to hold only a single element. For example, after selecting
the New button, if 10 columns were dragged and dropped onto the empty node underneath
the new dimension value, the entire dimension value would be replicated 10 times, each copy
containing a different column and named with the original dimension value name appended
with a number between 1 and 10.

Dimension Tool-Tip

Information about a dimension value may be viewed by holding the mouse pointer over it.

Deleting All Dimensions

All dimension values can be deleted from the analysis by selecting the double-back-arrow
button in the center of the Variable Creation window. When this function is requested, one or
more warnings will be given. The first warning indicates how many dimension values are
about to be deleted. The second possible warning is given if the number of dimension values
being deleted exceeds 100, the maximum number of operations that can be undone or redone
using the Undo or Redo buttons. (If this warning is given and the Undo button is then
selected, only the first 100 dimension values will be restored. These are actually the last 100
deleted, since they are deleted in reverse order.) A third possible warning is given if any of
the dimension values about to be deleted has been applied to a variable on the dimensions
screen. If the choice to continue is made, all associations between variables and dimensions
being deleted will be removed. (Note that this part of the operation cannot be "undone"; it is
unaffected by the Undo button.)

Buttons

Wizard Button
When the Dimensions panel is selected, the Wizard button can be used to generate
dimension values, When Conditions for Searched Case statements or conditional
expressions for And All or Or All statements. To generate dimension values, highlight
any dimension value or ensure that no value is highlighted when the Wizard button is
selected. Otherwise, highlight the desired Case Conditions folder under a Case -

155 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Searched node or the Expressions folder under an And All or Or All node and select the
Wizard button.

The maximum number of dimensions or values that can be generated by a single


application of the wizard is limited to 1000.

The following dialog is given when generating dimension values. (Instructions at the top
change and a subset of fields is shown when not generating dimension values.)

Dimension Prefix
When a comparison operator such as Equal is selected in the Operator field, the
names of the resulting dimension values consist of the prefix followed by underscore
and the selected value. Otherwise the dimension value name is the prefix followed
by a number.

Description
When a comparison operator such as Equal is selected in the Operator field, the
description of the resulting dimension values consist of the description specified here
followed by the operator and selected value. Otherwise the description is the
description entered here.

Left Side Column/Expression


Replace the "(empty)" node with a SQL Column or more complex expression
involving a SQL Column.

© 1999-2007 NCR Corporation, All Rights Reserved 156


Chapter Two
Analytic Data Sets

Operator
Select a comparison operator such as Equals or select Between, Not Between, In, Not
In, Is Null or Is Not Null as the operator to use. If Between or Not Between is
selected, a dimension value or condition is generated for each pair of requested
values. If In or Not In is selected, the Wizard will generate a single dimension value
or condition based on all requested values when 'OK' or 'Apply' is clicked. If Is Null
or Is Not Null is selected, the Wizard will generate a single dimension value or
condition based on no values. Otherwise, if a comparison operator such as Equal is
selected, the Wizard will generate a dimension value or condition for each requested
value.

Else Value
Select either Else Null or Else Zero to indicate the value to use when the condition is
not met.

Right Side Values

Values
This tab accepts values displayed by selecting the Values button for input
columns on the left side of the input screen. Values can be drag-dropped onto
this panel or selected with the right-arrow button. They can be numeric, string or
date type values.

Note that when values are displayed on the left side of the input screen, the
ellipses button (the one displaying ‘…’) may be used to Select All Values.

Range
This tab can be used to generate a range of integer or decimal numeric values
based on a From, To and By field. If desired, the values can be generated in
descending order by making the From value greater than the To value, so that the
By value should always be positive. If the By field is not specified, an
incremental value of 1 is assumed. (Note that a value displayed with the Values
button may be drag-dropped into this field. Note also that the escape key will
revert to the last value entered in this field.)

When the Between or Not Between operator has been specified, the Range fields
behanve somewhat differently and may be used only to specify a single pair of
values using the From and To field, with the From field validated to be less than
or equal to the To field. The By field may not be specified when the Between or
Not Between operator has been specified.

List
A list of numeric, string or date type values can be entered here, separated by
commas (actually, by the standard list separator for the current locale settings).
(Note that a value displayed with the Values button may be drag-dropped into
this field. Note also that the escape key will revert to the last value entered in
this field.)

Clear All
This button will clear all of the fields of this dialog.

157 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

OK
This button will generate the requested dimension values or conditions and return to
the Dimensions panel.

Cancel
This button returns to the Dimensions panel without generating any elements.

Apply
This button will generate the requested dimension values or conditions and remain on
this panel. A status message is displayed just above this button reporting on the
number of generated conditions.

Combine Button
When the Dimensions panel is selected and dimensions are defined, the Combine button
can be used to generate combined dimension values based on existing dimension values.
When dimensions are combined their conditions are joined with either an SQL ‘AND’ or
‘OR’ operator.

Dimension Values:
These are the dimension values from the Dimensions panel plus any dimensions
already combined using the ‘Apply’ button (thus becoming candidates for re-
combining). (Information about a dimension value may be viewed by holding the
mouse pointer over it.)

© 1999-2007 NCR Corporation, All Rights Reserved 158


Chapter Two
Analytic Data Sets

Dimensions to Combine:
Using the upper and lower sets of right and left arrow buttons, dimensions may be
selected or de-selected for combining. The AND/OR radio buttons may be selected
to determine the method of combining the conditions represented by the dimension
values. The double left arrow buttons to the right of these panels move combined
dimensions back into the panels in preparation for re-combining.

Combined Dimensions:
The single right and left arrow buttons next to this panel cause the dimensions to be
combined and added to the combined dimensions list, or removed from the list,
respectively. (If the name of any combined dimension is too long, a warning
message is given in the lower left corner of the dialog.) The double left arrow
buttons to the left of this panel move combined dimensions back into the
“Dimensions to Combine” panels in preparation for re-combining. (Thus it is
possible to build up combined dimensions without making dimension values out of
the intermediate results.)

Clear All
This button will clear all of the fields of this dialog except the Dimension Values in
the leftmost panel.

OK
This button will generate the dimensions defined in the Combined Dimensions panel
and return to the Dimensions panel.

Cancel
This button returns to the Dimensions panel without generating any elements.

Apply
This button will generate the dimensions defined in the Combined Dimensions panel
and remain on this panel. A status message is displayed in the lower left corner of
the dialog reporting on the number of generated combined dimensions.

Delete Button
The Delete button can be used to delete any node within the Dimensions tree. If
applicable, the tree will roll-up children, but in some cases, a delete may remove all
children.

SQL Button
The SQL button can be used to dynamically display the SQL for any node within the
Dimensions tree. If the resulting display is not closed, the expression changes as you
click on the different levels of the tree comprising a dimension value. An option is
provided in the display to Qualify column names, that is to precede each column name in
the display with its database and table name.

Properties Button
A number of properties are available when defining a dimension value to be created, as
outlined below. Click the Properties button when the dimension value is highlighted, or
double click on the variable to bring up the Properties dialogue:

159 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Name:

A name must be specified for each dimension value.

(Tip: Variables can be named by single left-clicking on the name, which produces a
box around the name, as in Windows Explorer)

Else Condition:

Dimension values are applied to a variable via a CASE construct. By default, the
ELSE condition within the CASE construct is NULL. Here, you can specify a 0 be
used instead.

Description:

An optional description may be specified for each dimension value. (Note that a
default description is generated automatically by the Wizard if its Description field
contains a value, and also by the Combine Dimensions dialog based on individual
descriptions or dimension names.)

Undo Button
The Undo button can be used to undo changes made to the Dimensions panel. Note that
if a number of dimension values are added at one time, each addition requires a separate
undo request to reverse. Up to 100 undo requests can be processed.

(Note also that if a change to a dimension value is undone, and that dimension value is
currently applied to a variable on the dimensions panel, the applied dimension will not
change as a result of the Undo operation.)

Redo Button

© 1999-2007 NCR Corporation, All Rights Reserved 160


Chapter Two
Analytic Data Sets

The Redo button can be used to reinstate a change previously undone with the Undo
button.

Question-Mark Help Button


The Question-Mark Help button can be used to request help information about a specific
SQL element by first clicking on the question-mark and then on the SQL element in the
SQL Elements panel or Dimensions panel.

SQL Elements

The same SQL Elements are supported when creating dimension value as when creating
variables, with the following exceptions:

Aggregations
Aggregations can not be used for creating dimension values.

Ordered Analytical Functions


Ordered Analytical Functions can not be used for creating dimension values..

Variable Creation - INPUT - dimensions


On this panel, dimension values from this or other Variable Creation analyses in the Project
window are applied to variables in this analysis. The panel is shown below:

161 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Right-Click Menu Options

The following right-click menu options are offered for the Available Variables panel.

Expand All Nodes


Expands the nodes completely for each variable.
Collapse All Nodes
Collapses the nodes to show only variables.
Switch Input To
This option applies only when a dimension value is highlighted in the Available
Variables panel. When this option is selected, the Available Analyses selector on
the left side of the screen is adjusted to match the analysis that the selected dimension
value comes from.
Add Dimension(s) to this Analysis
If the dimensions selected are not currently members of this analysis they can be
added to this analysis with this option. (This option could be useful if a dimension
from another analysis is applied to a variable in this analysis and it becomes
necessary to change it.)
Remove Dimension(s) from this Variable
The selection of this option removes all dimension values from the selected variable.
(Dimension values are not removed from the analysis in which they are defined but
rather their association with this variable is removed.)

The options on the INPUT - Dimensions panel are described below.

Available Analyses
Any Variable Creation analysis in any currently loaded project may be selected to
make its dimension values available for selection. Initially, the current analysis is
selected.

Available Dimensions
The available dimensions are the dimension values defined on the Dimensions tab of
the Input – Variables panel in the selected Variable Creation analysis. (Information
about a dimension value may be viewed by holding the mouse pointer over it.)

(Note that if a dimension value comes from an analysis in another project, and if it
contains a reference to another analysis, it may not be applied to a variable in this
analysis, even if it is displayed here. If it is applied to an available variable an error
message will be given.)

Available Variables
The available variables are the variables defined on the Variables tab of the Input –
Variables panel. As dimension values are moved over using the right-arrow and left-
arrow buttons, or dragged and dropped from Available Dimensions, they are shown
below the variable. The resulting output column name will be the dimension name
followed by an underscore and the variable name. The description of the resulting
variable will be the description of the variable followed by connecting characters and
the description of the dimension. (If either the original variable or the dimension
value does not have a description, its name is used when forming the description of

© 1999-2007 NCR Corporation, All Rights Reserved 162


Chapter Two
Analytic Data Sets

the resulting variable.) (Information about an applied dimension value may be


viewed by holding the mouse pointer over it.)

Variables that are referenced by another variable (using a Variable Reference SQL
element) may not have dimension values applied to them.

Variable Creation - INPUT - anchor table


On the Variable Creation dialog, click on INPUT and then click on anchor table:

On this screen, select:

Anchor Table:
Pull-down with a list of all tables used to create variables, dimensions and/or
specified in a WHERE, QUALIFY or HAVING clause. Select the table that contains
all of the key values to be included in the final data set. Physically, this can be a table
or a view residing in Teradata.

Available Anchor Columns:


All columns within the table selected as the Anchor Table.

Selected Anchor Columns:

163 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

The columns within the Anchor Table that uniquely identify rows in the anchor table
(otherwise unpredictable results may occur when joining this table with others). By
default, the primary index columns of the selected Anchor Table are initially selected.
For a view, these must be selected manually. (Note that if the Anchor Table is the
standard CALENDAR view in the SYS_CALENDAR database, the calendar_date
column is used by default.)

Join Paths:
A list of all Join Paths, connecting the anchor table to each other table referenced in
the analysis (i.e. in a Variable, Dimension or expert clause) is given here. By right-
clicking on a Join Path the join style can be set to Left Outer Join, Inner Join, Right
Outer Join, Full Outer Join or Cross Join.

If a Cross Join is selected it results in a join path without join steps. Validation is
performed including a count of the rows in the table to be joined.

Join Steps:
A list of the Join Steps comprising the Join Path currently selected above is given
here. Each Join Step consists of two columns connected by an opereator, which
defaults to the equals operator. By right-clicking on a Join Step its operator can be
set to equals (=), not equals (<>), greater than (>), greater than or equals (>=), less
than (<) or less than or equals (<=). The join steps are connected by logical AND
operators in the generated SQL.

Note that a Join Path of style Cross Join does not contain Join Steps.

Load
To load join paths from other Variable Creation analyses in a loaded project, click on
the Load button. This causes each Variable Creation analysis in a loaded project to
be searched for missing join paths. (Missing join paths are those that have no Join
Steps, with the exception of those of style Cross Join which cannot have Join Steps.)
The first join path encountered, if any, for each missing join path is used. When the
load operation is complete, an informational message is displayed at the bottom of
the form summarizing the results of the search.

(Note that if a join path is missing when an analysis is executed, the Load operation
is performed automatically to try to correct the error.)

Wizard…
To set the join paths using the following dialog screens, click on the Wizard…
button:

© 1999-2007 NCR Corporation, All Rights Reserved 164


Chapter Two
Analytic Data Sets

From:
Initially, this is the Anchor Table, along with a list of all columns within the
Anchor Table. If more than one table is required in the Join Path, these are
specified through subsequent clicks of the Add button. Highlight the column to
join to that specified in To below.

To:
Initially, this is the target or right-side table in the Join Path, along with a list of
all columns within that table. If the Anchor Table is not simply joined directly to
this table, it can be changed via pull-down. If more than one table is required in
the Join Path, these are specified through subsequent clicks of the Add button.
Highlight the column to join to that specified in From above.

Steps:
Clicking on the Add button populates the Steps area. Similarly, highlighting a
Step and clicking on the Remove button removes that particular step. Steps
should be entered such that the first step begins with the anchor table (on the left
side) and the last step ends with the target table for the join path (on the right
side). Additionally, the target or right side tables should be grouped together in
the list of steps and not alternate in value (that is, table1, table1, table2, not
table1, table2, table1).

The operator of a Join Step may be changed by right-clicking on the Join Step.

165 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Add:
Clicking on the Add button adds a Join Step built from the currently selected
columns. The operator is equals (=) by default, but may be changed by right
clicking on the Join Step.

Remove:
Clicking on the Remove button removes the currently selected Join Step.

Up/Down Arrows:
Clicking on the up or down arrow to the right of the Steps display moves the
currently selected Join Step up or down in the list.

Left/Right Arrows:
Clicking on the left arrow to the right of the From and To table selectors will
move the currently selected To table to the From selector and set the To selector
to the target or right side table for the Join Path. Clicking on the right arrow will
move the currently selected From table to the To selector and set the From
selector to the source or left side table for the Join Path.

Finish:
Clicking on the Finish button accepts all changes and returns to the anchor
panel.

Back:
Clicking on the Back button returns to the previous Join Path.

Next:
Clicking on the Next button proceeds to the next Join Path.

Cancel:
Clicking on the Cancel button discards all changes and returns to the anchor
panel.

Variable Creation - INPUT – analysis parameters


On the Variable Creation dialog, click on INPUT and then click on analysis parameters:

© 1999-2007 NCR Corporation, All Rights Reserved 166


Chapter Two
Analytic Data Sets

On this screen, select:

Target Date:
If a Target Date was used when creating a variable, dimension, or used in a
WHERE, QUALIFY or HAVING clause, it can be set here. The default value is
the current date, and can be changed by either typing in another date, entering
month, day and year separately or by selecting a date with the standard Windows
calendar control.

Group By Style:
Group by anchor columns
Use of this option causes the anchor columns to be used as the Group By
columns when one or more variables contain an aggregate function. When this is
the case, all variables that don’t already contain an aggregate function are
automatically changed to an aggregate by adding the MIN (minimum) function.

Group by all non-aggregate columns


Use of this option provides more control over the grouping characteristics of the
request with the following effects.

• A group-by clause is generated whether or not aggregation is present.


• Every non-aggregate column is included in the group-by clause.

167 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

• An order-by clause is generated to match the group-by clause if the


output style is select or explain.
• Aggregation is not forced on non-anchor columns, but aggregation is
forced on new-style windowed OLAP functions.
• If new windowed OLAP functions are present, positional numbers are
used in the group by clause for correct syntax.
• Old-style Teradata OLAP functions may not be used with this option.

Variable Creation - INPUT - Expert Options


On the Variable Creation dialog, click on INPUT and then click on expert options:

This screen provides nearly the same options as those provided by the Variable Creation –
INPUT – Variables screen, as described in the section of the same name. The principal
difference is that instead of Variables or Dimensions there are three fixed expert clauses that
may not be added to or deleted. Therefore, the New and Add buttons are not present. The
Wizard button can be used only to add conditions, not Searched Case statements or
Dimensions.

Variable Creation - INPUT - Expert Options- SQL Elements


Nearly the same functionality is provided for creating expert options as for creating variables.
In particular, the same SQL Elements are supported with the following exceptions:

© 1999-2007 NCR Corporation, All Rights Reserved 168


Chapter Two
Analytic Data Sets

Aggregations
Aggregations can only be used in the Having Clause.

Ordered Analytical Functions


Ordered Analytical Function can only be used in the Qualify Clause.

Variable Creation - INPUT - Expert Options - Expert Clauses


Expert options are available with the Variable Creation function as described below. They are
created and manipulated in the form of SQL expressions in a manner similar to the way
variables are defined. A free-format string may be used as all or part of a SQL expression by
using the Free-Format SQL Text element, thus allowing constructs not otherwise supported,
such as a subquery. (Of course, in using this feature, care should be taken to create a valid
expression, since validation is not performed on the SQL within the free-format text string.)

Where Clause
An SQL WHERE clause is allowed to limit the rows processed from the input table.
Aggregation and ordered analytical (OLAP) functions are not allowed in a WHERE
clause expression. Note that if a subquery is desired it can only be specified using a Free-
Format SQL Text element.

It may be useful to note that if a WHERE clause condition is specified on the "inner"
table of a join (i.e. a table that contributes only matched rows to the results), the join is
logically equivalent to an Inner Join, regardless of whether an Outer type is specified. (In
a Left Outer Join, the left table is the "outer" table and the right table is the "inner" table.)

Having Clause
An SQL HAVING clause may be specified with the Variable Creation function if
aggregation is requested in the variable expressions. Ordered analytical (OLAP)
functions are not allowed in a HAVING clause expression.

Qualify Clause
An SQL QUALIFY clause may be specified with the Variable Creation function if
ordered analytical functions are requested in any of the variable expressions. Aggregation
functions are not allowed in a QUALIFY clause expression.

Variable Creation - OUTPUT - storage


On the Variable Creation dialog, click on OUTPUT and then click on storage:

169 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

On this screen, select:

Use the Teradata EXPLAIN feature to display the execution plan for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.

Store the tabular output of this analysis in the database


Option to generate a Teradata TABLE or VIEW populated with the results of the
analysis. Once enabled, the next three fields must be specified:
Database Name
Text box to specify the name of the Teradata database where the resultant Table
or View will be created. By default, this is the “Result Database.”
Output Name
Text box to specify the name of the Teradata Table or View.
Output Type
Pull-down to specify Table or View.
Create Output table using the FALLBACK keyword
If a table is selected, it will be built with FALLBACK if this option is selected
Create Output table using the MULTISET keyword
If a table is selected, it will be built as a MULTISET table if this option is
selected

Generate the SQL for this analysis, but do not execute it

© 1999-2007 NCR Corporation, All Rights Reserved 170


Chapter Two
Analytic Data Sets

If this option is selected the analysis will only generate SQL, returning it and
terminating immediately.

Variable Creation - OUTPUT - Primary Index


On the Variable Creation dialog click on OUTPUT and then click on primary index:

On this screen, select the columns which comprise the primary index of the output table:

Available Columns
A list of columns which comprise the index of the resultant table if an Output Table
is used.
Primary Index Columns
Select columns by highlighting and then either dragging and dropping into the
Primary Index Columns window, or click on the arrow button to move highlighted
columns into the Primary Index Columns window.
Create the index using the UNIQUE keyword
When selected, a Unique Primary Index will be created on the table. Otherwise a
Primary Index will be created by default.

Run the Variable Creation Analysis


After setting INPUT and OUTPUT parameters as described above, you are ready to run the
analysis. To run the analysis you can either:

171 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

• Click the Run icon on the toolbar, or


• Select Run <project name> on the Project menu, or
• Press the F5 key on your keyboard

Results - Variable Creation


The results of running a Variable Creation analysis include the generated SQL itself, the
results of executing the generated SQL, and, if selected, a Teradata table (or view). All these
results are outlined below.

Variable Creation - RESULTS - Data


On the Variable Creation dialog, click on RESULTS and then click on data (note that the
RESULTS tab will be grayed-out/disabled until after the analysis is completed):

The results of the completed query are returned in this Data Viewer page. This page has the
properties of the Data page discussed in the Chapter on Using the Teradata Warehouse Miner
Graphical User Interface. With the exception of the Explain Select Result Option, these
results will match the tables described below in the Output Column Definition section,
depending upon the parameters chosen for the analysis.

Variable Creation - RESULTS - SQL


On the Variable Creation dialog, click on RESULTS and then click on SQL (note that the
RESULTS tab will be grayed-out/disabled until after the analysis is completed):

The generated SQL is returned as text which can be copied, pasted, or printed.

Tutorial – Variable Creation

Variable Creation - Example #1

Parameterize a Variable Creation Analysis as follows:

© 1999-2007 NCR Corporation, All Rights Reserved 172


Chapter Two
Analytic Data Sets

1. Select TWM_CUSTOMER as the Available Table.

2. Create seven variables by double-clicking on the following columns. (Note that the
variable name will default to the column name.)

• TWM_CUSTOMER.cust_id
• TWM_CUSTOMER.income
• TWM_CUSTOMER.age
• TWM_CUSTOMER.years_with_bank
• TWM_CUSTOMER.nbr_children
• TWM_CUSTOMER.gender
• TWM_CUSTOMER.marital_status

3. Select TWM_CREDIT_TRAN as the Available Table.

4. Create a variable by clicking on the New button and build up an expression as


follows.

5. Drag an Add (Arithmetic) SQL Element over the Variable, and then drag the
following two columns over the empty arguments:

• TWM_CREDIT_TRAN.interest_amt
• TWM_CREDIT_TRAN.principal_amt

6. Because there may be negative values, drag and drop an Absolute Value (Arithmetic)
SQL Element over both interest_amt and principal_amt:

7. Take the average of this expression, by dragging and dropping an Average


(Aggregation) on top of the Add:

173 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

8. Because this analysis may generate many NULL values by joining


TWM_CUSTOMER to TWM_CREDIT_TRAN, drag a Coalesce (Case) on top of
the Average:

9. Drag and drop a Number (Literal) 0 into the expressions folder and rename it from
Variable1 to avg_cc_tran_amt to complete the variable:

10. Go to INPUT-anchor table and select TWM_CUSTOMER as the anchor table.

11. Specify the Join Path from TWM_CUSTOMER to TWM_CREDIT_TRAN by


clicking on the Wizard button and specifying that they be joined on the column
"cust_id".

12. Go to OUTPUT-storage, and select Store the tabular output of this analysis in the
database. Specify that a Table should be created named twm_tutorials_vc1.

For this example, the Variable Creation Analysis generated the following results. Note that the
SQL is not shown for brevity:

Data

Only the first 10 rows after sorting are shown.

cust_id income age years nbr gender mar avg_cc_tran_amt


1362480 50890 33 3 2 M 2 264.17
1362481 20855 36 6 2 F 2 0
1362484 10053 42 2 0 F 1 182.57
1362485 22690 25 4 0 F 1 175.40
1362486 10701 76 6 0 F 3 0
1362487 6605 71 1 0 M 2 149.16
1362488 7083 77 7 0 F 2 0
1362489 55888 35 5 2 F 3 397.07
1362492 40252 40 0 5 F 3 214.05
1362496 0 13 2 0 M 1 0

© 1999-2007 NCR Corporation, All Rights Reserved 174


Chapter Two
Analytic Data Sets

Variable Creation - Example #2

Parameterize a Variable Creation Analysis as follows:

1. Select TWM_CUSTOMER as the Available Table


2. Create a variable by clicking on the New button and drag and drop the following
columns. Note the variable name will default to the column name
• TWM_CUSTOMER.cust_id
3. Create a variable by clicking on the New button and drag and drop the SQL Element
Maximum (Aggregation) on to the empty argument in the variable.
4. Drag and drop a Number (Literal) on to the empty argument in the Maximum, and
rename the variable acct:

5. Select TWM_ACCOUNTS as the Available Table


6. Create a variable by clicking on the New button and drag and drop the following
columns. Note the variable name will default to the column name
• TWM_ACCOUNTS.ending_balance
7. Drag and drop an Average (Aggregation) SQL Element over ending_balance, and
rename the variable bal:

8. Select TWM_TRANSACTIONS as the Available Table


9. Create a variable by clicking on the New button and drag and drop the following
columns. Note the variable name will default to the column name
• TWM_TRANSACTIONS.tran_id
10. Drag and drop an Count (Aggregation) SQL Element over tran_id, and rename the
variable nbr_trans:

11. Select TWM_ACCOUNTS as the Available Table

12. Go to INPUT-variables-Dimensions and click on the New button three times to create
three dimension values. Drag TWM_ACCOUNTS.acct_type to each of the three
dimension values.

13. Drag and drop an Equals (Comparison) SQL Element on top of each instance of
acct_type in the three dimensions.

175 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

14. Drag and drop a String (Literal) SQL Element into the second argument of the
Equals. Specify a string value of CC, CK, SV for each of the three dimensions by
double-clicking on String, and entering the values. Rename each dimension value
CC, CK, and SV accordingly:

Change the Properties of the dimensions CC, CK, SV, modifying the Else condition
to ELSE ZERO from ELSE NULL

15. Select TWM_TRANSACTIONS as the Available Table

16. Go to INPUT-variables-Dimensions and click on the New button four times to create
four dimension values. Drag TWM_ACCOUNTS.tran_date to each of the four
dimension values.

17. Drag and drop a Quarter of Year (Calendar) SQL Element on top of each instance of
tran_date in the four dimension values

18. Drag and drop an Equals (Comparison) SQL Element on top of each Quarter of Year
instance in the four dimension values.

19. Drag and drop a Number (Literal) SQL Element into the second argument of the
Equals. Specify a number of 1-4 for each of the four dimension values by double-
clicking on Number, and entering the values. Rename each dimension value Q1-Q4
accordingly:

© 1999-2007 NCR Corporation, All Rights Reserved 176


Chapter Two
Analytic Data Sets

20. Go to INPUT-dimensions and apply the dimension values to the variables as follows:
• acct – CK, CC, SV
• bal – CK, CC, SV
• nbr_trans – Q1, Q2, Q3, Q4:

21. Go to INPUT-anchor table and select TWM_CUSTOMER as the anchor table.

22. Specify the Join Paths from TWM_CUSTOMER to each of the following by
selecting a table in the Join Path from Anchor Table To: and clicking on the
Wizard button. Specify the following Join Paths:

a.) From Anchor Table (TWM_CUSTOMER) to TWM_ACCOUNTS

TWM_CUSTOMER.cust_id --> TWM_ACCOUNTS.cust_id

177 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

b.) From Anchor Table (TWM_CUSTOMER) to TWM_TRANSACTIONS

TWM_CUSTOMER.cust_id --> TWM_ACCOUNTS.cust_id

TWM_ACCOUNTS.acct_nbr --> TWM_TRANSACTIONS.acct_nbr

23. Go to OUTPUT-storage, and select Store the tabular output of this analysis in the
database. Specify that a Table should be created named twm_tutorials_vc2.

For this example, the Variable Creation Analysis generated the following results. Once again, the
SQL is not shown:

Data

Note – only the first 10 rows shown.

cust_id CK_acct SV_acct CC_acct CK_bal SV_bal CC_bal Q1_nbr Q2_nbr Q3_nbr Q4_nbr
1362480 1.00 1.00 1.00 54.77 196.73 4.08 113 17 17 10
1362481 0.00 0.00 0.00 0.00 0.00 0.00 0 0 0 0
1362484 1.00 1.00 1.00 50.46 374.50 108.74 113 27 20 27
1362485 1.00 0.00 1.00 26.34 0.00 463.16 13 18 50 90
1362486 1.00 1.00 0.00 1656.14 58.12 0.00 12 10 13 15
1362487 1.00 1.00 1.00 707.41 2.38 481.00 17 25 25 36
1362488 1.00 0.00 0.00 122.42 0.00 0.00 26 39 27 7
1362489 1.00 1.00 1.00 79.60 52.69 4.49 56 51 44 5
1362492 1.00 0.00 1.00 443.84 0.00 476.92 3 42 64 21
1362496 0.00 1.00 0.00 0.00 251.06 0.00 3 3 3 3

© 1999-2007 NCR Corporation, All Rights Reserved 178


Chapter Two
Analytic Data Sets

Variable Transformation

Introduction
One aspect of creating an analytic data set to be used as input to a data mining algorithm is the
transformation of variables into a format useful to the algorithm. In general, transformations that
are reasonably performed as part of SQL expressions have been included in the Variable Creation
function, whereas transformations that require a more elaborate SQL structure are provided in the
Variable Transformation function. Specifically, transformations in the Variable Transformation
function may require calculating global aggregates or more complex measures in derived tables,
or may include a separate null replacement transformation as a preprocessing step using a
preliminary volatile table. Variable Transformation is however limited to operating on a single
input table.

The Variable Transformation function makes it possible to specify at one time any mixture of
transformations for any number of columns in a single input table. The user may also specify that
columns from the input table be retained unchanged, or retained with a different name and/or
type. The result is a new table or view based on the same or transformed columns from the input
table.

The Variable Transformation functions include:

ƒ Bin Code ƒ Retain


ƒ Derive ƒ Null Replacement
ƒ Design Code ƒ Sigmoid
ƒ Recode ƒ Z Score
ƒ Rescale

In order to use the Variable Transformation analysis, the user selects a single input table and
then, on a column by column basis, selects what transformation or action they want to perform, if
any. The user may choose any of the offered transformations and/or a simple copy or Retain
operation. That is, they may choose to include any input table column, as is or with a different
name or type, in the output table, whether or not they choose to transform it. By default, the result
column name is the same as the input column name, unless multiple result columns may result (as
with the design coding transformation). If a specific type is specified, it results in casting the
retained column or transformed column.

Anchor columns are included automatically in the result table, so they should not be included as
retained columns. Note that it is the user’s responsibility to insure that result column names do
not conflict with each other.

The user may also specify that a null transformation be performed in a preprocessing step prior to
the requested transformation. In this case the null transformation is produced in a volatile table
that is then automatically referenced by the generated SQL, both by the transformation SQL and
by any derived aggregates the transformation may require.

It is possible that the user may specify more transformations than can be performed in a single
analysis. This can happen either because the maximum number of columns allowed by Teradata
is exceeded (256 in V2R4.1 and 2048 in V2R5), or because the generated SQL is simply too large
or complex. If this sort of failure occurs, the user must split up the transformations into multiple

179 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

analyses and either add a join step or rely on the Build Data Set analysis to join the output tables
together.

Bin Code
Bin Coding is useful when it is desired to replace a continuous numeric column with a categorical
one. Bin coding produces ordinal values, i.e. numeric categorical values where order is
meaningful. It uses the same techniques used in Histogram analysis, allowing the user to choose
between equal-width bins, equal-width bins with a user specified minimum and maximum range,
bins with a user specified width, evenly distributed bins, or bins with user-specified boundaries as
follows.

If the minimum and maximum are specified, all values less than the minimum are put in to “bin
0,” while all values greater than the maximum are put in to “bin N+1.” The same is true when the
boundary option is specified.

Derive
Derive allows you to enter simple expressions based upon columns within a table. For example, if
you know that all values are positive or zero, the Derive Analysis can be used to add one to the
column and take the natural logarithm of it. The Derive expression may be specified in a
structured way as in the Variable Creation function, and may include any functions or operators
supported by the Variable Creation function except a reference to another variable. It may also
include free-formatted SQL text in all or part of an expression, making it possible to use
constructs not supported by the expression builder. Of course, care should be taken in using this
feature to create a valid expression, since validation is not performed on the SQL within the free-
format text string.

Special handling is given to aggregation functions if they appear in the user-defined expression.
Any requested aggregation function is computed over the entire input table (limited of course by
the where clause if specified as an expert option) in one of the global aggregate derived tables
shared by the other transformation functions. The aggregation is then treated as a constant in the
user-defined expression. And although the user-defined expression may include ordered
analytical functions, it may not include an aggregate within an ordered analytical function.

Design Code
Design coding is useful when a categorical data element must be re-expressed as one or more
meaningful numeric data elements. Many classes of analytical algorithms from the statistical and
artificial intelligence communities require variables, inputs, or outputs to be numeric and
numerically meaningful. It does this, roughly speaking, by creating a binary numeric field for
each categorical data value. Design coding is offered in two forms, one known as dummy-coding
and the other as contrast-coding. A “Values” function is provided to select the possible values
from the input table.

In “dummy-coding”, a new column is produced for each listed value, with a value of 0 or 1
depending on whether that value is assumed by the original column. Alternately, given a list of
values to “contrast-code” along with a “reference value”, a new column is produced for each
listed value, with a value of 0 or 1 depending on whether that value is assumed by the original
column, or a value of –1 if that original value is equal to the reference value.

When using “Dummy Coding,” if a column assumes n values, new columns may be created for
all n values, (or for only n-1 values, because the nth column will be perfectly correlated with the

© 1999-2007 NCR Corporation, All Rights Reserved 180


Chapter Two
Analytic Data Sets

first n-1 columns). When using “Contrast Coding”, only n-1 or fewer new columns may be
created from a categorical column with n values.

Recode
Recoding a categorical data column is most often done to “re-express” existing values of a
column (variable) into some new “coding scheme”. Additionally, it is also done to correct data
quality problems and to focus an analysis on a particular value. It allows for mapping individual
values, NULL values or any number of remaining values (ELSE option) to a new value, a NULL
value or the same value. A “Values” function is provided to select the possible values from the
input table.

Rescale
Rescaling limits the upper and/or lower boundaries of the data in a continuous numeric column
using a linear rescaling function based on maximum and/or minimum data values. It may be
useful with algorithms that require or work better with data within a certain range. Rescale is only
valid on numeric columns, and not columns of type date.

The user may supply new minimum and maximum values (lower, upper) to form new variable
boundaries. If only the lower boundary is supplied, the variable is aligned to this value; or if only
an upper boundary value is specified, the variable is aligned to that value. If a requested column
has a constant value (max and min are the same), then the transformation will fail with an SQL
error.

The rescale transformation formulas can be thought of as:

l + (x − min (x )) ⋅ (r − l )
ƒ(x, l, r) = (i.e. if both lower and upper specified)
max (x ) − min (x )

ƒ(x, l) = x − min (x ) + l (i.e. only lower specified)

ƒ(x,r) = x − max (x ) + r (i.e. only upper specified)

Retain
The retain option allows you to copy a column as is, along with any transformed columns into the
final analytic data set. When using this option, they may choose to include any input table
column, as is or with a different name or type, in the output table, without actually “transforming”
it. By default, the result column name is the same as the input column name. If a specific type is
specified, it results in casting the retained column.

Null Replacement
NULL value replacement is offered as a transformation function. A literal value, the mean,
median, mode or an imputed value joined from another table can be used as the replacement
value. The median value can be requested with or without averaging of two middle values when
there is an even number of values. The replacement value can also be the analytic data set’s target
date value. Literal value replacement is supported for numeric, character and date data types.
Mean value replacement is supported for columns of numeric type or date type, with special
coding required for date type. Median without averaging, mode and imputed value replacement
are valid for any supported type, with distinct SQL generated for computing the median value of

181 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

numeric, date and other type columns. Median with averaging is however supported only for
numeric and date type columns.

Sigmoid
A Sigmoid transformation provides rescaling of continuous numeric data in a more sophisticated
way than the Rescaling transformation function. In a Sigmoid transformation a numeric column is
transformed using a type of sigmoid or s-shaped function. One of these, called a logit function,
produces a continuously increasing value between 0 and 1. Another called the modified logit
function, is twice the logit minus 1 and produces a value between –1 and 1. A third, called the
hyperbolic tangent function, also produces a value between –1 and 1. (Note that the logit function
is the same as the function previously called the sigmoid function, and the hyperbolic tangent
function is the same as the math function of the same name.) These non-linear transformations
are generally more useful in data mining than a linear Rescaling transformation.

The logit value is calculated as:

1
ƒ(x) =
1 + e−x

The modified logit value is calculated as:

⎡ 1 ⎤
ƒ(x) = 2 ∗ ⎢ −x ⎥
−1
⎣1 + e ⎦

which is equivalent to:

⎡1 − e − x ⎤
ƒ(x) = ⎢ −x ⎥
⎣1 + e ⎦

The hyperbolic tangent value is calculated as:

e2x − 1
ƒ(x) = 2 x
e +1

Note that for absolute values of x greater than or equal to 36, the value of the sigmoid function is
effectively 1 for positive arguments or 0 for negative arguments, within about 15 digits of
significance.

Z Score
Like a Sigmoid transformation, a Z-Score transformation provides rescaling of continuous
numeric data in a more sophisticated way than a Rescaling transformation. In a Z-Score
transformation, a numeric column is transformed into its Z-score based on the mean value and
standard deviation of the data in the column. It transforms each column value into the number of
standard deviations from the mean value of the column. This non-linear transformation is
generally more useful in data mining than a linear Rescaling transformation.

For a value, the number of standard deviations away from the mean is calculated as:

© 1999-2007 NCR Corporation, All Rights Reserved 182


Chapter Two
Analytic Data Sets

n
1
x −
n

i =1
x
ƒ(x) =
∑ x2
⎛1 n

2

n
−⎜
⎝ n

i =1
x⎟

Initiate a Variable Transformation Function


Use the following procedure to initiate a new Variable Transformation analysis in Teradata
Warehouse Miner:

1. Click on the Add New Analysis icon in the toolbar:

2. In the resulting Add New Analysis dialog box, click on ADS under Categories and then under
Analyses double-click on Variable Transformation:

183 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

3. This will bring up the Variable Transformation dialog in which you can define INPUT /
OUTPUT options and initiate any of the Variable Transformation functions, i.e:

ƒ Retain ƒ Recode
ƒ Bin Code ƒ Rescale
ƒ Derive ƒ Sigmoid
ƒ Design Code ƒ Z Score
ƒ Null Replacement

Variable Transformation - INPUT - Transformations


On the Variable Transformations dialog, click on INPUT and then click on transformations:

© 1999-2007 NCR Corporation, All Rights Reserved 184


Chapter Two
Analytic Data Sets

On this screen, select:

Available Databases
All databases which are available for the Variable Transformation analysis.

Available Tables
All tables within the Source Database which are available for the Variable
Transformation analysis.

Available Columns
All columns within the selected table which are available for the Variable
Transformation analysis.

‘Transformations…’ Window
Move column(s) into this window for the Variable Transformation analysis to execute
against. First, highlight the function you wish to use in this window, for example Bin
Code:

Next, with selected function highlighted, choose column(s) by highlighting it in the

185 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Available Columns window and then clicking on the arrow button to move highlighted
column(s) into the ‘Transformations…’ window. Columns may also be dragged and
dropped into the appropriate folder.

Right-click options for this window are described below.

Right-Click Menu Options

The same right-click menu options are offered for the Columns selector on the left side of the
input screen as are offered for other input screens (refer to the Analysis Input Screen topic
in Using Teradata Warehouse Miner). Also, the following right-click options are available
within the Transformations window.

Expand All Nodes


Expands the nodes completely for each transformation folder.
Collapse All Nodes
Collapses the nodes to show only transformation folders.
Switch Input To ‘<table name>’
When this option is selected, the selectors on the left side of the input screen are
adjusted to match the table or analysis of the column associated with the selected
transformation, or otherwise with the column associated with the first transformation
specified in the current ordering.
Switch ‘<table name>’ To Current Input
When this option is selected, the selectors on the left side of the input screen are used
to change the input table or analysis of all columns in all transformations. The new
table or analysis must contain all columns in all of the transformations or an error is
given and no changes are made.
Remove All Transformations of this Type
This option applies only to Transformation folder nodes and will remove all
transformations contained within a selected folder (that is, of a given type).

Double-Back-Arrow Button
Clicking on the button with two arrows pointing to the left will remove all
transformations from the Transformations window. A prompt is given before removing
the transformations, which are removed only if OK is clicked in response.

Add Button
Clicking on the Add button leads to a dialog from which transformations may be selected
from loaded analyses to add as copies to the current analysis.

© 1999-2007 NCR Corporation, All Rights Reserved 186


Chapter Two
Analytic Data Sets

On this dialog select:

Available Analyses
This drop down list contains all of the Variable Transformation analyses currently
loaded in the Project window, including those in other projects.

Available Transformation Types


This drop down list contains all of the types of transformation so that the following
selector may be filtered and transformations more easily selected. Note that the
default value of All leads to the listing of all available transformations.

Available Transformations
These are the transformations in the currently selected analysis, filtered by type if a
specific type is selected in the selector immediately above this one. (Note that a
Derive transformation that references more than one column cannot be added, even if
it appears as an available transformation.) Select one or more transformations to add.

Column To Transform

187 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

This drop-down selector contains all of the possible columns in the table being
transformed. By default the column with name matching that being transformed in
the selected transformation to add will be selected. If a column with matching name
does not exist the user must select an appropriate column to transform.

If more than one transformation is selected the Column To Transform selector is


disabled. In this case the selected transformations are applied to columns with the
same name as the columns in the selected transformations. If any one of the selected
transformations does not transform a column with a name matching one of the
columns in the table to transform an error message is given and no transformations
are added.

OK/Cancel/Apply
Each time the Apply button is clicked a copy of the currently selected
transformations are added and a status message is given. The Apply button is also
disabled as a consequence until another transformation or column to transform is
selected. The dialog can be exited at any time by clicking the OK or Cancel button.
If OK is clicked, the currently selected transformations will be added unless the
Apply button is disabled.

Reorder Button
Clicking on the Reorder button leads to a dialog from which transformations in the
current analysis may be reordered for output purposes.

© 1999-2007 NCR Corporation, All Rights Reserved 188


Chapter Two
Analytic Data Sets

On this dialog select:

Up/Down Arrow Buttons


One or more transformations may be highlighted and moved up or down in the list
using the Up and Down arrow buttons. The Up arrow button is disabled when the
top most selected transformation is at the top of the list, and the Down arrow button
is disabled when the bottom most selected transformation is at the bottom of the list.

Ellipses (…) Button


The following options are presented when the Ellipses button is clicked.

Move to Top
This option moves all the selected transformations to the top of the list.
Move to Bottom
This option moves all the selected transformations to the bottom of the list.
Restore Initial Order
This option reorders the transformations to match the order when the dialog was
displayed.
Order by Input Columns

189 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

This option reorders the transformations by input column as displayed in the


Available Columns selector. This may either be in alphabetical or table order
depending on Preference or right-click options previously selected.

Properties Button
The Properties button leads to a dialog from which properties or default properties may
be set, as described in the following sections.

Setting Properties - Variable Transformation


Each requested transformation contains properties that can be set by editing the properties of the
column node for that transformation.

In the ‘Transformations…’ window, click on the column that was added when the transformation
was requested, for example column cust_id under Bin Code:

With column highlighted, click on the Properties button to bring up the Properties dialog:

(Tip: You can also double-click on the column name to bring up the Properties dialog.)

© 1999-2007 NCR Corporation, All Rights Reserved 190


Chapter Two
Analytic Data Sets

Setting Default Properties - Variable Transformation


Each type of transformation contains default properties that can be changed by editing the
properties of the folder node for that type of transformation. When a column is added to a
transformation folder node, the default properties currently in effect for that type of
transformation are used to set the initial property values for the newly added transformation.

The default properties for each type of transformation are saved along with the analysis so that
they will be available if changes are made to the analysis at a later time.

In the ‘Transformations…’ window, click on the folder associated with the type of transformation
you want to set default properties for:

With the folder highlighted, click on the Properties button to bring up the Properties dialog for
the selected transformation type (Bin Code in the example below):

Apply to existing <transformation type> transformations


If there are already transformations in a folder node when the default properties for that folder
node are changed, the user may request that the changes be applied to all existing
transformations in that folder. Otherwise, the property values of existing transformations are
not changed when default properties are set. (Note: This option is not available when setting
default properties for Derive transformations.)

(Tip: You can also double-click on the column name to bring up the Default Properties dialog.)

191 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Properties Dialog – Common Features


While some options on the Properties dialog (shown above) depend on the function in use, the
Properties dialog will usually contain these common features:

Output
For most transformations, the Properties dialog will have an Output tab. (For Retain
transformations, the Properties dialog has no tabs but directly displays Output options.) Clicking
on Output leads to the following display and options:

Output Name / Suffix:


If this field appears, you can rename (assign an alias to) the column, if needed. (Only in
the case of Design Code, where multiple columns are created, does this appear as Output
Suffix because it follows the prefix representing the value being encoded, separated by an
underscore character.)

Output Type:
When this field appears it lets you select output type. The default is Generate
Automatically, but you can also select the following types. (Depending on the type
selected, one or more length fields may also be presented.)

BYTEINT
CHAR
DATE
DECIMAL
FLOAT
INTEGER
SMALLINT
TIME
TIMESTAMP

© 1999-2007 NCR Corporation, All Rights Reserved 192


Chapter Two
Analytic Data Sets

VARCHAR

Column Attributes:
One or more column attributes can be entered here in a free-form manner to be used
when an output table is created. They are placed as-entered following the column
name in the CREATE TABLE AS statement. This can be particularly useful when
requesting data compression for an output column, which might look like the following:
COMPRESS NULL.

Description:
An optional description may be specified for each transformation.

Null Replacement
For most transformations, the Properties dialog will have a Null Replacement tab. Click on
Null Replacement to display options for replacing null values within the column.

On this screen you can elect to replace null values by clicking the checkbox, and then specifying
what null values are to be replaced with. The choices are:

Imputed Value
You will then need to select a column in the Imputed Column field. Click the down-
arrow beside the Imputed Column field to display available columns. (You may need
to expand tree items to drill down to individual columns.)
Literal Value
The value as specified in the Literal Value field. Literal value replacement is
supported for numeric, character and date data types.
Mean
Average value - Mean value replacement is supported for columns of numeric type or
date type, with special coding provided for date type.

193 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Median
The median value can be requested with averaging of two middle values when there
is an even number of values. Supported only for numeric and date type columns .
Median (No Averaging)
The median value can be requested without averaging of two middle values when
there is an even number of values.
Mode
The most frequently occurring value.
Target Date
Literal Target Date as specified on the INPUT-target date panel.

Properties Dialog – Function-Specific Features


Some contents of the Properties dialog will depend on the particular function in use (Bin Code,
Design Code, etc). Function-specific options on the Properties dialog are as follows:

Properties - Bin Code


If doing a Bin Code transformation there will be a Bin Code tab on the Properties dialog. Click
on Bin Code to access the following options:

Bin Code Style:


Select a Bin Code Style. Options are:
Bins
Specify a number of equal sized data bins (Default = 10).
Bin with Boundaries
Specify the minimum and maximum value and the number of equal sized data bins
(Default = 10) to create in this range.
Boundaries
Specify a list of boundary values to define the bins.
Quantiles
Specify a number of bins with a nearly equal number of values (Default = 10).
Width
Specify the desired width of each bin.

After setting values/options on the Properties dialog, click on OK to close the Properties dialog.
Then continue specifying INPUT and OUTPUT parameters described further in this chapter.

Properties - Derive
If doing a Derive transformation there will be a Derive tab on the Properties dialog. Click on
Derive to access a variation of the Variable Creation input screen. It has been altered to initially
contain a single variable consisting of the column that the Derive transformation is based upon. It
has also been altered so that the input table cannot be changed and so that a new variable cannot
be entered or the existing one deleted.

Special handling is given to aggregation functions if they appear in the user-defined expression.
Any requested aggregation function is computed over the entire input table (limited of course by
the where clause if specified as an expert option) in one of the global aggregate derived tables
shared by the other transformation functions. The aggregation is then treated as a constant in the
user-defined expression. And although the user-defined expression may include ordered
analytical functions, it may not include an aggregate within an ordered analytical function.

© 1999-2007 NCR Corporation, All Rights Reserved 194


Chapter Two
Analytic Data Sets

Special handling is also given when specifying the default properties for a Derive transformation
in the Default Properties dialog. A single variable called <default column> is initially provided.
Wherever it appears in the expression created by the user, it will be replaced by the selected
column that was used to define a specific Derive transformation. (If more instances of <default
column> are needed, the initially provided instance can be copied by dragging it with the control
key held down). This makes the default Derive transformation behave like a template for a
custom transformation.

After setting values/options on the Properties dialog, click on OK to close the Properties dialog.
Then continue specifying INPUT and OUTPUT parameters described further in this chapter.

Properties - Design Code


If doing a Design Code transformation there will be a Design Code tab on the Properties dialog.
Click on Design Code to access the following options:
Encoding Style:
Select an Encoding Style. Options are:
Contrast Code
Choose to “contrast-code” all values, resulting in –1/0/1 generated as values.
Reference Value
The value for which a –1 will be generated when the column is equal to it. (This
option only available when Contrast Code is selected.)
Dummy Code
Choose to “dummy-code” all values, resulting in 0/1 generated as values.

Values to Encode
Value
A list of values within the column that “dummy-codes” or “contrast-codes” will be
generated for. If the Contrast Coding option is selected, the Reference Value must
not be listed. Double-click in the area shown to enter the values.
Column
The desired name of the result of the Design Coding Analysis. A default name
is provided if the values are loaded with the Values… button. The data type
generated is BYTEINT.
Values
Brings up the design code wizard which determines the distinct values of the column
being design coded, and assigns default column names of <value>_<column name>
(for example, 123_Department). These columns can be renamed by highlighting them
and typing over the current name.

Special handling is necessary for the default properties of a Design Code transformation. Since
the column to be transformed is not yet known, column prefixes are associated with specific
values rather than column names. Then, when the default properties are applied to a specific
column, the column name is appended to the default prefixes. For example, if the value 0 is
associated with the prefix "0_", when the default properties are applied to the column "amount", 0
is associated with the column "0_amount".

After setting values/options on the Properties dialog, click on OK to close the Properties dialog.
Then continue specifying INPUT and OUTPUT parameters described further in this chapter.

195 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Properties - Recode
If doing a Recode transformation there will be a Recode tab on the Properties dialog. Click on
Recode to access the following options:

Values to Recode:
Create a list of categorical values to transform from one value to another. Use the Add
(and Remove) buttons as necessary to build a list or use the Values button.
From
List existing values within column to recode. These are the “Old” values to be
replaced by new values below. For example: 0, ELSE, NULL
To
New values to replace corresponding old value, one for one. For example: N, Y, N.
In this example, you will change a column 0 and other values into a column with Y/N
by changing 0 to N, all other values to Y and NULL (unknown) values to N.

Recode all other values as:


NULL
SAME
(Literal Value)
Enter a literal value in the field provided.
Values
Brings up the recode wizard which determines the distinct values of the column being
recoded, and allows you to type a new value in for each of n distinct values.

After setting values/options on the Properties dialog, click on OK to close the Properties dialog.
Then continue specifying INPUT and OUTPUT parameters described further in this chapter.

Properties - Rescale
If doing a Rescale transformation there will be a Rescale tab on the Properties dialog. Click on
Rescale to access the following options:

Upper and Lower Bound:


Then enter numeric values indicating the upper and lower bounds to rescale the column
to.
Lower Bound:
Then enter numeric value indicating the lower bound to rescale the column to.
Upper Bound:
Then enter numeric value indicating the upper bound to rescale the column to.

After setting values/options on the Properties dialog, click on OK to close the Properties dialog.
Then continue specifying INPUT and OUTPUT parameters described further in this chapter.

Properties - Sigmoid
If doing a Sigmoid transformation there will be a Sigmoid tab on the Properties dialog. Click on
Sigmoid to access the following options:

Statistical Computation:
The choices are:
Logit

© 1999-2007 NCR Corporation, All Rights Reserved 196


Chapter Two
Analytic Data Sets

Modified Logit
Hyperbolic Tangent

After setting values/options on the Properties dialog, click on OK to close the Properties dialog.
Then continue specifying INPUT and OUTPUT parameters described further in this chapter.

Variable Transformation - INPUT - Primary Key


On the Variable Transformation dialog, click on INPUT and then click on primary key:

The purpose of this screen is to specify the columns that comprise the primary key of the input
table or view being transformed. (This is required only when null value replacement is requested
in one of the requested transformations.)

If input comes from a table the primary index columns of the table will be selected by default. To
change these columns, or to enter them initially if input is from a view, use the selectors as
described below.

Available Tables:
Pull-down with the name of the input table or view.
Available Columns:
All columns within the table or view selected in Available Tables. Highlight those
columns which comprise the primary key of the table or view and either drag and drop

197 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

them to Selected Primary Key Columns, or use the right arrow button > to move them
over.
Selected Primary Key Columns:
All columns within the table or view that constitute the primary key (that is, that uniquely
identify each row). If undesired columns were moved into this area, highlight those
columns and either drag and drop them back to Available Columns, or use the left arrow
button < to move them back.

Variable Transformation - INPUT – Analysis Parameters


On the Variable Transformation dialog, click on INPUT and then click on analysis parameters:

If a Target Date was used for NULL value replacement, it can be set here. The default value is the
current date, and can be changed by either typing in another date, specifying month, day and year
separately, or selecting a date with the standard Windows calendar control as shown above.

Variable Transformation - INPUT - Expert Options


On the Variable Transformations dialog, click on INPUT and then click on expert options:

The resulting screen has the WHERE option available:

Optional WHERE clause text

© 1999-2007 NCR Corporation, All Rights Reserved 198


Chapter Two
Analytic Data Sets

Option to generate a SQL WHERE clause(s) to restrict rows selected for analysis.

Variable Transformation - OUTPUT - Storage


On the Variable Transformation dialog click on OUTPUT and then click on storage:

On this screen, select:

Use the Teradata EXPLAIN feature to display the execution plan for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.

Store the tabular output of this analysis in the database


Option to generate a Teradata TABLE or VIEW populated with the results of the
analysis. Once enabled, the next three fields must be specified:
Database Name
Text box to specify the name of the Teradata database where the resultant Table or
View will be created in. By default, this is the “Result Database.”
Output Name
Text box to specify the name of the Teradata Table or View.
Output Type
Pull-down to specify Table or View.
Create Output table using the FALLBACK keyword
If a table is selected, it will be built with FALLBACK if this option is selected
Create Output table using the MULTISET keyword
If a table is selected, it will be built as a MULTISET table if this option is selected

Generate the SQL for this analysis, but do not execute it


If this option is selected the analysis will only generate SQL, returning it and terminating
immediately.

Variable Transformation - OUTPUT - Primary Index


On the Variable Transformation dialog click on OUTPUT and then click on primary index:

199 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

On this screen, select the columns which comprise the primary index of the output table:

Available Columns
A list of columns which comprise the index of the resultant table if an Output Table is
used.
Primary Index Columns
Select columns by highlighting and then either dragging and dropping into the Primary
Index Columns window, or click on the arrow button to move highlighted columns into
the Primary Index Columns window.

Run the Variable Transformation Analysis


After setting INPUT and OUTPUT parameters as described above, you are ready to run the
analysis. To run the analysis you can either:
• Click the Run icon on the toolbar, or
• Select Run <project name> on the Project menu, or
• Press the F5 key on your keyboard

© 1999-2007 NCR Corporation, All Rights Reserved 200


Chapter Two
Analytic Data Sets

Results - Variable Transformation


The results of running a Variable Transformation analysis include the generated SQL itself, the
results of executing the generated SQL, and, if selected, a Teradata table (or view). All these
results are outlined below.

Variable Transformation - RESULTS - Data


On the Variable Transformation dialog, click on RESULTS and then click on data (note that the
RESULTS tab will be grayed-out/disabled until after the analysis is completed):

The results of the completed query are returned in this Data Viewer page. This page has the
properties of the Data page discussed in the Chapter on Using the Teradata Warehouse Miner
Graphical User Interface. With the exception of the Explain Select Result Option, these results
will match the tables described below in the Output Column Definition section, depending upon
the parameters chosen for the analysis.

Variable Transformation - RESULTS - SQL


On the Variable Transformation dialog, click on RESULTS and then click on SQL (note that the
RESULTS tab will be grayed-out/disabled until after the analysis is completed):

The generated SQL is returned as text which can be copied, pasted, or printed.

Tutorial – Variable Transformation Analysis

Variable Transformation - Example #1

Parameterize a Variable Transformation Analysis as follows:

1. Select twm_tutorials_vc1 (as created in Variable Creation Tutorial #1) as the


Available Table
2. Drag and drop the following columns from twm_tutorials_vc1 to the following
transformation functions in Created Transformations:

• cust_id - Retain folder


• income - Bin Code folder

201 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

• gender - Design Code folder


• marital_status - Recode folder
• age - Rescale folder
• avg_cc_tran_amt - Z Score folder (rename zavg_cc_tran_amt)

3. Let all of the transformation functions properties default, except as follows. Double
click on the variable name to bring up the Properties screen:

4. gender - Design Code

Click on the Design Code tab on the Properties screen and then click on the Values
button to bring up the Design Code values Wizard:

Select both F and M by highlighting the and hitting the Add> button. Hit Finish to
exit the Wizard.

5. The default values of F_gender and M_gender are given for the values of F and M
respectively. Highlight those values and type in Females and Males accordingly:

© 1999-2007 NCR Corporation, All Rights Reserved 202


Chapter Two
Analytic Data Sets

6. marital_status - Recode

Click on the Recode tab on the Properties screen and then click on the Values button
to bring up the Recode values Wizard:

203 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Select 1-4 by highlighting them and hitting the Add> button. Hit Finish to exit the
Wizard.

7. Specify recode values as follows: 1-S, 2-M, 3-S and 4-S.

8. age - Rescale

Specify a lower bound of 0 and an upper bound of 1 as follows:

© 1999-2007 NCR Corporation, All Rights Reserved 204


Chapter Two
Analytic Data Sets

9. Go to OUTPUT-storage, and select Store the tabular output of this analysis in the
database. Specify that a Table should be created named twm_tutorials_vt1.

For this example, the Variable Transformation Analysis generated the following results. Note that
the SQL is not shown for brevity:

Data

Only the first 10 columns after sorting are shown.

cust_id income Females Males marital Age zavg_cc_tran


1362480 9 0 1 M 0.15 0.88
1362481 3 1 0 M 0.21 -1.09
1362484 1 1 0 S 0.33 0.27
1362485 4 1 0 S 0.00 0.22
1362486 1 1 0 S 0.98 -1.09
1362487 1 0 1 M 0.88 0.02
1362488 1 1 0 M 1.00 -1.09
1362489 10 1 0 S 0.19 1.88
1362492 3 1 0 S 0.36 0.15
1362496 1 0 1 S 0.00 -0.39

205 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Build ADS (Analytic Data Set)

The purpose of analytic data set functions is to build a data set table or view. Each Variable
Creation and Variable Transformation analysis creates a table or view to be joined together into a
final data set table. This duty is performed by the Build ADS analysis.

The Build ADS analysis has similar functionality to the Join analysis in the Reorganization group
of analyses. However, it is distinguished by these differences.

• A join table or view is not required, so that it may operate on a single table or view.
• Tables are joined together via Join Paths as in a Variable Creation analysis, but without
Anchor Columns (refer to the section Variable Creation – Input – anchor table).
• By using Join Paths, Build ADS allows the use of Cross Join as a Join Style.
• By using Join Paths, the Join Style can be set differently for different tables.
• By using Join Paths, comparison operators may be set individually in Join Steps.

It should be pointed out that although the Variable Creaton analysis can be used in place of
Build ADS, Build ADS is simpler and easier to use in the functions it performs.

Initiate a Build ADS


Use the following procedure to initiate a new Build ADS analysis in Teradata Warehouse Miner:

1. Click on the Add New Analysis icon in the toolbar:

2. In the resulting Add New Analysis dialog box, click on to highlight ADS under Categories,
and then under Analyses double-click on Build ADS:

© 1999-2007 NCR Corporation, All Rights Reserved 206


Chapter Two
Analytic Data Sets

3. This will bring up the Build ADS dialog in which you will enter INPUT and OUTPUT options
to parameterize the analysis as described in the next sections.

Build ADS - INPUT - Data Selection


On the Build ADS dialog click on INPUT and then click on data selection:

On this screen select:

Available Databases
All the databases which are available for the Build ADS Analysis.
Available Tables
All the tables within the Source Database that are available for the Build ADS Analysis.
Available Columns
All the columns within the selected table that are available for the Build ADS Analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the Selected
Columns window, or click on the arrow button to move highlighted columns into the
Selected Columns window.

207 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Build ADS - INPUT – Anchor Table


On the Build ADS dialog click on INPUT and then click on anchor table:

This screen performs the same function it does for the Variable Creation analysis with the
exception that the selector for Anchor Columns is not used. Refer to the section Variable
Creation – Input – Anchor Table for details.

Build ADS - INPUT - Expert Options


On the Build ADS dialog click on INPUT and then click on expert options:

This screen provides the option to generate a SQL WHERE clause(s) to restrict rows selected for
analysis (for example: cust_id > 0).

It may be useful to note that if a WHERE clause condition is specified on the "inner" table of a
join (i.e. a table that contributes only matched rows to the results), the join is logically equivalent
to an Inner Join, regardless of whether an Outer type is specified. (In a Left Outer Join, the left
table is the "outer" table and the right table is the "inner" table.)

Build ADS - OUTPUT - Storage


Before running the analysis, define Output options. On the Build ADS dialog click on OUTPUT
and then click on storage:

On this screen select:

Use the Teradata EXPLAIN feature to display the execution path for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.

© 1999-2007 NCR Corporation, All Rights Reserved 208


Chapter Two
Analytic Data Sets

Store the tabular output of this analysis in the database


Option to generate a Teradata TABLE or VIEW populated with the results of the
analysis. Once enabled, the following three fields must be specified:
Database Name
Text box to specify the name of the Teradata database where the resultant Table or
View will be created in. By default, this is the “Result Database.”
Output Name
Text box to specify the name of the Teradata Table or View.
Output Type
Pull-down to specify Table or View.
Create Output table using the FALLBACK keyword
If a table is selected, it will be built with FALLBACK if this option is selected
Create Output table using the MULTISET keyword
If a table is selected, it will be built as a MULTISET table if this option is selected

Generate the SQL for this analysis, but do not execute it


If this option is selected the analysis will only generate SQL, returning it and terminating
immediately.

Build ADS - OUTPUT - Primary Index


Before running the analysis, define Output options. On the Build ADS dialog click on OUTPUT
and then click on primary index:

On this screen, select the columns which comprise the primary index of the output table. Select:

Available Columns
A list of columns which comprise the index of the resultant table if an Output Table is
used.
Primary Index Columns
Select columns by highlighting and then either dragging and dropping into the Primary
Index Columns window, or click on the arrow button to move highlighted columns into
the Primary Index Columns window.

Run the Build ADS Analysis


After setting parameters on the INPUT and OUTPUT screens as described above, you are ready
to run the analysis. To run the analysis you can either:
• Click the Run icon on the toolbar, or
• Select Run <project name> on the Project menu, or
• Press the F5 key on your keyboard

209 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Results - Build ADS


The results of running the Build ADS analysis include the generated SQL itself, the results of
executing the generated SQL, and, if the Create Table (or View) option is chosen, a Teradata
table (or view). All of these results are outlined below.

Build ADS - RESULTS - Data


On the Build ADS dialog, click on RESULTS and then click on data (note that the RESULTS
tab will be grayed-out/disabled until after the analysis is completed):

The results of the completed query are returned in a Data page within Results. This page has the
properties of the Data page discussed in the Chapter on Using the Teradata Warehouse Miner
Graphical User Interface. With the exception of the Explain Select Result Option, these results
will match the tables described below in the Output Column Definition section, depending upon
the parameters chosen for the analysis.

Build ADS - RESULTS - SQL


On the Build ADS dialog, click on RESULTS and then click on SQL (note that the RESULTS
tab will be grayed-out/disabled until after the analysis is completed):

The generated SQL is returned here as text which can be copied, pasted, or printed.

Tutorial – Build ADS Analysis

Build ADS - Example #1

Parameterize a Build ADS Analysis as follows:

Selected Columns TWM_CUSTOMER.cust_id


TWM_CUSTOMER.nbr_children
TWM_CUSTOMER.postal_code
(From Variable Transformation Tutorial #1)
twm_tutorials_vt1.age
twm_tutorials_vt1.income

© 1999-2007 NCR Corporation, All Rights Reserved 210


Chapter Two
Analytic Data Sets

twm_tutorials_vt1.marital_status
twm_tutorials_vt1.Females
twm_tutorials_vt1.Males
twm_tutorials_vt1.zavg_cc_tran_amt
(From Variable Creation Tutorial #2)
twm_tutorials_vc2.CC_acct
twm_tutorials_vc2.CC_bal
twm_tutorials_vc2.CK_acct
twm_tutorials_vc2.CK_bal
twm_tutorials_vc2.SV_acct
twm_tutorials_vc2.SV_bal
twm_tutorials_vc2.Q1_nbr_trans
twm_tutorials_vc2.Q2_nbr_trans
twm_tutorials_vc2.Q3_nbr_trans
twm_tutorials_vc2.Q4_nbr_trans
Anchor Table TWM_CUSTOMER
Inner Join to twm_tutorials_vt1 on cust_id
Inner Join to twm_tutorials_vc2 on cust_id

Go to OUTPUT-storage, and select Store the tabular output of this analysis in the
database. Specify that a Table should be created named twm_tutorials_bads1.

For this example, the Build ADS Analysis generates a table of 747 rows with the 19 columns
above joined together and, containing the same cust_id values as the TWM_CUSTOMER table.
Note that the SQL is not shown for brevity.

211 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Refresh

The Refresh Analysis is provided as a means to re-execute a chain of referenced analyses with a
different set of user specified parameters without modifying the original analyses. It falls under
the ADS umbrella because it is designed to allow the user to refresh an analytic data set, however
in addition to ADS Analyses it may also be used to refresh Score Analyses.

Creating an analytic data set can require a lot of thought and result in many steps of creating
variables and reorganizing data. There can be multiple tables joined by complicated join paths,
sophisticated arithmetic formulas, as well as the dimensioning of variables. With the use of
Analysis References, that provide a means to feed the output of a previous analysis into a
subsequent analysis, the result can be a complex string of analyses that make up the creation of a
final analytic data set. As the source data changes over time, it might be necessary to modify the
parameters used in generating the analytic data set. Appart from Refresh, there are two ways to do
this. The first way is to reproduce the entire set of analyses used to generate the analytic data set
with the new modified parameters. This is not ideal because if it is a complicated set of analyses it
could take a significant amount of time to reproduce it when you only wanted to change a few
things. The second way is to actually change the original analyses with new parameters. The
problem with this is that the original ADS template is now permanently changed.

With the Refresh Analysis, the original analyses can be re-executed with the modified parameters
without affecting the original parameters used. If any of the parameters are not selected to be
changed, then the original values are used. When Refresh is run, the analysis to be refreshed is
executed (along with any analyses that it references) using the new parameters specified within
Refresh. Over and above this it should be noted that, using one of the most powerful features of
the Refresh analysis, the referenced analyses will only generate the columns needed for the
analysis that is being refreshed.

Initiate a Refresh Analysis


Use the following procedure to initiate a new Refresh Analysis in Teradata Warehouse Miner:

1. Click on the Add New Analysis icon in the toolbar:

2. In the resulting Add New Analysis dialog box, with ADS highlighted on the left, double-click
on the Refresh icon:

© 1999-2007 NCR Corporation, All Rights Reserved 212


Chapter Two
Analytic Data Sets

3. This will bring up the Refresh dialog in which you will enter INPUT options to parameterize
the analysis as described in the next section.

Refresh - INPUT - Data Selection


On the Refresh dialog click on INPUT and the analysis parameters tab will automatically be
selected.

On this screen select:

Available Analyses
Select a single analysis from the list of all of the analyses in the current project which are
available for the Refresh Analysis.
Modify Output
Check the box if you wish to change the output database and/or output table of the analysis to be
refreshed
DatabaseName
The name of the output database of the analysis to be refreshed
Table/View Name
The name of the output table or view of the analysis to be refreshed

213 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

Modify Anchor Table


Check the box if you wish to modify the Anchor Table. This applies only if there are one or more
Variable Creation Analyses either as the selected analysis or, if it exists, in the chain of analyses
referenced by the selected analysis. When a new Anchor Table is selected, the new Anchor Table
is joined to the old Anchor Table(s) in all the Variable Creation Analyses in the reference chain
by means of a LEFT OUTER JOIN. (Any Anchor Tables in analyses of other types are not
affected.)
Modify Target Date
Check the box if you wish to modify the Target Date. This applies only if there are one or more
Variable Creation or Variable Transformation Analyses that use a target date either as the
selected analysis or, if it exists, in the chain of analyses referenced by the selected analysis.
Generate SQL Only
Check the box if you wish for the selected analysis, and if it exists, the chain of analyses
referenced by the selected analysis to generate SQL rather than execute it.

Run the Refresh Analysis


After setting parameters on the INPUT screen as described above you are ready to run the
analysis. To run the analysis you can either:
• Click the Run icon on the toolbar, or
• Select Run <project name> on the Project menu, or
Press the F5 key on your keyboard

Results - Refresh
On the Refresh dialog click on RESULTS (note that the RESULTS tab will be grayed-
out/disabled until after the analysis is completed):

Tutorial – Refresh

Refresh – Example

(Note: The following example will contain a Variable Creation, which will then be input into
the Refresh Analysis)

Parameterize a Variable Creation Analysis as follows:

1. Select TWM_CUSTOMER as the Available Table


2. Create one variable by clicking on the New button and drag and drop the following
column. Note the variable name will default to the column name.
• TWM_CUSTOMER.cust_id
3. Select TWM_CREDIT_TRAN as the Available Table.
4. Create a variable by clicking on the New button and build up an expression as follows:

© 1999-2007 NCR Corporation, All Rights Reserved 214


Chapter Two
Analytic Data Sets

5. Rename the variable to “avg_tran_amt” and drag an AVERAGE (Arithmetic) SQL


Element over the Variable, and then drag the following column over the empty
arguments:
• TWM_CREDIT_TRAN.tran_amt

6. Go to INPUT- variables-Dimensions tab and create a dimension by clicking on the New


button, and then drag an AND (Logical) on the Dimension Value. Rename the Dimension
to “LastMonth”

7. Drag a LESS THAN OR EQUALS (Comparison) onto the first empty argument, and a
GREATER THAN (Comparison) onto the second empty argument.

8. Drag a DATE DIFFERENCE (Date and Time) onto the first empty argument of each
comparison operator.

9. Drag a TARGET DATE (Literals) onto the first empty argument of each Date Difference
and drag TWM_CREDIT_TRAN.tran_date onto the second empty argument of each
Date Difference.

215 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Two
Analytic Data Sets

10. Drag a NUMBER (30) (Literal) onto the empty argument of the LESS THAN OR
EQUALS, and drag a NUMBER (0) (Literal) onto the empty arugment of the GREATER
THAN.

11. Go to the INPUT-dimensions tab and apply the dimension to the variable in the following
way.

© 1999-2007 NCR Corporation, All Rights Reserved 216


Chapter Two
Analytic Data Sets

12. Specify the Join Paths from TWM_CUSTOMER to each of the following by selecting a
table in the Join Path from Anchor Table To: and clicking on the Wizard button.
Specify the following Join Paths:
TWM_CUSTOMER.cust_id - TWM_CREDIT_TRAN.cust_id

13. Go to INPUT-target date, and change the Target Date to 7/31/1995.


14. Go to OUTPUT-storage, and select Store the tabular output of this analysis in the
database. Specify that a Table should be created named twm_tutorials_refresh.

Run the analysis.

Parameterize a Refresh Analysis as follows:

Available Analyses Variable Creation1


Modify Output Checked
Table Name twm_tutorials_refresh2
Modify Anchor Table Checked, twm_savings_tran
Modify Target Date 8/31/1995

Run the Analysis. View the generated SQL (which has been generated by the Variable
Creation Analysis, but modified by the Refresh Analysis) to see how the target date, output
table, and anchor table have been changed.

217 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Three
Matrix Functions

3. Matrix Functions

Overview – Matrix Functions


Teradata Warehouse Miner functions are provided to build matrices which can drastically reduce
the amount of data required for analytic algorithms. Numeric columns in potentially huge
relational tables are reduced to a comparatively compact matrix (n-by-n if there are n columns),
which can be delivered to two of the Teradata Warehouse Miner Analytic Algorithms (Linear
Regression and Factor Analysis), or an external application for further analysis. One example of
an external application would be SAS, which provides principal component analysis or linear
regression analysis based on a correlation or covariance matrix as input.

The matrix functions must operate on numeric data. Columns of type DATE will not produce
meaningful results. NULL values are handled via listwise or pairwise deletion in the Matrix
analysis.

These functions are valid for any of the supported data reduction matrix types, namely
correlation, covariance, sums of squares and cross products, and corrected sums of squares and
cross products. Internally the Matrix analysis stores the matrix as an extended sums of squares
and cross products matrix, with an additional column containing a constant value, 1. The actual
conversion to another type, if requested, is computed in the Export Matrix analysis.

The Matrix functions are:

Matrix
Build an extended Sums of Squares and Cross-Products (SSCP) data reduction matrix.
Optionally, restart the Matrix process upon a failure or when a previously-executed
Matrix was stopped.

Export Matrix
Convert or export the resultant matrix and build either a SAS data step, a Teradata table,
or just view the results. Valid matrices include:
• Pearson-product moment correlations (COR)
• Covariances (COV)
• Sums of squares and cross-products (SSCP)
• Corrected Sums of squares and cross-products (CSSCP)

© 1999-2007 NCR Corporation, All Rights Reserved 218


Chapter Three
Matrix Functions

Matrix Analysis
The Matrix analysis will process the input data so that one of the following data reduction
matrices can be exported via the Export Matrix analysis:

• Pearson-product moment correlations (COR)


• Covariances (COV)
• Sums of squares and cross-products (SSCP)
• Corrected Sums of squares and cross-products (CSSCP)

The formulas used to calculate these matrices are given below.

Correlation
The Pearson Product-Moment Correlation value of the pairwise combinations of each
column within the selected table. This is calculated as follows, for each pairwise
combination of variables X and Y:

(n ⋅ ∑ xy ) − ∑ x ⋅ ∑ y
ƒ(x,y) =
((n ⋅ ∑ x )− (∑ x) )⋅ ((n ⋅ ∑ y )− (∑ y ) )
2 2 2 2

where n is the total number of occurrences of this variable.

Covariance
The Covariance value of the pairwise combinations of each column within the selected
table. This is calculated as follows, for each pairwise combination of variables X and Y:

⎛ ⎞


∑ x ⋅ y ⎞⎟ − ⎜ ∑ x ⋅∑ y ⎟

ƒ(x,y) = ⎜⎜
⎝ n − 1 ⎟⎠ ⎜ n ( n − 1) ⎟
⎜⎜ ⎟⎟
⎝ ⎠

where n is the total number of occurrences of this variable.

Sums of Squares and Cross-Products


The Sums of squares and Cross-Products value of the pair-wise combinations of each
column within the selected table. This is calculated as follows, for each pair-wise
combination of variables X and Y:

n
ƒ(x,y) = ∑
i =1
xy

where n is the total number of occurrences of this variable.

Corrected Sums of Squares and Cross-Products


The Corrected Sums of squares and Cross-Products value of the pair-wise combinations
of each column within the selected table. This is calculated as follows, for each pair-wise

219 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Three
Matrix Functions

combination of variables X and Y:

ƒ(x,y) = ∑ x⋅y −
∑ x ⋅∑ y
n

where n is the total number of occurrences of this variable.

The matrix functions must operate on numeric data. Columns of type DATE will not produce
meaningful results.

An option is provided for list-wise versus pair-wise deletion or omission of values which are
NULL. With list-wise deletion, the default option, if the value of any column to be included in
matrix calculations is NULL, the entire row is omitted during matrix calculations. Alternatively,
if pair-wise deletion is chosen, only pairs of values involving a NULL are ignored, not entire
rows. The danger in this case is that when later analysis is performed on the matrix, it is possible
that mathematical irregularities will be found due to the calculations being made over different
numbers of observations.

The Matrix analysis has restart capabilities as well. If a system failure occurs, or the Matrix
analysis is stopped by the end-user, it can be restarted, beginning its calculations at the point of
stoppage.

Note that the name of the Matrix analysis will be used to fetch the matrix values from the
database for those functions that are dependent upon a matrix – namely, Export Matrix, Linear
Regression and Factor Analysis.

Initiate a Matrix Function


Use the following procedure to initiate a new Matrix analysis in Teradata Warehouse Miner:

1. Click on the Add New Analysis icon in the toolbar:

2. In the resulting Add New Analysis dialog box, click on Matrix Functions under Categories
and then under Analyses double-click on Matrix:

© 1999-2007 NCR Corporation, All Rights Reserved 220


Chapter Three
Matrix Functions

3. This will bring up the Matrix dialog in which you will enter INPUT and OUTPUT options to
parameterize the analysis as described in the next sections.

221 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Three
Matrix Functions

Matrix - INPUT - Data Selection


On the Matrix dialog click on INPUT and then click on data selection:

On this screen select:

Available Databases
All the databases which are available for the Matrix analysis.
Available Tables
All the tables within the Source Database that are available for the Matrix analysis.
Available Columns
All the columns within the selected table that are available for the Matrix analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the Selected
Columns window, or click on the arrow button to move highlighted columns into the
Selected Columns window.

Matrix - INPUT - Analysis Parameters


On the Matrix dialog click on INPUT and then click on analysis parameters:

On this screen select:

Null Handling
Provides option for list-wise versus pair-wise deletion, used for omission of values which
are NULL.
Pairwise Deletion
Only pairs of values involving a NULL are ignored, not entire rows.
Listwise Deletion
If the value of any column to be included in the matrix is NULL, the entire row is
omitted during matrix calculations.

Matrix Width
The width of the matrix results. Width is the number of columns processed with each
SQL statement.
Number of Connections

© 1999-2007 NCR Corporation, All Rights Reserved 222


Chapter Three
Matrix Functions

The number of threads or simultaneous connections to the data source. Multiple sessions
may speed the SQL execution process.

Continue Execution (instead of starting over)


If a previously executed Matrix analysis was stopped, or failed after some portion of the
matrix was built, this option will be enabled to allow you to begin the Matrix process at
the point the analysis terminated.

Run the Matrix Analysis


After setting parameters on the INPUT screens as described above, you are ready to run the
analysis. To run the analysis you can either:
• Click the Run icon on the toolbar, or
• Select Run <project name> on the Project menu, or
• Press the F5 key on your keyboard

Results - Matrix
The results from running the Matrix analysis are persisted within the Metadata model, and are not
returned to the front-end. Results can be viewed using the Export Matrix analysis (next section in
this chapter).

Tutorial - Matrix

Matrix Example #1

Parameterize a Matrix analysis as follows. Note that this matrix will be used in the Linear
Regression and Factor Analysis Tutorials in subsequent chapters:

Selected Table and Columns TWM_CUSTOMER_ANALYSIS.income


TWM_CUSTOMER_ANALYSIS.age
TWM_CUSTOMER_ANALYSIS.years_with_bank
TWM_CUSTOMER_ANALYSIS.nbr_children
TWM_CUSTOMER_ANALYSIS.marital_status
TWM_CUSTOMER_ANALYSIS.female
TWM_CUSTOMER_ANALYSIS.single
TWM_CUSTOMER_ANALYSIS.married
TWM_CUSTOMER_ANALYSIS.separated
TWM_CUSTOMER_ANALYSIS.ccacct
TWM_CUSTOMER_ANALYSIS.ckacct
TWM_CUSTOMER_ANALYSIS.svacct
TWM_CUSTOMER_ANALYSIS.avg_cc_bal
TWM_CUSTOMER_ANALYSIS.avg_ck_bal
TWM_CUSTOMER_ANALYSIS.avg_sv_bal
TWM_CUSTOMER_ANALYSIS.avg_cc_tran_amt
TWM_CUSTOMER_ANALYSIS.avg_cc_tran_cnt
TWM_CUSTOMER_ANALYSIS.avg_ck_tran_amt

223 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Three
Matrix Functions

TWM_CUSTOMER_ANALYSIS.avg_ck_tran_cnt
TWM_CUSTOMER_ANALYSIS.avg_sv_tran_amt
TWM_CUSTOMER_ANALYSIS.avg_sv_tran_cnt
TWM_CUSTOMER_ANALYSIS.cc_rev
Analysis Name: Customer_Analysis_Matrix

There are no viewable results generated as a result of executing the Matrix analysis. The results
will be viewed via the Export Matrix analysis tutorial (later in this chapter). Save the Matrix
analysis with the above mentioned name “Customer_Analysis_Matrix” for use in the Linear
Regression and Factor analysis tutorials.

Matrix Example #2

Parameterize a Matrix analysis as follows:

Selected Tables and Columns TWM_CUSTOMER_ANALYSIS.income


TWM_CUSTOMER_ANALYSIS.age
TWM_CUSTOMER_ANALYSIS.years_with_bank
TWM_CUSTOMER_ANALYSIS.nbr_children
TWM_CUSTOMER_ANALYSIS.marital_status
TWM_CUSTOMER_ANALYSIS.female
Analysis Name: Customer_Analysis_Matrix_Short

There are no viewable results generated as a result of executing the Matrix analysis. The results
will be viewed via the Export Matrix analysis tutorial. Save the Matrix analysis with the above
mentioned name “Customer_Analysis_Matrix_Short” for use during the Export Matrix tutorial.

© 1999-2007 NCR Corporation, All Rights Reserved 224


Chapter Three
Matrix Functions

Export Matrix
The Export Matrix analysis will export the matrix data values built by the Matrix analysis in one
of the following forms. (Note that the form is not specified when the matrix is built, yet the
matrix can be requested in any form when it is exported.)

• Pearson-product moment correlations (COR)


• Covariances (COV)
• Sums of squares and cross-products (SSCP)
• Corrected Sums of squares and cross-products (CSSCP)

The exported matrices can take on one of the following formats:

• SAS DataStep
• Teradata Table
• Viewable Results

If a SAS data step script is created to build a “special” (matrix) SAS data set, the script will
produce, when executed with a SAS application, a data set with the same name as the SAS file
name. This function automatically appends “.sas” to the end of the requested output (script)
name, and SAS will create a .log file when the script is executed.

If a table containing the matrix is created, the table will contain one column for each column used
to build the matrix, with the same name as the original column, or the alias, if any, which was
given to the Matrix analysis. In addition, an XIDX column is added to the front of the result table,
along with an XCOL column containing the name of the original column or alias.

To view the correlation, covariance, SSCP or CSSCP matrix, specify no Output Options on the
analysis parameters tab. After the analysis has executed, click on the Results tab to view the
matrix.

225 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Three
Matrix Functions

Initiate an Export Matrix Function


Use the following procedure to initiate a new Export Matrix analysis in Teradata Warehouse
Miner:

1. Click on the Add New Analysis icon in the toolbar:

2. In the resulting Add New Analysis dialog box, click on Matrix Functions under Categories
and then under Analyses double-click on Export Matrix:

3. This will bring up the Export Matrix dialog in which you will enter INPUT and OUTPUT
options to parameterize the analysis as described in the next sections.

© 1999-2007 NCR Corporation, All Rights Reserved 226


Chapter Three
Matrix Functions

Export Matrix - INPUT - Data Selection


On the Matrix dialog click on INPUT and then click on data selection:

On this screen select:

Available Matrices
All the matrices within the Metadata Database that have been previously built with the
Matrix analysis and have been saved to metadata are available to export with the Export
Matrix analysis. These are identified by the analysis name of the Matrix analysis.
Selected Matrix
The Matrix analysis name of the matrix to export.

Export Matrix - INPUT - Analysis Parameters


On the Matrix dialog click on INPUT and then click on analysis parameters:

On this screen select:

Matrix Type
Provides options for the specific type of matrix to export.
Correlation
Export the matrix values as Pearson-product moment correlations.
Covariance
Export the matrix values as Covariances.
SSCP
Export the matrix values as an extended Sums of squares and cross-products, with the
column of constant 1’s labeled INTERCEPT.
CSSCP
Export the matrix values as Corrected Sums of squares and cross-products.

Output Options
Create a SAS DataStep based on this Matrix
Build the matrix results within a SAS DataStep script.
Use truncated (8 character) Column Names

227 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Three
Matrix Functions

Check to force column/alias name to 8 characters or less.


File Name
You can use the Browse button to bring up a standard browse dialog, to choose
a location to save the exported Flat File, Report or SAS Data Step.
Create a Database Table based on this matrix
Build the matrix results as a Teradata table. You will need to specify a Table Name.

Run the Export Matrix


After setting parameters on the INPUT screens as described above, you are ready to run the
analysis. To run the analysis you can either:
• Click the Run icon on the toolbar, or
• Select Run <project name> on the Project menu, or
• Press the F5 key on your keyboard

Results – Export Matrix


An XML Results object is built when an Export Matrix analysis has completed execution. It
contains a single tag – <Format> Matrix, where <Format> is either Correlation, Covariance,
SSCP, or CSSCP. On the Export Matrix dialog, click on RESULTS (note that the RESULTS
tab will be grayed-out/disabled until after the analysis has been run to completion):

Output Columns - Export Matrix


If the “Create a Database Table” output option is chosen, then the following table is built by the
Export Matrix analysis. Those columns in bold below will comprise the Unique Primary Index
(UPI).

Name Type Definition


XIDX Integer Unique integer value indicating an internal “index” of the column(s) selected in
Selected Tables and Columns. Used as a row identifier in order to manipulate
the table with matrix algebra.
XCOL VARCHAR(30) The column names selected in Selected Tables and Columns.
<matrix_values> FLOAT A column, with the same name as that selected in Selected Tables and
Columns will be generated. Data type for all is FLOAT.

Tutorial - Export Matrix

Export Matrix Example #1

Parameterize a Export Matrix analysis as follows:

© 1999-2007 NCR Corporation, All Rights Reserved 228


Chapter Three
Matrix Functions

Selected Matrix Customer_Analysis_Matrix_Short


SSCP Enabled
Create a SAS Data Step Enabled
Use truncated (8 character) column names Enabled
File Name Twm_SSCP_Values.sas

Run the analysis and edit the resultant SSCP_Values.sas SAS data step script:

DATA SSCP_Values (type=SSCP);


infile cards flowover;
ARRAY columns [7] INTERCEP income age years_01 nbr_ch01 marita01 female;

input _type_ $ _name_ $ columns[*];


cards;

N CNT 747.000000 747.000000 747.000000 747.000000 747.000000 747.000000


MEAN AVG 22728.281124 42.479250 3.907631 0.714859 1.882195 0.559572
STD STDDEV 22207.221405 19.114879 2.675634 1.103410 0.892051 0.496771

SSCP INTERCEP 747.000000 16978026.000000 31732.000000 2919.000000 534.000000


1406.000000 418.000000
SSCP income 16978026.000000 753779217048.000000 798771897.000000 68143689.000000
17316503.000000 35612419.000000 8290553.000000
SSCP age 31732.000000 798771897.000000 1620524.000000 130921.000000 21784.000000
64058.000000 17696.000000
SSCP years_01 2919.000000 68143689.000000 130921.000000 16747.000000 2010.000000
5475.000000 1629.000000
SSCP nbr_ch01 534.000000 17316503.000000 21784.000000 2010.000000 1290.000000
1355.000000 295.000000
SSCP marita01 1406.000000 35612419.000000 64058.000000 5475.000000 1355.000000
3240.000000 787.000000
SSCP female 418.000000 8290553.000000 17696.000000 1629.000000 295.000000
787.000000 418.000000
;

Export Matrix Example #2

Parameterize an Export Matrix analysis as follows:

Selected Matrix Customer_Analysis_Matrix_Short


CSSCP Enabled
Create Table Enabled
Table Name Twm_CSSCP_Matrix

Run the analysis and view the results with either QueryMan or the SQL Node by executing the
following queries:
SHOW TABLE <result_db>.CSSCP_Matrix;
SELECT * from <result_db>.CSSCP_Matrix order by 1;

229 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Three
Matrix Functions

Note the following results:


CREATE SET TABLE <result_db>.CSSCP_Matrix ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
(
XIDX INTEGER,
XCOL VARCHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
income FLOAT,
age FLOAT,
years_with_bank FLOAT,
nbr_children FLOAT,
marital_status FLOAT,
female FLOAT)
UNIQUE PRIMARY INDEX ( XIDX );

XIDX XCOL income age years_wi


1 income 367897869180.964 77558080.3574296 1799836.3975904
2 age 77558080.3574296 272572.4283802 6924.0682731
3 years_wi 1799836.3975904 6924.0682731 5340.626506
4 nbr_chil 5179600.8795181 -899.9196787 -76.6746988
5 marital_ 3656455.7389558 4332.1740295 -19.1285141
6 female -1209868.5100402 -60.3266399 -4.3895582

XIDX XCOL nbr_chil marital_ female
1 income 5179600.8795181 3656455.7389558 -1209868.5100402
2 age -899.9196787 4332.1740295 -60.3266399
3 years_wi -76.6746988 -19.1285141 -4.3895582
4 nbr_chil 908.2650602 349.9076305 -3.811245
5 marital_ 349.9076305 593.6331995 .2423025
6 female -3.811245 .2423025 184.0990629

Export Matrix Example #3

Parameterize a Export Matrix analysis as follows:

Selected Matrix Customer_Analysis_Matrix_Short


Correlation Enabled
Output Options None

Run the analysis to see the following results:

Results

Click on the Results tab to the see the following Matrix Report:

- income age years_with_bank nbr_children marital_status female


income 1.0000 * * * * *
age 0.2449 1.0000 * * * *
years_with_bank 0.0406 0.1815 1.0000 * * *
nbr_children 0.2833 -0.0572 -0.0348 1.0000 * *
marital_status 0.2474 0.3406 -0.0107 0.4765 1.0000 *
female -0.1470 -0.0085 -0.0044 -0.0093 0.0007 1.0000

© 1999-2007 NCR Corporation, All Rights Reserved 230


Chapter Four
Scoring

4. Scoring

PMML Scoring

Predictive Model Markup Language (PMML) is an XML standard being developed by the Data
Mining Group, a vendor-led consortium established in 1998 to develop data-mining standards.
NCR co-developed the initial PMML specification along with Angoss, Magnify, SPSS and The
National Center for Data Mining at the University of Illinois at Chicago.

PMML enables the definition and subsequent sharing of predictive models between applications.
It represents and describes data mining and statistical models, as well as some of the operations
required for cleaning and transforming data prior to modeling. PMML aims to provide enough
infrastructure for an application to be able to produce a model (the PMML producer) and another
application to consume it (the PMML consumer) simply by reading the PMML data file. This
means that a model developed in a desktop data-mining tool can be deployed or scored against an
entire data warehouse.

PMML-compliant XML documents consist of the following major constructs:

Feature Function

Data Dictionary Defines the data to the model and specifies each data
attribute’s type and value range.

Mining Schema Defines attribute information specific to a certain model. It


specifies an attribute's usage type, whether it be active or
independent (an input of the model), predicted or dependent
(an output of the model), or supplementary (descriptive
information that is ignored by the model).

Transformation Dictionary Contains simple algorithm-specific data transformations


such as normalization (map values to numbers),
discretization (map continuous values to discrete values),
value mapping (map discrete values to discrete values) and
aggregation (simple averages and counts).

Models Identifies model parameters for regression models, cluster


models, decision tree models, neural networks, Bayesian
models, association rules and sequence models.

Each PMML construct supports a mechanism for extending the content of a model. Liberal use of
such “extensions” requires that vendors who produce PMML-based models collaborate closely
with vendors who wish to consume that PMML. Please refer to the Teradata Warehouse Miner
Release Definition document for details about the products and product versions supported for
PMML consumption in Teradata ADS Generator and Teradata Warehouse Miner.

Although PMML is a great step forward, it has several flaws other than extensions, namely
encapsulation of the process of cleaning, transforming and aggregating data. Teradata recognized
this limitation early on—if the PMML document could not represent the analytic variables that

231 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Four
Scoring

were input to the analytic tools, it would be nearly impossible to consume PMML for scoring
predictive models. This is because the deployment (scoring phase) of a predictive model requires
the existence of the same variables upon which the model was built. For this reason, the PMML
Scoring analysis is included in both the Teradata ADS Generator as well as Teradata Warehouse
Miner.

Initiate PMML Scoring


Use the following procedure to initiate a PMML Scoring analysis:

1. Click on the Add New Analysis icon in the toolbar:

2. In the resulting Add New Analysis dialog box, click on Scoring under Categories and then
under Analyses double-click on PMML Scoring:

3. This will bring up the PMML Scoring dialog in which you will enter INPUT and OUTPUT
options to parameterize the analysis as described in the next sections.

© 1999-2007 NCR Corporation, All Rights Reserved 232


Chapter Four
Scoring

PMML Scoring - INPUT - Data Selection


On the PMML Scoring dialog click on INPUT and then click on data selection:

On this screen select:

Select Filename
The fully qualified name of the XML file containing the PMML model to be scored. A
filename can either be entered here or loaded using the Browse button.

Note that when a saved analysis with a valid model is first loaded into the project space
its model is embedded in the analysis and the displayed filename reflects the file the
model was originally built from, even if it resided on another client machine. Hovering
the mouse over the filename will display the original filename, computer name and
modified date.
Modify
Select this button to remove the embedded model from the analysis and return to
the standard browse filename selection input method. Once selected however, the
model is taken from a file rather than the previous embedded model. (NOTE: If
the analysis isn't saved the next load of the analysis will still contain the previous
embedded model.)
Browse
Bring up the Standard Windows location dialogue in order to navigate to the file
containing the PMML model.
view >>
Once the XML file containing the PMML model is selected (or there is an
embedded model), the view >> hyperlink is enabled. The model can be viewed
by clicking this link.
Available Databases
All available source databases that have been added through Connection Properties.
Available Tables
The tables available for scoring are listed in this window, though all may not strictly
qualify: the input table or tables to be scored must contain the same column names used
in the original analysis.
Available Columns
The columns available for scoring are listed in this window.
Selected Columns:
Index Columns
Note that the Selected Columns window is actually a split window for specifying
Index and/or Retain columns if desired. If a table is specified as input, the primary
index of the table is defaulted here, but can be changed. If a view is specified as
input, an index must be provided.
Retain Columns

233 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Four
Scoring

Other columns within the table being scored can be appended to the scored table, by
specifying them here. Columns specified in Index Columns may not be specified
here.

PMML Scoring - OUTPUT


On the PMML Scoring dialog click on OUTPUT:

On this screen select:

Output Table:
Database name
The name of the database.
Table name
The name of the scored output table to be created.
Create Output table using the FALLBACK keyword
If a table is selected, it will be built with FALLBACK if this option is selected
Generate the SQL for this analysis, but do not execute it
If this option is checked the SQL to score this PMML model will be generated.
Maximum SQL statement size allowed (default 64000):
The SQL statements generated will not exceed this max value in characters.
Generate as a stored procedure with name:
If this option is checked the SQL produced will be generated in the form of a
stored procedure having the name that was given.

Model Output Options:


This control allows you to add probabilities to the score table generated, in addition to the
dependent variable prediction itself.
Name
Name of the column containing the probability. Click the check-box to have it included
in the score table generated.
Display Name
For display purposes only, the “DisplayName” of the column containing the probability
from the PMML file.
Target Field
For display purposes only, the “TargetField” or name of the dependent variable from the
PMML file.
Feature
For display purposes only, a description of what the column will contain – currently only
Probabilities.
Value
For display purposes only, the actual value of the dependent variable from the PMML
file.

© 1999-2007 NCR Corporation, All Rights Reserved 234


Chapter Four
Scoring

Run the PMML Scoring Analysis


After setting parameters on the INPUT screens as described above, you are ready to run the
analysis. To run the analysis you can either:
• Click the Run icon on the toolbar, or
• Select Run <project name> on the Project menu, or
• Press the F5 key on your keyboard

Results - PMML Scoring


The results of running the Teradata Warehouse Miner PMML Scoring Analysis include the
following outlined below.

PMML Scoring - RESULTS - Reports


On the PMML Scoring dialog click RESULTS and then click on reports (note that the
RESULTS tab will be grayed-out/disabled until after the analysis is completed):

PMML Score Report


Resulting Scored Table Name
This is the name given the table with the scored values of the model.
Number of Rows in Scored Table
This is the number of rows in the scored table.

PMML Scoring - RESULTS - Data


On the PMML Scoring dialog click RESULTS and then click on data (note that the RESULTS
tab will be grayed-out/disabled until after the analysis is completed):

A sample of rows from the scored table is displayed here – the size determined by the setting
specified by Maximum result rows to display in Tools-Preferences-Limits. By default, the
index of the table being scored as well as the dependent column prediction are in the scored table
– additional columns as specified in the OUTPUT panel may be displayed as well.

PMML Scoring - RESULTS - SQL


On the PMML Scoring dialog click RESULTS and then click on SQL (note that the RESULTS
tab will be grayed-out/disabled until after the analysis is completed):

235 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Four
Scoring

The scoring SQL is always shown here.

Output Columns - PMML Scoring


The following table is built in the Result Database by the PMML Scoring analysis. Note that the
options selected affect the structure of the table. Those columns in bold below will comprise the
Primary Index. Also note that there may be repeated groups of columns, and that some columns
will be generated only if specific options are selected.

Name Type Definition


Key User One or more key columns, which default to the index, defined in the table to
Defined be scored (i.e. in Selected Table). The data type defaults to the same as the
scored table, but can be changed via Primary Index Columns.
<app_var> User One or more columns as selected under Retain Columns.
Defined
< dep_var > User The predicted value of the dependent variable. The name used defaults to the
(Default) Defined Dependent Variable specified when the model was built. The data type used
is the same as the Dependent Variable.
P_<dep_var><value> FLOAT If any additional probability output is requested on the OUTPUT panel, it
will be displayed using the name provided in the PMML model.

PMML Scoring Tutorials


PMML files generated by SAS Enterprise Miner have been copied into the Teradata Warehouse
Miner installation folder (C:\Program Files\NCR\Teradata Warehouse Miner <release> by
default), within the “Scripts/PMML UDF Install and Scripts” folder. Some of these files provide
the input to PMML Scoring in the following tutorials:

1. RegressionContinuousPMML.xml
A Linear Regression model which predicts a continuous outcome.
2. DecisionTreeDiscretePMML.xml
A Decision Tree model which predicts a discrete outcome.
3. RegressionDiscretePMML.xml
A Logistic Regression model which predicts a discrete outcome.
4. NeuralMLPDiscretePMML.xml
A MLP Neural Network model which predicts a discrete outcome.
5. ClusterPMML.xml
A Cluster model which predicts which of 20 clusters a customer should be assigned to.

Tutorial #1 - PMML Scoring

Parameterize a PMML Scoring Analysis to score a Linear Regression model which predicts a
continuous outcome as follows:

Select File Name RegressionContinuousPMML.xml

© 1999-2007 NCR Corporation, All Rights Reserved 236


Chapter Four
Scoring

Selected Tables twm_customer_analysis


Index Columns cust_id
Result Table Name twm_pmml_score_reg_1

Run the analysis, and click on Results when it completes. For this example, the PMML Scoring
Analysis generated the following results. A single click on each page name populates Results
with the item. SQL is always produced and displayed, but is not shown here for brevity.

Report
Resulting Scored Table Name score_reg_1
Number of Rows in Scored File 747

Data
cust_id cc_rev
1362527 -3.09123331353583
1363078 17.1112717566361
1362588 8.28237095448635
1363486 27.7070270772696
1362752 53.7221660256401
1362893 -3.32443782325574
1363017 14.7337070009494
1363444 15.8410540579199
1362548 35.790895539682
1362487 11.3670140503415
… …
… …
… …

Tutorial #2 - PMML Scoring

Parameterize a PMML Scoring Analysis to score a Decision Tree model which predicts a discrete
outcome as follows:

Select File Name DecisionTreeDiscretePMML.xml


Selected Tables twm_customer_analysis
Index Columns cust_id
Result Table Name twm_pmml_score_tree_1
Model Output Options:
P_ccacct1 Enabled
P_ccacct0 Enabled

Run the analysis, and click on Results when it completes. For this example, the PMML Scoring
Analysis generated the following results. A single click on each page name populates Results
with the item. SQL is always produced and displayed, but is not shown here for brevity.

Report
Resulting Scored Table Name score_tree_1
Number of Rows in Scored File 747

Data

237 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Four
Scoring

cust_id ccacct P_ccacct1 P_ccacct0


1362527 0 0 1
1363078 1 0.948387096774194 0.0516129032258065
1362588 1 0.893333333333333 0.106666666666667
1363486 0 0 1
1362752 1 0.948387096774194 0.0516129032258065
1362893 1 0.948387096774194 0.0516129032258065
1363017 1 0.948387096774194 0.0516129032258065
1363444 0 0 1
1362548 1 0.948387096774194 0.0516129032258065
1362487 1 0.948387096774194 0.0516129032258065
… … … …
… … … …
… … … …

Tutorial #3 - PMML Scoring

Parameterize a PMML Scoring Analysis to score a Logistic Regression model which predicts a
discrete outcome as follows:

Select File Name RegressionDiscretePMML.xml


Selected Tables twm_customer_analysis
Index Columns cust_id
Result Table Name twm_pmml_score_reg_2
Model Output Options:
P_ccacct1 Enabled
P_ccacct0 Enabled

Run the analysis, and click on Results when it completes. For this example, the PMML Scoring
Analysis generated the following results. A single click on each page name populates Results
with the item. SQL is always produced and displayed, but is not shown here for brevity.

Report
Resulting Scored Table Name score_reg_2
Number of Rows in Scored File 747

Data
cust_id ccacct P_ccacct1 P_ccacct0
1362527 0 0.125740481096571 0.874259518903429
1363078 1 0.861086203667224 0.138913796332776
1362588 1 0.723429148501114 0.276570851498886
1363486 0 0.125034199014627 0.874965800985373
1362752 0 0.419312298702164 0.580687701297836
1362893 1 0.970060355675886 0.0299396443241139
1363017 1 0.999980678896465 1.93211035354291E-05
1363444 0 0.173538764837706 0.826461235162294
1362548 0 0.265964538752992 0.734035461247008
1362487 1 0.872345777062174 0.127654222937826
… … … …
… … … …
… … … …

© 1999-2007 NCR Corporation, All Rights Reserved 238


Chapter Four
Scoring

Tutorial #4 - PMML Scoring

Parameterize a PMML Scoring Analysis to score a MLP Neural Network model which predicts a
discrete outcome as follows:

Select File Name NeuralMLPDiscretePMML.xml


Selected Tables twm_customer_analysis
Index Columns cust_id
Result Table Name twm_pmml_score_nn_2

Run the analysis, and click on Results when it completes. For this example, the PMML Scoring
Analysis generated the following results. A single click on each page name populates Results
with the item. SQL is always produced and displayed, but is not shown here for brevity.

Report
Resulting Scored Table Name score_nn_2
Number of Rows in Scored File 747

Data
cust_id ccacct
1362527 0
1363078 1
1362588 1
1363486 0
1362752 1
1362893 1
1363017 1
1363444 0
1362548 1
1362487 1
… …
… …
… …

Tutorial #5 - PMML Scoring

Parameterize a PMML Scoring Analysis to score a Cluster model which predicts which of 20
clusters this customer should be assigned to as follows:

Select File Name ClusterPMML.xml


Selected Tables twm_customer_analysis
Index Columns cust_id
Result Table Name twm_pmml_score_cluster_1

Run the analysis, and click on Results when it completes. For this example, the PMML Scoring
Analysis generated the following results. A single click on each page name populates Results
with the item.

Report
Resulting Scored Table Name score_cluster_1

239 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Four
Scoring

Number of Rows in Scored File 747

Data
cust_id Cluster
1362527 10
1363078 10
1362588 7
1363486 1
1362752 19
1362893 7
1363017 7
1363444 7
1362548 1
1362487 7
… …
… …
… …

© 1999-2007 NCR Corporation, All Rights Reserved 240


Chapter Five
Publishing

5. Publishing

Publishing Overview
The Publish Analysis is provided as a means to save an analytic model by storing the SQL
generated by an associated Score Analysis and/or ADS analysis into Publish Tables (metadata
tables used by the Model Manager application). When a Score Analysis is selected as input into a
Publish Analysis, the SQL that was generated by that Score Analysis is stored in such a way that
Model Manager can replace key components of that SQL and re-execute it, making it possible to
effectively re-use a published model (the SQL template) on different sets of data.

Analysis References
The Publish Analysis makes use of the Analysis References feature in the following way.
Because one of the parameters of input is another analysis, it is in effect referencing that analysis.
When that analysis is selected as input, the Publish Analysis then manages the execution of any
analyses that are references of the input analysis. For example, it is a distinct possibility that the
input into the final Score Analysis will be a series of Reorganization or ADS Analyses linked
together via Analysis References. A possible scenario would be a Variable Creation Analysis that
is referenced by (input into) a Join, and then a Sample. The resulting analytic data set (ADS)
might then be used as the input to a Score Analysis. In this scenario, because each analysis is
dependent upon the previous one, the SQL from each analysis will be published (stored in the
Publish Tables) in the proper order of execution so that it will work when re-executed via Model
Manager. This ensures that all of the SQL necessary to generate the ADS and resulting analytic
model will be captured.

Minimum SQL Storage


An additional feature of the Publish Analysis is that only the variables necessary for the analytic
model and the Score Analysis are used. Because the focus of publishing is to store a model (Score
SQL) for future use, it is wasteful to store SQL that generates variables that are not used. For
instance, if a Variable Creation was executed that created 100 variables, but the model was
created and then scored using only 5 of those variables, Publish will only store the SQL that is
needed to generate those 5 variables in the Variable Creation analysis. In general, for a given
ADS analysis, the only variables that are generated are the ones that are necessary for the
subsequent analysis reference.

No SQL Execution of Published Analyses


Publish is designed as the last step in the creation and score of an ADS and analytic model.
Therefore it is assumed that the analyses have been executed and deemed suitable for publishing.
For this reason, Publish doesn’t execute the SQL that is stored as it would be redundant.

Analyses Available To Publish


Build ADS, Variable Creation, Variable Transformation, Join, Sample, Denorm, Partition, Tree
Scoring, Cluster Scoring, Logistic Regression Scoring, Linear Regression Scoring, Factor
Analysis Scoring, Neural Networks and PMML Scoring are all available to be published.

Limitations with Respect to Analysis References


Any number of ADS Analyses can be in the chain of referenced analyses to be published, but
there can only be one Score Analysis, and it must be the last one.

241 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Five
Publishing

For an analysis to be available for publishing, it must store its tabular output in the database as a
table or view.

The anchor table of the last Variable Creation analysis within the chain of referenced analyses to
be published will be stored as the published anchor table. If that anchor table is the output table of
another Variable Creation analysis within the chain of referenced analyses to be published, the
publish will fail with the following error message:

The current anchor table 'AnchorDatabase.AnchorTable' of the last Variable Creation


analysis is the output table of a referenced Variable Creation analysis. Please use a
different anchor table.

The anchor table of the last Variable Creation analysis must be changed to the output table of a
different analysis (not a Variable Creation), or to a permanent table or view in Teradata for the
publish to be successful.

Initiate a Publish Analysis


Use the following procedure to initiate a new Publish Analysis in Teradata Warehouse Miner:
Please note: to execute a Publish Analysis successfully the Publish Tables must be installed in the
Publish Database.

1. Click on the Add New Analysis icon in the toolbar:

2. In the resulting Add New Analysis dialog box, with Publish highlighted on the left, double-
click on the Publish icon:

© 1999-2007 NCR Corporation, All Rights Reserved 242


Chapter Five
Publishing

3. This will bring up the Publish dialog in which you will enter INPUT options to parameterize
the analysis as described in the next section.

243 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Five
Publishing

Publish - INPUT - Data Selection


On the Publish dialog click on INPUT and then click on data selection:

On this screen select:

Available Analyses to Publish


Select a single analysis from the list of all of the analyses in the current project which are
available for the Publish Analysis.
Name of Model To Publish
Enter the name of the model that is being published.
Published By
Enter the name of the person who is publishing the model.
Version
Enter the version of the model being published.
Expiration Date
Enter the date that the model will expire.
Description
Enter a description of the model being published.

Preview the Publish Analysis


After setting parameters on the INPUT screen as described above, you can preview the SQL and
parameter values that will be stored in the Publish Tables.

By clicking on the button in the bottom center of the input screen, a pop up
window will appear that contains the following information that will be stored in the Publish
Tables:

Publish Date
The date that Publishing occurs, automatically set to the current date.
Expiration Date
The date that the model expires, set on the input screen by the user.
ADS Output Database
The database that was used to store the results of the ADS Analysis (if applicable)
ADS Output Table
The table that contains the results of the ADS Analysis (if applicable)
Score Output Database
The database that was used to store the results of the Score Analysis (if applicable)
Score Output Table

© 1999-2007 NCR Corporation, All Rights Reserved 244


Chapter Five
Publishing

The table that contains the results of the Score Analysis if (applicable)

Model Variables
A list of the variables that were used in the model along with their descriptions.

Score Columns
A list of the columns that are generated in the output of the score (if applicable), along with their
descriptions.

ADS SQL to be Published


The SQL that was generated by the ADS Analysis (if applicable).

Score SQL to be Published


The SQL that was generated by the Score Analysis (if applicable).

Run the Publish Analysis


After setting parameters on the INPUT screen as described above or previewing the values to be
published, you are ready to run the analysis. To run the analysis you can either:

• Click the Run icon on the toolbar, or


• Select Run <project name> on the Project menu, or
• Press the F5 key on your keyboard

• Click the button from within the Preview pop up window.

By running the analysis, the information needed by Model Manager to re-use the model will be
stored in the Publish Tables within the Publish Database.

Results - Publish
On the Publish dialog click on RESULTS (note that the RESULTS tab will be grayed-
out/disabled until after the analysis is completed):

Select either the report or SQL tab to view the report or the SQL generated by the execution of
the Publish analysis.

Tutorial – Publish

Publish – Example

245 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Five
Publishing

The following example contains a Variable Creation analysis that is referenced by a PMML Score
and is then published.

Parameterize a Variable Creation Analysis named Variable Creation1 as follows:

1. Select TWM_CUSTOMER_ANALYSIS as the Available Table.

2. Select all the columns in the input table into the Variables panel.

3. Go to OUTPUT-storage, and select Store the tabular output of this analysis in the
database. Specify that a Table should be created named twm_tutorials_vc1.

Run the analysis.

Parameterize a PMML Scoring Analysis named PMML Scoring1 to score a Decision Tree model
which predicts a discrete outcome as follows:

Input:
Select Input Source Analysis
Available Analyses Variable Creation1
Available Tables twm_publish_vc1
Select File Name DecisionTreeDiscretePMML.xml
(located in Scripts\PMML UDF Install under the
directory where the application is installed)
Index Columns cust_id

Output – Storage:
Result Table Name twm_publish_score1
Model Output Options:
P_ccacct1 Enabled
P_ccacct0 Enabled

Run the analysis.

Parameterize a Publish Analysis named Publish1 as follows:

Available Analyses to Publish PMML Scoring1


Name of Model to Publish PMML Scoring Demo
Published By Tutorial User
Version 1
Expiration Date 1/1/2010
Description This is a demo of the Publish Analysis.

© 1999-2007 NCR Corporation, All Rights Reserved 246


Chapter Five
Publishing

Click on the button in the bottom center of the input screen. This will open a pop
up window. By clicking on the button within the pop up window you will see the
following screens:

247 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Five
Publishing

© 1999-2007 NCR Corporation, All Rights Reserved 248


Chapter Five
Publishing

The final Publish Results screen shows the Score SQL to be Published (not shown here).

Click on the button to execute the Publish Analysis and store the information in the
Publish Tables.

Click on the Results to view what was published. Select the report tab to view the report portion
as shown below, and the SQL tab to view the SQL (not shown here).

249 © 1999-2007 NCR Corporation, All Rights Reserved


Chapter Five
Publishing

© 1999-2007 NCR Corporation, All Rights Reserved 250


References

References

1) Agrawal, R. Mannila, H. Srikant, R. Toivonen, H. and Verkamo, I., Fast Discovery of


Association Rules. In Advances in Knowledge Discovery and Data Mining, 1996, eds. U.M.
Fayyad, G. Paitetsky-Shapiro, P. Smyth and R. Uthurusamy. Menlo Park, AAAI Press/The
MIT Press.

2) Agresti, A. (1990) Categorical Data Analysis. Wiley, New York.

3) Arabie, P., Hubert, L., and DeSoete, G., Clustering and Classification, World Scientific, 1996

4) Belsley, D.A., Kuh, E., and Welsch, R.E. (1980) Regression Diagnostics: Identifying
Influential Data and Sources of Collinearity. Wiley, New York.

5) Bradley, P., Fayyad, U. and Reina, C., Scaling EM Clustering to Large Databases, Microsoft
Research Technical Report MSR-TR-98-35, 1998

6) Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. Classification and Regression
Trees. Wadsworth, Belmont, 1984.

7) Cox, D.R. and Hinkley, D.V. (1974) Theoretical Statistics. Chapman & Hall/CRC, New
York.

8) Finn, J.D. (1974) A General Model for Multivariate Analysis. Holt, Rinehart and Winston,
New York.

9) Harman, H.H. (1976) Modern Factor Analysis. University of Chicago Press, Chicago.

10) Hosmer, D.W. and Lemeshow, S. (1989) Applied Logistic Regression. Wiley, New York.

11) Johnson, R.A. and Wichern, D.W. (1998) Applied Multivariate Statistical Analysis, 4th
Edition. Prentice Hall, New Jersey.

12) Kachigan, S.K. (1991) Multivariate Statistical Analysis. Radius Press, New York.

13) Kass, G. V. (1979) An Exploratory Technique for Investigating Large Quantities of


Categorical Data, Applied Statistics (1980) 29, No. 2 pp. 119-127

14) Kaufman, L. and Rousseeuw, P., Finding Groups in Data, J Wiley & Sons, 1990

15) Kennedy, W.J. and Gentle, J.E. (1980) Statistical Computing. Marcel Dekker, New York.

16) Kleinbaum, D.G. and Kupper, L.L. (1978) Applied Regression Analysis and Other
Multivariable Methods. Duxbury Press, North Scituate, Massachusetts.

17) Maddala, G.S. (1983) Limited-Dependent and Qualitative Variables In Econometrics.


Cambridge University Press, Cambridge, United Kingdom.

18) Maindonald, J.H. (1984) Statistical Computation. Wiley, New York.

251 © 1999-2007 NCR Corporation, All Rights Reserved


References

19) McCullagh, P.M. and Nelder, J.A. (1989) Generalized Linear Models, 2nd Edition. Chapman
& Hall/CRC, New York.

20) McLachlan, G.J. and Krishnan, T., The EM Algorithm and Extensions, J Wiley & Sons,
1997

21) Menard, S (1995) Applied Logistic Regression Analysis, Sage, Thousand Oaks

22) Mulaik, S.A. (1972) The Foundations of Factor Analysis. McGraw-Hill, New York.

23) Neter, J., Kutner, M.H., Nachtsheim, C.J., and Wasserman, W. (1996) Applied Linear
Statistical Models, 4th Edition. WCB/McGraw-Hill, New York.

24) Nocedal, J. and Wright, S.J. (1999) Numerical Optimization. Springer-Verlag, New York.

25) Orchestrate/OSH Component User’s Guide Vol II, Analytics Library, Chapter 2:
Introduction to Data Mining. Torrent Systems, Inc., 1997.

26) Ordonez, C. and Cereghini, P. (2000) SQLEM: Fast Clustering in SQL using the EM
Algorithm. SIGMOD Conference 2000: 559-570

27) Ordonez, C. (2004): Programming the K-means clustering algorithm in SQL. KDD 2004:
823-828

28) Ordonez, C. (2004): Horizontal aggregations for building tabular data sets. DMKD 2004: 35-
42

29) Peduzzi, P.N., Hardy, R.J., and Holford, T.R. (1980) A Stepwise Variable Selection
Procedure for Nonlinear Regression Models. Biometrics 36, 511-516.

30) Pregibon, D. (1981) Logistic Regression Diagnostics. Annals of Statistics, Vol. 9, No. 4, 705-
724.

31) Quinlan, J.R. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, 1993.

32) Roweis, S. and Ghahramani, Z., A Unifying Review of Linear Gaussian Models, Journal of
Neural Computation, 1999

33) SPSS 7.5 Statistical Algorithms Manual, SPSS Inc., Chicago.

34) SYSTAT 9: Statistics I. (1999) SPSS Inc., Chicago.

35) Tatsuoka, M.M. (1971) Multivariate Analysis: Techniques For Educational and
Psychological Research. Wiley, New York.

36) Tatsuoka, M.M. (1974) Selected Topics in Advanced Statistics, Classification Procedures,
Institute for Personality and Ability Testing, 1974

37) Teradata Warehouse Miner User’s Guide Release 03.00.02, B035-2093-022A, January 2002

© 1999-2007 NCR Corporation, All Rights Reserved 252


References

38) Wilkinson, L., Blank, G., and Gruber, C. (1996) Desktop Data Analysis With SYSTAT.
Prentice Hall, New Jersey.

39) Pagano, Gauvreau Principles of Biostatistics 2nd Edition

40) Conover, W.J. Practical Nonparametric Statistics 3rd Edition

41) D'Agostino, R. B. and Stephens, M. A., eds. Goodness-of-fit Techniques, 1986,. New York:
Dekker.

42) D’Agostino, R, Belanger, A., and D’Agostino,R. Jr., A Suggestion for Using Powerful and
Informative Tests of Normality, American Statistician, 1990, Vol. 44, No. 4

43) Royston, JP., An Extension of Shapiro and Wilk’s W Test for Normality to Large Samples,
Applied Statistics, 1982, 31, No. 2, pp.115-124

44) Royston, JP, Algorithm AS 177: Expected normal order statistics (exact and approximate),
1982, Applied Statistics, 31, 161-165.

45) Royston, JP., Algorithm AS 181: The W Test for Normality. 1982, Applied Statistics, 31,
176–180.

46) Royston , JP., A Remark on Algorithm AS 181: The W Test for Normality., 1995, Applied
Statistics, 44, 547–551.

47) H. L. Harter and D. B. Owen, eds, Selected Tables in Mathematical Statistics, Vol. 1..
Providence, Rhode Island: American Mathematical Society.

48) Shapiro, SS and Francia, RS (1972). An approximate analysis of variance test for normality,
Journal of the American Statistical Association, 67, 215-216

49) D'Agostino, RB. (1971) An omnibus test of normality for moderate and large size samples,
Biometrica, 58, 341-348

50) NIST/SEMATECH e-Handbook of Statistical Methods,


http://www.itl.nist.gov/div898/handbook/index.htm, 2005.

51) PST, Portland State University, http://www.upa.pdx.edu, 2005.

52) Wendorf, Craig A., MANUALS FOR UNIVARIATE AND MULTIVARIATE STATISTICS
© 1997, Revised 2004-03-12, UWSP, http://www.uwsp.edu/psych/cw/statmanual, 2005

53) UZ, University of Zurich, http://www.id.unizh.ch, 2005

54) NUMS, Northwestern University Medical School,


http://www.basic.northwestern.edu/statguidefiles/sghome.html , 2005 (inactive)

55) Takahashi, T. (2005) Getting Started: International Character Sets and the Teradata
Database, NCR Corporation, 541-0004068-C02

253 © 1999-2007 NCR Corporation, All Rights Reserved


References

© 1999-2007 NCR Corporation, All Rights Reserved 254

Potrebbero piacerti anche