Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ADS Generation
Release 05.01.00
B035-2301-077A
THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN “AS-IS” BASIS, WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR
NON- INFRINGEMENT. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OF IMPLIED WARRANTIES, SO THE ABOVE EXCLUSION
MAY NOT APPLY TO YOU. IN NO EVENT WILL NCR CORPORATION (NCR) BE LIABLE FOR ANY INDIRECT, DIRECT, SPECIAL,
INCIDENTAL OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS OR LOST SAVINGS, EVEN IF EXPRESSLY ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
The information contained in this document may contain references or cross references to features, functions, products, or
services that are not announced or available in your country. Such references do not imply that NCR intends to announce
such features, functions, products, or services in your country. Please consult your local NCR representative for those
features, functions, products, or services available in your country.
Information contained in this document may contain technical inaccuracies or typographical errors. Information may be
changed or updated without notice. NCR may also make improvements or changes in the products or services described
in this information at any time without notice.
To maintain the quality of our products and services, we would like your comments on the accuracy, clarity, organization,
and value of this document. Please e-mail: teradata-books@lists.ncr.com
Any comments or materials (collectively referred to as “Feedback”) sent to NCR will be deemed non-confidential. NCR will
have no obligation of any kind with respect to Feedback and will be free to use, reproduce, disclose, exhibit, display,
transform, create derivative works of and distribute the Feedback and derivative works thereof without limitation on a
royalty-free basis. Further, NCR will be free to use any ideas, concepts, know-how or techniques contained in such
Feedback for any purpose whatsoever, including developing, manufacturing, or marketing products or services
incorporating Feedback.
This publication describes how to use the features and functions of NCR’s Teradata Warehouse
Miner, Release 5. All information required to use the analytic functions in the Teradata
Warehouse Miner product is provided in this manual. Teradata Warehouse Miner is a set of
Microsoft® .NET™ interfaces and a multi-tier User Interface that together help understand the
quality of data residing in a Teradata® database, create analytic data sets, and build and score
analytic models directly in the Teradata database.
Chapter 1 “Data Reorganization” describees the Denorm, Join, Merge, Partition, and
Sample analyses.
Chapter 3 “Matrix Functions” describes how to use the Teradata Warehouse Miner
Matrix Functions to build and export a Correlation, Covariance, or Sums of
Squares and Cross Products matrix.
Chapter 4 “Scoring” describes how to use the Teradata Warehouse Miner Predictive
Model Markup Language (PMML) Scoring analysis.
Chapter 5 “Publishing” describes how to use the Teradata Warehouse Miner Publish
analysis to make the SQL representing models and analytic data sets
available to the Model Manager application.
Convention Description
Italic text Titles (esp. screen names/titles)
New terms for emphasis
Support Information
0
Americas RCCA
MSC – Atlanta
Table of Contents
1. Data Reorganization
The data Reorganization functions provide the ability to join, merge and denormalize
preprocessed results into a wide analytic data set, as well as select a subset of the rows in a
table. The result of these functions is a new restructured table that has been built from one or
more existing tables, and/or a subset of the rows in a table.
The Sampling and Partitioning functions build a new table containing randomly selected rows
in an existing table or view. Sampling is useful when it becomes unwieldy to perform an
analytic process because of the high volume of data available. This is especially true for
compute intensive analytic modeling tasks. Partitioning is similar to sampling but allows
mutually distinct but all inclusive subsets of data to be requested by separate processes.
In the case of the Data Reorganization functions, NULL values are passed back as NULL. A
special case is the Denorm Analysis which allows you to convert NULL values to zero.
Note that Identity columns, i.e. columns defined with the attribute "GENERATE … AS
IDENTITY", cannot be analyzed by Data Reorganization functions.
Denorm
Create new table denormalizing by removing key column(s).
Join
Join tables or views by columns into a combined result table.
Merge
Merge tables or views by rows into a combined result table.
Partition
Select partition(s) from a table using a hash key.
Sample
Select sample(s) from a table by size or fraction.
In order to add a Data Reorganization Analysis to a Teradata Warehouse Miner Data Mining
Project, create a new analysis as described in Chapter 3. Select Reorganization from the
menu:
Double-click or highlight the desired analysis and click the OK button. Optionally select an
existing analysis for incorporation into the current data mining project. Each of these specific
analyses are described in detail in the subsequent sections.
Denorm
Denorm Analysis is provided to denormalize or “flatten out” (sometimes referred to as
“pivoting”) a table so it can be used as an analytic data set. This is done by removing part of a
multi-part index and replicating remaining columns based upon the unique values of the
removed index column.
Many analytical techniques from the statistical and artificial intelligence communities require
a denormalized table, or data set, as input. The Denorm function is provided to help analytical
modelers and database administrators save considerable time and effort when a denormalized
table needs to be constructed from data which exists in relational form in the data warehouse.
The aggregations typically used in the construction of a denormalized table, (AVG, SUM,
MIN, MAX, and COUNT), are provided in the Denorm function as user selectable options.
Analytical modelers typically refer to the rows of a denormalized table as “observations”, and
typically refer to the columns as “variables”.
Given a table name, the names of index columns to remove, the names of index columns to
retain, the names of remaining columns to denormalize, the values of the removed index
columns to denormalize, and finally the names of any already denormalized columns to
retain, the Denorm Analysis creates a new denormalized table. All columns other than the
retain key and denormalize columns are dropped in the new table, unless they are specified as
columns to retain. However, in this case they should already be denormalized, that is have the
same value for each of the removed key columns.
New columns names are concatenated from the prefix associated by the user with the Values
to Denormalize (which occur in the Index Remove Columns), and the alias or name of the
Denormalize Column.
An option is provided which allows you to specify an aggregation method in the case where
new columns have multiple values to choose from. A user specified aggregation method,
specifically MIN, MAX, AVG, SUM or COUNT, should only be used when there are non-
unique index values or when a part of the index is being ignored, that is, when part of the
index is neither being retained nor removed (denormalized by).
Finally an option to specify zero instead of NULL, the default, for the value of those
denormalized columns for which the index is not defined, is also provided.
Literal values entered for columns of type DATE must be entered in the format defined or
defaulted for the column in question. For example, if the date format of a key value being
removed is ’YYYYMMDD’, then a parameter for this key value might be entered as
“19990703.”
The Denorm Analysis is parameterized by specifying the table and column(s) to analyze,
options unique to the Denorm Analysis, as well as specifying the desired results and SQL or
Expert Options.
2. In the resulting Add New Analysis dialog box, with Reorganization highlighted on the
left, double-click on the Denorm icon:
3. This will bring up the Denorm dialog in which you will enter INPUT and OUTPUT
options to parameterize the analysis as described in the next sections.
Available Databases
All the databases which are available for the Denorm Analysis.
Available Tables
All the tables within the Source Database that are available for the Denorm Analysis.
Available Columns
All the columns within the selected table that are available for the Denorm Analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the
Selected Columns window, or click on the arrow button to move highlighted
columns into the Selected Columns window. Make sure the correct category is
highlighted:
Index Retain Columns
List of index columns to retain (not remove) in resultant denormalized table
(click to expand/highlight).
Index Remove Columns
List of index columns to remove (denormalize by values of these columns).
(Click to expand/highlight.)
Denormalize Columns
List of columns/aliases to denormalize (i.e. replicate for given values of removed
index columns). (Click to expand/highlight.)
Retain Columns
List of columns to retain which are already denormalized (i.e. have a constant
value over the selected values of removed key columns). (Click to
expand/highlight.)
Values to Denormalize
A list of values and prefixes which are valid values in the column specified in Index
Remove Columns. Use the Add and Remove buttons to set values for:
Prefix
An optional string (must be a valid Teradata word) that will define the unique
Value specified.
Value in <remove column>
A list of distinct values which the column specified in Index Remove Columns
takes on.
Add button
Both Prefix and Value in <remove column> can be specified manually by clicking
on the Add button and typing the required values.
Remove button
Remove the currently highlighted Prefix and Value in <remove column>.
Values…
Selecting the Values button brings up the following Denorm Values Wizard:
Once the Values button is selected, a status message indicating which column values
are being fetched appears. Once this is complete, the distinct values of the column
being removed are listed in the left-most column. These values can be dragged and
dropped into the right-most column, or selected via the Add button. Similarly, they
can be removed via the Remove button. Once the values to be denormalized are
moved to Selected values, click the Finish button to return to the Teradata Warehouse
Miner user interface or to continue to select the values of the next Denormalize
Column, if there is more than one.
When the Values load process is finished, a default value is generated for each
Column Prefix by concatenating the values of the Index Remove Columns, each
value followed by an underscore character. If the combination of the prefix and the
longest Denormalize Column name or alias will be greater than 30 characters in
length, the prefix is left blank, to be filled in by the user. Note that if the name of a
Denormalize Column or the values of the Index Remove Columns are long, it may
be necessary to specify a comparatively short alias for the Denormalize Column so
that automatic prefixes can be generated. Otherwise, it may be necessary to specify a
short prefix manually.
Aggregation Method
This parameter allows you to specify an aggregation method in the case where new
columns have multiple values to choose from. Valid user specified aggregation
methods, include MIN, MAX, AVG, SUM and COUNT. These should only be used
when there are non-unique indices or when a part of the index is being ignored, that
is, when part of the key is neither being retained nor removed (i.e. denormalized by).
This screen provides the option to generate a SQL WHERE clause(s) to restrict rows selected
for analysis (for example: cust_id > 0).
Denorm - OUTPUT
Before running the analysis, specify Output options. On the Denorm dialog click on
OUTPUT:
Use the Teradata EXPLAIN feature to display the execution path for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.
Results - Denorm
The results of running the Denorm Analysis include the generated SQL itself, the results of
executing the generated SQL, and, if the Create Table (or View) option is chosen, a Teradata
table (or view). All of these results are outlined below.
Results of the completed query are returned in a Data Viewer page within the Results
Browser. This page has the properties of the Data View page discussed in the Chapter on
Using the Teradata Warehouse Miner Graphical User Interface. With the exception of the
Explain Select Result Option, these results will match the tables described below in the
Output Column Definition section, depending upon the parameters chosen for the analysis.
The generated SQL is returned here as text which can be copied, pasted, or printed.
Tutorial - Denorm
Parameterize a Denorm Analysis as follows:
Values to Denormalize
Value SV
CK
CC
Prefix SV_ (Value - SV)
CK_ (Value - CK)
CC_ (Value - CC)
Aggregation Method MIN
Treat undefined key values as Zero
For this example, the Denorm Analysis generated the following results. Note that the SQL is
not shown for brevity:
Data
Join
The Join analysis is useful in joining together tables and/or views into an intermediate or
final analytic data set. The Join Analysis provides a graphical user interface to several of the
most common, though certainly not all, join mechanisms in Teradata. Consequently, it should
not be thought of or used as a complete replacement for SQL approaches to executing any
generic Teradata join.
By default, an INNER join is performed on the given tables based on the given join columns.
This means that rows will be returned only for primary index column values that appear in all
selected tables. By option, a LEFT outer join can be requested, which returns rows for all
primary index column values found in the first table specified, and fills in any missing values
from the other tables with NULL values. Alternatively, a RIGHT outer join can be requested
to return all rows found in the last requested table, filling in any missing values from the first
table with NULL values (or from the incremental right outer joins preceding it if more than
two tables were selected). Finally, an option to perform a FULL outer join can be requested
which retains all primary index values from all selected tables with missing values set to
NULL.
The Join Analysis is parameterized by specifying the table and column(s) to analyze, options
unique to the Join Analysis, as well as specifying the desired results and SQL or Expert
Options.
2. In the resulting Add New Analysis dialog box, with Reorganization highlighted on the
left, double-click on the Join icon:
3. This will bring up the Join dialog in which you will enter INPUT and OUTPUT options to
parameterize the analysis as described in the next sections.
Available Databases
All the databases which are available for the Join Analysis.
Available Tables
All the tables within the Source Database that are available for the Join Analysis.
Available Columns
All the columns within the selected table that are available for the Join Analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the
Selected Columns window, or click on the arrow button to move highlighted
columns into the Selected Columns window.
This screen is used to specify the columns on which to join together the tables or views
selected in this analysis. For tables, the primary index columns are displayed as default
values which may be changed. Join columns are matched for each table or view, one-for-one
in the order specified. Each table or view must therefore have the same number of join
columns specified. The screen contains these fields:
Available Tables
All tables specified under data selection Selected Columns.
Available Columns
All columns specified under data selection Selected Columns.
Selected Join Columns
Select columns by highlighting and then either dragging and dropping into the
Selected Join Columns window, or click on the arrow button to move highlighted
columns into the Selected Join Columns window.
Anchor Table
For all the join types, this table will be the first column to which all other joins are
performed against.
Join Style
Select the type of join to perform, either Inner, or Left, Right or Full outer join.
This screen provides the option to generate a SQL WHERE clause(s) to restrict rows selected
for analysis (for example: cust_id > 0).
It may be useful to note that if a WHERE clause condition is specified on the "inner" table of
a join (i.e. a table that contributes only matched rows to the results), the join is logically
equivalent to an Inner Join, regardless of whether an Outer type is specified. (In a Left Outer
Join, the left table is the "outer" table and the right table is the "inner" table.)
Use the Teradata EXPLAIN feature to display the execution path for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.
If this option is selected the analysis will only generate SQL, returning it and
terminating immediately.
On this screen, select the columns which comprise the primary index of the output table.
Select:
Available Columns
A list of columns which comprise the index of the resultant table if an Output Table
is used.
Primary Index Columns
Select columns by highlighting and then either dragging and dropping into the
Primary Index Columns window, or click on the arrow button to move highlighted
columns into the Primary Index Columns window.
Results - Join
The results of running the Teradata Warehouse Miner Join Analysis include the generated
SQL itself, the results of executing the generated SQL, and, if the Create Table (or View)
option is chosen, a Teradata table (or view). All of these results are outlined below.
The results of the completed query are returned in a Data Viewer page within the Results
Browser. This page has the properties of the Data View page discussed in the Chapter on
Using the Teradata Warehouse Miner Graphical User Interface. With the exception of the
Explain Select Result Option, these results will match the tables described below in the
Output Column Definition section, depending upon the parameters chosen for the analysis.
The generated SQL is returned here as text which can be copied, pasted, or printed.
Join - Example #1
For this example, the Join Analysis generated the following results. Note that the SQL is not
Data
Merge
The Merge analysis merges together tables or views by performing an SQL UNION,
INTERSECT or MINUS operation. The merge operation brings together rows from two or
more tables, matching up the selected columns in the order they are selected. (This can be
contrasted with the Join function that brings together columns from multiple tables.) The
rows contained in the answer set are determined by the choice of the Merge Style,
determining whether the Union, Intersect or Minus operator is applied to each table after the
first table selected. An additional option is provided to determine if duplicate rows, if any,
should be included in the answer set. You may also specify one or more optional SQL Where
Clauses to apply to selected tables (each Where Clause is applied to just one table).
When the Union merge style is selected, the union of the rows containing selected columns
from the first table and each subsequent table is performed using the SQL UNION operator.
The final answer table contains all the qualifying rows from each table. With the Union
merge style, an option is provided to add an identifying column to the answer set and to name
the column if desired. This column assumes an integer value from 1 to n to indicate the input
table each row in the answer set comes from.
When the Intersect merge style is selected, the intersection of the rows containing selected
columns from the first table and each subsequent table is performed using the SQL
INTERSECT operator. The final answer table contains all the qualifying rows that exist in
each of the tables being merged. (That is, if a row is not contained in each of the requested
tables, it is not included in the answer set.)
When the Minus merge style is selected, the rows containing selected columns from the first
table are included in the answer table provided they do not appear in any of the other selected
tables. This is achieved using the SQL MINUS operator for each table after the first. (The
MINUS operator is a Teradata specific SQL operator equivalent to the standard EXCEPT
operator.)
2. In the resulting Add New Analysis dialog box, with Reorganization highlighted on the
left, double-click on the Merge icon:
3. This will bring up the Merge dialog in which you will enter INPUT and OUTPUT options
to parameterize the analysis as described in the next sections.
Available Databases
All the databases that are available for the Merge Analysis.
Available Tables
All the tables within the Source Database that are available for the Merge Analysis.
Available Columns
All the columns within the selected table that are available for the Merge Analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the
Selected Columns window, or click on the arrow button to move highlighted
columns into the Selected Columns window. Columns from the first selected table
may be renamed if desired by single-clicking on them.
Merge Style
Select the type of merge to perform, either Union, Intersect or Minus.
Retain Duplicate Rows
Select whether or not to include duplicate rows in the answer set.
Add Identifying Column (Union only)
Select whether or not to add an identifying column to the answer set. (This option is
available only when the Merge Style is Union.)
Column Name (Union only)
Specify the name of the identifying column to add to the answer set. (This option is
available only when the Merge Style is Union and Add Identifying Column is
selected.)
One or more optional Where Clauses may be entered on this screen. Each Where Clause
entered is applied only to the table currently selected on the screen. On this screen select:
Use the Teradata EXPLAIN feature to display the execution path for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.
On this screen, select the columns that comprise the primary index of the output table. Select:
Available Columns
A list of columns that will be in the output table or result set. Select columns by
highlighting and then either dragging and dropping into the Primary Index
Columns window, or click on the arrow button to move highlighted columns into the
Primary Index Columns window.
Primary Index Columns
A list of columns that comprise the index of the resultant table if an Output Type of
Table is used.
Create the index using the UNIQUE keyword
Select whether or not the primary index should be a unique primary index, i.e. that
more than one row may not have the same combination of primary index column
values.
Results - Merge
The results of running the Teradata Warehouse Miner Merge Analysis include the generated
SQL itself, the results of executing the generated SQL, and, if the Create Table (or View)
option is chosen, a Teradata table (or view). All of these results are outlined below.
The results of the completed query are returned in a Data Viewer page within the Results
Browser. This page has the properties of the Data View page discussed in the Chapter on
Using the Teradata Warehouse Miner Graphical User Interface. With the exception of the
Explain Select Result Option, these results will match the tables described below in the
Output Column Definition section, depending upon the parameters chosen for the analysis.
The generated SQL is returned here as text which can be copied, pasted, or printed.
Merge - Example #1
For this example, the Merge Analysis generates the following results data.
Partition
The Partition analysis is one of two functions provided by Teradata Warehouse Miner to
sample data from a table or view. The Partition Analysis is distinguished from the Sample
Analysis in that it is repeatable and is based on the internal hash index encodings provided by
Teradata, rather than the statistically random selections provided by the Sample function.
Given a table, a list of columns to select and a list of columns to hash on, the Partition
Analysis generates a user specific partition or range of partitions from a table using a hash
key. For example, the 3rd partition out of 10 might be requested, or partitions 1 through 3 out
of 10.
To select a specific partition, set start and end partition to the same selected value. If a range
of partitions is requested, the partition number is also returned as xpartid.
The Partition Analysis is parameterized by specifying the table and column(s) to analyze,
options unique to the Partition Analysis, as well as specifying the desired results and SQL or
Expert Options.
2. In the resulting Add New Analysis dialog box, with Reorganization highlighted on the
left, double-click on the Partition icon:
3. This will bring up the Partition dialog in which you will enter INPUT and OUTPUT
options to parameterize the analysis as described in the next sections.
Available Databases
All the databases which are available for the Partition Analysis.
Available Tables
All the tables within the Source Database that are available for the Partition
Analysis.
Available Columns
All the columns within the selected table that are available for the Partition Analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the
Selected Columns window, or click on the arrow button to move highlighted
columns into the Selected Columns window. Make sure the correct category is
highlighted:
Partition Columns
Number of Partitions
Number of partitions (1 to 65536) to logically split table into, from which Start to
End is selected.
First Partition
First logical partition to select (must be in the range from 1 to Number of
Partitions).
Last Partition
Last logical partition to select (must be in the range from First Partition to Number
of Partitions).
This screen provides the option to generate a SQL WHERE clause(s) to restrict rows selected
for analysis (for example: cust_id > 0).
Use the Teradata EXPLAIN feature to display the execution path for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.
On this screen, select the columns which comprise the primary index of the output table.
Select:
Available Columns
A list of columns which comprise the index of the resultant table if an Output Table
is used.
Primary Index Columns
Select columns by highlighting and then either dragging and dropping into the
Primary Index Columns window, or click on the arrow button to move highlighted
columns into the Primary Index Columns window.
The results of the completed query are returned in a Data Viewer page within the Results
Browser. This page has the properties of the Data View page discussed in the Chapter on
Using the Teradata Warehouse Miner Graphical User Interface. With the exception of the
Explain Select Result Option, these results will match the tables described below in the
Output Column Definition section, depending upon the parameters chosen for the analysis.
The generated SQL is returned as text which can be copied, pasted, or printed.
Partition - Example #1
For this example, the Partition Analysis generated the following results. Note that the SQL is
not shown for brevity:
Data
Partition - Example #2
For this example, the Partition Analysis generated the following results. Again, the SQL is
not shown:
Data
Sample
The Sample analysis function randomly selects rows from a table or view, producing one or
more samples based on a specified number of rows or a fraction of the total number of rows.
The sampled rows may be stored in a single table, in a separate table for each sample, or in a
single table with a view created for each sample. When connected to a Teradata V2R5 or later
data source, options are provided for sampling with or without replacement of rows,
randomized allocation or proportional allocation by AMP, and stratified or simple random
sampling. When connected to an earlier Teradata release the default options are automatically
used. These options are described more fully below.
Sampling is performed without replacement by default. This means that each row sampled in
a request is unique and once sampled is not replaced in the sampling pool for that request.
Therefore, it is not possible to sample more rows than exist in the sampled table, and if
multiple samples are requested they are mutually exclusive. When sampling with replacement
is requested, each sampled row is immediately returned to the sampling pool and may
therefore be selected multiple times. If multiple samples are requested with replacement, the
samples are not necessarily mutually exclusive.
The default row allocation method is proportional, allocating the requested rows across the
Teradata AMP's as a function of the number of rows on each AMP. This is technically not a
simple random sample because it does not include all possible sample sets. It is however
much faster than randomized allocation, especially for large sample sizes, and should have
sufficient randomness for most applications. When randomized allocation is requested, row
selections are allocated across the AMP's by simulating simple random sampling, a process
that can be comparatively slow.
By default the Sample Analysis function performs simple random sampling. This means that
each possible set of the requested size has an equal probability of being selected (subject to
the limitations of proportional allocation noted above). An option is however provided for
stratified random sampling, wherein the available rows are divided into groups or strata based
on stated conditions prior to samples of a requested size or sizes being taken.
The Sample Analysis is parameterized by specifying the table and column(s) to analyze,
options unique to Sample Analysis, as well as specifying the desired results and SQL or
Expert Options.
2. In the resulting Add New Analysis dialog box, with Reorganization highlighted on the
left, double-click on the Sample icon:
3. This will bring up the Sample dialog in which you will enter INPUT and OUTPUT options
to parameterize the analysis as described in the next sections.
Available Databases
All the databases which are available for the Sample Analysis.
Available Tables
All the tables within the Source Database that are available for the Sample Analysis.
Available Columns
All the columns within the selected table that are available for the Sample Analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the
Selected Columns window, or click on the arrow button to move highlighted
columns into the Selected Columns window.
Sample Style
Basic - When this option is checked, simple random sampling without stratifying
conditions is performed.
Stratified - When this option is checked, the available rows are divided into groups
or strata based on stated conditions prior to samples of a requested size or sizes being
taken.
Sample Options
Sample with Replacement - When this option is checked, each sampled row is
immediately returned to the sampling pool and may therefore be selected multiple
times. If multiple samples are requested with replacement, the samples are not
necessarily mutually exclusive.
When this option is not checked, each row sampled in a request is unique, and once
sampled, is not replaced in the sampling pool for that request. Therefore, it is not
possible to sample more rows than exist in the sampled table, and if multiple samples
are requested they are mutually exclusive.
Sample with Randomized Allocation - When this option is checked, the requested
rows are allocated across the AMP’s by simulating simple random sampling, a
process that can be comparatively slow.
When this option is not checked, requested rows are allocated across the Teradata
AMP’s as a function of the number of rows on each AMP. This is technically not a
simple random sample because it does not include all possible sample sets. It is
however much faster than randomized allocation, especially for large sample sizes,
and should have sufficient randomness for most applications.
When the Sample Style is Basic, this option is used to enter a list of one or more
sample sizes or fractions, separated by the list separator for the current locale. If
sample sizes are entered (e.g. 10, 20, 30), they indicate the number of rows to be
returned in each sample. If fractions are entered (e.g. .01, .02, .03), they indicate the
approximate size of each sample as a fraction of the available rows in the table, and
as such must not add up to more than 1.
When the Sample Style is Stratified, this option is used to enter one or more
conditions along with corresponding sample sizes or fractions. (For an example of
stratified sampling, refer to Sample Example #5 in Tutorial – Sample Analysis.)
Condition
Sizes/Fractions
This field is used to enter sizes or fractions for one or more samples, separated by the
list separator for the current locale. If sample sizes are entered (e.g. 10, 20, 30), they
indicate the number of rows to be returned in each sample for the stratum. If fractions
are entered (e.g. .01, .02, .03), they indicate the approximate size of each sample as a
fraction of the available rows in the stratum, and as such must not add up to more
than 1.
This screen provides the option to generate a SQL WHERE clause(s) to restrict rows selected
for analysis (for example: cust_id > 0). (Note that the use of this option may negatively
impact the performance of a Basic style Sample with default options.)
Use the Teradata EXPLAIN feature to display the execution path for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.
On this screen, select the columns which comprise the primary index of the output table.
Select:
Available Columns
A list of columns which comprise the index of the resultant table if an Output Table
or Tables are created.
Primary Index Columns
Select columns by highlighting and then either dragging and dropping into the
Primary Index Columns window, or click on the arrow button to move highlighted
columns into the Primary Index Columns window.
The results of the completed query are returned in a Data Viewer page within the Results
Browser. This page has the properties of the Data View page discussed in the Chapter on
Using the Teradata Warehouse Miner Graphical User Interface. With the exception of the
Explain Select Result Option, these results will match the tables described below in the
Output Column Definition section, depending upon the parameters chosen for the analysis.
The generated SQL is returned as text which can be copied, pasted, or printed.
If one of these options is selected, a single table is built. If multiple values have been
specified in the Size or Fraction list, a column named xsampleid will be created
indicating which sample the row belongs to – a number from 1 to n for each distinct
value entered in the Size or Fraction list (depending on stratified sampling options).
When the Multiple Views option is selected, multiple views are created operating against
this table, selecting rows based on xsampleid, but not including xsampleid.
Multiple Tables
If this option is selected, one table will be built for every value in the Size or Fraction list.
Sample - Example #1
For this example, the Sample Analysis generated the following results. Note that the SQL is
not shown for brevity, and that the specific rows returned will vary randomly.
Data
Sample - Example #2
For this example, the Sample Analysis generated the following results. Again, the SQL is not
shown, and the specific rows returned will vary randomly.
Data
Sample - Example #3
For this example, the Sample Analysis generated the following results. Again, the SQL is not
shown, and the specific rows returned will vary randomly. The data page will have a Load
button which must be click to view the three results.
Sample - Example #4
TWM_CUSTOMER.marital_status
Size or Fraction .1
.2
.3
Output Type Multiple Views
Table Name Twm_Cust_Sample
View Names (3) Twm_Cust_Sample1_view
Twm_Cust_Sample2_view
Twm_Cust_Sample3_view
For this example, the Sample Analysis generated the following results. Again, the SQL is not
shown, and the specific rows returned will vary randomly. The data page will have a Load
button which must be click to view the three results.
Sample - Example #5
For this example, the Sample Analysis generated the following results. Note that not all SQL
is shown for brevity, and that the specific rows returned will vary randomly.
Data
1363039 0 15 7 0 F 1 3
1362548 44554 59 9 2 F 4 3
1362836 5920 66 6 0 F 3 3
1363266 20889 23 2 0 F 3 3
1363051 0 14 6 0 M 1 3
1362563 14711 73 3 0 M 2 3
1362962 2858 83 3 0 M 4 3
Several types of analysis may be involved in building an analytic data set. A Variable
Creation analysis provides expression building and dimensioning to define new variable
columns and place them in a table or view. A Variable Transformation function applies
requested data mining transformation functions to the columns in a table and creates a
transformed table. A Build Data Set analysis joins together the tables or views created by one
or more Variable Creation and/or Variable Transformation functions, allowing column
selection and the application of expert where clause constraints. (It is largely the same as the
Join function in the Reorganization category of functions, but can operate on a single table.)
Note that Identity columns, i.e. columns defined with the attribute "GENERATE … AS
IDENTITY", cannot be analyzed by Analytic Data Set functions.
Variable Creation
The Variable Creation function makes possible the creation of variables as columns in a table
or view. The user creates each new variable as an expression by selecting various SQL
keywords and operators as well as table and column names. SQL keywords and operators
allowed include arithmetic and logical operators, date/time operators, the typical aggregation
functions, as well as the newer ordered analytical (windowed OLAP) functions. The only
typing normally required is the typing of names, descriptions and values (although some
automation is provided for names and values).
In addition to defining variables as expressions or formulas, the user may specify constraints
on the data, either for all the variables defined in a Variable Creation function, or on an
individual basis. Table level constraints defined for all variables result in WHERE, HAVING
or QUALIFY clauses in the generated SQL. Constraints defined for individual variables
result in the use of CASE clauses in order to allow for different constraints on different
variables in the same SQL statement. A feature to allow the creation of numerous similar
variables using constraints based on specific values of one or more ‘dimensioning’ columns is
also provided.
Any number of variables can be defined in a single Variable Creation function, provided they
conform to rules that allow them to be combined in the same table, and they do not exceed
the maximum number of columns allowed in a table by Teradata. Several variable properties
are used in determining which variables can be built in the same table. Some rules of
combining variables in the same Variable Creation function are given below.
• Variables derived in a single table must have the same aggregation type and level.
• A number of tables may be referenced by the variables defined in a single Variable
Creation function.
The standard result options are available with the Variable Creation function, namely Select,
Explain Select, Create Table and Create View. The choice depends primarily on whether this
analysis produces a final result or an intermediate result, and if so, whether the user wants to
create a permanent table or view for this intermediate result. If a permanent result is not
desired, the Select option can be used to view and verify results. (Even if this analysis
produces an intermediate result directly referred to by another analysis, the Select option can
still be used since a volatile table will automatically be created in this case to allow the
referring analysis to access the results.)
SQL Elements
The Variable Creation function allows the creation of new columns or variables as
SQL expressions or formulas based on the features, functions and operators outlined
below, dependent on the release of Teradata in use at the time the variables are
defined:
The same list applies to the creation of Dimensions, with the exclusion of all aggregation
functions and ordered analytical functions. Additionally, the Variable Creation analysis
also allows creation of WHERE, HAVING and QUALIFY clause constraints based
on the same list with the exclusion of aggregation functions (except with HAVING),
and ordered analytical functions (except with QUALIFY).
Variable Properties
Each time a new variable is defined, the program keeps track of several attributes of the
variable that control how it is generated. Some of these attributes can be explicitly set by the
user and some are determined by the SQL verbs or clauses selected by the user.
Duplication By Dimension
Sometimes it is desirable to generate a number of similar variables at one time using data
constraints involving specific values or combinations of values from one or more columns in
the input table. These other columns can be thought of as dimensions upon which the new
variable is expanded or duplicated. For example, instead of creating a single variable
containing a customer’s average transaction amount, it may be desirable to create separate
variables for average transaction amount during each of the last 6 months, yielding 6
variables.
Duplication by dimension is performed at the time a variable is created with the Variable
Creation analysis. The user may dimension a variable on all or a subset of the dimension
values they define. Ordinarily, both the dimensioned and dimensioning variable reside in the
same input table. For example, both the transaction amount (variable being dimensioned) and
the transaction date (dimensioning variable) reside in the transaction table that is used as
input.
Depending on the nature of the variable being dimensioned, the user may want to treat values
not applying to a particular dimension value as either NULL or 0. The use of NULL in this
case results in the possibility of the dimensioned variable being NULL if no data applies. The
use of 0 in this case simply gives a total of 0 if no data applies. An option is therefore
provided to the user to indicate that either NULL or 0 should be used when no data applies.
The dimension information is shown below for conceptual purposes in the form of two tables.
Note that the Dimension Values table targets the dimension values of tran_code in a
particular table. Notice that the conditions comprising the elements of the dimension may
overlap. That is, they do not need to be mutually exclusive in value.
Dimension Values:
Suppose the above dimension values are applied to a new variable, AVG(tran_amt), with
abbreviation Amt. The select list items for the AVG(tran_amt) dimensioned by these
dimension values would produce 4 variables:
Conditions other than simple inclusion can be used in defining dimensions. In fact, any SQL
construct listed previously with the exception of an aggregation or ordered analytic function
can be used.
The anchor table is a table that contains all of the key values to be included in the final data
set. Physically, this can be a table or a view residing in Teradata. The data set anchor key
columns must be included in the anchor table and must uniquely identify rows in the anchor
table, otherwise unpredictable results may occur when joining this table with others.
Join paths must be specified from the Anchor Table to every table used to create variables,
dimensions and/or specified in a WHERE, QUALIFY or HAVING clause. This information
is used to build up a FROM clause for each table or view to be left outer joined with the
anchor table in order to include the appropriate anchor key values in the data set.
The following is an example of a simple join path between two tables. Note that the
containing databases can differ as can the joining table names and column names.
db1.tbl1.cust_id = db2.tbl2.cid
In some cases more than two tables must be joined together to reach a commonly used table.
By way of an example, a transaction table may not contain the customer identifier that forms
the primary index of the anchor table, but an account number instead, which is tied to
customer identifier in a third table which contains both values.
Of course, more complex examples can occur in practice and can be accommodated by a join
path with sufficient conditions combined together.
The Variable Creation function includes a Join Path wizard to make it easier to build up
complex join paths. Note also that join paths can be automatically extracted from other
analyses in the same project. This suggests that join paths can be created once in a Variable
Creation analysis, and then copied into a project to be used as a template.
SQL Generation
In order to derive the variables defined in a Variable Creation function, SQL is generated in
one of a number of forms depending on the result option selected. (Note that for each of
these forms, there is an option to "Create SQL Only" without executing the SQL.)
• "Select"
• "Explain Select"
• "Drop Table" and "Create Table As"
When the SELECT option is chosen for output, if another analysis refers to this Variable
Creation analysis for its input, the SQL takes the form of a "Drop Table" and "Create Volatile
Table As".
Note that it is necessary to generate a DROP command prior to a CREATE in case the
definition of the table or view has changed since a previous execution. For each variable, a
select list item is generated for the variable expression. If requested as expert options,
WHERE, QUALIFY and/or HAVING clauses may be generated. In the FROM clause, data is
selected from the anchor table, and left outer joined to any other tables referred to in the
variable, dimension or expert clause definitions. Aliases are generated for each table or view
accessed and all column names are automatically qualified using these aliases.
2. In the resulting Add New Analysis dialog box, click on ADS under Categories and then
under Analyses double-click on Variable Creation:
3. This will bring up the Variable Creation dialog in which you can define INPUT /
OUTPUT options.
Note that this screen may be resized by clicking on one of the edges or corners and moving
the mouse while holding the button down.
Selection Options
Input Source
Select Table to input from a table or view, or select Analysis to select directly from
the output of a qualifying analysis in the same project. (Selecting Analysis will
cause the referenced analysis to be executed before this analysis whenever this
analysis is run. It will also cause the referenced analysis to create a volatile table if
the Output option of the referenced analysis is Select.)
Databases
All databases which are available for the Variable Creation analysis.
Tables
All tables within the Source Database which are available for the Variable Creation
analysis.
Columns
All columns within the selected table which are available for the Variable Creation
analysis.
Values
If a single column is highlighted and the Values button is clicked, a window appears
above the Columns selector displaying distinct values that appear in the selected
column in the selected table or view. The query to retrieve these values is affected by
two options on the Limits tab of the Tools menu item called Preferences, namely: Use
sampling to retrieve distinct value data and Number of rows to sample. To remove
the temporary window that displays the values, select the Hide button at the top of
the display. (Note that if the Input Source is Analysis and the column is in a volatile
table created by the referenced analysis, the retrieval of Values may fail. Just follow
the directions in the informational message displayed in case of failure to retrieve
data values.)
The same right-click menu options are offered for the Columns selector on the left side of the
input screen as are offered for other input screens (refer to the Analysis Input Screen topic
in Using Teradata Warehouse Miner). Also, the following right-click options are available
within the Variables panel.
The variables to be created are specified one at a time as any type of SQL expression. One
way to create a new variable is to click on the New button to produce the following within the
Variables panel:
Another way to create one or more new variables is to drag and drop one or more columns
from the Columns panel to the empty space at the bottom of the Variables panel (multiple
columns may be dragged and dropped at the same time). Each new variable is given the same
name as the corresponding column dropped onto the empty area.
One alternative to dragging and dropping a column is to use the right arrow selection button
to create a new variable from it. Another alternative is to double-click on the column. If the
right arrow button is clicked repeatedly, or the column is double-clicked repeatedly, a range
of columns may be used to create new variables, since the selected column increments each
time the arrow is clicked or the column is double-clicked. (It should be noted that when a
column or column value is selected, the right arrow selection button will only be highlighted
if a SQL Element is not selected. This can be ensured if the right-click option to Collapse All
Nodes is utilized in the SQL Element view.)
Whether dragging and dropping, clicking on the right arrow button or double-clicking on the
column, a new variable based on a column looks something like the following (after
expanding the node).
Still another way to create a new variable is to drag and drop a single SQL element from the
SQL Elements panel to the empty space at the bottom of the Variables panel, or to drag and
drop one or more column values displayed by selecting the Values button. In the case of
column values, a variable containing a single SQL Numeric Literal, String Literal or Date
Literal is created as appropriate for each column value. (This technique saves having to edit
the properties of a numeric, string or date literal to set the desired value.)
As with creating variables from selected columns, use of the right arrow selection button or
double-clicking the desired SQL element or column value provides an alternative to dragging
and dropping an element or value. Note however that repeated selection of a SQL element
does not advance the selected element so the result is multiple variables containing the same
SQL element. (Note also that when a SQL element is selected, the right arrow selection
button will only be highlighted if neither a column or a column value is selected in its
respective view.)
When a SQL element is placed on top of another element on the Variables panel, whether by
dragging and dropping it, selecting it with the right arrow or by double-clicking it, the new
element is typically inserted into the expression tree at that point. The element replaced is
then typically moved to an empty operand of the new SQL element.
Whether dragging and dropping, clicking on the right arrow button or double-clicking, a new
variable based on a SQL element looks something like the following example involving the
Average element:
It is possible to create a copy of a variable by holding down the Control key on the keyboard
while dragging the variable to another location in the Variables panel. The copy can be
placed ahead of another variable by dropping it on that variable, or at the end of the list of
variables by dropping it on the empty space at the bottom of the Variables panel. It is also
possible to copy a variable in the same manner from another analysis by viewing the other
analysis at the same time and dragging the variable from one analysis to the other.
Please be aware that if the Control key is not held down while performing the copy operation
just described within the same analysis, the variable is moved form one place to the other, i.e.
deleted from its old location and copied to the new one. There are two exceptions to this.
First, this is not the case when copying a variable from one analysis to another, in which case
a copy operation is always performed, with or without holding down the Control key. The
second exception is when moving one child node on top of another child node of the same
parent in the expression tree that defines a variable. In this case, the two nodes or sub-
expressions are switched. (For example, if income and age are added together and age is
moved on top of income, the result is to add age and income, reversing the operands.)
Replicating a Variable
All variables can be deleted from the analysis by selecting the double-back-arrow button in
the center of the Variable Creation window. When this function is requested, one or more
warnings will be given. The first warning indicates how many variables are about to be
deleted. The second possible warning is given if the number of variables being deleted
exceeds 100, the maximum number of operations that can be undone or redone using the
Undo or Redo buttons. (If this warning is given and the Undo button is then selected, only
the first 100 variables will be restored. These are actually the last 100 deleted, since they are
deleted in reverse order.)
Buttons
New Button
Clicking on the New button creates a new Variable on the panel.
Add Button
Clicking on the Add button brings up a dialog to allow adding copies of variables from other
loaded analyses.
Available Analyses
This drop down list contains all of the Variable Creation analyses currently loaded in
the Project window, including those in other projects.
Available Variables
These are the variables in the currently selected analysis.
(Note that if the box is checked and the selected analysis is in another project, and
one of the variables or dimensions applied to a variable being copied contains a
reference to another analysis, an error message will be given and none of the
variables will be copied.)
OK/Cancel/Apply
Each time the Apply button is clicked a copy of the currently selected variables are
added and a status message given. The Apply button is also disabled as a
consequence until another variable is selected. The dialog can be exited at any time
by clicking the OK or Cancel button. If OK is clicked, the currently selected
variables will be added unless the Apply button is disabled.
Wizard Button
When the Variables tab is selected and either a Variable is selected or nothing is
selected, the Wizard button can be used to generate new variables, each containing a
Searched Case statement. Alternately, when an appropriate folder is selected, When
Conditions for Searched Case statements, or conditional expressions for And All or Or
All statements, can be generated. To do so, highlight the Case Conditions folder under a
Case - Searched node or the Expressions folder under an And All or Or All node and
select the Wizard button.
The maximum number of variables or values that can be generated by a single application
of the wizard is limited to 1000.
The following dialog is given when a Variable or nothing at all is selected. (Note that in
the other cases a subset of these fields is displayed with appropriate instructions at the top
of the dialog.)
Variable Prefix
When a comparison operator such as Equal is selected in the Operator field, the
names of the resulting variables consist of the prefix followed by underscore and the
selected value. Otherwise the variable name is the prefix followed by a number.
Description
When a comparison operator such as Equal is selected in the Operator field, the
description of the resulting variables consist of the description specified here
followed by the operator and selected value. Otherwise the description is the
description entered here.
Then Expression
Replace the "(empty)" node with a SQL element or more complex expression that
will form the Then clause of the generated Searched Case expression. (The default
value of ‘1’ is useful for an indicator variable.)
Else Expression
Replace the "(empty)" node with a SQL element or more complex expression that
will form the Else clause of the generated Searched Case expression. (The default
value of ‘0’ is useful for an indicator variable.)
Operator
Select a comparison operator such as Equals or select Between, Not Between, In, Not
In, Is Null or Is Not Null as the operator to use. If Between or Not Between is
selected, a variable or condition is generated for each pair of requested values. If In
or Not In is selected, the Wizard will generate a single variable or condition based on
all requested values when 'OK' or 'Apply' is clicked. If Is Null or Is Not Null is
selected, the Wizard will generate a single variable or condition based on no values.
Otherwise, if a comparison operator such as Equal is selected, the Wizard will
generate a variable or condition for each requested value.
Values
This tab accepts values displayed by selecting the Values button for input
columns on the left side of the input screen. The displayed values can be drag-
dropped onto this panel, selected with the right-arrow button or selected by
double-clicking them. They can be numeric, string or date type values.
Note that when values are displayed on the left side of the input screen, the
ellipses button (the one displaying ‘…’) may be used to Select All Values.
Range
This tab can be used to generate a range of integer or decimal numeric values
based on a From, To and By field. If desired, the values can be generated in
descending order by making the From value greater than the To value, so that the
By value should always be positive. If the By field is not specified, an
incremental value of 1 is assumed. (Note that a value displayed with the Values
button may be drag-dropped into this field. Note also that the escape key will
revert to the last value entered in this field.)
When the Between or Not Between operator has been specified, the Range fields
behanve somewhat differently and may be used only to specify a single pair of
values using the From and To field, with the From field validated to be less than
or equal to the To field. The By field may not be specified when the Between or
Not Between operator has been specified.
List
A list of numeric, string or date type values can be entered here, separated by
commas (actually, by the standard list separator for the current locale settings).
(Note that a value displayed with the Values button may be drag-dropped into
this field. Note also that the escape key will revert to the last value entered in
this field.)
Clear All
This button will clear all of the fields of this dialog. (This is convenient because all
entries are generally retained when returning to this dialog.)
OK
This button will generate the requested variables or conditions and return to the
Variables panel.
Cancel
This button returns to the Variables panel without generating any elements.
Apply
This button will generate the requested variables or conditions and remain on this
panel. A status message is displayed just above this button reporting on the number
of generated conditions.
Delete Button
The Delete button can be used to delete any node within the tree. If applicable, the tree
will roll-up children, but in some cases, a delete may remove all children.
SQL Button
The SQL button can be used to dynamically display the SQL for any node within the
Variables tree. If the resulting display is not closed, the expression changes as you click
on the different levels of the tree comprising a variable. An option is provided in the
display to Qualify column names, that is to precede each column name in the display
with its database and table name.
Properties Button
A number of properties are available when defining a variable to be created, as outlined
below. Click the Properties button when the variable is highlighted, or double click on
the variable to bring up the Properties dialogue:
Name:
A name must be specified for each variable. If the SQL expression defining the
variable is simply a SQL Column, the name defaults to the name of the column
automatically when the column is dragged to the variable.
(Tip: Variables can be named by single left-clicking on the name, which produces a
box around the name, as in Windows Explorer)
Output Type:
A specific Teradata data type may optionally be specified for each variable. If
specified, the SQL CAST function is used to force the data type to the requested
specification. Otherwise the type will be generated automatically by the variable’s
expression (Generate Automatically option). Valid options include:
• BYTEINT
• CHAR
• DATE
• DECIMAL
• FLOAT
• INTEGER
• SMALLINT
• TIME
• TIMESTAMP
• VARCHAR
Column Attributes:
One or more column attributes can be entered here in a free-form manner to be used
when an output table is created. They are placed as-entered following the column
name in the CREATE TABLE AS statement. This can be particularly useful when
requesting data compression for an output column, which might look like the
following: COMPRESS NULL.
Description:
An optional description may be specified for each variable. (Note that a default
description is generated automatically by the Wizard if its Description field contains
a value.)
Undo Button
The Undo button can be used to undo changes made to the Variables panel. Note that if a
number of variables or dimension values are added at one time, each addition requires a
separate undo request to reverse. Up to 100 undo requests can be processed.
Redo Button
The Redo button can be used to reinstate a change previously undone with the Undo
button.
Aggregations
A number of aggregation functions are supported, including several of a statistical nature.
Note that aggregation functions are not allowed in a Dimension value expression, but may be
used in a Variable expression that is being dimensioned. They are not allowed in a Where
clause or Qualify clause either. Double click on Aggregations to view the supported
functions:
Average
The standard average function is supported, taking a single expression argument and
generating AVG(expression). The function returns a value of type float, with the
exception that a value of type date is returned as the average of a date expression. When
dragging an Average into a variable, the following tree element is created:
Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.
The option to compute the average over distinct values only is provided, resulting in the
generation of AVG(DISTINCT expression). This option is enabled through the
Properties panel. Double-click on Average, or highlight it and hit the Properties button:
Correlation
Columns, and/or other non-aggregate expressions can be moved over the two (empty)
branches of the tree.
The enhancement is the ability to compute the correlation when either or both the first
and second expression arguments evaluate to type date, generating one of the following:
Covariance
Columns, and/or other non-aggregate expressions can be moved over the two (empty)
branches of the tree.
The enhancement consists of the ability to compute the covariance when either or both
the first and second expression arguments evaluate to type date, generating one of the
following (in which COVAR_POP may be substituted for COVAR_SAMP):
The option to compute the covariances on the population or sample is offered through the
Properties panel. Double-click on Covariance, or highlight it and hit the Properties
button:
Count
The (empty) is added, if no expression yet existed within the variable. Otherwise the
expression is maintained. In either case, columns, and/or other non-aggregate expressions
can be moved over the (empty) branch in the tree. An Asterisk (*) may also be moved
from the Other category to request the COUNT(*) function.
The option to compute the count over distinct values only is provided, resulting in the
generation of COUNT(DISTINCT expression). This option is enabled through the
Properties panel. Double-click on Count, or highlight it and hit the Properties button:
Kurtosis
Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.
The enhancement consists of the ability to compute the kurtosis of a date expression,
generating KURTOSIS(date expression - DATE '1900-01-01').
The standard option to compute the kurtosis over distinct values only is also provided,
resulting in the generation of KURTOSIS(DISTINCT expression). This option is enabled
through the Properties panel. Double-click on Kurtosis, or highlight it and hit the
Properties button:
Maximum
Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.
The option to compute the maximum over distinct values only is provided, resulting in
the generation of MAX(DISTINCT expression). This option is enabled through the
Properties panel. Double-click on Maximum, or highlight it and hit the Properties
button:
Minimum
Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.
The option to compute the minimum over distinct values only is provided, resulting in the
generation of MIN(DISTINCT expression). This option is enabled through the
Properties panel. Double-click on Minimum, or highlight it and hit the Properties
button:
Regression Intercept
Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.
Regression R-Squared
Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.
Regression Slope
Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.
Skewness
Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.
The enhancement consists of the ability to compute the skew of a date expression,
generating SKEW(date expression - DATE '1900-01-01').
The standard option to compute the skew over distinct values only is also provided,
resulting in the generation of SKEW(DISTINCT expression). This option is enabled
through the Properties panel. Double-click on Skewness, or highlight it and hit the
Properties button:
Standard Deviation
Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.
The enhancement consists of the ability to compute the standard deviation of a date
expression, generating for the sample version STDDEV_SAMP(date expression - DATE
'1900-01-01').
The standard option to compute the standard deviation over distinct values only is also
provided, resulting in the generation for the sample version of
STDDEV_SAMP(DISTINCT expression). Both this option as well as the options for
population and sample versions of standard deviation are enabled through the Properties
panel. Double-click on Standard Deviation, or highlight it and hit the Properties button:
Sum
The standard sum function is supported, generating SUM(expression). The type of the
resulting value depends on the type of the expression being summed. If the expression is
any of the integer types, the resulting value is of type integer. If the expression is a float
or character type, the resulting value is of type float. A decimal expression results in a
value of decimal type with 18 total digits and the same number of fractional digits
contained in the decimal expression. When dragging a Sum into a variable, the following
tree element is created:
Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.
The option to compute the sum over distinct values only is provided, resulting in the
generation of SUM(DISTINCT expression). This option is enabled through the
Properties panel. Double-click on Sum, or highlight it and hit the Properties button:
Variance
Columns, and/or other non-aggregate expressions can be moved over the (empty) branch
of the tree.
The enhancement consists of the ability to compute the variance of a date expression,
generating for the sample version VAR_SAMP(date expression - DATE '1900-01-01').
The standard option to compute the variance over distinct values only is also provided,
resulting in the generation for the sample version of VAR_SAMP(DISTINCT
expression). Both this option as well as the options for population and sample versions of
standard deviation are enabled through the Properties panel. Double-click on Standard
Deviation, or highlight it and hit the Properties button:
Arithmetic
Numeric functions can operate in general on any expression that can automatically be
converted to a numeric value. Character type operands are automatically converted to a
number of type float if possible performing the numeric function. Additionally, the standard
and Teradata specific numeric operators are supported. Double click on Arithmetic to view
the supported functions and operators:
Absolute Value
float if expression is a character type. When dragging an Absolute Value into a variable,
the following tree element is created:
Columns, and/or other expressions can be moved over the (empty) branch of the tree.
Add
The standard Add (+) operator is supported, generating expression+ expression. Within
Teradata, these operators automatically convert numeric operands to the expected result
type before they are applied. Character type data is converted to FLOAT if possible
before being applied. Operands of type DATE are valid, when adding an integer number
of days to a date expression. The resulting data types and other specific usage
information are documented in some detail in the Teradata documentation. When
dragging an Add into a variable, the following tree element is created:
Columns, and/or other expressions can be moved over the (empty) branches of the tree.
Divide
The standard Divide (/) operator is supported, generating expression / expression. Within
Teradata, these operators automatically convert numeric operands to the expected result
type before they are applied. Character type data is converted to FLOAT if possible
before being applied. The resulting data types and other specific usage information are
documented in some detail in the Teradata documentation. When dragging a Divide into
a variable, the following tree element is created:
Columns, and/or other expressions can be moved over the (empty) branches of the tree.
Exponentiate
Columns, and/or other expressions can be moved over the (empty) branches of the tree.
Note that the second argument must resolve to a numeric literal.
Logarithm
Columns, and/or other expressions can be moved over the (empty) branch of the tree.
Modulo
Columns, and/or other expressions can be moved over the (empty) branches of the tree.
Multiply
Columns, and/or other expressions can be moved over the (empty) branches of the tree.
Natural Exponentiate
Columns, and/or other expressions can be moved over the (empty) branch of the tree.
(Note that it may be advisable to use a Case statement in conjunction with this function if
extreme values in the data may occur, resulting in an overflow or SQL argument error.)
Natural Logarithm
Columns, and/or other expressions can be moved over the (empty) branch of the tree.
(Note that it may be advisable to use a Case statement in conjunction with this function if
zero or negative values may occur in the data, resulting in a SQL argument error.)
Random
The integers x (Lower Bound) and y (Upper Bound) are set through the Properties panel.
Double-click on Random, or highlight it and hit the Properties button:
Square Root
Columns, and/or other expressions can be moved over the (empty) branch of the tree.
Note that expressions that resolve to a negative number will result in SQL errors.
Subtract
number of days from a date expression. The resulting data types and other specific usage
information are documented in some detail in the Teradata documentation. When
dragging a Subtract into a variable, the following tree element is created:
Columns, and/or other expressions can be moved over the (empty) branch of the tree.
Unary Minus
The standard Unary Minus (-) operator is supported, generating -expression. Within
Teradata, this operator automatically converts numeric operands to the expected result
type before they are applied. Character type data is converted to FLOAT if possible
before being applied. The resulting data types and other specific usage information are
documented in some detail in the Teradata documentation. When dragging a Unary
Minus into a variable, the following tree element is created:
Columns, and/or other expressions can be moved over the (empty) branch of the tree.
Unary Plus
The standard Unary Plus (+) operator is supported, generating +expression. Within
Teradata, this operator automatically converts numeric operands to the expected result
type before they are applied. Character type data is converted to FLOAT if possible
before being applied. The resulting data types and other specific usage information are
documented in some detail in the Teradata documentation. When dragging a Unary Plus
into a variable, the following tree element is created:
Columns, and/or other expressions can be moved over the (empty) branch of the tree.
Calendar
Day of Calendar
The Day of Calendar function is supported, returning an integer 1-n, the number of Julian
days since 1/1/1900. When dragging a Day of Calendar into a variable, the following tree
element is created:
Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.
Day of Month
The Day of Month function is supported, returning an integer 1-31, the number of the day
within a given month. When dragging a Day of Month into a variable, the following tree
element is created:
Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.
Day of Week
The Day of Week function is supported, returning an integer 1-7, the number of the day
within a given week assuming 1/1/1900 Is Monday. When dragging a Day of Week into a
variable, the following tree element is created:
Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.
Day of Year
The Day of Year function is supported, returning an integer 1-366, the number of the day
within a given year. When dragging a Day of Year into a variable, the following tree
element is created:
Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.
Month of Calendar
The Month of Calendar function is supported, returning an integer 1-N, the number of
months Since 1/1/1900. When dragging a Month of Calendar into a variable, the
following tree element is created:
Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.
Month of Quarter
The Month of Quarter function is supported, returning an integer 1-3, the number of the
month in a given quarter. When dragging a Month of Quarter into a variable, the
following tree element is created:
Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.
Month of Year
The Month of Year function is supported, returning an integer 1-12, the number of the
month in a given year. When dragging a Month of Year into a variable, the following tree
element is created:
Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.
Quarter of Calendar
The Quarter of Calendar function is supported, returning an integer 1-N, the number of
Quarters Since Q1/1900. When dragging a Quarter of Calendar into a variable, the
following tree element is created:
Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.
Quarter of Year
The Quarter of Year function is supported, returning an integer 1-4, the quarter of the
year where Jan-Mar=1, Apr-Jun=2, Jul-Sep=3, Oct-Dec=4. When dragging a Quarter of
Year into a variable, the following tree element is created:
Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.
Week of Calendar
The Week of Calendar function is supported, returning an integer 0-N, partial week in the
beginning is 0. When dragging a Week of Calendar into a variable, the following tree
element is created:
Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.
Week of Month
The Week of Month function is supported, returning an integer 0-5, partial week in the
beginning is 0. When dragging a Week of Month into a variable, the following tree
element is created:
Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.
Week of Year
The Week of Year function is supported, returning an integer 0-53, partial week in the
beginning is 0. When dragging a Week of Year into a variable, the following tree element
is created:
Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.
Weekday of Month
The Weekday of Month function is supported, returning an integer 1-5, nth occurrence of
day of week in month. When dragging a Weekday of Month into a variable, the
following tree element is created:
Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.
Year of Calendar
The Year of Calendar function is supported, returning the year, assuming year starts
January 1st. When dragging a Year of Calendar into a variable, the following tree element
is created:
Columns, and/or other expressions that resolve to a date can be moved over the (empty)
branch of the tree.
Both the standard valued and searched CASE expressions are supported. In both cases the
ELSE expression is optional, and if not specified, the expression will return NULL if no
WHEN conditions evaluate to TRUE. At least one WHEN/THEN condition is required with
both forms of CASE expression, and a test expression is also required with the valued form.
(Note that a CASE statement embedded within a CASE statement as a THEN or ELSE
expression is automatically enclosed in parentheses if it is not already so enclosed. This
makes it easier to achieve correct syntax if a nested CASE statements is needed.)
Case - Searched
The standard searched CASE expression is supported. When dragging a Searched Case
into a variable, the following tree element is created:
The searched CASE statement is built up by supplying one or more conditions within the
Conditions folder. Each time a Condition is added, the following tree element is created:
Columns, and/or other expressions can be moved over the (empty) branches of the tree.
Note that the ELSE expression is optional, and if not specified, the expression will return
NULL if no WHEN conditions evaluate to TRUE. At least one WHEN/THEN condition
is required with the CASE expression.
Case - Valued
The standard valued CASE expression is supported. When dragging a Valued Case into a
variable, the following tree element is created:
The valued CASE statement is built up by supplying one or more conditions within the
Conditions folder. Each time a Condition is added, the following tree element is created:
Columns, and/or other expressions can be moved over the (empty) branches of the tree.
Note that the ELSE expression is optional, and if not specified, the expression will return
NULL if no WHEN conditions evaluate to TRUE. At least one WHEN/THEN condition
is required with the CASE expression, and a test expression is also required with the
valued form.
Case Condition
For both Searched and Valued CASE statements, any number of conditions can be built
up. In order to do so, a Condition must first be dragged and dropped into the Conditions
folder of a Searched Case or Valued Case expression. Each condition results in an
expression of the form WHEN expression THEN expression. As an example, when
dragging a Condition into the Conditions folder of a Searched Case expression, the
following tree element is created:
Columns, and/or other expressions can be moved over the (empty) branches of the tree.
Coalesce
The standard COALESCE case expression is supported, generating SQL of the form
COALESCE(expression, …expression). It must be supplied at least two arguments. The
entire COALESCE expression will automatically be enclosed in parenthesis if it is not
part of an expression and not already enclosed in parenthesis. When dragging a Coalesce
into a variable, the following tree element is created:
Note that COALESCE can be used in place of the non-standard Teradata specific
command ZEROIFNULL. For example, COALESCE(column1, 0) is equivalent to
ZEROIFNULL(column1).
Null If
The standard NULLIF case expression is supported, generating SQL of the form
NULLIF(expression, expression). It must be supplied exactly two arguments. The entire
NULLIF expression will automatically be enclosed in parenthesis if it is not part of an
expression and not already enclosed in parenthesis. When dragging a Null If into a
variable, the following tree element is created:
Columns, and/or other expressions can be moved over the (empty) branches of the tree.
Note that NULLIF can be used in place of the non-standard Teradata specific command
NULLIFZERO. For example, NULLIF(column1, 0) is equivalent to
NULLIFZERO(column1).
Null If Zero
A column, and/or other expression can be moved over the (empty) branch of the tree.
Note that the Null If element, which generates the standard NULLIF command, can be
used in place of the Null If Zero element, which generates the non-standard Teradata
specific command NULLIFZERO. (In Teradata SQL, NULLIF(column1, 0) is equivalent
to NULLIFZERO(column1).)
Zero If Null
A column, and/or other expression can be moved over the (empty) branch of the tree.
Note that the Coalesce element, which generates the standard COALESCE command, can
be used in place of the Zero If Null element, which generates the non-standard Teradata
specific command ZEROIFNULL. (In Teradata SQL, COALESCE(column1, 0) is
equivalent to ZEROIFNULL (column1).)
Comparison
The standard comparison operators are supported, including equals (=), not equals (<>), less
than (<), less than or equals (<=), greater than (>) and greater than or equals. Comparison
operators evaluate to a true or false condition which can be used in various contexts such as
case conditions. Double-click on Comparison to see all of the operators:
Equals
Columns, and/or other expressions and literals can be moved over the (empty) branches
of the tree.
Greater Than
The Greater Than operator is supported, generating expression > expression. When
dragging a Greater Than operator into a variable, the following tree element is created:
Columns, and/or other expressions and literals can be moved over the (empty) branches
of the tree.
The Greater Than or Equals operator is supported, generating expression => expression.
When dragging a Greater Than or Equals operator into a variable, the following tree
element is created:
Columns, and/or other expressions and literals can be moved over the (empty) branches
of the tree.
There are no special properties for the Greater Than or Equals operator.
Less Than
The Less Than operator is supported, generating expression < expression. When dragging
a Less Than operator into a variable, the following tree element is created:
Columns, and/or other expressions and literals can be moved over the (empty) branches
of the tree.
The Less Than or Equals operator is supported, generating expression <= expression.
When dragging a Less Than or Equals operator into a variable, the following tree element
is created:
Columns, and/or other expressions and literals can be moved over the (empty) branches
of the tree.
There are no special properties for the Less Than or Equals operator.
The Does Not Equal operator is supported, generating expression <> expression. When
dragging a Does Not Equal operator into a variable, the following tree element is created:
Columns, and/or other expressions and literals can be moved over the (empty) branches
of the tree.
There are no special properties for the Does Not Equal operator.
Many different date and time functions and operators are offered to extract elements from a
date or time column as well as perform differences/elapsed calculations on multiple date and
time elements. Double-click on Date and Time to see all of the functions and operators:
Add Months
Columns, and/or other expressions and/or literals that resolve to a date can be moved
over the first (empty) branch of the tree, while expressions and/or literals that resolve to
type integer can be moved over the second (empty) branch of the tree.
Current Date
The Current Date literal represents the current system date. It generates the SQL keyword
CURRENT_DATE and is of type Date. When dragging a Current Date function into a
variable, the following tree element is created:
There are no children (empty) branches of the tree as no arguments are required for the
Current Date function.
Current Time
The Current Time literal represents the current system time and current session Time
Zone displacement. It generates the keyword CURRENT_TIME and is of type Time
With Time Zone. The feature allowing the specification of the number of digits of
precision for fractional seconds is not supported; no fractional digits are provided. When
dragging a Current Time function into a variable, the following tree element is created:
There are no children (empty) branches of the tree as no arguments are required for the
Current Time function.
Current Timestamp
The Current Timestamp literal represents the current system timestamp and current
session Time Zone displacement. It generates the keyword CURRENT_TIMESTAMP
and is of type Timestamp With Time Zone. The feature allowing the specification of the
number of digits of precision for fractional seconds is not supported; six digits are always
provided. When dragging a Current Timestamp function into a variable, the following
tree element is created:
There are no children (empty) branches of the tree as no arguments are required for the
Current Timestamp function.
Date Difference
A Teradata Warehouse Miner specific function is provided for calculating the difference
between two Date and/or Timestamp expressions in various units. The integer measures
are calculated by expressing both dates in the requested units and then taking the integer
difference between the two (for example, the difference between April 1 and March 31 is
1 month). The fractional measures are in days converted to fractions of longer time
periods. Note that either Date or Timestamp expression may be a literal value, the built-in
function Current Date or Current Timestamp, or the analytic data set's target date value.
When dragging a Date Difference function into a variable, the following tree element is
created:
Options to compute the difference in Days, Weeks, Months, Quarters or Years are set
through the Properties panel. If Weeks, Months, Quarters or Years are requested, the
units may be calculated in one of two different ways, as described on the Properties
panel. Double-click on Date Difference, or highlight it and hit the Properties button:
Date Field
Days
Calculate the date difference in integer days.
Weeks
Either calculate the difference in days and convert this to fractional weeks, or
express both dates in weeks and take the integer difference.
Months
Either calculate the difference in days and convert this to fractional months,
or express both dates in months and take the integer difference.
Quarters
Either calculate the difference in days and convert this to fractional quarters,
or express both dates in quarters and take the integer difference.
Years
Either calculate the difference in days and convert this to fractional years, or
express both dates in years and take the integer difference.
Elapsed Time
Options to compute the elapsed time in seconds, minutes, hours or days (fraction of a
day) are set through the Properties panel. Double-click on Elapsed Time, or highlight it
and hit the Properties button:
Extract Day
The standard date/time field extract function is supported for Day, generating
EXTRACT(DAY FROM date/time expression). If this function is applied to a column or
expression of type other than Date or Timestamp, a SQL runtime error will occur. When
dragging an Extract Day function into a variable, the following tree element is created:
The type of the value returned is integer. There are no special properties for the Extract
Day function.
Extract Hour
The standard date/time field extract function is supported for Hour, generating
EXTRACT(HOUR FROM date/time expression). If this function is applied to a column
or expression of type other than Time or Timestamp, a SQL runtime error will occur.
When dragging an Extract Hour function into a variable, the following tree element is
created:
The type of the value returned is integer. There are no special properties for the Extract
Hour function.
Extract Minute
The standard date/time field extract function is supported for Minute, generating
EXTRACT(MINUTE FROM date/time expression). If this function is applied to a
column or expression of type other than Time or Timestamp, a SQL runtime error will
occur. When dragging an Extract Minute function into a variable, the following tree
element is created:
The type of the value returned is integer. There are no special properties for the Extract
Minute function.
Extract Month
The standard date/time field extract function is supported for Month, generating
EXTRACT(MONTH FROM date/time expression). If this function is applied to a
column or expression of type other than Date or Timestamp, a SQL runtime error will
occur. When dragging an Extract Month function into a variable, the following tree
element is created:
The type of the value returned is integer. There are no special properties for the Extract
Month function.
Extract Second
The standard date/time field extract function is supported for Second, generating
EXTRACT(DAY FROM date/time expression). If this function is applied to a column or
expression of type other than Time or Timestamp, a SQL runtime error will occur. When
dragging an Extract Second function into a variable, the following tree element is created:
The type of the value returned is integer if fractional seconds precision is 0, and
DECIMAL(8, n) if precision is n. There are no special properties for the Extract Second
function.
Extract Year
The standard date/time field extract function is supported for Year, generating
EXTRACT(YEAR FROM date/time expression). If this function is applied to a column
or expression of type other than Date or Timestamp, a SQL runtime error will occur.
When dragging an Extract Year function into a variable, the following tree element is
created:
The type of the value returned is integer. There are no special properties for the Extract
Year function.
Time Difference
A Teradata Warehouse Miner specific function is provided for calculating the time
differences between two Time or Timestamp expressions in seconds, minutes, hours or
days (fraction of a day). The date portion however of any Timestamp expression is
ignored, so that the measure is strictly the difference between two time values, assumed
to be from the same day. All the measures are based on the difference measured in
seconds, with conversions to other larger units. Any differences in time zones are
ignored. When dragging a Time Difference function into a variable, the following tree
element is created:
The difference in seconds, minutes, hours and days are set through the Properties panel.
Double-click on Time Difference, or highlight it and hit the Properties button:
Select Seconds, Minutes, Hours or Days from the Time Field pull-down.
Date/Time Difference
A Teradata Warehouse Miner specific function is provided for calculating the date/time
differences between two Timestamp columns in seconds, minutes, hours or days. Note
that this includes the day differences as well as the time differences. All the measures are
based on the difference measured in seconds, with conversions to other larger units. Any
differences in time zones are ignored. When dragging a Date/Time Difference function
into a variable, the following tree element is created:
The difference in seconds, minutes, hours and days are set through the Properties panel.
Double-click on Date/Time Difference, or highlight it and hit the Properties button:
Select Seconds, Minutes, Hours or Days from the Time Field pull-down.
Literals
A number of SQL literal values may be used in SQL expressions that define created
variables. Double-click on Literals to see all of the literal operators:
Date
SQL Date literal values consist of the keyword DATE followed by a date enclosed in
single quotes with the format YYYY-MM-DD such as DATE ‘20003-12-31’. When
dragging a literal Date into a variable, the following tree element is created:
A default date of January 1, 0001 is provided, but can be changed via Properties. Double
click on Date (1/1/0001) or highlight and hit the Properties button:
Either the standard windows Calendar control can be used to set the desired date by
clicking through the months in the calendar using the < and > buttons, or just typing in
the desired date, specifying the month, day and year one at a time.
Null
The SQL Null literal represents an unknown value and is treated as having the type
Integer. It generates the SQL keyword NULL. When dragging a literal Null into a
variable, the following tree element is created:
Number
SQL numeric literal values of type BYTEINT, SMALLINT, INTEGER, FLOAT and
DECIMAL are supported. Care should be taken not to exceed the capacity of the type
(for example, specifying more than 18 decimal digits). When dragging a literal Number
into a variable, the following tree element is created:
A default value of an integer 0 is provided, but can be changed via Properties. Double
click on Number (0) or highlight and hit the Properties button:
Typing in an integer format number such as 1, results in a 1 being generated in the SQL,
while decimal format numbers such as 1.0, results in a 1.0000E0 being generated in the
SQL.
String
SQL String literal values consists of zero or more characters enclosed in single quotes
and are treated as being of type character varying with length equal to the number of
characters which are enclosed in quotes. The feature allowing specification of the
character set is not supported. When dragging a literal String into a variable, the
following tree element is created:
No default string is provided - use Properties to change it. Double click on String or
highlight and hit the Properties button:
(Note that the string literal will automatically be enclosed in quotes when SQL is
generated for the literal. If a single quote mark is included in the string literal, it will
automatically be "escaped" by doubling it. If however more than one quote mark is
entered, the value is placed in the SQL "as-is", without adding quote marks. This makes
it possible to enter a hexadecimal literal if desired, such as '00'XC.)
Time
SQL Time literal values consist of the keyword TIME followed by a time enclosed in
single quotes with the format HH:MM:SS. Time zones and fractional seconds are not
supported. When dragging a literal Time into a variable, the following tree element is
created:
A default time midnight is provided, but can be changed via Properties. Double click on
Time (00:00:00) or highlight and hit the Properties button:
You can highlight the hours, minutes and seconds and type in the desire time.
Timestamp
A default timestamp of the current date and time is provided, but can be changed via
Properties. Double click on Timestamp (CurrentDate CurrentTime) or highlight and hit
the Properties button:
Either the standard windows Calendar control can be used to set the desired date by
clicking through the months in the calendar using the < and > buttons, or just typing in
the desired date. You can highlight the hours, minutes and seconds and type in the desire
time.
Target Date
A Target Date, as defined in INPUT-target date (described below), can be used in
variable creation. When no target date has been specified yet, the default value is the
current date. When dragging a literal Target Date into a variable, the following tree
element is created:
Logical
Logical predicates are used to form conditional expressions that evaluate to true or false in a
manner similar to comparison operators. Click on Logical to view a list of supported
operators:
All
The standard All predicate is supported with an expression list but not with a subquery.
They may be used with a comparison operator and with the In / Not In and Like / Not
Like predicates. When dragging an All operator into a variable, the following tree
element is created:
Any number of columns, and/or other expressions can be moved into the Expressions
folder within the tree.
And
The logical operator AND is supported for use in conditional expressions, connecting
either comparison operators or logical predicates. When dragging an And operator into a
variable, the following tree element is created:
Columns, and/or other expressions can be moved into the (empty) branches of the tree.
And All
The And All operator is a custom operator created for the convenience of connecting a
series of conditional expressions together with SQL And operators. When dragging an
And All operator into a variable, the following tree element is created:
Conditional expressions should be moved into the Expressions folder beneath the And
All node so that they will be connected with And operators. For example, if the
expressions "C1 = C2", "C3 = C4" and "C5 = C6" were moved into the Expressions
folder as three Equal nodes, the resulting SQL would be something like the expression
below. (Of course, the column names such as "C1" would be qualified with table aliases,
and the expression by itself is not valid as a variable, though it would be as a dimension.)
Any
The standard Any predicate is supported with an expression list but not with a subquery.
It may be used with a comparison operator and with the In / Not In and Like / Not Like
predicates. The following are some examples of the SQL generated for these cases.
When dragging an Any operator into a variable, the following tree element is created:
Any number of columns, and/or other expressions can be moved into the Expressions
folder within the tree.
Between
Columns, and/or other expressions can be moved into the (empty) branches of the tree,
the first argument being the expression to the left of the BETWEEN, the second to the
right, and the third to the right of the AND.
In
One or more literals or a single column or SQL element can be moved into the
Expressions folder within the tree. A column, expression or literal can be moved into the
(empty) branch of the tree.
Is Null
The standard Is Null predicate is supported to test whether or not an expression has a
SQL NULL value, i.e. is undefined in a particular row. The generated SQL takes the
form expression IS NULL. When dragging an Is Null operator into a variable, the
following tree element is created:
Columns, and/or other expressions can be moved into the (empty) branches of the tree.
Is Not Null
The standard Is Not Null predicate is supported to test whether or not an expression has a
SQL NULL value, i.e. is undefined in a particular row. The generated SQL takes the
form expression IS NOT NULL. When dragging an Is Not Null operator into a variable,
the following tree element is created:
Columns, and/or other expressions can be moved into the (empty) branches of the tree.
Like
The standard Like predicate is supported with pattern expressions but not with
subqueries. It generates SQL of the form expression LIKE ANY / ALL pattern. The
percent (%) or underscore (_) characters can be used to allow searching for a pattern. The
percent character represents zero or more characters of any value, whereas underline
represents exactly one. An “escape” character may not be specified. Some examples
include:
When dragging a Like operator into a variable, the following tree element is created:
Columns, and/or other expressions can be moved into the (empty) branches of the tree,
where the first argument is the expression to the left of the LIKE and the second to the
right.
Not
The logical operator NOT is supported for use in conditional expressions, connecting
either comparison operators or logical predicates. When dragging a Not operator into a
variable, the following tree element is created:
Columns, and/or other expressions can be moved into the (empty) branch of the tree.
Not Between
Columns, and/or other expressions can be moved into the (empty) branches of the tree,
the first argument being the expression to the left of the BETWEEN, the second to the
right, and the third to the right of the AND.
Not In
The standard Not In predicate is supported with a single expression or a list of literal
constants, but not with subqueries. That is it may be used to test whether or not an
expression equals another expression or is one of a list of values, but not if it is returned
from a query. The Not In predicate generates SQL of the form expression NOT IN
expression or expression NOT IN (literal, … literal). The use of the ALL predicate with
the Not In predicate is optional, provided the single expression form is not used. That is,
NOT IN (…), NOT IN ALL (…) and <> ALL (…) are equivalent. When dragging a Not
In operator into a variable, the following tree element is created:
One or more literals or a single column or SQL element can be moved into the
Expressions folder within the tree. A column, expression or literal can be moved into the
(empty) branch of the tree.
Not Like
The standard NOT LIKE predicate is supported with pattern expressions but not with
subqueries. It generates SQL of the form expression NOT LIKE ANY / ALL pattern. The
percent (%) or underscore (_) characters can be used to allow searching for a pattern. The
percent character represents zero or more characters of any value, whereas underline
represents exactly one. An “escape” character may not be specified. Some examples
include:
When dragging a Not Like operator into a variable, the following tree element is created:
Columns, and/or other expressions can be moved into the (empty) branches of the tree,
where the first argument is the expression to the left of the NOT LIKE and the second to
the right.
Or
The logical operator OR is supported for use in conditional expressions, connecting either
comparison operators or logical predicates. When dragging an Or operator into a variable,
the following tree element is created:
Columns, and/or other expressions can be moved into the (empty) branches of the tree,
where the first argument is to the left of the OR, the second to the right.
Or All
The Or All operator is a custom operator created for the convenience of connecting a
series of conditional expressions together with SQL Or operators. When dragging an Or
All operator into a variable, the following tree element is created:
Conditional expressions should be moved into the Expressions folder beneath the Or All
node so that they will be connected with Or operators. For example, if the expressions
"C1 = C2", "C3 = C4" and "C5 = C6" were moved into the Expressions folder as three
Equal nodes, the resulting SQL would be something like the expression below. (Of
course, the column names such as "C1" would be qualified with table aliases, and the
expression by itself is not valid as a variable, though it would be as a dimension.)
The following Ordered Analytical Functions are available in Variable Creation. Double Click
on Ordered Analytical to see:
All of the standard ordered analytical functions consist of a value expression enclosed in
parentheses and an OVER construct composed of an optional PARTITION BY clause, an
ORDER BY clause (required with all but group style aggregation) and possibly a ROWS
clause (depending on the function), all within parentheses. The PARTITION BY clause is
something like the GROUP BY clause in a simple aggregation, partitioning the rows into
groups over which the function is separately applied. The PARTITION BY clause effectively
causes the function to "start over" for each partitioned group of rows. An example of an
ordered analytical function containing these components is given below.
The traditional aggregate functions AVG, COUNT, MIN, MAX and SUM have ordered
versions that take on different styles depending on the ROWS clause that is used. The
variations available for these functions are Cumulative, Group, Moving and Remaining, as
outlined below. The RANK function and related functions PERCENT_RANK and
QUANTILE do not offer the ROWS options. Note that not all variations are available with
Teradata V2R4.1, as noted in the individual function descriptions that follow.
Note that Ordered Expression is not an Ordered Analytical Function but simply a means of
specifying a sort direction (ascending or descending) on a sort expression. Note also that
Ordered Analytical Functions are not allowed in a Dimension value, Dimensioned variable,
Where clause or Having clause.
Moving Difference
Given one or more columns/expressions, along with a width and sort expression list, this
Ordered Analytical Function derives a new column for each expression giving the
moving difference of the expression when the rows are sorted by the sort expression list.
The moving difference is calculated as the difference between the current value and the
nth previous value of the expression, where N equals the width. The moving difference is
NULL if there is no N -preceding row in the table or group.
th
expression - SUM(expression)
OVER (ORDER BY sort expression list ROWS BETWEEN width
PRECEDING AND width PRECEDING)
When dragging a Moving Difference function into a variable, the following tree element
is created:
Sort expressions can be built up in the Sort Expressions folder, and if the system is
V2R5.0 or later, Partition Columns can be built up in that folder (with V2R4.1 systems,
Partition Columns are ignored). Columns, and/or other expressions can be moved into the
(empty) branch of the tree. The Width is specified by the Properties panel. Double click
on Moving Difference, or highlight it and click on the Properties button:
Given a single expression, width, and sort expression, this Ordered Analytical Function
derives a new column giving the moving linear regression extrapolation of the expression
over "width" rows when sorted by the sort expression, using the sort expression as the
independent variable. The current and "width-1" rows after sorting are used to calculate
the simple least squares linear regression. For rows that have less than "width-1" rows
preceding it in the table or group, the function is computed using all preceding rows. The
first two rows in the table or group however will have the NULL value.
As an example, moving linear regression predicting y based on x over w rows looks like:
MLINREG(y, w, x)
When dragging a Moving Linear Regression function into a variable, the following tree
element is created:
A single sort expressions should be placed in the Sort Expressions folder, and a column
or expression should be moved into the (empty) branch of the tree. The Width is
specified by the Properties panel. Double click on Moving Linear Regression, or
highlight it and click on the Properties button:
Ordered Expression
An Ordered Expression can be used in a Sort Expressions folder with any of the Ordered
Analytical Functions to specify a sort direction, either ascending or descending. The
appropriate SQL keyword, either ASC or DESC, is automatically added to the SQL
generated for the expression placed under the Ordered Expression node in the tree. If an
Ordered Expression is not used, the default sort direction is given, depending on the
Ordered Analytical Function in use. An example of an Ordered Expression in a Sort
Expressions folder is given below.
In order to set the sort direction the user must either highlight the Ordered Expression
node and click on the Properties button, or double-click on the Ordered Expression node
to receive the following Properties panel. Clicking on the OK button will cause the
selected sort order to be used.
Percent Rank
Given a sort expression list, this Ordered Analytical Function derives a new column
which assumes a value between 0 and 1 indicating the rank of the rows as a percentage of
rows when sorted by the sort expression list. The formula used for PERCENT_RANK is
(R – 1) / (N – 1) where R is the rank of the row and N is the number of rows overall or in
the partition.
As with the RANK function, when the column or expression has the same value for
multiple rows (say M rows), they are all assigned the same percent rank, while the
following M-1 percent rank values are not assigned. When an optional Partition By
clause is specified, the percent ranks are computed separately over the rows in each
partition. (Note from the formula used for PERCENT_RANK that if there is only one
row to be ranked in the table or partition, division by zero will result and give a numeric
overflow error.) Rows options are not available with the Percent Rank function.
Sort expressions can be built up in the Sort Expressions folder, and Partition Columns
can be built up in that folder. The enhancement to the Percent Rank function to optionally
request that NULL values in any element of the sort expression list cause the row to be
excluded in the ranking process is enabled through the Properties Panel. Double click on
Percent Rank, or highlight it and click on the Properties button:
The default is to Include NULL values in the analysis, but that can be disabled here.
Quantile
Given a sort expression list and the number of quantile partitions, this Ordered Analytical
Function derives a new column giving the quantile partition that each row belongs to
based on the sort expression list and the requested number of quantile partitions. When an
optional Partition By clause is specified, the quantile partitions are computed separately
over the rows in each partition. Rows options are not available with the Quantile
function. Although there is a non-standard Teradata specific command QUANTILE, the
function is implemented in Variable Creation using the standard RANK and COUNT
functions.
Sort expressions can be built up in the Sort Expressions folder, and Partition Columns
can be built up in that folder. The enhancement to the Quantile function to optionally
request that NULL values in any element of the sort expression list cause the row to be
excluded in the ranking process, as well as setting the number of partitions are both
enabled through the Properties Panel. Double click on Percent Rank, or highlight it and
click on the Properties button:
The default number of Partitions is 0, but can be changed here. Additionally, the default
is to include NULL values in the analysis, but that can be disabled here.
Rank
Given a sort expression list, this Ordered Analytical Function derives a new column
indicating the rank of the rows when sorted by the specified sort expression list. When
the column or expression has the same value for multiple rows (say M rows), they are all
assigned the same rank, while the following M-1 rank values are not assigned. For
example, column values 3,3,3,2,1 could be assigned rank values of 1,1,1,4,5. When an
optional Partition By clause is specified, the ranks are determined separately over the
rows in each partition (the ranking process is reset for each new partition). Rows options
are not available with the Rank function.
excluded in the ranking process. When dragging a Rank function into a variable, the
following tree element is created:
Sort expressions can be built up in the Sort Expressions folder, Partition Columns can be
built up in that folder. The enhancement to the Rank function to optionally request that
NULL values in any element of the sort expression list cause the row to be excluded in
the ranking process is enabled through the Properties Panel. Double click on Percent
Rank, or highlight it and click on the Properties button:
The default is to Include NULL values in the analysis, but that can be disabled here.
Windowed Average
Cumulative, Group, Moving or Remaining Average are supported within the Windowed
Average function. Given a value expression, a width and a sort expression list, this
function derives a new column giving the cumulative, group, moving or remaining
average of the value expression over "width" rows when sorted by the sort expression
list. For rows that have less than "width-1" rows preceding it in the table or group, the
function is computed using all preceding rows. When an optional Partition By clause is
specified, the averages are computed separately over the rows in each partition. Any of
the Rows options may be used to determine the type of average to compute. Note that in
Teradata V2R4.1 only the moving average is available with the "ROWS value
PRECEDING" option. When dragging a Windowed Average function into a variable, the
following tree element is created:
Sort expressions can be built up in the Sort Expressions folder, and Partition Columns
can be built up in that folder. The options to perform a Cumulative, Group, Moving or
Remaining Average, and their associated options, is enabled through the Properties Panel.
Double click on Windowed Average, or highlight it and click on the Properties button:
These options are defined below for each of the four types of Windowed Averages:
Windowed Count
Cumulative, Group, Moving and Remaining Count are supported within the Windowed
Count function. This function derives a new column giving the cumulative, group,
moving or remaining count of the number of rows or rows with non-null values of a
value expression, when rows are sorted by a sort expression list. When an optional
Partition By clause is specified, the counts are accumulated only over the rows in each
partition (the start of a partition resets the accumulated count to 0). With Teradata V2R5
and later releases, any of the Rows options may be used to determine the type of count to
compute. In V2R4.1 only the Group option with no Sort Expression and "ROWS
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING" clause
may be used with the COUNT function. When dragging a Windowed Count function into
a variable, the following tree element is created:
Sort expressions can be built up in the Sort Expressions folder, if the system is V2R5.0 or
later, and Partition Columns can be built up in that folder. By default a windowed
COUNT(*) is done, but another expression can be built up in its place. The options to
perform a Cumulative, Group, Moving or Remaining Count, and their associated options,
is enabled through the Properties Panel. Double click on Windowed Count, or highlight it
and click on the Properties button:
These options are defined below for each of the four types of Windowed Count:
Windowed Maximum
Cumulative, Group, Moving and Remaining Maximum are supported within the
Windowed Maximum function. This function, available only with Teradata V2R5 and
later releases, derives a new column containing the minimum or maximum value of a
column or expression. When an optional Partition By clause is specified, the minimum or
maximum values are determined over the rows in each partition. Any of the Rows
options may be used with this function. When dragging a Windowed Maximum function
into a variable, the following tree element is created:
Sort expressions can be built up in the Sort Expressions folder, and Partition Columns
can be built up in that folder. The options to perform a Cumulative, Group, Moving or
Remaining Maximum, and their associated options, is enabled through the Properties
Panel. Double click on Windowed Maximum, or highlight it and click on the Properties
button:
These options are defined below for each of the four types of Windowed Maximum:
Current Row, or
Value Preceding, or
Value Following.
(If Value Preceding/Following)
Second Value: 0-n
Windowed Minimum
Cumulative, Group, Moving and Remaining Minimum are supported within the
Windowed Minimum function. This function, available only with Teradata V2R5 and
later releases, derives a new column containing the minimum or maximum value of a
column or expression. When an optional Partition By clause is specified, the minimum or
maximum values are determined over the rows in each partition. Any of the Rows
options may be used with this function. When dragging a Windowed Minimum function
into a variable, the following tree element is created:
Sort expressions can be built up in the Sort Expressions folder, Partition Columns can be
built up in that folder. The options to perform a Cumulative, Group, Moving or
Remaining Minimum, and their associated options, is enabled through the Properties
Panel. Double click on Windowed Minimum, or highlight it and click on the Properties
button:
These options are defined below for each of the four types of Windowed Minimum:
Value Preceding, or
Value Following.
(If Value Preceding/Following)
First Value: 0-n
Windowed Sum
Cumulative, Group, Moving and Remaining Sum are supported within the Windowed
Sum function. This function derives a new column for a value expression giving the
cumulative, group, moving or remaining sum of the value expression when sorted by a
sort expression list. When an optional Partition By clause is specified, the sums are
accumulated only over the rows in each partition (the start of a partition resets the
accumulated sum to 0). Any of the Rows options may be used with Teradata V2R5 to
determine the type of sum to compute. With Teradata V2R4.1, only Cumulative--Rows
Unbounded Preceding, Group--Between Rows Unbounded Preceding and Unbounded
Following, and Moving--Rows Value Preceding are supported. When dragging a
Windowed Sum function into a variable, the following tree element is created:
Sort expressions can be built up in the Sort Expressions folder, and Partition Columns
can be built up in that folder. The options to perform a Cumulative, Group, Moving or
Remaining Sum and their associated options, is enabled through the Properties Panel.
Double click on Windowed Sum, or highlight it and click on the Properties button:
These options are defined below for each of the four types of Windowed Sum:
String Functions
The following standard string functions are available in Variable Creation. Double Click on
String to see:
Character Length
The standard character length function is supported for determining the length of variable
character data.. (When used with fixed length character data, the defined column length is
always returned.) When dragging a Character Length operator into a variable, the
following tree element is created:
A column and/or expression for which to get the character length can be moved into the
(empty) branch of the tree.
When used in conjunction with the Trim function, the Character Length function can
also be used to determine the length of fixed character length data by first trimming pad
characters, as in the following.
Concatenate
The standard concatenate operator is supported for joining two character expressions
together, generating the SQL expression1 || expression2. Numeric or date expressions are
converted to characters before concatenating. The resulting type, one of the character data
types, depends on the type of the expressions, as described in the Teradata
documentation. When dragging a Concatenate operator into a variable, the following tree
element is created:
Columns, and/or other expressions can be moved into the (empty) branches of the tree,
where the first argument is to the left of the concatenation operator, the second to the
right.
Lower
The standard lower case function is supported for converting all characters in an
expression to lower case. It is valid only if the expression evaluates to a character data
type with the LATIN character set. The SQL generated is LOWER(expression) and the
type returned is that of the expression. When dragging a Lower operator into a variable,
the following tree element is created:
Columns, and/or other expressions can be moved into the (empty) branches of the tree.
Position
The standard string position function is supported for determining the position of a
substring within a string. The SQL generated is POSITION(expression1 IN expression2)
where expression1 is the substring and expression2 is the string. The two string
expressions must both evaluate to a character, numeric or date type. Numeric or date
expressions are converted to characters before evaluating. The type returned is integer.
The position returned is the logical position, not the byte position. The first position in a
string is treated as 1 and 0 is returned when the substring is not in the string. When
dragging a Position operator into a variable, the following tree element is created:
Columns, and/or other expressions can be moved into the (empty) branches of the tree
where the first argument is expression1 as indicated above, and the second expression2.
Substring
The standard substring function is supported for extracting a portion of a string based on
a position value and optional length. The SQL generated is SUBSTRING(expression
FROM position FOR length). The expression to take a substring from may be of a
character, numeric or date type, with a numeric or date expression being automatically
converted to a character expression before taking the substring. The first position in the
string is 1, and if length is not specified it means "until the end of the string". The type
returned is VARCHAR. When dragging a Substring operator into a variable, the
following tree element is created:
Columns, and/or other expressions can be moved into the (empty) branch of the tree. The
starting position and length of the substring are specified in the Properties panel. Double
click on Substring, or highlight it and click on the Properties button:
Trim
The semi-standard trim function is supported for removing leading and/or trailing
characters or bytes matching pad characters or a specified character from a character
string. (The ability to specify a character set for the expression is however not supported.)
The SQL generated may take one of these forms:
TRIM(expression)
TRIM(LEADING/TRAILING/BOTH FROM expression)
The expression to trim may be of a character, numeric, date or byte type, with a numeric
or date expression being automatically converted to a character expression before
trimming. The type returned is VARCHAR (or VARBYTE for byte data). When
dragging a Trim operator into a variable, the following tree element is created:
Columns, and/or other expressions can be moved into the (empty) branch of the tree. The
value to trim and the type of trimming are specified in the Properties panel. Double click
on Substring, or highlight it and click on the Properties button:
Valid Trim Styles are (Default), Leading, Trailing or Both. If (Default) is specified, both
leading and trailing pad characters (or null bytes for byte type data) are trimmed. Any
type of character can be specified to be trimmed in Value to Trim.
(Note that the value to trim will automatically be enclosed in quotes when SQL is
generated for the value. If a single quote mark is included in the value, it will
automatically be "escaped" by doubling it. If however more than one quote mark is
entered, the value is placed in the SQL "as-is", without adding quote marks. This makes
it possible to enter a hexadecimal literal if desired, such as '00'XC.)
Upper
The standard upper case function is supported for converting all characters in an
expression to upper case. It is valid only if the expression evaluates to a character data
type. The SQL generated is UPPER(expression) and the type returned is that of the
expression. When dragging an Upper operator into a variable, the following tree element
is created:
Columns, and/or other expressions can be moved into the (empty) branches of the tree
where the first argument is expression1 as indicated above, and the second expression2.
Trigonometric
The following trigonometric functions are available in Variable Creation. Double click on
Trigonometric to display:
Arccosine
Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Arccosine operator.
Arcsine
Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Arcsine operator.
Arctangent
Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Arctangent operator.
Arctangent XY
Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Arctangent XY operator.
Cosine
Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Cosine operator.
Hyperbolic Arccosine
Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Hyperbolic Arccosine operator.
Hyperbolic Arcsine
Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Hyperbolic Arcsine operator.
Hyperbolic Arctangent
Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Hyperbolic Arctangent operator.
Hyperbolic Cosine
Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Hyperbolic Cosine operator.
Hyperbolic Sine
Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Hyperbolic Sine operator.
Hyperbolic Tangent
Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Hyperbolic Tangent operator.
Sine
Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Sine operator.
Tangent
Columns, and/or other expressions can be moved into the (empty) branch of the tree.
There are no special properties for the Tangent operator.
Other
Several Teradata functions/operators do not fall into a category as those outlined below. The
Other category holds the following functions and operators. Double click on Other to view
those operators/functions:
Asterisk
The SQL Asterisk character (*) may be specified by the user as the argument to a Count
aggregate or Windowed Count ordered analytical function. It represents the fact that all
rows should be counted, not just those with non-null values in a particular column. When
dragging an Asterisk operator into a variable, the following tree element is created:
The SQL Asterisk character (*) is valid within a COUNT aggregate and windowed
aggregate function. There are no special properties for the Asterisk operator.
Bytes
The non-standard Bytes function is supported for determining the length of variable byte
data. (When used with fixed length byte data, the defined column length is always
returned.) When dragging a Bytes operator into a variable, the following tree element is
created:
A byte column and/or expression for which to get the length can be moved into the
(empty) branch of the tree.
When used in conjunction with the Trim function, the Bytes function can also be used to
determine the length of fixed length byte data by first trimming null-byte characters, as in
the following.
Cast Function
The standard Cast function is supported, generating SQL of the form CAST (expression
AS data type). The following data types are supported:
BYTEINT
SMALLINT
INTEGER
DECIMAL(m, n)
FLOAT
CHAR(n)
VARCHAR(n)
DATE
TIME(n)
TIMESTAMP(n)
Note that character set and case specific options may not be specified with CHAR and
VARCHAR types. When dragging a Cast operator into a variable, the following tree element
is created:
Columns, and/or other expressions can be moved into the (empty) branch of the tree. The
data types to cast to are specified in the Properties panel. Double click on Cast, or
highlight it and click on the Properties button:
F(x)
An arithmetic formula of one argument ‘x’ may be entered using the F(x) SQL element.
This element will result in the appropriate SQL for the formula being generated after
replacing the argument ‘x’ in the formula with the SQL for the empty branch of the tree.
A Column or other expression can be moved into the (empty) branch of the tree
representing the argument ‘x’. The formula to generate SQL for is specified in the
Properties panel. Double click on F(x) or highlight it and click on the Properties button:
In the example above the formula “x (x – 1) / 2” is entered. Note that a multiply operator
‘*’ is implied between the first ‘x’ and the left parenthesis ‘(‘.
The following rules apply to arithmetic formulas entered in the formula SQL elements.
• Numbers begin with a digit (‘0’ to ‘9’) and may be in integer, decimal or
scientific formats according to client locale settings.
F(x,y)
An arithmetic formula of two arguments, ‘x’ and ‘y’, may be entered using the F(x,y)
SQL element. This element will result in the appropriate SQL for the formula being
generated after replacing the arguments ‘x’ and ‘y’ in the formula with the SQL for the
empty branches of the tree.
Columns or other expressions can be moved into the (empty) branches of the tree
representing the arguments ‘x’ and ‘y’. The formula to generate SQL for is specified in
the Properties panel. Double click on F(x,y) or highlight it and click on the Properties
button.
F(x,y,z)
An arithmetic formula of three arguments, ‘x’, ‘y’ and ‘z’, may be entered using the
F(x,y,z) SQL element. This element will result in the appropriate SQL for the formula
being generated after replacing the arguments ‘x’, ‘y’ and ‘z’ in the formula with the
SQL for the empty branches of the tree.
Columns or other expressions can be moved into the (empty) branches of the tree
representing the arguments ‘x’, ‘y’ and ‘z’. The formula to generate SQL for is specified
in the Properties panel. Double click on F(x,y,z) or highlight it and click on the Properties
button.
Free-Form SQL
SQL text may be directly entered for an entire expression or into an element of an
expression as a free-format text string. This allows the use of constructs that may not
otherwise be supported in an expression (for example, a subquery in a where clause). Of
course, in using this feature, care should be taken to create a valid expression, since
validation is not performed on the SQL within the free-format text string. When dragging
a Free-Form SQL operator into a variable, the following tree element is created:
Double click on Free-Form SQL, or highlight it and click on the Properties button:
Variable Reference
A variable defined in a Variable Creation analysis may reference another variable defined
in the same analysis, provided the referenced variable does not contain dimensions. It is
also not possible to reference a variable that results from having a dimension applied to a
variable. Referencing a variable can be particularly useful when the referenced variable is
used merely as an intermediate calculation. The SQL generated consists simply of the
name assigned to the referenced variable.
(When referencing a variable with the same name as an input column, a runtime error
will occur if a column with this name occurs in more than one table being accessed
("Column '<name>' is ambiguous"). If aggregation is being performed in another
variable the error "Selected non-aggregate values must be part of the associated group"
may occur. In these cases it is recommended to rename the referenced variable.)
When dragging a Variable Reference operator into a variable, the following tree element
is created:
The variable to reference is specified in the Properties panel. Double click on Variable
Reference, or highlight it and click on the Properties button:
Parentheses
In some cases, you may wish to explicitly request an expression be enclosed within
beginning ‘(‘ and ending ‘)’ parentheses. The Variable Creation analysis attempts when it
can to provide the correct nesting of parentheses, so this is offered for specialized cases.
Using the explicit Parentheses function results in an expression being parenthesized, as
in: (expression). When dragging a Parentheses into a variable, the following tree element
is created:
Columns, and/or other expressions can be moved over the (empty) branch of the tree.
On the Variable Creation dialog, click on INPUT and then click on variables on the upper
tabs. Click on Dimensions on the large tab in the center of the panel.
The same right-click menu options are offered for the Columns selector on the left side of the
input screen as are offered for other input screens (refer to the Analysis Input Screen topic
in Using Teradata Warehouse Miner). Also, the following right-click options are available
within the Dimensions panel.
the new table or analysis must contain all the columns or an error is given and no
changes are made.
Apply Dimensions to Variables
This option jumps to the upper dimensions tab so that dimensions can be applied to
variables.
Selection Options
Input Source
Select Table to input from a table or view, or select Analysis to select directly from
the output of a qualifying analysis in the same project. (Selecting Analysis will
cause the referenced analysis to be executed before this analysis whenever this
analysis is run. It will also cause the referenced analysis to create a volatile table if
the Output option of the referenced analysis is Select.)
Databases
All databases which are available for the Variable Creation analysis.
Tables
All tables within the Source Database which are available for the Variable Creation
analysis.
Columns
All columns within the selected table which are available for the Variable Creation
analysis.
The Dimension Values to be created are specified one at a time as most every type of SQL
expression. One way to create a new dimension value is to click on the New button,
producing the following within the Dimensions tab:
Another way to create one or more new dimension values is to drag and drop one or more
columns from the Columns panel to the empty space at the bottom of the Dimensions panel
(multiple columns may be dragged and dropped at the same time).
One alternative to dragging and dropping a column is to use the right arrow selection button
to move over one column at a time as a new dimension value. Another alternative is to
double-click on the column. If the right arrow button is clicked repeatedly, or the column is
double-clicked repeatedly, a range of columns may be used to create new dimension values,
since the selected column increments each time the arrow is clicked. (It should be noted that
when a column or column value is selected, the right arrow selection button will only be
highlighted if a SQL Element is not selected. This can be ensured if the right-click option to
Collapse All Nodes is utilized in the SQL Element view.)
Whether dragging and dropping, clicking on the right arrow button or double-clicking, a new
dimension value based on a column looks something like the following (after expanding the
node).
Still another way to create a new dimension value is to drag and drop a single SQL element
from the SQL Elements panel to the empty space at the bottom of the Dimensions panel, or to
drag and drop one or more column values displayed by selecting the Values button. In the
case of column values, a dimension value containing a single SQL Numeric Literal, String
Literal or Date Literal is created as appropriate for each column value. (This technique saves
having to edit the properties of a numeric, string or date literal to set the desired value.)
As with creating dimension values from selected columns, use of the right arrow selection
button or double-clicking the desired SQL element or column value provides an alternative to
dragging and dropping an element or value. Note however that repeated selection of a SQL
element does not advance the selected element so the result is multiple dimension values
containing the same SQL element. (Note also that when a SQL element is selected, the right
arrow selection button will only be highlighted if neither a column or a column value is
selected in its respective view.)
When a SQL element is placed on top of another element on the Dimensions panel, whether
by dragging and dropping it, selecting it with the right arrow or by double-clicking it, the new
element is typically inserted into the expression tree at that point. The element replaced is
then typically moved to an empty operand of the new SQL element.
Whether dragging and dropping, clicking on the right arrow button or double-clicking, a new
dimension value based on a SQL element looks something like the following example
involving the Equals element:
It is possible to create a copy of a dimension value by holding down the Control key on the
keyboard while dragging the dimension value to another location in the Dimensions panel.
The copy can be placed ahead of another dimension value by dropping it on that dimension
value, or at the end of the list of dimension values by dropping it on the empty space at the
bottom of the Dimensions panel. It is also possible to copy a dimension value in the same
manner from another analysis by viewing the other analysis at the same time and dragging the
dimension value from one analysis to the other.
If the Control key is not held down while performing the copy operation just described within
the same analysis, the dimension value is moved form one place to the other, i.e. deleted from
its old location and copied to the new one. There are two exceptions to this. First, this is not
the case when copying a dimension from one analysis to another, in which case a copy
operation is always performed, with or without holding down the Control key. The second
exception is when moving one child node on top of another child node of the same parent in
the expression tree that defines a dimension. In this case, the two nodes or sub-expressions
are switched. (For example, if income and age are added together and age is moved on top of
income, the result is to add age and income, reversing the operands.)
Replicating a Dimension
Dimension Tool-Tip
Information about a dimension value may be viewed by holding the mouse pointer over it.
All dimension values can be deleted from the analysis by selecting the double-back-arrow
button in the center of the Variable Creation window. When this function is requested, one or
more warnings will be given. The first warning indicates how many dimension values are
about to be deleted. The second possible warning is given if the number of dimension values
being deleted exceeds 100, the maximum number of operations that can be undone or redone
using the Undo or Redo buttons. (If this warning is given and the Undo button is then
selected, only the first 100 dimension values will be restored. These are actually the last 100
deleted, since they are deleted in reverse order.) A third possible warning is given if any of
the dimension values about to be deleted has been applied to a variable on the dimensions
screen. If the choice to continue is made, all associations between variables and dimensions
being deleted will be removed. (Note that this part of the operation cannot be "undone"; it is
unaffected by the Undo button.)
Buttons
Wizard Button
When the Dimensions panel is selected, the Wizard button can be used to generate
dimension values, When Conditions for Searched Case statements or conditional
expressions for And All or Or All statements. To generate dimension values, highlight
any dimension value or ensure that no value is highlighted when the Wizard button is
selected. Otherwise, highlight the desired Case Conditions folder under a Case -
Searched node or the Expressions folder under an And All or Or All node and select the
Wizard button.
The following dialog is given when generating dimension values. (Instructions at the top
change and a subset of fields is shown when not generating dimension values.)
Dimension Prefix
When a comparison operator such as Equal is selected in the Operator field, the
names of the resulting dimension values consist of the prefix followed by underscore
and the selected value. Otherwise the dimension value name is the prefix followed
by a number.
Description
When a comparison operator such as Equal is selected in the Operator field, the
description of the resulting dimension values consist of the description specified here
followed by the operator and selected value. Otherwise the description is the
description entered here.
Operator
Select a comparison operator such as Equals or select Between, Not Between, In, Not
In, Is Null or Is Not Null as the operator to use. If Between or Not Between is
selected, a dimension value or condition is generated for each pair of requested
values. If In or Not In is selected, the Wizard will generate a single dimension value
or condition based on all requested values when 'OK' or 'Apply' is clicked. If Is Null
or Is Not Null is selected, the Wizard will generate a single dimension value or
condition based on no values. Otherwise, if a comparison operator such as Equal is
selected, the Wizard will generate a dimension value or condition for each requested
value.
Else Value
Select either Else Null or Else Zero to indicate the value to use when the condition is
not met.
Values
This tab accepts values displayed by selecting the Values button for input
columns on the left side of the input screen. Values can be drag-dropped onto
this panel or selected with the right-arrow button. They can be numeric, string or
date type values.
Note that when values are displayed on the left side of the input screen, the
ellipses button (the one displaying ‘…’) may be used to Select All Values.
Range
This tab can be used to generate a range of integer or decimal numeric values
based on a From, To and By field. If desired, the values can be generated in
descending order by making the From value greater than the To value, so that the
By value should always be positive. If the By field is not specified, an
incremental value of 1 is assumed. (Note that a value displayed with the Values
button may be drag-dropped into this field. Note also that the escape key will
revert to the last value entered in this field.)
When the Between or Not Between operator has been specified, the Range fields
behanve somewhat differently and may be used only to specify a single pair of
values using the From and To field, with the From field validated to be less than
or equal to the To field. The By field may not be specified when the Between or
Not Between operator has been specified.
List
A list of numeric, string or date type values can be entered here, separated by
commas (actually, by the standard list separator for the current locale settings).
(Note that a value displayed with the Values button may be drag-dropped into
this field. Note also that the escape key will revert to the last value entered in
this field.)
Clear All
This button will clear all of the fields of this dialog.
OK
This button will generate the requested dimension values or conditions and return to
the Dimensions panel.
Cancel
This button returns to the Dimensions panel without generating any elements.
Apply
This button will generate the requested dimension values or conditions and remain on
this panel. A status message is displayed just above this button reporting on the
number of generated conditions.
Combine Button
When the Dimensions panel is selected and dimensions are defined, the Combine button
can be used to generate combined dimension values based on existing dimension values.
When dimensions are combined their conditions are joined with either an SQL ‘AND’ or
‘OR’ operator.
Dimension Values:
These are the dimension values from the Dimensions panel plus any dimensions
already combined using the ‘Apply’ button (thus becoming candidates for re-
combining). (Information about a dimension value may be viewed by holding the
mouse pointer over it.)
Dimensions to Combine:
Using the upper and lower sets of right and left arrow buttons, dimensions may be
selected or de-selected for combining. The AND/OR radio buttons may be selected
to determine the method of combining the conditions represented by the dimension
values. The double left arrow buttons to the right of these panels move combined
dimensions back into the panels in preparation for re-combining.
Combined Dimensions:
The single right and left arrow buttons next to this panel cause the dimensions to be
combined and added to the combined dimensions list, or removed from the list,
respectively. (If the name of any combined dimension is too long, a warning
message is given in the lower left corner of the dialog.) The double left arrow
buttons to the left of this panel move combined dimensions back into the
“Dimensions to Combine” panels in preparation for re-combining. (Thus it is
possible to build up combined dimensions without making dimension values out of
the intermediate results.)
Clear All
This button will clear all of the fields of this dialog except the Dimension Values in
the leftmost panel.
OK
This button will generate the dimensions defined in the Combined Dimensions panel
and return to the Dimensions panel.
Cancel
This button returns to the Dimensions panel without generating any elements.
Apply
This button will generate the dimensions defined in the Combined Dimensions panel
and remain on this panel. A status message is displayed in the lower left corner of
the dialog reporting on the number of generated combined dimensions.
Delete Button
The Delete button can be used to delete any node within the Dimensions tree. If
applicable, the tree will roll-up children, but in some cases, a delete may remove all
children.
SQL Button
The SQL button can be used to dynamically display the SQL for any node within the
Dimensions tree. If the resulting display is not closed, the expression changes as you
click on the different levels of the tree comprising a dimension value. An option is
provided in the display to Qualify column names, that is to precede each column name in
the display with its database and table name.
Properties Button
A number of properties are available when defining a dimension value to be created, as
outlined below. Click the Properties button when the dimension value is highlighted, or
double click on the variable to bring up the Properties dialogue:
Name:
(Tip: Variables can be named by single left-clicking on the name, which produces a
box around the name, as in Windows Explorer)
Else Condition:
Dimension values are applied to a variable via a CASE construct. By default, the
ELSE condition within the CASE construct is NULL. Here, you can specify a 0 be
used instead.
Description:
An optional description may be specified for each dimension value. (Note that a
default description is generated automatically by the Wizard if its Description field
contains a value, and also by the Combine Dimensions dialog based on individual
descriptions or dimension names.)
Undo Button
The Undo button can be used to undo changes made to the Dimensions panel. Note that
if a number of dimension values are added at one time, each addition requires a separate
undo request to reverse. Up to 100 undo requests can be processed.
(Note also that if a change to a dimension value is undone, and that dimension value is
currently applied to a variable on the dimensions panel, the applied dimension will not
change as a result of the Undo operation.)
Redo Button
The Redo button can be used to reinstate a change previously undone with the Undo
button.
SQL Elements
The same SQL Elements are supported when creating dimension value as when creating
variables, with the following exceptions:
Aggregations
Aggregations can not be used for creating dimension values.
The following right-click menu options are offered for the Available Variables panel.
Available Analyses
Any Variable Creation analysis in any currently loaded project may be selected to
make its dimension values available for selection. Initially, the current analysis is
selected.
Available Dimensions
The available dimensions are the dimension values defined on the Dimensions tab of
the Input – Variables panel in the selected Variable Creation analysis. (Information
about a dimension value may be viewed by holding the mouse pointer over it.)
(Note that if a dimension value comes from an analysis in another project, and if it
contains a reference to another analysis, it may not be applied to a variable in this
analysis, even if it is displayed here. If it is applied to an available variable an error
message will be given.)
Available Variables
The available variables are the variables defined on the Variables tab of the Input –
Variables panel. As dimension values are moved over using the right-arrow and left-
arrow buttons, or dragged and dropped from Available Dimensions, they are shown
below the variable. The resulting output column name will be the dimension name
followed by an underscore and the variable name. The description of the resulting
variable will be the description of the variable followed by connecting characters and
the description of the dimension. (If either the original variable or the dimension
value does not have a description, its name is used when forming the description of
Variables that are referenced by another variable (using a Variable Reference SQL
element) may not have dimension values applied to them.
Anchor Table:
Pull-down with a list of all tables used to create variables, dimensions and/or
specified in a WHERE, QUALIFY or HAVING clause. Select the table that contains
all of the key values to be included in the final data set. Physically, this can be a table
or a view residing in Teradata.
The columns within the Anchor Table that uniquely identify rows in the anchor table
(otherwise unpredictable results may occur when joining this table with others). By
default, the primary index columns of the selected Anchor Table are initially selected.
For a view, these must be selected manually. (Note that if the Anchor Table is the
standard CALENDAR view in the SYS_CALENDAR database, the calendar_date
column is used by default.)
Join Paths:
A list of all Join Paths, connecting the anchor table to each other table referenced in
the analysis (i.e. in a Variable, Dimension or expert clause) is given here. By right-
clicking on a Join Path the join style can be set to Left Outer Join, Inner Join, Right
Outer Join, Full Outer Join or Cross Join.
If a Cross Join is selected it results in a join path without join steps. Validation is
performed including a count of the rows in the table to be joined.
Join Steps:
A list of the Join Steps comprising the Join Path currently selected above is given
here. Each Join Step consists of two columns connected by an opereator, which
defaults to the equals operator. By right-clicking on a Join Step its operator can be
set to equals (=), not equals (<>), greater than (>), greater than or equals (>=), less
than (<) or less than or equals (<=). The join steps are connected by logical AND
operators in the generated SQL.
Note that a Join Path of style Cross Join does not contain Join Steps.
Load
To load join paths from other Variable Creation analyses in a loaded project, click on
the Load button. This causes each Variable Creation analysis in a loaded project to
be searched for missing join paths. (Missing join paths are those that have no Join
Steps, with the exception of those of style Cross Join which cannot have Join Steps.)
The first join path encountered, if any, for each missing join path is used. When the
load operation is complete, an informational message is displayed at the bottom of
the form summarizing the results of the search.
(Note that if a join path is missing when an analysis is executed, the Load operation
is performed automatically to try to correct the error.)
Wizard…
To set the join paths using the following dialog screens, click on the Wizard…
button:
From:
Initially, this is the Anchor Table, along with a list of all columns within the
Anchor Table. If more than one table is required in the Join Path, these are
specified through subsequent clicks of the Add button. Highlight the column to
join to that specified in To below.
To:
Initially, this is the target or right-side table in the Join Path, along with a list of
all columns within that table. If the Anchor Table is not simply joined directly to
this table, it can be changed via pull-down. If more than one table is required in
the Join Path, these are specified through subsequent clicks of the Add button.
Highlight the column to join to that specified in From above.
Steps:
Clicking on the Add button populates the Steps area. Similarly, highlighting a
Step and clicking on the Remove button removes that particular step. Steps
should be entered such that the first step begins with the anchor table (on the left
side) and the last step ends with the target table for the join path (on the right
side). Additionally, the target or right side tables should be grouped together in
the list of steps and not alternate in value (that is, table1, table1, table2, not
table1, table2, table1).
The operator of a Join Step may be changed by right-clicking on the Join Step.
Add:
Clicking on the Add button adds a Join Step built from the currently selected
columns. The operator is equals (=) by default, but may be changed by right
clicking on the Join Step.
Remove:
Clicking on the Remove button removes the currently selected Join Step.
Up/Down Arrows:
Clicking on the up or down arrow to the right of the Steps display moves the
currently selected Join Step up or down in the list.
Left/Right Arrows:
Clicking on the left arrow to the right of the From and To table selectors will
move the currently selected To table to the From selector and set the To selector
to the target or right side table for the Join Path. Clicking on the right arrow will
move the currently selected From table to the To selector and set the From
selector to the source or left side table for the Join Path.
Finish:
Clicking on the Finish button accepts all changes and returns to the anchor
panel.
Back:
Clicking on the Back button returns to the previous Join Path.
Next:
Clicking on the Next button proceeds to the next Join Path.
Cancel:
Clicking on the Cancel button discards all changes and returns to the anchor
panel.
Target Date:
If a Target Date was used when creating a variable, dimension, or used in a
WHERE, QUALIFY or HAVING clause, it can be set here. The default value is
the current date, and can be changed by either typing in another date, entering
month, day and year separately or by selecting a date with the standard Windows
calendar control.
Group By Style:
Group by anchor columns
Use of this option causes the anchor columns to be used as the Group By
columns when one or more variables contain an aggregate function. When this is
the case, all variables that don’t already contain an aggregate function are
automatically changed to an aggregate by adding the MIN (minimum) function.
This screen provides nearly the same options as those provided by the Variable Creation –
INPUT – Variables screen, as described in the section of the same name. The principal
difference is that instead of Variables or Dimensions there are three fixed expert clauses that
may not be added to or deleted. Therefore, the New and Add buttons are not present. The
Wizard button can be used only to add conditions, not Searched Case statements or
Dimensions.
Aggregations
Aggregations can only be used in the Having Clause.
Where Clause
An SQL WHERE clause is allowed to limit the rows processed from the input table.
Aggregation and ordered analytical (OLAP) functions are not allowed in a WHERE
clause expression. Note that if a subquery is desired it can only be specified using a Free-
Format SQL Text element.
It may be useful to note that if a WHERE clause condition is specified on the "inner"
table of a join (i.e. a table that contributes only matched rows to the results), the join is
logically equivalent to an Inner Join, regardless of whether an Outer type is specified. (In
a Left Outer Join, the left table is the "outer" table and the right table is the "inner" table.)
Having Clause
An SQL HAVING clause may be specified with the Variable Creation function if
aggregation is requested in the variable expressions. Ordered analytical (OLAP)
functions are not allowed in a HAVING clause expression.
Qualify Clause
An SQL QUALIFY clause may be specified with the Variable Creation function if
ordered analytical functions are requested in any of the variable expressions. Aggregation
functions are not allowed in a QUALIFY clause expression.
Use the Teradata EXPLAIN feature to display the execution plan for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.
If this option is selected the analysis will only generate SQL, returning it and
terminating immediately.
On this screen, select the columns which comprise the primary index of the output table:
Available Columns
A list of columns which comprise the index of the resultant table if an Output Table
is used.
Primary Index Columns
Select columns by highlighting and then either dragging and dropping into the
Primary Index Columns window, or click on the arrow button to move highlighted
columns into the Primary Index Columns window.
Create the index using the UNIQUE keyword
When selected, a Unique Primary Index will be created on the table. Otherwise a
Primary Index will be created by default.
The results of the completed query are returned in this Data Viewer page. This page has the
properties of the Data page discussed in the Chapter on Using the Teradata Warehouse Miner
Graphical User Interface. With the exception of the Explain Select Result Option, these
results will match the tables described below in the Output Column Definition section,
depending upon the parameters chosen for the analysis.
The generated SQL is returned as text which can be copied, pasted, or printed.
2. Create seven variables by double-clicking on the following columns. (Note that the
variable name will default to the column name.)
• TWM_CUSTOMER.cust_id
• TWM_CUSTOMER.income
• TWM_CUSTOMER.age
• TWM_CUSTOMER.years_with_bank
• TWM_CUSTOMER.nbr_children
• TWM_CUSTOMER.gender
• TWM_CUSTOMER.marital_status
5. Drag an Add (Arithmetic) SQL Element over the Variable, and then drag the
following two columns over the empty arguments:
• TWM_CREDIT_TRAN.interest_amt
• TWM_CREDIT_TRAN.principal_amt
6. Because there may be negative values, drag and drop an Absolute Value (Arithmetic)
SQL Element over both interest_amt and principal_amt:
9. Drag and drop a Number (Literal) 0 into the expressions folder and rename it from
Variable1 to avg_cc_tran_amt to complete the variable:
12. Go to OUTPUT-storage, and select Store the tabular output of this analysis in the
database. Specify that a Table should be created named twm_tutorials_vc1.
For this example, the Variable Creation Analysis generated the following results. Note that the
SQL is not shown for brevity:
Data
12. Go to INPUT-variables-Dimensions and click on the New button three times to create
three dimension values. Drag TWM_ACCOUNTS.acct_type to each of the three
dimension values.
13. Drag and drop an Equals (Comparison) SQL Element on top of each instance of
acct_type in the three dimensions.
14. Drag and drop a String (Literal) SQL Element into the second argument of the
Equals. Specify a string value of CC, CK, SV for each of the three dimensions by
double-clicking on String, and entering the values. Rename each dimension value
CC, CK, and SV accordingly:
Change the Properties of the dimensions CC, CK, SV, modifying the Else condition
to ELSE ZERO from ELSE NULL
16. Go to INPUT-variables-Dimensions and click on the New button four times to create
four dimension values. Drag TWM_ACCOUNTS.tran_date to each of the four
dimension values.
17. Drag and drop a Quarter of Year (Calendar) SQL Element on top of each instance of
tran_date in the four dimension values
18. Drag and drop an Equals (Comparison) SQL Element on top of each Quarter of Year
instance in the four dimension values.
19. Drag and drop a Number (Literal) SQL Element into the second argument of the
Equals. Specify a number of 1-4 for each of the four dimension values by double-
clicking on Number, and entering the values. Rename each dimension value Q1-Q4
accordingly:
20. Go to INPUT-dimensions and apply the dimension values to the variables as follows:
• acct – CK, CC, SV
• bal – CK, CC, SV
• nbr_trans – Q1, Q2, Q3, Q4:
22. Specify the Join Paths from TWM_CUSTOMER to each of the following by
selecting a table in the Join Path from Anchor Table To: and clicking on the
Wizard button. Specify the following Join Paths:
23. Go to OUTPUT-storage, and select Store the tabular output of this analysis in the
database. Specify that a Table should be created named twm_tutorials_vc2.
For this example, the Variable Creation Analysis generated the following results. Once again, the
SQL is not shown:
Data
cust_id CK_acct SV_acct CC_acct CK_bal SV_bal CC_bal Q1_nbr Q2_nbr Q3_nbr Q4_nbr
1362480 1.00 1.00 1.00 54.77 196.73 4.08 113 17 17 10
1362481 0.00 0.00 0.00 0.00 0.00 0.00 0 0 0 0
1362484 1.00 1.00 1.00 50.46 374.50 108.74 113 27 20 27
1362485 1.00 0.00 1.00 26.34 0.00 463.16 13 18 50 90
1362486 1.00 1.00 0.00 1656.14 58.12 0.00 12 10 13 15
1362487 1.00 1.00 1.00 707.41 2.38 481.00 17 25 25 36
1362488 1.00 0.00 0.00 122.42 0.00 0.00 26 39 27 7
1362489 1.00 1.00 1.00 79.60 52.69 4.49 56 51 44 5
1362492 1.00 0.00 1.00 443.84 0.00 476.92 3 42 64 21
1362496 0.00 1.00 0.00 0.00 251.06 0.00 3 3 3 3
Variable Transformation
Introduction
One aspect of creating an analytic data set to be used as input to a data mining algorithm is the
transformation of variables into a format useful to the algorithm. In general, transformations that
are reasonably performed as part of SQL expressions have been included in the Variable Creation
function, whereas transformations that require a more elaborate SQL structure are provided in the
Variable Transformation function. Specifically, transformations in the Variable Transformation
function may require calculating global aggregates or more complex measures in derived tables,
or may include a separate null replacement transformation as a preprocessing step using a
preliminary volatile table. Variable Transformation is however limited to operating on a single
input table.
The Variable Transformation function makes it possible to specify at one time any mixture of
transformations for any number of columns in a single input table. The user may also specify that
columns from the input table be retained unchanged, or retained with a different name and/or
type. The result is a new table or view based on the same or transformed columns from the input
table.
In order to use the Variable Transformation analysis, the user selects a single input table and
then, on a column by column basis, selects what transformation or action they want to perform, if
any. The user may choose any of the offered transformations and/or a simple copy or Retain
operation. That is, they may choose to include any input table column, as is or with a different
name or type, in the output table, whether or not they choose to transform it. By default, the result
column name is the same as the input column name, unless multiple result columns may result (as
with the design coding transformation). If a specific type is specified, it results in casting the
retained column or transformed column.
Anchor columns are included automatically in the result table, so they should not be included as
retained columns. Note that it is the user’s responsibility to insure that result column names do
not conflict with each other.
The user may also specify that a null transformation be performed in a preprocessing step prior to
the requested transformation. In this case the null transformation is produced in a volatile table
that is then automatically referenced by the generated SQL, both by the transformation SQL and
by any derived aggregates the transformation may require.
It is possible that the user may specify more transformations than can be performed in a single
analysis. This can happen either because the maximum number of columns allowed by Teradata
is exceeded (256 in V2R4.1 and 2048 in V2R5), or because the generated SQL is simply too large
or complex. If this sort of failure occurs, the user must split up the transformations into multiple
analyses and either add a join step or rely on the Build Data Set analysis to join the output tables
together.
Bin Code
Bin Coding is useful when it is desired to replace a continuous numeric column with a categorical
one. Bin coding produces ordinal values, i.e. numeric categorical values where order is
meaningful. It uses the same techniques used in Histogram analysis, allowing the user to choose
between equal-width bins, equal-width bins with a user specified minimum and maximum range,
bins with a user specified width, evenly distributed bins, or bins with user-specified boundaries as
follows.
If the minimum and maximum are specified, all values less than the minimum are put in to “bin
0,” while all values greater than the maximum are put in to “bin N+1.” The same is true when the
boundary option is specified.
Derive
Derive allows you to enter simple expressions based upon columns within a table. For example, if
you know that all values are positive or zero, the Derive Analysis can be used to add one to the
column and take the natural logarithm of it. The Derive expression may be specified in a
structured way as in the Variable Creation function, and may include any functions or operators
supported by the Variable Creation function except a reference to another variable. It may also
include free-formatted SQL text in all or part of an expression, making it possible to use
constructs not supported by the expression builder. Of course, care should be taken in using this
feature to create a valid expression, since validation is not performed on the SQL within the free-
format text string.
Special handling is given to aggregation functions if they appear in the user-defined expression.
Any requested aggregation function is computed over the entire input table (limited of course by
the where clause if specified as an expert option) in one of the global aggregate derived tables
shared by the other transformation functions. The aggregation is then treated as a constant in the
user-defined expression. And although the user-defined expression may include ordered
analytical functions, it may not include an aggregate within an ordered analytical function.
Design Code
Design coding is useful when a categorical data element must be re-expressed as one or more
meaningful numeric data elements. Many classes of analytical algorithms from the statistical and
artificial intelligence communities require variables, inputs, or outputs to be numeric and
numerically meaningful. It does this, roughly speaking, by creating a binary numeric field for
each categorical data value. Design coding is offered in two forms, one known as dummy-coding
and the other as contrast-coding. A “Values” function is provided to select the possible values
from the input table.
In “dummy-coding”, a new column is produced for each listed value, with a value of 0 or 1
depending on whether that value is assumed by the original column. Alternately, given a list of
values to “contrast-code” along with a “reference value”, a new column is produced for each
listed value, with a value of 0 or 1 depending on whether that value is assumed by the original
column, or a value of –1 if that original value is equal to the reference value.
When using “Dummy Coding,” if a column assumes n values, new columns may be created for
all n values, (or for only n-1 values, because the nth column will be perfectly correlated with the
first n-1 columns). When using “Contrast Coding”, only n-1 or fewer new columns may be
created from a categorical column with n values.
Recode
Recoding a categorical data column is most often done to “re-express” existing values of a
column (variable) into some new “coding scheme”. Additionally, it is also done to correct data
quality problems and to focus an analysis on a particular value. It allows for mapping individual
values, NULL values or any number of remaining values (ELSE option) to a new value, a NULL
value or the same value. A “Values” function is provided to select the possible values from the
input table.
Rescale
Rescaling limits the upper and/or lower boundaries of the data in a continuous numeric column
using a linear rescaling function based on maximum and/or minimum data values. It may be
useful with algorithms that require or work better with data within a certain range. Rescale is only
valid on numeric columns, and not columns of type date.
The user may supply new minimum and maximum values (lower, upper) to form new variable
boundaries. If only the lower boundary is supplied, the variable is aligned to this value; or if only
an upper boundary value is specified, the variable is aligned to that value. If a requested column
has a constant value (max and min are the same), then the transformation will fail with an SQL
error.
l + (x − min (x )) ⋅ (r − l )
ƒ(x, l, r) = (i.e. if both lower and upper specified)
max (x ) − min (x )
Retain
The retain option allows you to copy a column as is, along with any transformed columns into the
final analytic data set. When using this option, they may choose to include any input table
column, as is or with a different name or type, in the output table, without actually “transforming”
it. By default, the result column name is the same as the input column name. If a specific type is
specified, it results in casting the retained column.
Null Replacement
NULL value replacement is offered as a transformation function. A literal value, the mean,
median, mode or an imputed value joined from another table can be used as the replacement
value. The median value can be requested with or without averaging of two middle values when
there is an even number of values. The replacement value can also be the analytic data set’s target
date value. Literal value replacement is supported for numeric, character and date data types.
Mean value replacement is supported for columns of numeric type or date type, with special
coding required for date type. Median without averaging, mode and imputed value replacement
are valid for any supported type, with distinct SQL generated for computing the median value of
numeric, date and other type columns. Median with averaging is however supported only for
numeric and date type columns.
Sigmoid
A Sigmoid transformation provides rescaling of continuous numeric data in a more sophisticated
way than the Rescaling transformation function. In a Sigmoid transformation a numeric column is
transformed using a type of sigmoid or s-shaped function. One of these, called a logit function,
produces a continuously increasing value between 0 and 1. Another called the modified logit
function, is twice the logit minus 1 and produces a value between –1 and 1. A third, called the
hyperbolic tangent function, also produces a value between –1 and 1. (Note that the logit function
is the same as the function previously called the sigmoid function, and the hyperbolic tangent
function is the same as the math function of the same name.) These non-linear transformations
are generally more useful in data mining than a linear Rescaling transformation.
1
ƒ(x) =
1 + e−x
⎡ 1 ⎤
ƒ(x) = 2 ∗ ⎢ −x ⎥
−1
⎣1 + e ⎦
⎡1 − e − x ⎤
ƒ(x) = ⎢ −x ⎥
⎣1 + e ⎦
e2x − 1
ƒ(x) = 2 x
e +1
Note that for absolute values of x greater than or equal to 36, the value of the sigmoid function is
effectively 1 for positive arguments or 0 for negative arguments, within about 15 digits of
significance.
Z Score
Like a Sigmoid transformation, a Z-Score transformation provides rescaling of continuous
numeric data in a more sophisticated way than a Rescaling transformation. In a Z-Score
transformation, a numeric column is transformed into its Z-score based on the mean value and
standard deviation of the data in the column. It transforms each column value into the number of
standard deviations from the mean value of the column. This non-linear transformation is
generally more useful in data mining than a linear Rescaling transformation.
For a value, the number of standard deviations away from the mean is calculated as:
n
1
x −
n
∑
i =1
x
ƒ(x) =
∑ x2
⎛1 n
⎞
2
n
−⎜
⎝ n
∑
i =1
x⎟
⎠
2. In the resulting Add New Analysis dialog box, click on ADS under Categories and then under
Analyses double-click on Variable Transformation:
3. This will bring up the Variable Transformation dialog in which you can define INPUT /
OUTPUT options and initiate any of the Variable Transformation functions, i.e:
Retain Recode
Bin Code Rescale
Derive Sigmoid
Design Code Z Score
Null Replacement
Available Databases
All databases which are available for the Variable Transformation analysis.
Available Tables
All tables within the Source Database which are available for the Variable
Transformation analysis.
Available Columns
All columns within the selected table which are available for the Variable
Transformation analysis.
‘Transformations…’ Window
Move column(s) into this window for the Variable Transformation analysis to execute
against. First, highlight the function you wish to use in this window, for example Bin
Code:
Available Columns window and then clicking on the arrow button to move highlighted
column(s) into the ‘Transformations…’ window. Columns may also be dragged and
dropped into the appropriate folder.
The same right-click menu options are offered for the Columns selector on the left side of the
input screen as are offered for other input screens (refer to the Analysis Input Screen topic
in Using Teradata Warehouse Miner). Also, the following right-click options are available
within the Transformations window.
Double-Back-Arrow Button
Clicking on the button with two arrows pointing to the left will remove all
transformations from the Transformations window. A prompt is given before removing
the transformations, which are removed only if OK is clicked in response.
Add Button
Clicking on the Add button leads to a dialog from which transformations may be selected
from loaded analyses to add as copies to the current analysis.
Available Analyses
This drop down list contains all of the Variable Transformation analyses currently
loaded in the Project window, including those in other projects.
Available Transformations
These are the transformations in the currently selected analysis, filtered by type if a
specific type is selected in the selector immediately above this one. (Note that a
Derive transformation that references more than one column cannot be added, even if
it appears as an available transformation.) Select one or more transformations to add.
Column To Transform
This drop-down selector contains all of the possible columns in the table being
transformed. By default the column with name matching that being transformed in
the selected transformation to add will be selected. If a column with matching name
does not exist the user must select an appropriate column to transform.
OK/Cancel/Apply
Each time the Apply button is clicked a copy of the currently selected
transformations are added and a status message is given. The Apply button is also
disabled as a consequence until another transformation or column to transform is
selected. The dialog can be exited at any time by clicking the OK or Cancel button.
If OK is clicked, the currently selected transformations will be added unless the
Apply button is disabled.
Reorder Button
Clicking on the Reorder button leads to a dialog from which transformations in the
current analysis may be reordered for output purposes.
Move to Top
This option moves all the selected transformations to the top of the list.
Move to Bottom
This option moves all the selected transformations to the bottom of the list.
Restore Initial Order
This option reorders the transformations to match the order when the dialog was
displayed.
Order by Input Columns
Properties Button
The Properties button leads to a dialog from which properties or default properties may
be set, as described in the following sections.
In the ‘Transformations…’ window, click on the column that was added when the transformation
was requested, for example column cust_id under Bin Code:
With column highlighted, click on the Properties button to bring up the Properties dialog:
(Tip: You can also double-click on the column name to bring up the Properties dialog.)
The default properties for each type of transformation are saved along with the analysis so that
they will be available if changes are made to the analysis at a later time.
In the ‘Transformations…’ window, click on the folder associated with the type of transformation
you want to set default properties for:
With the folder highlighted, click on the Properties button to bring up the Properties dialog for
the selected transformation type (Bin Code in the example below):
(Tip: You can also double-click on the column name to bring up the Default Properties dialog.)
Output
For most transformations, the Properties dialog will have an Output tab. (For Retain
transformations, the Properties dialog has no tabs but directly displays Output options.) Clicking
on Output leads to the following display and options:
Output Type:
When this field appears it lets you select output type. The default is Generate
Automatically, but you can also select the following types. (Depending on the type
selected, one or more length fields may also be presented.)
BYTEINT
CHAR
DATE
DECIMAL
FLOAT
INTEGER
SMALLINT
TIME
TIMESTAMP
VARCHAR
Column Attributes:
One or more column attributes can be entered here in a free-form manner to be used
when an output table is created. They are placed as-entered following the column
name in the CREATE TABLE AS statement. This can be particularly useful when
requesting data compression for an output column, which might look like the following:
COMPRESS NULL.
Description:
An optional description may be specified for each transformation.
Null Replacement
For most transformations, the Properties dialog will have a Null Replacement tab. Click on
Null Replacement to display options for replacing null values within the column.
On this screen you can elect to replace null values by clicking the checkbox, and then specifying
what null values are to be replaced with. The choices are:
Imputed Value
You will then need to select a column in the Imputed Column field. Click the down-
arrow beside the Imputed Column field to display available columns. (You may need
to expand tree items to drill down to individual columns.)
Literal Value
The value as specified in the Literal Value field. Literal value replacement is
supported for numeric, character and date data types.
Mean
Average value - Mean value replacement is supported for columns of numeric type or
date type, with special coding provided for date type.
Median
The median value can be requested with averaging of two middle values when there
is an even number of values. Supported only for numeric and date type columns .
Median (No Averaging)
The median value can be requested without averaging of two middle values when
there is an even number of values.
Mode
The most frequently occurring value.
Target Date
Literal Target Date as specified on the INPUT-target date panel.
After setting values/options on the Properties dialog, click on OK to close the Properties dialog.
Then continue specifying INPUT and OUTPUT parameters described further in this chapter.
Properties - Derive
If doing a Derive transformation there will be a Derive tab on the Properties dialog. Click on
Derive to access a variation of the Variable Creation input screen. It has been altered to initially
contain a single variable consisting of the column that the Derive transformation is based upon. It
has also been altered so that the input table cannot be changed and so that a new variable cannot
be entered or the existing one deleted.
Special handling is given to aggregation functions if they appear in the user-defined expression.
Any requested aggregation function is computed over the entire input table (limited of course by
the where clause if specified as an expert option) in one of the global aggregate derived tables
shared by the other transformation functions. The aggregation is then treated as a constant in the
user-defined expression. And although the user-defined expression may include ordered
analytical functions, it may not include an aggregate within an ordered analytical function.
Special handling is also given when specifying the default properties for a Derive transformation
in the Default Properties dialog. A single variable called <default column> is initially provided.
Wherever it appears in the expression created by the user, it will be replaced by the selected
column that was used to define a specific Derive transformation. (If more instances of <default
column> are needed, the initially provided instance can be copied by dragging it with the control
key held down). This makes the default Derive transformation behave like a template for a
custom transformation.
After setting values/options on the Properties dialog, click on OK to close the Properties dialog.
Then continue specifying INPUT and OUTPUT parameters described further in this chapter.
Values to Encode
Value
A list of values within the column that “dummy-codes” or “contrast-codes” will be
generated for. If the Contrast Coding option is selected, the Reference Value must
not be listed. Double-click in the area shown to enter the values.
Column
The desired name of the result of the Design Coding Analysis. A default name
is provided if the values are loaded with the Values… button. The data type
generated is BYTEINT.
Values
Brings up the design code wizard which determines the distinct values of the column
being design coded, and assigns default column names of <value>_<column name>
(for example, 123_Department). These columns can be renamed by highlighting them
and typing over the current name.
Special handling is necessary for the default properties of a Design Code transformation. Since
the column to be transformed is not yet known, column prefixes are associated with specific
values rather than column names. Then, when the default properties are applied to a specific
column, the column name is appended to the default prefixes. For example, if the value 0 is
associated with the prefix "0_", when the default properties are applied to the column "amount", 0
is associated with the column "0_amount".
After setting values/options on the Properties dialog, click on OK to close the Properties dialog.
Then continue specifying INPUT and OUTPUT parameters described further in this chapter.
Properties - Recode
If doing a Recode transformation there will be a Recode tab on the Properties dialog. Click on
Recode to access the following options:
Values to Recode:
Create a list of categorical values to transform from one value to another. Use the Add
(and Remove) buttons as necessary to build a list or use the Values button.
From
List existing values within column to recode. These are the “Old” values to be
replaced by new values below. For example: 0, ELSE, NULL
To
New values to replace corresponding old value, one for one. For example: N, Y, N.
In this example, you will change a column 0 and other values into a column with Y/N
by changing 0 to N, all other values to Y and NULL (unknown) values to N.
After setting values/options on the Properties dialog, click on OK to close the Properties dialog.
Then continue specifying INPUT and OUTPUT parameters described further in this chapter.
Properties - Rescale
If doing a Rescale transformation there will be a Rescale tab on the Properties dialog. Click on
Rescale to access the following options:
After setting values/options on the Properties dialog, click on OK to close the Properties dialog.
Then continue specifying INPUT and OUTPUT parameters described further in this chapter.
Properties - Sigmoid
If doing a Sigmoid transformation there will be a Sigmoid tab on the Properties dialog. Click on
Sigmoid to access the following options:
Statistical Computation:
The choices are:
Logit
Modified Logit
Hyperbolic Tangent
After setting values/options on the Properties dialog, click on OK to close the Properties dialog.
Then continue specifying INPUT and OUTPUT parameters described further in this chapter.
The purpose of this screen is to specify the columns that comprise the primary key of the input
table or view being transformed. (This is required only when null value replacement is requested
in one of the requested transformations.)
If input comes from a table the primary index columns of the table will be selected by default. To
change these columns, or to enter them initially if input is from a view, use the selectors as
described below.
Available Tables:
Pull-down with the name of the input table or view.
Available Columns:
All columns within the table or view selected in Available Tables. Highlight those
columns which comprise the primary key of the table or view and either drag and drop
them to Selected Primary Key Columns, or use the right arrow button > to move them
over.
Selected Primary Key Columns:
All columns within the table or view that constitute the primary key (that is, that uniquely
identify each row). If undesired columns were moved into this area, highlight those
columns and either drag and drop them back to Available Columns, or use the left arrow
button < to move them back.
If a Target Date was used for NULL value replacement, it can be set here. The default value is the
current date, and can be changed by either typing in another date, specifying month, day and year
separately, or selecting a date with the standard Windows calendar control as shown above.
Option to generate a SQL WHERE clause(s) to restrict rows selected for analysis.
Use the Teradata EXPLAIN feature to display the execution plan for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.
On this screen, select the columns which comprise the primary index of the output table:
Available Columns
A list of columns which comprise the index of the resultant table if an Output Table is
used.
Primary Index Columns
Select columns by highlighting and then either dragging and dropping into the Primary
Index Columns window, or click on the arrow button to move highlighted columns into
the Primary Index Columns window.
The results of the completed query are returned in this Data Viewer page. This page has the
properties of the Data page discussed in the Chapter on Using the Teradata Warehouse Miner
Graphical User Interface. With the exception of the Explain Select Result Option, these results
will match the tables described below in the Output Column Definition section, depending upon
the parameters chosen for the analysis.
The generated SQL is returned as text which can be copied, pasted, or printed.
3. Let all of the transformation functions properties default, except as follows. Double
click on the variable name to bring up the Properties screen:
Click on the Design Code tab on the Properties screen and then click on the Values
button to bring up the Design Code values Wizard:
Select both F and M by highlighting the and hitting the Add> button. Hit Finish to
exit the Wizard.
5. The default values of F_gender and M_gender are given for the values of F and M
respectively. Highlight those values and type in Females and Males accordingly:
6. marital_status - Recode
Click on the Recode tab on the Properties screen and then click on the Values button
to bring up the Recode values Wizard:
Select 1-4 by highlighting them and hitting the Add> button. Hit Finish to exit the
Wizard.
8. age - Rescale
9. Go to OUTPUT-storage, and select Store the tabular output of this analysis in the
database. Specify that a Table should be created named twm_tutorials_vt1.
For this example, the Variable Transformation Analysis generated the following results. Note that
the SQL is not shown for brevity:
Data
The purpose of analytic data set functions is to build a data set table or view. Each Variable
Creation and Variable Transformation analysis creates a table or view to be joined together into a
final data set table. This duty is performed by the Build ADS analysis.
The Build ADS analysis has similar functionality to the Join analysis in the Reorganization group
of analyses. However, it is distinguished by these differences.
• A join table or view is not required, so that it may operate on a single table or view.
• Tables are joined together via Join Paths as in a Variable Creation analysis, but without
Anchor Columns (refer to the section Variable Creation – Input – anchor table).
• By using Join Paths, Build ADS allows the use of Cross Join as a Join Style.
• By using Join Paths, the Join Style can be set differently for different tables.
• By using Join Paths, comparison operators may be set individually in Join Steps.
It should be pointed out that although the Variable Creaton analysis can be used in place of
Build ADS, Build ADS is simpler and easier to use in the functions it performs.
2. In the resulting Add New Analysis dialog box, click on to highlight ADS under Categories,
and then under Analyses double-click on Build ADS:
3. This will bring up the Build ADS dialog in which you will enter INPUT and OUTPUT options
to parameterize the analysis as described in the next sections.
Available Databases
All the databases which are available for the Build ADS Analysis.
Available Tables
All the tables within the Source Database that are available for the Build ADS Analysis.
Available Columns
All the columns within the selected table that are available for the Build ADS Analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the Selected
Columns window, or click on the arrow button to move highlighted columns into the
Selected Columns window.
This screen performs the same function it does for the Variable Creation analysis with the
exception that the selector for Anchor Columns is not used. Refer to the section Variable
Creation – Input – Anchor Table for details.
This screen provides the option to generate a SQL WHERE clause(s) to restrict rows selected for
analysis (for example: cust_id > 0).
It may be useful to note that if a WHERE clause condition is specified on the "inner" table of a
join (i.e. a table that contributes only matched rows to the results), the join is logically equivalent
to an Inner Join, regardless of whether an Outer type is specified. (In a Left Outer Join, the left
table is the "outer" table and the right table is the "inner" table.)
Use the Teradata EXPLAIN feature to display the execution path for this analysis
Option to generate a SQL EXPLAIN SELECT statement, which returns a Teradata
Execution Plan.
On this screen, select the columns which comprise the primary index of the output table. Select:
Available Columns
A list of columns which comprise the index of the resultant table if an Output Table is
used.
Primary Index Columns
Select columns by highlighting and then either dragging and dropping into the Primary
Index Columns window, or click on the arrow button to move highlighted columns into
the Primary Index Columns window.
The results of the completed query are returned in a Data page within Results. This page has the
properties of the Data page discussed in the Chapter on Using the Teradata Warehouse Miner
Graphical User Interface. With the exception of the Explain Select Result Option, these results
will match the tables described below in the Output Column Definition section, depending upon
the parameters chosen for the analysis.
The generated SQL is returned here as text which can be copied, pasted, or printed.
twm_tutorials_vt1.marital_status
twm_tutorials_vt1.Females
twm_tutorials_vt1.Males
twm_tutorials_vt1.zavg_cc_tran_amt
(From Variable Creation Tutorial #2)
twm_tutorials_vc2.CC_acct
twm_tutorials_vc2.CC_bal
twm_tutorials_vc2.CK_acct
twm_tutorials_vc2.CK_bal
twm_tutorials_vc2.SV_acct
twm_tutorials_vc2.SV_bal
twm_tutorials_vc2.Q1_nbr_trans
twm_tutorials_vc2.Q2_nbr_trans
twm_tutorials_vc2.Q3_nbr_trans
twm_tutorials_vc2.Q4_nbr_trans
Anchor Table TWM_CUSTOMER
Inner Join to twm_tutorials_vt1 on cust_id
Inner Join to twm_tutorials_vc2 on cust_id
Go to OUTPUT-storage, and select Store the tabular output of this analysis in the
database. Specify that a Table should be created named twm_tutorials_bads1.
For this example, the Build ADS Analysis generates a table of 747 rows with the 19 columns
above joined together and, containing the same cust_id values as the TWM_CUSTOMER table.
Note that the SQL is not shown for brevity.
Refresh
The Refresh Analysis is provided as a means to re-execute a chain of referenced analyses with a
different set of user specified parameters without modifying the original analyses. It falls under
the ADS umbrella because it is designed to allow the user to refresh an analytic data set, however
in addition to ADS Analyses it may also be used to refresh Score Analyses.
Creating an analytic data set can require a lot of thought and result in many steps of creating
variables and reorganizing data. There can be multiple tables joined by complicated join paths,
sophisticated arithmetic formulas, as well as the dimensioning of variables. With the use of
Analysis References, that provide a means to feed the output of a previous analysis into a
subsequent analysis, the result can be a complex string of analyses that make up the creation of a
final analytic data set. As the source data changes over time, it might be necessary to modify the
parameters used in generating the analytic data set. Appart from Refresh, there are two ways to do
this. The first way is to reproduce the entire set of analyses used to generate the analytic data set
with the new modified parameters. This is not ideal because if it is a complicated set of analyses it
could take a significant amount of time to reproduce it when you only wanted to change a few
things. The second way is to actually change the original analyses with new parameters. The
problem with this is that the original ADS template is now permanently changed.
With the Refresh Analysis, the original analyses can be re-executed with the modified parameters
without affecting the original parameters used. If any of the parameters are not selected to be
changed, then the original values are used. When Refresh is run, the analysis to be refreshed is
executed (along with any analyses that it references) using the new parameters specified within
Refresh. Over and above this it should be noted that, using one of the most powerful features of
the Refresh analysis, the referenced analyses will only generate the columns needed for the
analysis that is being refreshed.
2. In the resulting Add New Analysis dialog box, with ADS highlighted on the left, double-click
on the Refresh icon:
3. This will bring up the Refresh dialog in which you will enter INPUT options to parameterize
the analysis as described in the next section.
Available Analyses
Select a single analysis from the list of all of the analyses in the current project which are
available for the Refresh Analysis.
Modify Output
Check the box if you wish to change the output database and/or output table of the analysis to be
refreshed
DatabaseName
The name of the output database of the analysis to be refreshed
Table/View Name
The name of the output table or view of the analysis to be refreshed
Results - Refresh
On the Refresh dialog click on RESULTS (note that the RESULTS tab will be grayed-
out/disabled until after the analysis is completed):
Tutorial – Refresh
Refresh – Example
(Note: The following example will contain a Variable Creation, which will then be input into
the Refresh Analysis)
7. Drag a LESS THAN OR EQUALS (Comparison) onto the first empty argument, and a
GREATER THAN (Comparison) onto the second empty argument.
8. Drag a DATE DIFFERENCE (Date and Time) onto the first empty argument of each
comparison operator.
9. Drag a TARGET DATE (Literals) onto the first empty argument of each Date Difference
and drag TWM_CREDIT_TRAN.tran_date onto the second empty argument of each
Date Difference.
10. Drag a NUMBER (30) (Literal) onto the empty argument of the LESS THAN OR
EQUALS, and drag a NUMBER (0) (Literal) onto the empty arugment of the GREATER
THAN.
11. Go to the INPUT-dimensions tab and apply the dimension to the variable in the following
way.
12. Specify the Join Paths from TWM_CUSTOMER to each of the following by selecting a
table in the Join Path from Anchor Table To: and clicking on the Wizard button.
Specify the following Join Paths:
TWM_CUSTOMER.cust_id - TWM_CREDIT_TRAN.cust_id
Run the Analysis. View the generated SQL (which has been generated by the Variable
Creation Analysis, but modified by the Refresh Analysis) to see how the target date, output
table, and anchor table have been changed.
3. Matrix Functions
The matrix functions must operate on numeric data. Columns of type DATE will not produce
meaningful results. NULL values are handled via listwise or pairwise deletion in the Matrix
analysis.
These functions are valid for any of the supported data reduction matrix types, namely
correlation, covariance, sums of squares and cross products, and corrected sums of squares and
cross products. Internally the Matrix analysis stores the matrix as an extended sums of squares
and cross products matrix, with an additional column containing a constant value, 1. The actual
conversion to another type, if requested, is computed in the Export Matrix analysis.
Matrix
Build an extended Sums of Squares and Cross-Products (SSCP) data reduction matrix.
Optionally, restart the Matrix process upon a failure or when a previously-executed
Matrix was stopped.
Export Matrix
Convert or export the resultant matrix and build either a SAS data step, a Teradata table,
or just view the results. Valid matrices include:
• Pearson-product moment correlations (COR)
• Covariances (COV)
• Sums of squares and cross-products (SSCP)
• Corrected Sums of squares and cross-products (CSSCP)
Matrix Analysis
The Matrix analysis will process the input data so that one of the following data reduction
matrices can be exported via the Export Matrix analysis:
Correlation
The Pearson Product-Moment Correlation value of the pairwise combinations of each
column within the selected table. This is calculated as follows, for each pairwise
combination of variables X and Y:
(n ⋅ ∑ xy ) − ∑ x ⋅ ∑ y
ƒ(x,y) =
((n ⋅ ∑ x )− (∑ x) )⋅ ((n ⋅ ∑ y )− (∑ y ) )
2 2 2 2
Covariance
The Covariance value of the pairwise combinations of each column within the selected
table. This is calculated as follows, for each pairwise combination of variables X and Y:
⎛ ⎞
⎛
⎜
∑ x ⋅ y ⎞⎟ − ⎜ ∑ x ⋅∑ y ⎟
⎟
ƒ(x,y) = ⎜⎜
⎝ n − 1 ⎟⎠ ⎜ n ( n − 1) ⎟
⎜⎜ ⎟⎟
⎝ ⎠
n
ƒ(x,y) = ∑
i =1
xy
ƒ(x,y) = ∑ x⋅y −
∑ x ⋅∑ y
n
The matrix functions must operate on numeric data. Columns of type DATE will not produce
meaningful results.
An option is provided for list-wise versus pair-wise deletion or omission of values which are
NULL. With list-wise deletion, the default option, if the value of any column to be included in
matrix calculations is NULL, the entire row is omitted during matrix calculations. Alternatively,
if pair-wise deletion is chosen, only pairs of values involving a NULL are ignored, not entire
rows. The danger in this case is that when later analysis is performed on the matrix, it is possible
that mathematical irregularities will be found due to the calculations being made over different
numbers of observations.
The Matrix analysis has restart capabilities as well. If a system failure occurs, or the Matrix
analysis is stopped by the end-user, it can be restarted, beginning its calculations at the point of
stoppage.
Note that the name of the Matrix analysis will be used to fetch the matrix values from the
database for those functions that are dependent upon a matrix – namely, Export Matrix, Linear
Regression and Factor Analysis.
2. In the resulting Add New Analysis dialog box, click on Matrix Functions under Categories
and then under Analyses double-click on Matrix:
3. This will bring up the Matrix dialog in which you will enter INPUT and OUTPUT options to
parameterize the analysis as described in the next sections.
Available Databases
All the databases which are available for the Matrix analysis.
Available Tables
All the tables within the Source Database that are available for the Matrix analysis.
Available Columns
All the columns within the selected table that are available for the Matrix analysis.
Selected Columns
Select columns by highlighting and then either dragging and dropping into the Selected
Columns window, or click on the arrow button to move highlighted columns into the
Selected Columns window.
Null Handling
Provides option for list-wise versus pair-wise deletion, used for omission of values which
are NULL.
Pairwise Deletion
Only pairs of values involving a NULL are ignored, not entire rows.
Listwise Deletion
If the value of any column to be included in the matrix is NULL, the entire row is
omitted during matrix calculations.
Matrix Width
The width of the matrix results. Width is the number of columns processed with each
SQL statement.
Number of Connections
The number of threads or simultaneous connections to the data source. Multiple sessions
may speed the SQL execution process.
Results - Matrix
The results from running the Matrix analysis are persisted within the Metadata model, and are not
returned to the front-end. Results can be viewed using the Export Matrix analysis (next section in
this chapter).
Tutorial - Matrix
Matrix Example #1
Parameterize a Matrix analysis as follows. Note that this matrix will be used in the Linear
Regression and Factor Analysis Tutorials in subsequent chapters:
TWM_CUSTOMER_ANALYSIS.avg_ck_tran_cnt
TWM_CUSTOMER_ANALYSIS.avg_sv_tran_amt
TWM_CUSTOMER_ANALYSIS.avg_sv_tran_cnt
TWM_CUSTOMER_ANALYSIS.cc_rev
Analysis Name: Customer_Analysis_Matrix
There are no viewable results generated as a result of executing the Matrix analysis. The results
will be viewed via the Export Matrix analysis tutorial (later in this chapter). Save the Matrix
analysis with the above mentioned name “Customer_Analysis_Matrix” for use in the Linear
Regression and Factor analysis tutorials.
Matrix Example #2
There are no viewable results generated as a result of executing the Matrix analysis. The results
will be viewed via the Export Matrix analysis tutorial. Save the Matrix analysis with the above
mentioned name “Customer_Analysis_Matrix_Short” for use during the Export Matrix tutorial.
Export Matrix
The Export Matrix analysis will export the matrix data values built by the Matrix analysis in one
of the following forms. (Note that the form is not specified when the matrix is built, yet the
matrix can be requested in any form when it is exported.)
• SAS DataStep
• Teradata Table
• Viewable Results
If a SAS data step script is created to build a “special” (matrix) SAS data set, the script will
produce, when executed with a SAS application, a data set with the same name as the SAS file
name. This function automatically appends “.sas” to the end of the requested output (script)
name, and SAS will create a .log file when the script is executed.
If a table containing the matrix is created, the table will contain one column for each column used
to build the matrix, with the same name as the original column, or the alias, if any, which was
given to the Matrix analysis. In addition, an XIDX column is added to the front of the result table,
along with an XCOL column containing the name of the original column or alias.
To view the correlation, covariance, SSCP or CSSCP matrix, specify no Output Options on the
analysis parameters tab. After the analysis has executed, click on the Results tab to view the
matrix.
2. In the resulting Add New Analysis dialog box, click on Matrix Functions under Categories
and then under Analyses double-click on Export Matrix:
3. This will bring up the Export Matrix dialog in which you will enter INPUT and OUTPUT
options to parameterize the analysis as described in the next sections.
Available Matrices
All the matrices within the Metadata Database that have been previously built with the
Matrix analysis and have been saved to metadata are available to export with the Export
Matrix analysis. These are identified by the analysis name of the Matrix analysis.
Selected Matrix
The Matrix analysis name of the matrix to export.
Matrix Type
Provides options for the specific type of matrix to export.
Correlation
Export the matrix values as Pearson-product moment correlations.
Covariance
Export the matrix values as Covariances.
SSCP
Export the matrix values as an extended Sums of squares and cross-products, with the
column of constant 1’s labeled INTERCEPT.
CSSCP
Export the matrix values as Corrected Sums of squares and cross-products.
Output Options
Create a SAS DataStep based on this Matrix
Build the matrix results within a SAS DataStep script.
Use truncated (8 character) Column Names
Run the analysis and edit the resultant SSCP_Values.sas SAS data step script:
Run the analysis and view the results with either QueryMan or the SQL Node by executing the
following queries:
SHOW TABLE <result_db>.CSSCP_Matrix;
SELECT * from <result_db>.CSSCP_Matrix order by 1;
Results
Click on the Results tab to the see the following Matrix Report:
4. Scoring
PMML Scoring
Predictive Model Markup Language (PMML) is an XML standard being developed by the Data
Mining Group, a vendor-led consortium established in 1998 to develop data-mining standards.
NCR co-developed the initial PMML specification along with Angoss, Magnify, SPSS and The
National Center for Data Mining at the University of Illinois at Chicago.
PMML enables the definition and subsequent sharing of predictive models between applications.
It represents and describes data mining and statistical models, as well as some of the operations
required for cleaning and transforming data prior to modeling. PMML aims to provide enough
infrastructure for an application to be able to produce a model (the PMML producer) and another
application to consume it (the PMML consumer) simply by reading the PMML data file. This
means that a model developed in a desktop data-mining tool can be deployed or scored against an
entire data warehouse.
Feature Function
Data Dictionary Defines the data to the model and specifies each data
attribute’s type and value range.
Each PMML construct supports a mechanism for extending the content of a model. Liberal use of
such “extensions” requires that vendors who produce PMML-based models collaborate closely
with vendors who wish to consume that PMML. Please refer to the Teradata Warehouse Miner
Release Definition document for details about the products and product versions supported for
PMML consumption in Teradata ADS Generator and Teradata Warehouse Miner.
Although PMML is a great step forward, it has several flaws other than extensions, namely
encapsulation of the process of cleaning, transforming and aggregating data. Teradata recognized
this limitation early on—if the PMML document could not represent the analytic variables that
were input to the analytic tools, it would be nearly impossible to consume PMML for scoring
predictive models. This is because the deployment (scoring phase) of a predictive model requires
the existence of the same variables upon which the model was built. For this reason, the PMML
Scoring analysis is included in both the Teradata ADS Generator as well as Teradata Warehouse
Miner.
2. In the resulting Add New Analysis dialog box, click on Scoring under Categories and then
under Analyses double-click on PMML Scoring:
3. This will bring up the PMML Scoring dialog in which you will enter INPUT and OUTPUT
options to parameterize the analysis as described in the next sections.
Select Filename
The fully qualified name of the XML file containing the PMML model to be scored. A
filename can either be entered here or loaded using the Browse button.
Note that when a saved analysis with a valid model is first loaded into the project space
its model is embedded in the analysis and the displayed filename reflects the file the
model was originally built from, even if it resided on another client machine. Hovering
the mouse over the filename will display the original filename, computer name and
modified date.
Modify
Select this button to remove the embedded model from the analysis and return to
the standard browse filename selection input method. Once selected however, the
model is taken from a file rather than the previous embedded model. (NOTE: If
the analysis isn't saved the next load of the analysis will still contain the previous
embedded model.)
Browse
Bring up the Standard Windows location dialogue in order to navigate to the file
containing the PMML model.
view >>
Once the XML file containing the PMML model is selected (or there is an
embedded model), the view >> hyperlink is enabled. The model can be viewed
by clicking this link.
Available Databases
All available source databases that have been added through Connection Properties.
Available Tables
The tables available for scoring are listed in this window, though all may not strictly
qualify: the input table or tables to be scored must contain the same column names used
in the original analysis.
Available Columns
The columns available for scoring are listed in this window.
Selected Columns:
Index Columns
Note that the Selected Columns window is actually a split window for specifying
Index and/or Retain columns if desired. If a table is specified as input, the primary
index of the table is defaulted here, but can be changed. If a view is specified as
input, an index must be provided.
Retain Columns
Other columns within the table being scored can be appended to the scored table, by
specifying them here. Columns specified in Index Columns may not be specified
here.
Output Table:
Database name
The name of the database.
Table name
The name of the scored output table to be created.
Create Output table using the FALLBACK keyword
If a table is selected, it will be built with FALLBACK if this option is selected
Generate the SQL for this analysis, but do not execute it
If this option is checked the SQL to score this PMML model will be generated.
Maximum SQL statement size allowed (default 64000):
The SQL statements generated will not exceed this max value in characters.
Generate as a stored procedure with name:
If this option is checked the SQL produced will be generated in the form of a
stored procedure having the name that was given.
A sample of rows from the scored table is displayed here – the size determined by the setting
specified by Maximum result rows to display in Tools-Preferences-Limits. By default, the
index of the table being scored as well as the dependent column prediction are in the scored table
– additional columns as specified in the OUTPUT panel may be displayed as well.
1. RegressionContinuousPMML.xml
A Linear Regression model which predicts a continuous outcome.
2. DecisionTreeDiscretePMML.xml
A Decision Tree model which predicts a discrete outcome.
3. RegressionDiscretePMML.xml
A Logistic Regression model which predicts a discrete outcome.
4. NeuralMLPDiscretePMML.xml
A MLP Neural Network model which predicts a discrete outcome.
5. ClusterPMML.xml
A Cluster model which predicts which of 20 clusters a customer should be assigned to.
Parameterize a PMML Scoring Analysis to score a Linear Regression model which predicts a
continuous outcome as follows:
Run the analysis, and click on Results when it completes. For this example, the PMML Scoring
Analysis generated the following results. A single click on each page name populates Results
with the item. SQL is always produced and displayed, but is not shown here for brevity.
Report
Resulting Scored Table Name score_reg_1
Number of Rows in Scored File 747
Data
cust_id cc_rev
1362527 -3.09123331353583
1363078 17.1112717566361
1362588 8.28237095448635
1363486 27.7070270772696
1362752 53.7221660256401
1362893 -3.32443782325574
1363017 14.7337070009494
1363444 15.8410540579199
1362548 35.790895539682
1362487 11.3670140503415
… …
… …
… …
Parameterize a PMML Scoring Analysis to score a Decision Tree model which predicts a discrete
outcome as follows:
Run the analysis, and click on Results when it completes. For this example, the PMML Scoring
Analysis generated the following results. A single click on each page name populates Results
with the item. SQL is always produced and displayed, but is not shown here for brevity.
Report
Resulting Scored Table Name score_tree_1
Number of Rows in Scored File 747
Data
Parameterize a PMML Scoring Analysis to score a Logistic Regression model which predicts a
discrete outcome as follows:
Run the analysis, and click on Results when it completes. For this example, the PMML Scoring
Analysis generated the following results. A single click on each page name populates Results
with the item. SQL is always produced and displayed, but is not shown here for brevity.
Report
Resulting Scored Table Name score_reg_2
Number of Rows in Scored File 747
Data
cust_id ccacct P_ccacct1 P_ccacct0
1362527 0 0.125740481096571 0.874259518903429
1363078 1 0.861086203667224 0.138913796332776
1362588 1 0.723429148501114 0.276570851498886
1363486 0 0.125034199014627 0.874965800985373
1362752 0 0.419312298702164 0.580687701297836
1362893 1 0.970060355675886 0.0299396443241139
1363017 1 0.999980678896465 1.93211035354291E-05
1363444 0 0.173538764837706 0.826461235162294
1362548 0 0.265964538752992 0.734035461247008
1362487 1 0.872345777062174 0.127654222937826
… … … …
… … … …
… … … …
Parameterize a PMML Scoring Analysis to score a MLP Neural Network model which predicts a
discrete outcome as follows:
Run the analysis, and click on Results when it completes. For this example, the PMML Scoring
Analysis generated the following results. A single click on each page name populates Results
with the item. SQL is always produced and displayed, but is not shown here for brevity.
Report
Resulting Scored Table Name score_nn_2
Number of Rows in Scored File 747
Data
cust_id ccacct
1362527 0
1363078 1
1362588 1
1363486 0
1362752 1
1362893 1
1363017 1
1363444 0
1362548 1
1362487 1
… …
… …
… …
Parameterize a PMML Scoring Analysis to score a Cluster model which predicts which of 20
clusters this customer should be assigned to as follows:
Run the analysis, and click on Results when it completes. For this example, the PMML Scoring
Analysis generated the following results. A single click on each page name populates Results
with the item.
Report
Resulting Scored Table Name score_cluster_1
Data
cust_id Cluster
1362527 10
1363078 10
1362588 7
1363486 1
1362752 19
1362893 7
1363017 7
1363444 7
1362548 1
1362487 7
… …
… …
… …
5. Publishing
Publishing Overview
The Publish Analysis is provided as a means to save an analytic model by storing the SQL
generated by an associated Score Analysis and/or ADS analysis into Publish Tables (metadata
tables used by the Model Manager application). When a Score Analysis is selected as input into a
Publish Analysis, the SQL that was generated by that Score Analysis is stored in such a way that
Model Manager can replace key components of that SQL and re-execute it, making it possible to
effectively re-use a published model (the SQL template) on different sets of data.
Analysis References
The Publish Analysis makes use of the Analysis References feature in the following way.
Because one of the parameters of input is another analysis, it is in effect referencing that analysis.
When that analysis is selected as input, the Publish Analysis then manages the execution of any
analyses that are references of the input analysis. For example, it is a distinct possibility that the
input into the final Score Analysis will be a series of Reorganization or ADS Analyses linked
together via Analysis References. A possible scenario would be a Variable Creation Analysis that
is referenced by (input into) a Join, and then a Sample. The resulting analytic data set (ADS)
might then be used as the input to a Score Analysis. In this scenario, because each analysis is
dependent upon the previous one, the SQL from each analysis will be published (stored in the
Publish Tables) in the proper order of execution so that it will work when re-executed via Model
Manager. This ensures that all of the SQL necessary to generate the ADS and resulting analytic
model will be captured.
For an analysis to be available for publishing, it must store its tabular output in the database as a
table or view.
The anchor table of the last Variable Creation analysis within the chain of referenced analyses to
be published will be stored as the published anchor table. If that anchor table is the output table of
another Variable Creation analysis within the chain of referenced analyses to be published, the
publish will fail with the following error message:
The anchor table of the last Variable Creation analysis must be changed to the output table of a
different analysis (not a Variable Creation), or to a permanent table or view in Teradata for the
publish to be successful.
2. In the resulting Add New Analysis dialog box, with Publish highlighted on the left, double-
click on the Publish icon:
3. This will bring up the Publish dialog in which you will enter INPUT options to parameterize
the analysis as described in the next section.
By clicking on the button in the bottom center of the input screen, a pop up
window will appear that contains the following information that will be stored in the Publish
Tables:
Publish Date
The date that Publishing occurs, automatically set to the current date.
Expiration Date
The date that the model expires, set on the input screen by the user.
ADS Output Database
The database that was used to store the results of the ADS Analysis (if applicable)
ADS Output Table
The table that contains the results of the ADS Analysis (if applicable)
Score Output Database
The database that was used to store the results of the Score Analysis (if applicable)
Score Output Table
The table that contains the results of the Score Analysis if (applicable)
Model Variables
A list of the variables that were used in the model along with their descriptions.
Score Columns
A list of the columns that are generated in the output of the score (if applicable), along with their
descriptions.
By running the analysis, the information needed by Model Manager to re-use the model will be
stored in the Publish Tables within the Publish Database.
Results - Publish
On the Publish dialog click on RESULTS (note that the RESULTS tab will be grayed-
out/disabled until after the analysis is completed):
Select either the report or SQL tab to view the report or the SQL generated by the execution of
the Publish analysis.
Tutorial – Publish
Publish – Example
The following example contains a Variable Creation analysis that is referenced by a PMML Score
and is then published.
2. Select all the columns in the input table into the Variables panel.
3. Go to OUTPUT-storage, and select Store the tabular output of this analysis in the
database. Specify that a Table should be created named twm_tutorials_vc1.
Parameterize a PMML Scoring Analysis named PMML Scoring1 to score a Decision Tree model
which predicts a discrete outcome as follows:
Input:
Select Input Source Analysis
Available Analyses Variable Creation1
Available Tables twm_publish_vc1
Select File Name DecisionTreeDiscretePMML.xml
(located in Scripts\PMML UDF Install under the
directory where the application is installed)
Index Columns cust_id
Output – Storage:
Result Table Name twm_publish_score1
Model Output Options:
P_ccacct1 Enabled
P_ccacct0 Enabled
Click on the button in the bottom center of the input screen. This will open a pop
up window. By clicking on the button within the pop up window you will see the
following screens:
The final Publish Results screen shows the Score SQL to be Published (not shown here).
Click on the button to execute the Publish Analysis and store the information in the
Publish Tables.
Click on the Results to view what was published. Select the report tab to view the report portion
as shown below, and the SQL tab to view the SQL (not shown here).
References
3) Arabie, P., Hubert, L., and DeSoete, G., Clustering and Classification, World Scientific, 1996
4) Belsley, D.A., Kuh, E., and Welsch, R.E. (1980) Regression Diagnostics: Identifying
Influential Data and Sources of Collinearity. Wiley, New York.
5) Bradley, P., Fayyad, U. and Reina, C., Scaling EM Clustering to Large Databases, Microsoft
Research Technical Report MSR-TR-98-35, 1998
6) Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. Classification and Regression
Trees. Wadsworth, Belmont, 1984.
7) Cox, D.R. and Hinkley, D.V. (1974) Theoretical Statistics. Chapman & Hall/CRC, New
York.
8) Finn, J.D. (1974) A General Model for Multivariate Analysis. Holt, Rinehart and Winston,
New York.
9) Harman, H.H. (1976) Modern Factor Analysis. University of Chicago Press, Chicago.
10) Hosmer, D.W. and Lemeshow, S. (1989) Applied Logistic Regression. Wiley, New York.
11) Johnson, R.A. and Wichern, D.W. (1998) Applied Multivariate Statistical Analysis, 4th
Edition. Prentice Hall, New Jersey.
12) Kachigan, S.K. (1991) Multivariate Statistical Analysis. Radius Press, New York.
14) Kaufman, L. and Rousseeuw, P., Finding Groups in Data, J Wiley & Sons, 1990
15) Kennedy, W.J. and Gentle, J.E. (1980) Statistical Computing. Marcel Dekker, New York.
16) Kleinbaum, D.G. and Kupper, L.L. (1978) Applied Regression Analysis and Other
Multivariable Methods. Duxbury Press, North Scituate, Massachusetts.
19) McCullagh, P.M. and Nelder, J.A. (1989) Generalized Linear Models, 2nd Edition. Chapman
& Hall/CRC, New York.
20) McLachlan, G.J. and Krishnan, T., The EM Algorithm and Extensions, J Wiley & Sons,
1997
21) Menard, S (1995) Applied Logistic Regression Analysis, Sage, Thousand Oaks
22) Mulaik, S.A. (1972) The Foundations of Factor Analysis. McGraw-Hill, New York.
23) Neter, J., Kutner, M.H., Nachtsheim, C.J., and Wasserman, W. (1996) Applied Linear
Statistical Models, 4th Edition. WCB/McGraw-Hill, New York.
24) Nocedal, J. and Wright, S.J. (1999) Numerical Optimization. Springer-Verlag, New York.
25) Orchestrate/OSH Component User’s Guide Vol II, Analytics Library, Chapter 2:
Introduction to Data Mining. Torrent Systems, Inc., 1997.
26) Ordonez, C. and Cereghini, P. (2000) SQLEM: Fast Clustering in SQL using the EM
Algorithm. SIGMOD Conference 2000: 559-570
27) Ordonez, C. (2004): Programming the K-means clustering algorithm in SQL. KDD 2004:
823-828
28) Ordonez, C. (2004): Horizontal aggregations for building tabular data sets. DMKD 2004: 35-
42
29) Peduzzi, P.N., Hardy, R.J., and Holford, T.R. (1980) A Stepwise Variable Selection
Procedure for Nonlinear Regression Models. Biometrics 36, 511-516.
30) Pregibon, D. (1981) Logistic Regression Diagnostics. Annals of Statistics, Vol. 9, No. 4, 705-
724.
31) Quinlan, J.R. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, 1993.
32) Roweis, S. and Ghahramani, Z., A Unifying Review of Linear Gaussian Models, Journal of
Neural Computation, 1999
35) Tatsuoka, M.M. (1971) Multivariate Analysis: Techniques For Educational and
Psychological Research. Wiley, New York.
36) Tatsuoka, M.M. (1974) Selected Topics in Advanced Statistics, Classification Procedures,
Institute for Personality and Ability Testing, 1974
37) Teradata Warehouse Miner User’s Guide Release 03.00.02, B035-2093-022A, January 2002
38) Wilkinson, L., Blank, G., and Gruber, C. (1996) Desktop Data Analysis With SYSTAT.
Prentice Hall, New Jersey.
41) D'Agostino, R. B. and Stephens, M. A., eds. Goodness-of-fit Techniques, 1986,. New York:
Dekker.
42) D’Agostino, R, Belanger, A., and D’Agostino,R. Jr., A Suggestion for Using Powerful and
Informative Tests of Normality, American Statistician, 1990, Vol. 44, No. 4
43) Royston, JP., An Extension of Shapiro and Wilk’s W Test for Normality to Large Samples,
Applied Statistics, 1982, 31, No. 2, pp.115-124
44) Royston, JP, Algorithm AS 177: Expected normal order statistics (exact and approximate),
1982, Applied Statistics, 31, 161-165.
45) Royston, JP., Algorithm AS 181: The W Test for Normality. 1982, Applied Statistics, 31,
176–180.
46) Royston , JP., A Remark on Algorithm AS 181: The W Test for Normality., 1995, Applied
Statistics, 44, 547–551.
47) H. L. Harter and D. B. Owen, eds, Selected Tables in Mathematical Statistics, Vol. 1..
Providence, Rhode Island: American Mathematical Society.
48) Shapiro, SS and Francia, RS (1972). An approximate analysis of variance test for normality,
Journal of the American Statistical Association, 67, 215-216
49) D'Agostino, RB. (1971) An omnibus test of normality for moderate and large size samples,
Biometrica, 58, 341-348
52) Wendorf, Craig A., MANUALS FOR UNIVARIATE AND MULTIVARIATE STATISTICS
© 1997, Revised 2004-03-12, UWSP, http://www.uwsp.edu/psych/cw/statmanual, 2005
55) Takahashi, T. (2005) Getting Started: International Character Sets and the Teradata
Database, NCR Corporation, 541-0004068-C02