Teradata Analytics Platform Guide for the Aster User

Learn Data Science
Teradata Employee

The new Teradata Analytics Platform is being released soon, and with comes a Machine Learning & Graph engine allowing Teradata users to use over 180 prebuilt analytic functions to transform, prepare, analyze, and visualize multi-structured data to address a wide number of business use cases, such as customer churn, path to purchase, marketing attribution, product affinity, fraud, marketing optimization, and more.  This is a replacement for the Aster Classic box and allows the Aster functions to be used in Teradata. 

 

For the Aster user, this means it's time to dust of the Teradata SQL. This article includes tips and tricks for the Aster user who is mirgrating to the Teradata Analytics Platform.

 

Teradata Database Features  

 

Case Sensitivity  

Teradata is not always case sensitive by default: this is a system-wide feature that is controlled by DBA’s. As Aster is case sensitive, unexpected results may be found, especially when joining or aggregating data. 

drop table case_test; 
create table case_test ( id int, word char(3) )primary index(id); 
insert into case_test (1, 'dog'); 
insert into case_test (2, 'DOG'); 
insert into case_test (3, 'Dog'); 
insert into case_test (4, 'DOG'); 
insert into case_test (5, 'DOG'); 
insert into case_test (6, 'Dog'); 
 
--System with Case Insensitive as default 
select word, count(*) 
from case_test 
group by word; 
-- word   Count(*)  
-- ------ --------  
-- Dog           6 
 
--Get “Aster like” results in a single query 
select word (casespecific), count(*) 
from case_test 
group by word (casespecific); 
-- word   Count(*)  
-- ------ --------  
-- dog           1 
-- DOG           3 
-- Dog           2 
 
--Get “Aster like” results for a table 
drop table case_test; 
create table case_test ( id int, word char(3) casespecific )primary index(id); 
insert into case_test (1, 'dog'); 
insert into case_test (2, 'DOG'); 
insert into case_test (3, 'Dog'); 
insert into case_test (4, 'DOG'); 
insert into case_test (5, 'DOG'); 
insert into case_test (6, 'Dog'); 
 
select word, count(*) 
from case_test 
group by word; 
-- word   Count(*)  
-- ------ --------  
-- DOG           3 
-- Dog           2 
-- dog           1 

  

Teradata Analytics Platform Features  

 

Running Analytic Functions 

There are two ways to run the default machine learning functions, either with the explicit @coprocessor syntax or by using the pre-setup function aliases. The Machine Learning Functions documentation uses the alias syntax. This is the recommended way to run functions, especially as parameter names and how functions run has become dramatically more consistent in the new syntax.  

 

By not using an alias you are essentially writing Aster Classic code which is pushed down to the coprocessor. This means that, for Aster Experts, the non-alias method of running functions will seem much more natural.  

 

Running Native Analytic Functions 

There are currently 9 functions that run natively in the Teradata database. These functions can also be explicitly pushed to run in the coprocessor by using the @coprocessor syntax discussed in Running Analytic Functions. In the Machine Learning documentation, you will see these functions with the @coprocessor syntax as this documentation is strictly on running functions in the Aster Engine.  

 

Function Calls Must Be Named 

Calls to the Machine Learning Engine require an alias to successfully run. Once you have solved the error message it makes sense, but when it’s new it is not particularly clear to the layman.  

--Sample Dataset 
drop table alias_test; 
create table alias_test ( 
id int 
,words varchar(100) 
)primary index(id); 
 
insert into alias_test (1, 'Hey, diddle, didle'); 
insert into alias_test (2, 'The cat and the fiddle,'); 
insert into alias_test (3, 'The cow jumped over the moon;'); 
insert into alias_test (4, 'The little dog laughed'); 
insert into alias_test (5, 'To see such sport,'); 
insert into alias_test (6, 'And the dish ran away with the spoon.'); 
 
--Forgetting to alias this SQL call 
select * from nGram( 
on alias_test 
using 
TextColumn('words') 
Grams('3') 
); 
--Executed as Single statement.  Failed [3707 : 42000] Syntax error, expected something like a name or a Unicode delimited identifier or an 'UDFCALLNAME' keyword between ')' and ';'.  
--Elapsed time = 00:00:00.234  
--  
--STATEMENT 1: Unknown failed.  
 
select * from nGram( 
on alias_test 
using 
TextColumn('words') 
Grams('3') 
) as ng; 
 
-- id words                                 ngram                      n frequency  
-- -- ------------------------------------- -------------------------- - ---------  
--  2 The cat and the fiddle,               Snippet:the cat and        3         1 
--  2 The cat and the fiddle,               Snippet:cat and the        3         1 
-- ... 

 

Coprocessor Knows the Past  

Parameter names of various functions have changed a lot between Aster Classic 6.10, 6.20, 6.21, and the Teradata Analytics Platform. This is for the better – there is now extreme consistency where in the past naming conventions changed from function to function. If you are getting stuck translating old Aster Classic code to new Teradata Analytics Platform syntax it may be because you are using a dated parameter name.  

The @coprocessor syntax is backwards compatible. It is not recommended to use this syntax, but this is a fun fact for when you are getting stuck while debugging. 

--Sample Dataset 
drop table alias_test; 
create table alias_test ( id int, words varchar(100) )primary index(id); 
insert into alias_test (1, 'Hey, diddle, didle'); 
insert into alias_test (2, 'The cat and the fiddle,'); 
insert into alias_test (3, 'The cow jumped over the moon;');/ 
insert into alias_test (4, 'The little dog laughed'); 
insert into alias_test (5, 'To see such sport,'); 
insert into alias_test (6, 'And the dish ran away with the spoon.'); 
 
--Oops, we used the Text Column parameter name from 6.20 
select * from nGram( 
on alias_test 
using 
Text_Column('words') 
Grams('3') 
) as ng; 
--Executed as Single statement.  Failed [4382 : HY000] Argument Text_Column is not defined in the function mapping definition.  
--Elapsed time = 00:00:00.041  
--STATEMENT 1: Select Statement failed. 
 
--Let's try the other syntax 
select *  
from nGram@coprocessor( 
on alias_test 
using 
Text_Column('words') Grams('3') 
) as ng; 
-- id words                                 ngram                      n frequency  
-- -- ------------------------------------- -------------------------- - ---------  
--  5 To see such sport,                    Snippet:to see such        3         1 
 
--Ah, time to confirm parameter names in the documentation 
select *  
from nGram( 
on alias_test 
usingTextColumn('words') Grams('3') 
) as ng; 
-- id words                                 ngram                      n frequency  
-- -- ------------------------------------- -------------------------- - ---------  
--  4 The little dog laughed                Snippet:the little dog     3         1 
-- ... 

 

You’re Always Using Multiple Input Tables 

You must give each table an alias when you are using a function with multiple inputs so that the function knows what data is in each table. When you are using a function that optionally can have multiple inputs you must still alias the single input table for the function to run successfully. This is only when using the alias version of function calls. 

--Build a table with numbers from 1-100 
drop table sample_test; 
create table sample_test ( 
id int 
)primary index(id); 
 
insert into sample_test 
select top 100 day_of_calendar 
from sys_calendar.calendar; 
 
SELECT *  
FROM Sampling ( 
ON sample_test  
PARTITION BY ANY 
USING 
SampleFraction ('0.1') 
) AS samp; 
--Executed as Single statement.  Failed [4380 : HY000] Input argument name  is not defined in function mapping.  
--Elapsed time = 00:00:00.044  
--  
--STATEMENT 1: Select Statement failed. 
 
SELECT * FROM Sampling ( 
ON sample_test AS "Input"  
PARTITION BY ANY 
USING 
SampleFraction ('0.1') 
) AS samp; 
 
-- id  
-- --  
-- 33 
-- 16 
--  7 
-- 91 
-- 36 
-- 62 
-- 22 
-- 75 

 

Custom Functions Need .JSON 

Custom SQL-MR functions that were written for Aster Classic can successfully run in the Teradata Analytics Platform. The .jar file(s) for the function must be in a .zip file with a .json file that explains to the Aster Engine how the function runs.  

The following fields are required in the .json file 

  • function_name 
  • function_type: sqlmr, driver, or graph. non-driver can also be used, and is the same a sqlmr 
  • short_description: include to prevent json null error 
  • long_description: include to prevent json null error 

Some functions require one or more input tables, use parameters that reference a column in an input table, or have partitioned tables. For non-partitioned tables use "PartitionByAny". 

{ 
  "function_name": "", 
  "function_type": "[sqlmr|driver|graph]", 
  "short_description": "", 
  "long_description": "" 
} 
... 
"input_tables": [ 
    { 
      "requiredInputKind": [ 
        "PartitionByKey" 
      ], 
      "isOrdered": false, 
      "partitionByOne": false, 
      "name": "input", 
      "alternateNames": [], 
      "isRequired": true, 
      "rDescription": "This table defining the input training data.", 
      "description": "This table defining the input training data.", 
      "datatype": "TABLE_ALIAS", 
      "allowsLists": false 
    } 
  ], 
  ... 

Some functions take one or more parameters. These can be string literals, numbers, or references to an input table.

They can be a single value or a list of values. 

… 
"argument_clauses": [ 
    { 
      "targetTable": [ 
        "input" 
      ], 
      "checkDuplicate": false, 
      "allowedTypes": [], 
      "allowedTypeGroups": [ 
        "STRING" 
      ], 
      "requiredLength": 1, 
      "matchLengthOfArgument": "", 
      "allowPadding": true, 
      "name": "text_column", 
      "isRequired": true, 
      "rDescription": "Specifies the name of the input table column that contains the text.", 
      "description": "Specifies the name of the input table column that contains the text.", 
      "datatype": "COLUMNS", 
      "allowsLists": false 
    }, 
    { 
      "name": "split_by", 
      "isRequired": true, 
      "rDescription": "Specifies how to split paragraphs.", 
      "description": "Specifies how to split paragraphs.", 
      "datatype": "STRING", 
      "allowsLists": false 
    } 
  ] 
…  

 

Useful Code to Know 

 

Information on Alias Functions 

The alias names of functions use a consistent naming convention. This means that for some functions the name is different than in Aster Classic. A table showing these changes can be found in the Teradata Analytics Platform User Guide.  

Here is how to get a list of all Aliases set up in the system, you’ll notice that each function exists for each user in the system 

select  
trim(DatabaseName), trim(TableName) 
from dbc.Tables 
where TableKind = '2' 
order by TableName; 
 
--DatabaseName TableName                       
-- ------------ ------------------------------  
-- user6        AdaBoost                       
-- user2        AdaBoost                       
-- user10       AdaBoost                       

You can use the RequestText column in this table to see all parameter names of a function, see if any of the parameters are key words, see if the input table needs an alias, and so forth. 

select RequestText 
from dbc.Tables 
where TableKind = '2' 
and trim(TableName) = 'SVMSparse' 
order by TableName; 
 
--CREATE FUNCTION MAPPING SVMSparse 
--FOR SparseSVMTrainer SERVER coprocessor 
--USING 
--InputTable IN TABLE, 
--ModelTable OUT TABLE, 
--SampleIdColumn,AttributeColumn,LabelColumn,ValueColumn 
--,"Hash",HashBuckets,Cost,Bias,ClassWeights 
--,MaxStep,Epsilon,Seed; 

 

Information on the Coprocessor Functions 

Use this SQL statement to get a list of all objects that are in the public schema.  

HELP FOREIGN SCHEMA "public"@coprocessor; 
-- objectname                         objecttype     owner         
-- ---------------------------------- -------------- ------------  
-- accumcompact                       function       user5        
-- accumCompact.zip                   installed file user5        
-- adaboost_drive                     function       db_superuser 
-- adaboost_predict                   function       db_superuser 
-- addoneplayer                       function       db_superuser 
-- allpairsshortestpath               function       db_superuser 
-- amlgenerator                       function       db_superuser 
-- antiselect                         function       db_superuser 

Use this SQL statement to get a list of all objects that are in the logged-in user’s private schema. 

HELP FOREIGN SCHEMA "user5"@coprocessor; 
-- objectname                          objecttype owner  
-- ----------------------------------- ---------- -----  
-- hmm_output_a_2348579604570333316_1  table      user5 
-- hmm_output_a_2348579604570333316_2  table      user5 

Object Type 

  • Function: default machine learning functions and custom aster functions 
  • Installed File: zip files of custom functions, custom dictionaries, model files 
  • Table: temporary tables that are used in during run time. If the function is successful these are cleaned up after it finished running; if the function fails these tables are deleted by a daemon that runs daily 

Owner 

  • DB_Superuser: this object was installed natively and is part of the base package 
  • Other: the username that installed this function or file 

Use this SQL statement to get information on running a function using the @coprocessor syntax 

help foreign function SparseSVMTrainer@coprocessor; 
-- Function Help                                                                                                                                                                                                                                                     
-- Function Name:                                                                                                                                                                                                                                                   
-- SparseSVMTrainer                                                                                                                                                                                                                                                 
--                                                                                                                                                                                                                                                                  
-- Function Type:                                                                                                                                                                                                                                                   
-- driver                                                                                                                                                                                                                                                           
--                                                                                                                                                                                                                                                                  
-- Short Description:                                                                                                                                                                                                                                               
-- This function trains a SVM model according to the training set iteratively.                                                                                                                                                                                      
--                                                                                                                                                                                                                                                                  
-- Long Description:                                                                                                                                                                                                                                                
-- This function is used to train a SVM model for sparse input data.  The training is a iterative process controlled by the paramters.  We may choose project the attributes to a hash space or explictly record all  the attributes as a list during the training. 
--                                                                                                                                                                                                                                                                  
-- Usage Syntax:                                                                                                                                                                                                                                                    
-- SparseSVMTrainer@coprocessor(                                                                                                                                                                                                                                    
-- ON(...) as InputTable                                                                                                                                                                                                                                            
-- AttributeColumn('COLUMN_NAMES')                                                                                                                                                                                                                                  
-- [ClassWeights('STRING')]                                                                                                                                                                                                                                         
-- [Bias('DOUBLE')]                                                                                                                                                                                                                                                 
-- [Seed('LONG')]                                                                                                                                                                                                                                                   
-- SampleIdColumn('COLUMN_NAMES')                                                                                                                                                                                                                                   
-- [MaxStep('INTEGER')]                                                                                                                                                                                                                                             
-- [Epsilon('DOUBLE')]                                                                                                                                                                                                                                              
-- LabelColumn('COLUMN_NAMES')                                                                                                                                                                                                                                      
-- [Hash('BOOLEAN')]                                                                                                                                                                                                                                                
-- [Cost('DOUBLE')]                                                                                                                                                                                                                                                 
-- [ValueColumn('COLUMN_NAMES')]                                                                                                                                                                                                                                    
-- [HashBuckets('INTEGER')]                                                                                                                                                                                                                                         
-- ModelTable('TABLE_NAME')                                                                                                                                                                                                                                         
-- )                                                                                                                                                                                                                                                                
--                                                                                                                                                                                                                                                                  
-- Input Columns:                                                                                                                                                                                                                                                   
-- inputtable<sampleidcolumn,attributecolumn,[valuecolumn],labelcolumn>                                                                                                                                                                                             
--                                                                                                                                                                                                                                                                  
-- Output Columns:                                                                                                                                                                                                                                                  
-- modeltable<classid,weights>                                                                                                                                                                                                                                      
--                                                                                                                                                                                                                                                                  
-- Function Owner:                                                                                                                                                                                                                                                  
-- db_superuser                                                                                                                                                                                                                                                     
--                                                                                                                                                                                                                                                                  
-- Creation Time:                                                                                                                                                                                                                                                   
-- 2018-05-15 01:26:09.900375                                                                                                                                                                                                                                       
--                                                                                                                                                                                                                                                                  
-- Function Version:                                                                                                                                                                                                                                                
-- 7.0_rel_1.2_r20180409                                                                                                                                                                                                                                            
--                                                                                                                                                                                                                                                                  
-- Interfaces Implemented:                                                                                                                                                                                                                                          
-- Partition function                                                                                                                                                                                                                                               
--                                                   

 

Installing a Custom Function 

Custom SQL-MR functions that were written for Aster Classic can successfully run in the Teradata Analytics Platform. It is important to note that there may be a native Teradata capability which may provide the same results and run faster than a custom function. Many permissions are required, check the Teradata Analytics Platform User Guide for a list of these. Function zip files require a .json file that explains how to run the function.  

create table user5.id_test( 
val char(3) 
)primary index(val); 
insert into user5.id_test values ('dog'); 
insert into user5.id_test values ('cat'); 
insert into user5.id_test values ('bat'); 
 
--Move the file from your local machine to the TD box 
CALL SYSUIF.INSTALL_FILE( 
'id' --alias for Aster Engine 
,'id.zip' 
 
--c: the file path is to the local machine, not TD 
--b: bianry file 
,'cb!C:\Users\mt186048\Desktop\custom_mr\id.zip' 
); 
 
--Install the function onto the Aster Engine in the public schema 
CALL PM.INSTALL_AFUNCTION_TO_PUBLIC('id'); 
 
--Install the function onto the Aster Engine into your user's schema 
--CALL PM.INSTALL_AFUNCTION('id'); 
 
--Run the function, must use "@coprocessor" as no alias is set up 
SELECT *  
FROM id@coprocessor ( 
ON user5.id_test 
) as id; 
-- val       id  
-- --------- --  
-- dog        3 
-- bat        2 
-- cat        7 
 
--Uninstall the function from the Aster Engine 
CALL PM.REMOVE_AFUNCTION_FROM_PUBLIC('id'); 
--CALL PM.REMOVE_AFUNCTION('id'); 
 
--Remove file from the Teradata System 
CALL SYSUIF.REMOVE_FILE( 
'id' 
,0 --check that file is not being used before we delete it 
); 

 

Michelle Tanco – Senior Data Scientist – Think Big Analytics, Data Science Deliveries – michelle.tanco@teradata.com 

1 Comment
LZ
Teradata Employee

Really helpful, thanks a lot!