Blog

The best minds from Teradata, our partners, and customers blog about relevant topics and features.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

06-02-2015
05:55 AM

- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content

06-02-2015
05:55 AM

Please take a look at this comprehensive list of analytic operators available in Aster.

Aster 6.10 | ||

Analytic Category | Analytic | Analytic -- Description |

Time Series, Path, and Attribution Analysis | Attribution | The attribution operator is often used in web page analysis. Companies would like to assign weights to pages before certain events, such as a 'click' or a 'buy'. This attribution function enables you to calculate attributions by using a wide range of distribution models. |

Association Analysis | WSRecommender | Item-based, collaborative filtering function that uses a weighted-sum algorithm to make recommendations (for example, items or products that users should consider purchasing). |

Beta Functions | Degrees | This function generates the in-degree and out-degree tables for a directed graph. For an undirected graph, it just generates one degree table. This function can also generate the augmented edges table, which you can use in other graph functions. |

Beta Functions | Rectangle_Finder | This function finds rectangles in an undirected graph. The job of enumerating rectangles (4-cycles) is similar to that of enumerating triangles. |

Beta Functions | Triangle_Finder | This function finds triangles in an undirected graph. It is a driver function that calls the triangleFinderMap and triangleFinderReduce functions. The Triangle_Finder function generates a table listing the triangles in the graph. |

Cluster Analysis | Canopy | A simple, fast, accurate method for grouping objects into preliminary clusters. Each object is represented as a point in a multidimensional feature space. Canopy clustering is often used as an initial step in more rigorous clustering techniques, such as k-means clustering. |

Cluster Analysis | KMeansPlot | A function that clusters new data points around the cluster centroids generated by the k-Means function |

Cluster Analysis | Minhash | A probabilistic clustering method that assigns a pair of users to the same cluster with probability proportional to the overlap between the set of items that these users have bought (this relationship between users and items mimics various other transactional models). |

Data Transformation | Antiselect | Returns all columns except the columns specified. |

Data Transformation | Apache Log Parser | Parses Apache log file content and extracts multiple columns of structural information, including search engines and search terms. |

Data Transformation | IdentityMatch | Tries to match enterprise customers with users records provided by external data sources. |

Data Transformation | IpGeo | Lets you map IP addresses to location information. You can use this information to identify the geographical location of a visitor. This information includes country, region, city, latitude, longitude, ZIP code, and ISP. |

Data Transformation | JSONParser | The JSONParser function is a tool used to extract the element name and text from JSON strings and output them into a flattened relational table. |

Data Transformation | Multicase | Extends the capability of the SQL CASE statement by supporting matches to multiple options. The function iterates through the input data set only once and emits matches whenever a match occurs whereas as soon as CASE has a match it emits the result and then moves on to the next row. |

Data Transformation | MurmurHash | Computes the hash value of the input columns. |

Data Transformation | OutlierFilter | Removes outliers from their data set. |

Data Transformation | Pack | Take data from a single “packed” column and expand it to multiple columns. |

Data Transformation | Pivot | Pivots data stored in rows into columns. |

Data Transformation | PSTParserAFS | Parses Personal Storage Table (PST) files which store email in Microsoft software such as Microsoft Outlook and Microsoft Exchange Client. |

Data Transformation | Unpack | Take data from a single “packed” column and expand it to multiple columns. |

Data Transformation | Unpivot | Converts columns into rows. |

Data Transformation | XMLRelation | The XMLRelation function is a tool for extracting most XML content (element name, text and attribute values) and structural information from XML documents into a relational table. |

Graph Analysis | AllPairsShortestPath | Computes the shortest distances between all combinations of the specified source and target vertices. |

Graph Analysis | Betweenness | Determines betweenness, a type of centrality measurement, for every vertex in the input graph. |

Graph Analysis | Closeness | Computes closeness and k-degree scores for each specified source vertex in a graph. |

Graph Analysis | EigenvectorCentrality | Calculates the centrality (relative importance) of each node in a graph. |

Graph Analysis | LocalClusteringCoefficient | Analyzes the structure of a network. |

Graph Analysis | nTree | Builds and traverses tree structures on all worker nodes. |

Graph Analysis | PageRank | Computes PageRank for a directed graph. |

Naive Bayes | NaiveBayesMap | generate a model from training data |

Naive Bayes | NaiveBayesPredict | This function uses the model generated by the NaiveBayesReduce function to predict the outcomes for a test set of data. |

Naive Bayes | NaiveBayesReduce | generate a model from training data |

Pattern Matching | nPath | Teradata Aster nPath is a function for pattern matching that allows you to specify a pattern in an ordered collection of rows, specify additional conditions on the rows matching these symbols, and extract useful information from these row sequences. |

Statistical Analysis | ConfusionMatrix | Defines a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one. |

Statistical Analysis | ConfusionMatrixPlot | Generates a plot of the output of the ConfusionMatrix function as a real table with precision, recall, false alarm rate, miss rate, and fmeasure for each class, as well as the corresponding micro average value for all the classes. |

Statistical Analysis | EMAVG | Computes the average over a number of points in a time series while applying an exponentially decaying damping (weighting) factor to older values so that more recent values are given a heavier weight in the calculation. |

Statistical Analysis | FMeasure | Calculates the accuracy of a test. |

Statistical Analysis | GLMPredict | Scores input data using the model generated by the Stats GLM function. |

Statistical Analysis | Histogram | Counts the number of occurrences of a given data value that fall into each of a series of user-defined bins. |

Statistical Analysis | KNN | Uses the kNN algorithm to classifies data objects based on their proximity to training objects with known classification. |

Statistical Analysis | Percentile | Finds percentiles on a per group basis. |

Statistical Analysis | SMAVG | Computes the average over a number of points in a series. |

Statistical Analysis | VWAP | Computes the average price of a traded item (usually an equity share) over a specified time interval. |

Statistical Analysis - Enhanced Histogram Function | Hist_Map | The Hist_Map function organizes data into bins, with automatic or manual bin breaks |

Statistical Analysis - Enhanced Histogram Function | Hist_Reduce | If you want to output multiple histograms by groups, use PARTITION BY groupby_columns instead of PARTITION BY 1. The value of groupby_columns should be the same as the value used in Hist_Map |

Statistical Analysis - LARS Functions | LARS | Select most important variables one by one and fit the coefficients dynamically. The LARS function implements a model selection algorithm. |

Statistical Analysis - LARS Functions | LARSPredict | LarsPredict takes in the new data and the model generated by LARS, and outputs the predictions. |

Stream API | Stream | Allows users to run scripts and functions written in various languages including Python, Ruby, Per, C#, and R. |

Text Analysis | Levenshtein Distance | Computes the Levenshtein distance between two text values, that is, the number of edits needed to transform one string into the other, where edits include insertions, deletions, or substitutions of individual characters. |

Text Analysis | Named Entity Recognition (NER) | Named entity recognition (NER) is a process of finding instances of specified entities in text (For example, person, location, and organization) It has functions to train, evaluate and apply models which perform this analysis. |

Text Analysis | nGram | Tokenizes (or splits) an input stream and emits n multi-grams based on the specified delimiter and reset parameters. This function is useful for performing sentiment analysis, topic identification, and document classification. |

Text Analysis | PoSTagger | Tags the parts-of-speech of input text. |

Text Analysis | Sentenizer | Extracts the sentences in the input paragraphs |

Text Analysis | Sentiment Extraction Functions | The sentiment extraction functions enable the process of deducing a user's opinion (positive, negative, neutral) from text-based content. |

Text Analysis | Text Classifier | Chooses the correct class label for a given text input. |

Text Analysis | Text_Parser | A general tool for working with text fields that can tokenize an input stream of words, optionally stem them, and then emit the individual words and counts for the each word appearance. |

Text Analysis | TextChunker | Divides text into phrases and assigns each phrase a tag identifying its type. |

Text Analysis | TextMorph | Provides lemmatization, a basic tool in text analysis. The TextMorph function outputs a standard form of the input words. |

Text Analysis | TextTagging | The TextTagging function tags input tuples according to user-defined rules. These rules comprise logical and text processing operators. |

Text Analysis | TF_IDF | Evaluates the importance of a word within a specific document, weighted by the number of times the word appears in the entire corpus of documents. |

Text Analysis | WMAVG | Computes the average over a number of points in a time series while applying an arithmetically-decreasing weighting to older values. |

Time Series, Path, and Attribution Analysis | DTW | Dynamic time warping (DTW) is a function that measures the similarity between two sequences that vary in time or speed. |

Time Series, Path, and Attribution Analysis | DWT | Implements Mallat’s algorithm, which is an iterate algorithm in the Discrete Wavelet Transform (DWT) field, and is designed to apply wavelet transform on multiple sequences simultaneously. |

Time Series, Path, and Attribution Analysis | DWT2D | Implements wavelet transforms on two-dimensional input, and simultaneously applies the transforms on multiple sequences. |

Time Series, Path, and Attribution Analysis | FrequentPaths | Mines for patterns that appear more than a certain number of times in the sequence database. The difference between sequential pattern mining and frequent pattern mining is that the former works on time sequences where the order of items must be kept. |

Time Series, Path, and Attribution Analysis | IDWT | Applies inverse wavelet transformation on multiple sequences simultaneously. IDWT is the inverse of DWT. |

Time Series, Path, and Attribution Analysis | IDWT2D | Simultaneously applies inverse wavelet transforms on multiple sequences. IDWT2d is the inverse function of DWT2d. |

Time Series, Path, and Attribution Analysis | Path_Analyzer | The path_analyzer function automates path analysis. This function acts as a wrapper function of the path_generator, path_start, and path_summarizer functions. You can use this function to perform clickstream analysis of common sequences of user pageviews on websites. |

Time Series, Path, and Attribution Analysis | Path_Generator | This function takes as input a set of paths where each path is a route (series of pageviews) taken by a user from start to end. For each path, it generates the correctly formatted sequence and all possible sub-sequences for further analysis by the Path Summarizer function. The first element in the path is the first page a user could visit. The last element of the path is the last page visited by the user. |

Time Series, Path, and Attribution Analysis | Path_Start | Generates all the children for a particular parent and sums up their count. Note that the input data has to be partitioned by the parent column. |

Time Series, Path, and Attribution Analysis | Path_Summarizer | The output of the Path Generator function is the input to this function. This function is used to sum counts on nodes. “Node” can either be a plain sub-sequence or an exit sub-sequence. Exit sub-sequence is the one in which both sequence and the sub-sequence are same. Exit sub-sequences are denoted by appending '$' to the end of the sequence. |

Time Series, Path, and Attribution Analysis | SAX | Symbolic Aggregate approXimation (SAX) transforms original time series data into symbolic strings. Once this transformation is complete, the data is more suitable for many additional types of manipulation, both because of its smaller size and the relative ease with which patterns can be identified and compared. |

Time Series, Path, and Attribution Analysis | Sessionization | Sessionization is the process of mapping each click in a clickstream to a unique session identifier. One can define a session as a sequence of clicks by a particular user where no more than n seconds pass between successive clicks (that is, if we don't see a click from a user for n seconds, we start a new session). |

Visualization Functions | CfilterViz | CfilterViz is a multiple-input partition SQL-MR function that visualizes the output of the cfilter SQL-MR function. This function uses the Sigma visualization module to generate Sigma graphs. Additionally, this function lets you specify these types: GEXF, Graphviz. |

Visualization Functions | NpathViz | NpathViz is a SQL-MR function that visualizes the output of the Teradata Aster nPath SQL-MR function. NpathViz generates these visualization types: Sankey, Tree, Sigma, Chord, GEXF, and Graphviz. |

Statistical Analysis | LinReg | Outputs the coefficients of the linear regression model represented by the input matrices. |

Statistical Analysis | GLM | GLM performs linear regression analysis for any of a number of distribution functions using a user-specified distribution family and link function. Supported models in Aster Database are ordinary linear regression, logistic regression (logit model), and Poisson log-linear model. |

Using GLM function for each value of response is equivalent to Multinomial Regression. See: http://www.theanalysisfactor.com/logistic-regression-models-for-multinomial-and-ordinal-variables/ | ||

Using GLM function for each value of response with proportional model is equivalent to Ordinal Regression. See: http://www.theanalysisfactor.com/logistic-regression-models-for-multinomial-and-ordinal-variables/ | ||

Statistical Analysis | LARS | Least Angle Regression (LARS) and Least Absolute Shrinkage and Selection Operator (LASSO) are attractive variants of linear regression that select the most important variables, one by one, and fit the coefficients dynamically. |

R-Engine | In-database Aster/R | |

R-Engine | In-database Aster/R | |

R-Engine | In-database Aster/R | |

Statistical Analysis | GLM | GLM performs linear regression analysis for any of a number of distribution functions using a user-specified distribution family and link function. Supported models in Aster Database are ordinary linear regression, logistic regression (logit model), and Poisson log-linear model. |

Various | SQL or In-database Aster/R | |

R-Engine | In-database Aster/R | |

Statistical Analysis | GLM | GLM performs linear regression analysis for any of a number of distribution functions using a user-specified distribution family and link function. Supported models in Aster Database are ordinary linear regression, logistic regression (logit model), and Poisson log-linear model. |

Statistical Analysis | PCA | Principal component analysis (PCA) is a common unsupervised learning technique that is useful for both exploratory data analysis and dimensionality reduction. |

Statistical Analysis | PCA | |

Decision Trees | Single Decision Tree Functions | These Single Decision Tree Functions let you create a predictive model without creating multiple decision trees. Only one decision tree is created. |

Decision Trees | Random Forest Functions | The Random Forest Functions let you create a predictive model based on a combination of the CART algorithm for training decision trees, and the ensemble learning method of bagging. |

Association Analysis | Basket_Generator/cfilter | Basket_generator generates sets or “baskets” of items that occur together in records in data, typically transaction records or web page logs. The cfilter function performs collaborative filtering, to find items or events that are frequently paired with other items or events. |

Cluster Analysis | Kmeans/Canopy | Kmeans is a simple unsupervised learning algorithm that solves the well-known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The goal is to define k centroids, one for each cluster. Canopy clustering is a very simple, fast, and accurate method for grouping objects into preliminary clusters. Each object is represented as a point in a multidimensional feature space. |

Text Analysis | LDA Functions | The LDA functions build a topic model based on the supplied training data and parameters and estimate the topic distribution for each document based on the generated model. One of the LDA functions displays the readable information of the model. |

Text Analysis | TextTokenizer | The TextTokenizer function extracts tokens (or text segments) from text. Examples of tokens are words, punctuation marks, and numbers. |

Various | SQL, percentile, correlation, histogram, sample | |

Statistical Analysis | Correlation | Computes a global correlation between any pair of columns from a table. |

Statistical Analysis | Distribution Matching | Carries out hypothesis testing and finds the best matching distribution for the data. |

Statistical Analysis | Distribution Matching | Carries out hypothesis testing and finds the best matching distribution for the data. |

Core technologies | Compressed storage/bitwise indices | |

Various | SQL, percentile, correlation, histogram, sample | |

Data Transformation | Various | |

Misc | pmml_reader | The field-developed function can read a PMML Model file and apply it to a table of data, produce predictions.. |

Statistical Analysis | Principal Component Analysis | Principal component analysis (PCA) is a common unsupervised learning technique that is useful for both exploratory data analysis and dimensionality reduction. It is often used as the core procedure for factor analysis. |

Statistical Analysis | Principal Component Analysis | Principal component analysis (PCA) is a common unsupervised learning technique that is useful for both exploratory data analysis and dimensionality reduction. It is often used as the core procedure for factor analysis. |

Time Series, Path, and Attribution Analysis | CMAVG | The Cumulative Moving Average (CMAVG) function computes the average of a value from the beginning of a series. |

Predictive Analysis Text Analysis | Naive Bayes & Naïve Bayes Text Classifier | Determines the classification of data objects based on the Naive Bayes algorithm, which takes into account the classification probability based on the training data set and additional input variables. |

Statistical Analysis | Support Vector Machines | Consists of three functions: (1) SparseSVMTrainer—Builds a predictive model according to a training set. (2) SparseSVMPredictor—Gives a prediction for each sample in the test set. (3)SVMModelPrinter—Displays the readable information of the model. |

Statistical Analysis | Approximate Distinct Count | Computes an approximate global distinct count of the values in the specified column or combination of columns. Based on probabilistic counting algorithms, this algorithm counts the approximate distinct values for any number of columns or combination of columns, while scanning the table only once. Evaluates all the children for a particular parent and sums up their count. Note that the input data has to be partitioned by the parent column. |

Text Analysis | TF, ngram, sentenizer, lda, etc | Functions to analyze values as specific aggregate |

R-Engine | In-database Aster/R | Aster connection to R functions |

Statistical Analysis | Sample | Draws rows randomly from the input relation. The function offers two sampling schemes. |

R-Engine | In-database Aster/R | Aster connection to R functions |

Various | SQL, nc_*, vacuum, analyse, backup, etc | Functions to clean and optimize the tables within Aster |

Statistical Analysis: HMM | HMMUnsupervisedLearner | The HMMUnsupervisedLearner function generates multiple HMM models simultaneously, where each model is learned from a set of time-ordered sequences, where each sequence is represented as a vertex. (Hidden Markov Model is a statistical model describing the evolution of observable events that depend on internal factors, which are not directly observable.) |

Statistical Analysis: HMM | HMMSupervisedLearner | The HMMSupervisedLearner function generates multiple HMM models simultaneously, where each model is learned from a set of time-ordered sequences. (Hidden Markov Model is a statistical model describing the evolution of observable events that depend on internal factors, which are not directly observable.) |

Statistical Analysis: HMM | HMMEvaluator | HMMEvaluator function measures the probabilities of one or more of newly occurred sequences, with ring2especting to each trained HMM. (Hidden Markov Model is a statistical model describing the evolution of observable events that depend on internal factors, which are not directly observable.) |

Statistical Analysis: HMM | HMMDecoder | The HMMDecoder function finds the state sequence with the highest probability, given the learned model and observed sequences. (Hidden Markov Model is a statistical model describing the evolution of observable events that depend on internal factors, which are not directly observable.) |

Statistical Analysis | VectorDistance | This function measures the distance between sparse vectors for example, TF-IDF vectors in a pairwise manner. |

Statistical Analysis | LRTEST | This function performs the likelihood ratio test for two GLM models. |

Graph Analysis | Modularity | The Modularity function discovers communities in input graphs. |

Graph Analysis | pSalsa | The pSALSA function personalized SALSA is a SQL-GR function that evaluates the similarity of nodes in a bipartite graph according to their proximity. It can be used for recommendation. |

Graph Analysis | Shaply Value | The Shapley Value queries and helper functions compute the Shapley Value, a measure of the value of individuals in a coalition. |

Geometry | GeometryLoader | The GeometryLoader function fetches various file-based geospatial files from AFS, parses them, and stores them in Aster Database. |

Geometry | PointInPolygon | The PointInPolygon Location Point in Polygon function is a geometry function that takes as input a list of location points and a list of polygons. |

Geometry | GeometryOverlay | The GeometryOverlay function calculates the result of overlaying two geometries as specified by the overlay operator. |

Data Transformation | URIUnpack | This function breaks up a hierarchical uniform resource identifier URI into its constituent components and extracts the values of the parameters specified by the function. |

Data Transformation | URIPack | The URIPack function reconstructs encoded hierarchical URI strings that were unpacked by the URIUnpack function. |

Data Transformation: Statistical Analysis/ Scaling | ScaleMap | This function retrieves statistical information. |

Data Transformation: Statistical Analysis/ Scaling | Scale | This function is a multiple-input function that generates scaled values for the entire input data set. |

Data Transformation: Statistical Analysis/ Scaling | ScalePrinter | This function generates the statistical information for the entire data set. |

Data Transformation: Statistical Analysis/ Scaling | PartitionScale | This function scales the sequences in each partition independently. |

Time Series: Shapelets | ShapeletMasker | The function emits the sax_word, its index in the input time series and the result of the mask sax_code_mask. The output of the function is used to generate candidates for shapelets. |

Time Series: Shapelets | ShapeletFrequencyFinder | This function counts the number of times each masked sax word appears in each class representative of the time series. |

Time Series: Shapelets | ShapeletStrengthFinder | This function operates on the output of ShapeletFrequencyFinder and computes the distinguishing power strength of shapelets. |

Time Series: Shapelets | ShapeletFinder | This function emits the shapelets in original un-encoded time series, format for the given training data set. |

Labels:

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.