Aster Analytics Catalog and Brief Description

Learn Aster
Teradata Employee

Please take a look at this comprehensive list of analytic operators available in Aster.

Aster 6.10
Analytic CategoryAnalyticAnalytic -- Description
Time Series, Path, and Attribution AnalysisAttributionThe attribution operator is often used in web page analysis. Companies would like to assign weights to pages before certain events, such as a 'click' or a 'buy'. This attribution function enables you to calculate attributions by using a wide range of distribution models.
Association AnalysisWSRecommenderItem-based, collaborative filtering function that uses a weighted-sum algorithm to make recommendations (for example, items or products that users should consider purchasing).
Beta FunctionsDegreesThis function generates the in-degree and out-degree tables for a directed graph. For an undirected graph, it just generates one degree table. This function can also generate the augmented edges table, which you can use in other graph functions.
Beta FunctionsRectangle_FinderThis function finds rectangles in an undirected graph. The job of enumerating rectangles (4-cycles) is similar to that of enumerating triangles.
Beta FunctionsTriangle_FinderThis function finds triangles in an undirected graph. It is a driver function that calls the triangleFinderMap and triangleFinderReduce functions. The Triangle_Finder function generates a table listing the triangles in the graph.
Cluster AnalysisCanopyA simple, fast, accurate method for grouping objects into preliminary clusters. Each object is represented as a point in a multidimensional feature space. Canopy clustering is often used as an initial step in more rigorous clustering techniques, such as k-means clustering.
Cluster AnalysisKMeansPlotA function that clusters new data points around the cluster centroids generated by the k-Means function
Cluster AnalysisMinhashA probabilistic clustering method that assigns a pair of users to the same cluster with probability proportional to the overlap between the set of items that these users have bought (this relationship between users and items mimics various other transactional models).
Data TransformationAntiselectReturns all columns except the columns specified.
Data TransformationApache Log ParserParses Apache log file content and extracts multiple columns of structural information, including search engines and search terms.
Data TransformationIdentityMatchTries to match enterprise customers with users records provided by external data sources.
Data TransformationIpGeoLets you map IP addresses to location information. You can use this information to identify the geographical location of a visitor. This information includes country, region, city, latitude, longitude, ZIP code, and ISP.
Data TransformationJSONParserThe JSONParser function is a tool used to extract the element name and text from JSON strings and output them into a flattened relational table.
Data TransformationMulticaseExtends the capability of the SQL CASE statement by supporting matches to multiple options. The function iterates through the input data set only once and emits matches whenever a match occurs whereas as soon as CASE has a match it emits the result and then moves on to the next row.
Data TransformationMurmurHashComputes the hash value of the input columns.
Data TransformationOutlierFilterRemoves outliers from their data set.
Data TransformationPackTake data from a single “packed” column and expand it to multiple columns.
Data TransformationPivotPivots data stored in rows into columns.
Data TransformationPSTParserAFSParses Personal Storage Table (PST) files which store email in Microsoft software such as Microsoft Outlook and Microsoft Exchange Client.
Data TransformationUnpackTake data from a single “packed” column and expand it to multiple columns.
Data TransformationUnpivotConverts columns into rows.
Data TransformationXMLRelationThe XMLRelation function is a tool for extracting most XML content (element name, text and attribute values) and structural information from XML documents into a relational table.
Graph AnalysisAllPairsShortestPathComputes the shortest distances between all combinations of the specified source and target vertices.
Graph AnalysisBetweennessDetermines betweenness, a type of centrality measurement, for every vertex in the input graph.
Graph AnalysisClosenessComputes closeness and k-degree scores for each specified source vertex in a graph.
Graph AnalysisEigenvectorCentralityCalculates the centrality (relative importance) of each node in a graph.
Graph AnalysisLocalClusteringCoefficientAnalyzes the structure of a network.
Graph AnalysisnTreeBuilds and traverses tree structures on all worker nodes.
Graph AnalysisPageRankComputes PageRank for a directed graph.
Naive BayesNaiveBayesMapgenerate a model from training data
Naive BayesNaiveBayesPredictThis function uses the model generated by the NaiveBayesReduce function to predict the outcomes for a test set of data.
Naive BayesNaiveBayesReducegenerate a model from training data
Pattern MatchingnPathTeradata Aster nPath is a function for pattern matching that allows you to specify a pattern in an ordered collection of rows, specify additional conditions on the rows matching these symbols, and extract useful information from these row sequences.
Statistical AnalysisConfusionMatrixDefines a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one.
Statistical AnalysisConfusionMatrixPlotGenerates a plot of the output of the ConfusionMatrix function as a real table with precision, recall, false alarm rate, miss rate, and fmeasure for each class, as well as the corresponding micro average value for all the classes.
Statistical AnalysisEMAVGComputes the average over a number of points in a time series while applying an exponentially decaying damping (weighting) factor to older values so that more recent values are given a heavier weight in the calculation.
Statistical AnalysisFMeasureCalculates the accuracy of a test.
Statistical AnalysisGLMPredictScores input data using the model generated by the Stats GLM function.
Statistical AnalysisHistogramCounts the number of occurrences of a given data value that fall into each of a series of user-defined bins.
Statistical AnalysisKNNUses the kNN algorithm to classifies data objects based on their proximity to training objects with known classification.
Statistical AnalysisPercentileFinds percentiles on a per group basis.
Statistical AnalysisSMAVGComputes the average over a number of points in a series.
Statistical AnalysisVWAPComputes the average price of a traded item (usually an equity share) over a specified time interval.
Statistical Analysis - Enhanced Histogram FunctionHist_MapThe Hist_Map function organizes data into bins, with automatic or manual  bin breaks
Statistical Analysis - Enhanced Histogram FunctionHist_ReduceIf you want to output multiple histograms by groups, use PARTITION BY groupby_columns instead of PARTITION BY 1. The value of groupby_columns should be the same as the value used in Hist_Map
Statistical Analysis - LARS FunctionsLARSSelect most important variables one by one and fit the coefficients dynamically. The LARS function implements a model selection algorithm.
Statistical Analysis - LARS FunctionsLARSPredictLarsPredict takes in the new data and the model generated by LARS, and outputs the predictions.
Stream APIStreamAllows users to run scripts and functions written in various languages including Python, Ruby, Per, C#, and R.
Text AnalysisLevenshtein DistanceComputes the Levenshtein distance between two text values, that is, the number of edits needed to transform one string into the other, where edits include insertions, deletions, or substitutions of individual characters.
Text AnalysisNamed Entity Recognition (NER)Named entity recognition (NER) is a process of finding instances of specified entities in text (For example, person, location, and organization) It has functions to train, evaluate and apply models which perform this analysis.
Text AnalysisnGramTokenizes (or splits) an input stream and emits n multi-grams based on the specified delimiter and reset parameters. This function is useful for performing sentiment analysis, topic identification, and document classification.
Text AnalysisPoSTaggerTags the parts-of-speech of input text.
Text AnalysisSentenizerExtracts the sentences in the input paragraphs
Text AnalysisSentiment Extraction FunctionsThe sentiment extraction functions enable the process of deducing a user's opinion (positive, negative, neutral) from text-based content.
Text AnalysisText ClassifierChooses the correct class label for a given text input.
Text AnalysisText_ParserA general tool for working with text fields that can tokenize an input stream of words, optionally stem them, and then emit the individual words and counts for the each word appearance.
Text AnalysisTextChunkerDivides text into phrases and assigns each phrase a tag identifying its type.
Text AnalysisTextMorphProvides lemmatization, a basic tool in text analysis. The TextMorph function outputs a standard form of the input words.
Text AnalysisTextTaggingThe TextTagging function tags input tuples according to user-defined rules. These rules comprise logical and text processing operators.
Text AnalysisTF_IDFEvaluates the importance of a word within a specific document, weighted by the number of times the word appears in the entire corpus of documents.
Text AnalysisWMAVGComputes the average over a number of points in a time series while applying an arithmetically-decreasing weighting to older values.
Time Series, Path, and Attribution AnalysisDTWDynamic time warping (DTW) is a function that measures the similarity between two sequences that vary in time or speed.
Time Series, Path, and Attribution AnalysisDWTImplements Mallat’s algorithm, which is an iterate algorithm in the Discrete Wavelet Transform (DWT) field, and is designed to apply wavelet transform on multiple sequences simultaneously.
Time Series, Path, and Attribution AnalysisDWT2DImplements wavelet transforms on two-dimensional input, and simultaneously applies the transforms on multiple sequences.
Time Series, Path, and Attribution AnalysisFrequentPathsMines for patterns that appear more than a certain number of times in the sequence database. The difference between sequential pattern mining and frequent pattern
mining is that the former works on time sequences where the order of items must be kept.
Time Series, Path, and Attribution AnalysisIDWTApplies inverse wavelet transformation on multiple sequences simultaneously. IDWT is the inverse of DWT.
Time Series, Path, and Attribution AnalysisIDWT2DSimultaneously applies inverse wavelet transforms on multiple sequences. IDWT2d is the inverse function of DWT2d.
Time Series, Path, and Attribution AnalysisPath_AnalyzerThe path_analyzer function automates path analysis. This function acts as a wrapper function of the  path_generator, path_start, and path_summarizer functions. You can use this function to perform clickstream analysis of common sequences of user pageviews on websites.
Time Series, Path, and Attribution AnalysisPath_GeneratorThis function takes as input a set of paths where each path is a route (series of pageviews) taken by a user from start to end. For each path, it generates the correctly formatted sequence and all possible sub-sequences for further analysis by the Path Summarizer function. The first element in the path is the first page a user could visit. The last element of the path is the last page visited by the user.
Time Series, Path, and Attribution AnalysisPath_StartGenerates all the children for a particular parent and sums up their count. Note that the input data has to be partitioned by the parent column.
Time Series, Path, and Attribution AnalysisPath_SummarizerThe output of the Path Generator function is the input to this function. This function is used to sum counts on nodes. “Node” can either be a plain sub-sequence or an exit sub-sequence. Exit sub-sequence is the one in which both sequence and the sub-sequence are same. Exit sub-sequences are denoted by appending '$' to the end of the sequence.
Time Series, Path, and Attribution AnalysisSAXSymbolic Aggregate approXimation (SAX) transforms original time series data into symbolic strings. Once this transformation is complete, the data is more suitable for many additional types of manipulation, both because of its smaller size and the relative ease with which patterns can be identified and compared.
Time Series, Path, and Attribution AnalysisSessionizationSessionization is the process of mapping each click in a clickstream to a unique session identifier. One can define a session as a sequence of clicks by a particular user where no more than n seconds pass between successive clicks (that is, if we don't see a click from a user for n seconds, we start a new session).
Visualization FunctionsCfilterVizCfilterViz is a multiple-input partition SQL-MR function that visualizes the output of the cfilter SQL-MR function. This function uses the Sigma visualization module to generate Sigma graphs. Additionally, this function lets you specify these types: GEXF, Graphviz.
Visualization FunctionsNpathVizNpathViz is a SQL-MR function that visualizes the output of the Teradata Aster nPath SQL-MR function. NpathViz generates these visualization types: Sankey, Tree, Sigma, Chord, GEXF, and Graphviz.
Statistical AnalysisLinRegOutputs the coefficients of the linear regression model represented by the input matrices.
Statistical AnalysisGLMGLM performs linear regression analysis for any of a number of distribution functions using a user-specified distribution family and link function. Supported models in Aster Database are ordinary linear regression, logistic regression (logit model), and Poisson log-linear model.
Using GLM function for each value of response is equivalent to Multinomial Regression.
See: http://www.theanalysisfactor.com/logistic-regression-models-for-multinomial-and-ordinal-variables/
Using GLM function for each value of response with proportional model is equivalent to Ordinal Regression.
See: http://www.theanalysisfactor.com/logistic-regression-models-for-multinomial-and-ordinal-variables/
Statistical AnalysisLARSLeast Angle Regression (LARS)  and Least Absolute Shrinkage and Selection Operator (LASSO) are attractive variants of linear regression that select the most important variables, one by one, and fit the coefficients dynamically.
R-EngineIn-database Aster/R
R-EngineIn-database Aster/R
R-EngineIn-database Aster/R
Statistical AnalysisGLMGLM performs linear regression analysis for any of a number of distribution functions using a user-specified distribution family and link function. Supported models in Aster Database are ordinary linear regression, logistic regression (logit model), and Poisson log-linear model.
VariousSQL or In-database Aster/R
R-EngineIn-database Aster/R
Statistical AnalysisGLMGLM performs linear regression analysis for any of a number of distribution functions using a user-specified distribution family and link function. Supported models in Aster Database are ordinary linear regression, logistic regression (logit model), and Poisson log-linear model.
Statistical AnalysisPCAPrincipal component analysis (PCA) is a common unsupervised learning technique that is useful for both exploratory data analysis and dimensionality reduction.
Statistical AnalysisPCA
Decision TreesSingle Decision Tree FunctionsThese Single Decision Tree Functions let you create a predictive model without creating multiple decision trees. Only one decision tree is created.
Decision TreesRandom Forest FunctionsThe Random Forest Functions let you create a predictive model based on a combination of the CART algorithm for training decision trees, and the ensemble learning method of bagging.
Association AnalysisBasket_Generator/cfilterBasket_generator generates sets or “baskets” of items that occur together in records in data, typically transaction records or web page logs.
The cfilter function performs collaborative filtering, to find items or events that are frequently paired with other items or events.
Cluster AnalysisKmeans/CanopyKmeans is a simple unsupervised learning algorithm that solves the well-known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The goal is to define k centroids, one for each cluster.
Canopy clustering is a very simple, fast, and accurate method for grouping objects into preliminary clusters. Each object is represented as a
point in a multidimensional feature space.
Text AnalysisLDA FunctionsThe LDA functions build a topic model based on the supplied training data and parameters and estimate the topic distribution for each document based on the generated model. One of the LDA functions displays the readable information of the model.
Text AnalysisTextTokenizerThe TextTokenizer function extracts tokens (or text segments) from text. Examples of tokens are words, punctuation marks, and numbers.
VariousSQL, percentile, correlation, histogram, sample
Statistical AnalysisCorrelationComputes a global correlation between any pair of columns from a table.
Statistical AnalysisDistribution MatchingCarries out hypothesis testing and finds the best matching distribution for the data.
Statistical AnalysisDistribution MatchingCarries out hypothesis testing and finds the best matching distribution for the data.
Core technologiesCompressed storage/bitwise indices
VariousSQL, percentile, correlation, histogram, sample
Data TransformationVarious
Miscpmml_readerThe field-developed function can read a PMML Model file and apply it to a table of data, produce predictions..
Statistical AnalysisPrincipal Component AnalysisPrincipal component analysis (PCA) is a common unsupervised learning technique that is useful for both exploratory data analysis and dimensionality reduction. It is often used as the core procedure for factor analysis.
Statistical AnalysisPrincipal Component AnalysisPrincipal component analysis (PCA) is a common unsupervised learning technique that is useful for both exploratory data analysis and dimensionality reduction. It is often used as the core procedure for factor analysis.
Time Series, Path, and Attribution AnalysisCMAVGThe Cumulative Moving Average (CMAVG) function computes the average of a value from the beginning of a series.
Predictive Analysis
Text Analysis
Naive Bayes & Naïve Bayes Text ClassifierDetermines the classification of data objects based on the Naive Bayes algorithm, which takes into account the classification probability based on the training data set and additional input variables.
Statistical AnalysisSupport Vector MachinesConsists of three functions: (1) SparseSVMTrainer—Builds a predictive model according to a training set. (2) SparseSVMPredictor—Gives a prediction for each sample in the test set. (3)SVMModelPrinter—Displays the readable information of the model.
Statistical AnalysisApproximate Distinct CountComputes an approximate global distinct count of the values in the specified column or combination of columns. Based on probabilistic counting algorithms, this algorithm counts the approximate distinct values for any number of columns or combination of columns, while scanning the table only once. Evaluates all the children for a particular parent and sums up their count. Note that the input data has to be partitioned by the parent column.
Text AnalysisTF, ngram, sentenizer, lda, etcFunctions to analyze values as specific aggregate
R-EngineIn-database Aster/RAster connection to R functions
Statistical AnalysisSampleDraws rows randomly from the input relation. The function offers two sampling
schemes.
R-EngineIn-database Aster/RAster connection to R functions
VariousSQL, nc_*, vacuum, analyse, backup, etcFunctions to clean and optimize the tables within Aster
Statistical Analysis: HMMHMMUnsupervisedLearnerThe HMMUnsupervisedLearner function generates  multiple HMM models simultaneously, where each model is learned from a set of time-ordered sequences, where each sequence is represented as a vertex. (Hidden Markov Model is a statistical model describing the evolution of observable events that depend on internal factors, which are not directly observable.)
Statistical Analysis: HMMHMMSupervisedLearner The HMMSupervisedLearner function generates multiple HMM models simultaneously, where each model is learned from a set of time-ordered sequences. (Hidden Markov Model is a statistical model describing the evolution of observable events that depend on internal factors, which are not directly observable.)
Statistical Analysis: HMMHMMEvaluatorHMMEvaluator function measures the probabilities of one or more of newly occurred sequences, with ring2especting to each trained HMM. (Hidden Markov Model is a statistical model describing the evolution of observable events that depend on internal factors, which are not directly observable.)
Statistical Analysis: HMMHMMDecoder The HMMDecoder function finds the state sequence with the highest probability, given the learned model and observed sequences. (Hidden Markov Model is a statistical model describing the evolution of observable events that depend on internal factors, which are not directly observable.)
Statistical Analysis VectorDistanceThis function measures the distance between sparse vectors for example, TF-IDF vectors in a pairwise manner.
Statistical Analysis LRTESTThis function performs the likelihood ratio test for two GLM models.
Graph Analysis ModularityThe Modularity function discovers communities in input graphs.
Graph Analysis pSalsaThe pSALSA function personalized SALSA is a SQL-GR function that evaluates the similarity of nodes in a bipartite graph according to their proximity. It can be used for recommendation.
Graph Analysis Shaply ValueThe Shapley Value queries and helper functions compute the Shapley Value, a measure of the value of individuals in a coalition.
Geometry GeometryLoaderThe GeometryLoader function fetches various file-based geospatial files from AFS, parses them, and stores them in Aster Database.
Geometry PointInPolygonThe PointInPolygon Location Point in Polygon function is a geometry function that takes as input a list of location points and a list of polygons.
Geometry GeometryOverlayThe GeometryOverlay function calculates the result of overlaying two geometries as specified by the overlay operator.
Data Transformation URIUnpackThis function breaks up a hierarchical uniform resource identifier URI into its constituent components and extracts the values of the parameters specified by the function.
Data Transformation URIPackThe URIPack function reconstructs encoded hierarchical URI strings that were unpacked by the URIUnpack function.
Data Transformation: Statistical Analysis/ ScalingScaleMapThis function retrieves statistical information.
Data Transformation: Statistical Analysis/ ScalingScaleThis function is a multiple-input function that generates scaled values for the entire input data set.
Data Transformation: Statistical Analysis/ ScalingScalePrinterThis function generates the statistical information for the entire data set.
Data Transformation: Statistical Analysis/ ScalingPartitionScaleThis function scales the sequences in each partition independently.
Time Series: ShapeletsShapeletMaskerThe function emits the sax_word, its index in the input time series and the result of the mask sax_code_mask. The output of the function is used to generate candidates for shapelets.
Time Series: ShapeletsShapeletFrequencyFinderThis function counts the number of times each masked sax word appears in each class representative of the time series.
Time Series: ShapeletsShapeletStrengthFinderThis function operates on the output of ShapeletFrequencyFinder and computes the distinguishing power strength of shapelets.
Time Series: ShapeletsShapeletFinderThis function emits the shapelets in original un-encoded time series, format for the given training data set.