Aster nPath Guide

Teradata Aster is an analytic platform that embeds MapReduce analytic processing with data stores. •Embedded MapReduce analytic processing and unique SQL-MapReduce® framework •Massively parallel data store for multistructured data •Intuitive tools and SQL-MapReduce libraries for rapid analytic development
Teradata Employee

Aster nPath Guide

As we've seen in our introduction to the power of Aster's nPath (Using Aster Express: Act 3, On the Road to nPath), the nPath function allows you to perform regular pattern matching over a sequence of rows. With it, you can find sequences of rows that match a pattern you’ve specified and easily extract information from these matched PATTERNs using SYMBOLs that represent the matched rows in the pattern.

nPath uses regular expressions because they are simple, widely understood, and flexible enough to express most search criteria. While most uses of regular expressions focus on matching patterns in strings of text; nPath enables matching patterns in sequences of rows.

nPath is powerful and for simple examples is very easy to use.  But as we go through our nPath tutorial articles and start looking at more complex examples, I thought a quick detour would be helpful to take a deeper dive into the syntax options.

nPath Syntax

nPath, wrapped within a SQL expression, has the following syntax:



ON { table_name | view_name | ( query ) }

PARTITION BY expression [, ...]

ORDER BY expression [ ASC | DESC ] [, ...]


PATTERN ( 'pattern_of_symbols' )

SYMBOLS ( symbol_predicate AS symbol [, ...] )

RESULT ( aggregate_function( expression OF symbol ) AS alias [, ...] )

)[, ... ]

[WHERE ... ]

[GROUP BY ... ]

[HAVING ... ]

[ORDER BY ... ]

[LIMIT ... ]

[OFFSET ... ]

As you can see, there's quite a bit of power and flexibility in the nPath syntax, both in the parameters for nPath itself, as well as the SQL statement wrapper.  Let's take a closer look at the nPath parameters.

nPath Parameters

The ON clause specifies the input relation, which can be a table, view, or query. If you provide a query, you must enclose it in parentheses.

The PARTITION BY expression defines the scope of a partition of input rows over which nPath searches for pattern matches.

The ORDER BY expression specifies the sort-order of the input rows.

The MODE clause indicates whether matched PATTERNs may overlap.

  • In OVERLAPPING match mode, nPath finds every occurrence of the pattern in the partition, regardless of whether it might have been part of a previously found match.
  • In NONOVERLAPPING match mode, nPath begins the next pattern search at the row that follows the last PATTERN match.

The PATTERN clause defines the sequence of rows nPath uses for its searches.

You express the pattern using symbols and operators.

The following are examples of operators may be used in a pattern.

.                is followed by; the cascade operator.
The expression A.B means “A is followed by B”.

| or; alternative

? occurs at most once

* occurs zero or more times

+ occurs at least once.

The precedence of operators is, from highest to lowest:

    1. Cascade operator (“.”)

    2. Alternative operator (“|”)

    3. Frequency operators (“?”, “*”, “+”)

Operators with equal precedence associate left to right.

The SYMBOLS clause defines the row-elements in the pattern, expressed as a comma-separated list of symbol definitions.

Each symbol definition is written in the form “symbol_predicate AS symbol” where symbol_predicate is an SQL predicate and symbol is a case-insensitive string you’ll use to represent rows that match this predicate.

It’s common to define each symbol as just one or two uppercase letters, since short symbols are easy to read when assembled into a pattern expression.

For example, a SYMBOLS clause for analyzing website visits might look like this:


pagetype = 'homepage' AS H,

pagetype <> 'homepage' AND pagetype <> 'checkout' AS PP,

pagetype = 'checkout' AS CO


A symbol is applied to a row only if the row satisfies the symbol’s predicate. If a null value is encountered when trying to match the symbol predicate, it’s treated as a non-match.

A symbol may be associated with the predicate “true”, meaning that the symbol can match any row.

Note that the predicates for different symbols may overlap, and therefore multiple symbols may match the same row.

In your symbol predicate, you can compare the current row to a preceding row to determine if it is considered a match for the symbol.  These are called “LAG expressions”, which take the following 2 formats:

     LAG ( expression-prev, lag-rows [, default-value ] ) operator expression-current 


     expression-current operator LAG ( expression-prev, lag-rows, [ default_value ] ) 


expression-current is the name of the column from the current row, or an expression operating on this column.

the operator can be >, >=, <, <=, =, or !=

expression-prev is the name of the column from the previous row, or an expression operating on this column.

lag-rows is the number of rows to count back from the current row to reach the row we designate as the earlier row. For example, to compare with the immediately preceding row, use “1”.

default-value is the optional value to be used when there are no rows which can be designated as “earlier row,” in such a case the default-value will be evaluated on the current row and used in place of the expression-prev.


The RESULT clause defines the output columns of this nPath query as a comma-separated list of expressions.

The RESULT clause is evaluated once for each matched PATTERN in the partition. In other words, nPath generates one output row per PATTERN match.

In the RESULT clause, each expression operates on one or more symbols, and each expression is followed by the alias to be applied to this column of output.

The form of an output column definition in the RESULTS clause is:

       aggregate_function ( expression OF symbol ) AS alias 

Since each symbol represents all the rows that matched that symbol’s predicate in this particular matched PATTERN, you must specify an expression (often just the column name) to state what values you want to retrieve from the matched rows, and then apply an aggregate function to the results of that expression, in order to generate a single, useful value from the set of matched rows in the symbol.

Aggregate functions can be COUNT, SUM, MAX, MIN, AVG, and the special nPath sequence aggregates FIRST, LAST, FIRST_NOTNULL, LAST_ NOTNULL, MAX_CHOOSE, and MIN_CHOOSE over each matched pattern.

For example, imagine that we want to count how many product pages a web visitor viewed during a visit to our website.  To do this, our output column definition in the RESULT clause might look like the following.

Let’s assume we’ve defined the symbol PP to represent rows that record a user’s views of our product pages on the website:

    COUNT ( * OF PP ) AS count_product_pages_visited

Full nPath Syntax Example

To complete this quick overview, here's an nPath Query that pulls together much of the syntax options that we've covered:

ON aggregate_clicks
PARTITION BY sessionid
ORDER BY clicktime ASC
PATTERN ( 'H+.D*.X*.P1.P2+' )
SYMBOLS ( 'true' AS X,
pagetype = 'home' AS H,
pagetype <> 'home' AND pagetype <> 'checkout' AS D,
pagetype = 'checkout' AS P1,
pagetype = 'checkout' AND productprice > 100 AND productprice > LAG (productprice, 1, 100::REAL ) AS P2
FIRST ( productprice OF P1 ) AS first_product,
MAX_CHOOSE ( productprice, productname OF P2 ) AS max_product,
FIRST ( sessionid OF P2 ) AS sessionid

We'll continue adding more nPath examples in the articles to follow.  This syntax overview will hopefully be a valuable reference to help you understand the example queries and also guide you as you create your own nPath analytics on your own data.

And as always, have fun!

Tags (2)

Re: Aster nPath Guide

I've run into an issue using the LAG statement in nPath. There appears to be a bug. If I don't declare the datatype of the default value in the lag statement , LAG(page,1,'none') instead of LAG(page,1,'none'::varchar(10), I get a java error:

at com.asterdata.ncluster.sqlmr.sdk.testrunner.ValueHolde rComparator$StringComparator.compareCore(ValueHolderCo
at com.asterdata.ncluster.sqlmr.sdk.testrunner.ValueHolde rComparator$
at com.asterdata.ncluster.sqlmr.sdk.testrunner.ValueHolde rComparator$
at npath.evaluateSymbolsPredicate(
at npath.matchForwards(
at npath.operateOnPartition(
at com.asterdata.ncluster.sqlmr.internal.SwigRunner.runOp eratingOnPartitionedInput(
at com.asterdata.ncluster.sqlmr.internal.SwigRunner.runOp eratingTask(
at com.asterdata.ncluster.sqlmr.internal.SwigRunner.runTa sk(
at com.asterdata.ncluster.sqlmr.internal.FunctionThread.r unFunction(
at com.asterdata.ncluster.sqlmr.internal.FunctionThread.r un(
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask (
at java.util.concurrent.ThreadPoolExecutor$