Aster

Teradata Aster is an analytic platform that embeds MapReduce analytic processing with data stores.
•Embedded MapReduce analytic processing and unique SQL-MapReduce® framework
•Massively parallel data store for multistructured data
•Intuitive tools and SQL-MapReduce libraries for rapid analytic development

turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

06-18-2012
10:50 AM

06-18-2012
10:50 AM

This presentation will cover some of the finer points on Aster’s nPath function. Here’s the overview:

- Working with
**ACCUMULATE** - Understanding
**MODE**(overlapping | nonoverlapping) - Experimenting with Operator precedence and parentheses
- Number of occurrences options
- Using Multiple
**WHERE**clauses and composite PATTERNs

If you haven’t read Teradata Mike’s Aster nPath Guide first, now is a good time. It provides a great foundation of these concepts and is compulsory before we dig into the more advanced stuff. However, at the risk of redundancy, here’s the nPath Operators we'll be using in our labs.

Operators | Definition |

| Is followed by. Sometimes called the Cascade operator. The expression A.B means ‘A followed by B’ |

| | Or. Sometimes called Alternative operator |

| Occurs at most once (0 to 1 occurrence) |

| Occurs zero or more times ( >=0 occurrence) |

| Occurs at least once (>=1 occurrence) |

It worth noting the Operators above are 'greedy' in nPath. This means they are matched from left-to-right and as many rows as possible that can fit in the wildcards will be. You will see examples of this below.

We will be using the **ACCUMULATE** aggregate in our **RESULTS** clause so it’s important we understand how it works (Note this keyword can be used in a number of Aster functions so it’s not just limited to nPath). **ACCUMULATE **provides a list of input columns that will be passed as-is to the result set when conditions are True for that Pattern. It will be in the form of a comma delimited string of sequenced values enclosed in brackets. I use it for troubleshooting when I want to understand the nPath logic. Suppose you have the following table:

c1 | c2

-----|-----

1 | a

1 | a

1 | b

1 | b

1 | c

1 | c

2 | a

2 | a

2 | b

2 | b

2 | c

2 | c

You run the following nPath code:

select * from npath

(on (select * from overlap)partition by c1 order by c2

mode (nonoverlapping)

pattern ('a.b')

symbols (c2='a' as A, c2='b' as B) -- using lower case to match data

result (count(* of any(A,B)) as Ct_Any_AB, accumulate(c2 of any(A,B))));

The result set is:

ct_any_ab | accumulate

---------------|------------------

2 | [a, b]

2 | [a, b]

Basically nPath starts walking the partitioned ordered rows in top-to-bottom fashion looking for True conditions to return in the answer set. In our example, the nPath mechanics works as follows:

**Row 3** – False (no match between row3 and row4). Move on to row 4

**Row 4** – True (found match between row 4 and row 5) with RESULTS = 2 [a, b]. Move on to row 6

**Row 6 **– False (no match between row 6 and row 7). Move on to row 7 ......

I think you get the idea by now. Eventually row 10 and row 11 would result in another True which would return the second 2 [a, b].

So that’s a quick overview of how nPATH’s **ACCUMULATE **works. I don’t want to muddy the waters, but I would be amiss without mentioning a few things in the above example.

- Unlike Teradata, Aster is case-specific. If I would have replaced the SYMBOLS line with this: symbols(c2='
**A**' as A, c2='**B**' as B) then the result set would have been empty since there would have been no matches. So be extremely careful with case in your code; and for that matter when INSERTING data

- Notice in the sequence above that once a True was found in Row 2, the next row number to be checked was Row 4, and not Row 3. That was due to the
**mode(nonoverlapping)**clause. We delve into this topic next but just wanted to point it out

OK, we’ve got our feet wet. It’s time to dive into some more detailed examples. Let’s go.

The **MODE **clause indicates whether the matched PATTERNs may overlap. You get to choose between **OVERLAPPING **or **NONOVERLAPPING **as shown below.

select * from npath (on (select * from overlap )

partition by c1 order by c2

mode (NONOVERLAPPING | OVERLAPPING) -- only get to pick one

pattern ('A+.B.B.C')

symbols(c2='a' as A,c2='b' as B,c2='c' as C)

result (first(c1 of ANY(A,B,C)), accumulate(c2 of ANY(A,B,C))));

In **NONOVERLAP** match mode, nPath begins the next pattern search at the row that follows the last PATTERN match (in our case, next pattern search would start at row 8).

In **OVERLAPPING** match mode, nPath finds every occurrence of the pattern in the partition, regardless of whether it might have been part of a previously found match. This means that, in OVERLAPPING mode, one row can match multiple symbols in a given matched PATTERN (in this case the next pattern match starts at row 4).

But don’t take my word for it (I’m from Missouri so you have to ‘Show Me’). Let’s do the lab using the same 12-row data set we used earlier. Since my pattern is **('A+.B.B.C')**, this means we are looking for a PATTERN of at least 1 A, followed by 2 B’s, followed by a C.

With the code running **mode (NONOVERLAPPING)** , here’s the output.

first | accumulate

----------|-----------------------

2 | [a, a, b, b, c]

1 | [a, a, b, b, c]

Repeat, only this time using **mode (OVERLAPPING)** you get 4 rows for the output.

first | accumulate

-----------|-----------------------

2 | [a, b, b, c]

2 | [a , a, b, b, c]

1 | [a, b, b, c]

1 | [a , a, b, b, c]

Of course, **OVERLAPPING **will not guarantee you have more output rows than **NONOVERLAPPING**. It all depends on the data. Suppose I were to run the above code with a slightly different pattern as follows:

**pattern ('A.B.B.C') **then I would get the same answer regardless of mode type since there’s only one match in sequence.

accumulate

----------------------

[a, b, b, c]

It is crucial to know the precedence of Operators from highest to lowest. Here it is:

- Cascade ( . )
- Alternate ( | )
- Frequency ( ?, *, + )

If you wish to change order of execution, parenthesis is the way to do it.

Here’s the table we’ll be running our queries against:

I want to create a query that finds all combinations of **A,B** or **A,C**. Looking at the table above, I figured I would get 3 hits (Row 1-2, Row 4-5, Row 6-7). Here's the code:

select * from npath

(on (select * from npathBetween)

partition by c2 order by c1 mode (nonoverlapping)

pattern ('A.B|C') -- notice the lack of parens here

symbols(c3='A' as A, c3='B' as B, c3='C' as C)

result(count(* of A) as Ct_A,

accumulate(c3 of any(A,B)) as Any_AB,

accumulate(c3 of any(A,C)) as Any_AC));

Hmmm. I only had 2 hits (A,B) and was missing the (A,C) hit. I could not understand why I did not have the A,C combination in my result set. After going back to the User’s Guide, I figured it out. I had forgotten that (** .** ) takes precedence over (** |** ) . So my logic was looking for PATTERNS of **A.B** rows or **C** rows as follows:

**Row 1-2** is A.B Match (A.B)

**Row 3** is C Match (C)

**Row 4-5** is A.B Match (A.B)

**Row 6** is A (with no following B) No Match

**Row 7** is C Match (C)

… when I was really wanted PATTERNS of **A.B** or **A.C**. To fix the problem, I added paretheses to the code:

select * from npath

(on (select * from npathBetween)

partition by c2 order by c1

mode (nonoverlapping)

pattern ('(A.(B|C))') -- add Parens

symbols(c3='A' as A,c3='B' as B,c3='C' as C)

result(count(* of A) as Ct_A, -- Ct=2

count(* of C) as Ct_C, -- Ct=2

accumulate(c3 of any(A,B)) as Any_AB,

accumulate(c3 of any(A,C)) as Any_AC));

Sure enough I got my 3 hits. 2 hits for A,B pattern and 1 for A,C pattern.

Ahh, that’s more like it. Here you can see the 3 hits of 2 **A,B** and 1 **A,C**. The bottom line; it is a best practice to use parentheses in your PATTERN string. It takes away any ambiguity that may exist and makes it more readable for others looking at your code.

If you wish to search for an exact number of occurrences that totals more than one, the nPath coding can become tedious. There are 3 options you can use:

- ' (x){a} ' Exactly A number of occurrences of X
- ' (x){a,} ' At least A number of occurrences of X
- ' (x){a,b} ' A to B occurrences of X

For example, suppose you wish to search for sub-sequence of (A.B|C) at least 4 times. You could code it as: ** PATTERN ( 'X. (Y.Z) . (A.B|C) . (A.B|C) . (A.B|C) . (A.B|C) * ' )** but that’s pretty messy.

An alternative is the following syntax: **PATTERN ( 'X. (Y.Z) . (A.B|C) {4,} ' )**

The only thing to be aware of is the need for your occurring PATTERN to be in enclosed in parentheses with the single tics on the outside of the entire PATTERN. Here’s an example of doing it the right way: **PATTERN ('X.(A.(B|C)){1,3}') **

It is possible to have multiple **WHERE **clauses in your nPath code. Their position in the statement determines if they filter INPUT rows or OUTPUT rows.

The below **WHERE **clause in the second SELECT filters **INPUT **rows where c1 = 1 or c1 =2. That’s simple enough.

select * from npath

(on (select * from overlap WHERE c1 in (1,2))

partition by c1 order by c2 mode (overlapping)

pattern ('A?.B?.C?')

symbols (c2='a' as A,c2='b' as B,c2='c' as C)

result(first(c1 of ANY(A,B,C)), accumulate(c2 of ANY(A,B,C))));

I'm a big believer in using WHERE to filter an many Input rows as possible. It's a much better performer than scanning the entire Table's rows.

If you wanted to later filter **Output **rows, you can do that with a 2^{nd} **WHERE **clause as demonstrated.

select * from npath

(on (select * from overlap WHERE c1 in (1,2))

partition by c1 order by c2 mode (overlapping)

pattern ('A?.B?.C?')

symbols (c2='a' as A, c2='b' as B, c2='c' as C)

result(first(c1 of ANY(A,B,C)), accumulate(c2 of ANY(A,B,C))as After_nPath))

WHERE After_nPath = ('[a, b]') or After_nPath = ('[b, c]');

Note I could have accomplished the same thing by having composite PATTERN search instead of using the 2nd **WHERE **clause. Following code produces the same output as above.

select * from npath

(on (select * from overlap WHERE c1 in (1,2))

partition by c1 order by c2 mode (overlapping)

pattern ('(A.B) | (B.C)')

symbols (c2='a' as A, c2='b' as B, c2='c' as C)

result(first(c1 of ANY(A,B,C)), accumulate(c2 of ANY(A,B,C))));

Ensure you have space after the comma ( ie: ** [a, b]** ). If you write it as **[a,b]** , it will return 0 rows.

Be extremely careful when writing composite PATTERNS as I have notice (what I think) is a bug in the code. If you were to write as: **PATTERN('A.B' | 'B.C')**, you get back a message along the lines '**The connection to the server was lost**.' You will not get a result set until you fix the code. I believe this is because you must have only one set of single tics in PATTERN.

As you can see, there’s lot of functionality baked into the nPath. I consider it one of Aster’s premier functions since it has so many uses and can provide deep insights into customer’s behavior such as buying trends and patterns.

Keep your eyes peeled for more nPath articles. In particular, we’ll be talking about the **LAG **function next which can be embedded in **SYMBOLS **clause. It can answer such questions as: “Find customers who view a product and then buy it with 1 minute on my web page.” That would be a valuable piece of information if I were a retailer.

And if you are interested in running these labs now, here's the **CREATE TABLE **statment and **INSERTs **to help you along:

create table overlap (c1 int, c2 text) distribute by hash(c1);

insert into overlap values (1, 'a');

insert into overlap values (1, 'a');

insert into overlap values (1, 'b');

insert into overlap values (1, 'b');

insert into overlap values (1, 'c');

insert into overlap values (1, 'c');

insert into overlap values (2, 'a');

insert into overlap values (2, 'a');

insert into overlap values (2, 'b');

insert into overlap values (2, 'b');

insert into overlap values (2, 'c');

insert into overlap values (2, 'c');

11 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-20-2013
07:28 AM

04-20-2013
07:28 AM

Hi Mark,

My customer has just setup a Terdata Aster POC environement. I have fairly good knowledge about Teradata working as a software engineer in Teradata Labs in the past and now as a full time DBA but I am pretty new to Aster Technology. Could you please suggest some reference document as a starting point to understand Aster Architecture. Is there anything similar to Teradata reference documentation in Aster?

Thanks,

Ravi

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-20-2013
08:53 AM

04-20-2013
08:53 AM

Go to http://info.teradata.com to download the Aster user manuals.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-26-2013
07:45 AM

04-26-2013
07:45 AM

cannot run teh query when u have a select * frpom npathBetween in the query, is this a function or a keyword in aster or a table?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-26-2013
07:53 AM

04-26-2013
07:53 AM

The FROM clause is where the function name resides. In our examples above, the FROM will always have 'npath' it. The ON clause is where you specify the Input table. So NPATH is constant and the ON clause can be any table name you wish to specify.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-26-2013
09:57 AM

04-26-2013
09:57 AM

Sorry if i was not clear in what i was asking, th eexample which was listed above

select * from npath

(on (select * from npathBetween)

partition by c2 order by c1

mode (nonoverlapping)

pattern ('(A.(B|C))') -- add Parens

symbols(c3='A' as A,c3='B' as B,c3='C' as C)

result(count(* of A) as Ct_A, -- Ct=2

count(* of C) as Ct_C, -- Ct=2

accumulate(c3 of any(A,B)) as Any_AB,

accumulate(c3 of any(A,C)) as Any_AC));

when i execute it i get the error saying that

ERROR: relation "npathbetween" does not exist

why?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

05-27-2013
04:41 PM

05-27-2013
04:41 PM

The user of the previous question might have figured out by now that there is a missing table not explicitly mentioned/provided in the tutorial above - for the impatient the missing table/data is:

create table npathBetween (c1 int, c2 int, c3 text) distribute by hash(c1);

insert into npathBetween values (1, 1,'A');

insert into npathBetween values (2, 1,'B');

insert into npathBetween values (3, 1,'C');

insert into npathBetween values (4, 1,'A');

insert into npathBetween values (5, 1,'B');

insert into npathBetween values (6, 1,'A');

insert into npathBetween values (7, 1,'C');

{cf. sect. 3 above}

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

06-18-2013
05:02 AM

06-18-2013
05:02 AM

Hi Mark,

Could you give examples of result and accumulate more clearly and mode aswell.

Thanx

Marshal

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

06-18-2013
07:20 AM

06-18-2013
07:20 AM

Instead of re-creating the wheel, the best source of information for nPath is located at:

Download the Aster Analytic Functions User guide. It's all there.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

12-27-2013
01:36 AM

12-27-2013
01:36 AM

I have used the below code:-

1 2 3 4 5 6 | `select` `* ` `from` `npath` `(` `on` `(` `select` `* ` `from` `overlap)partition ` `by` `c1 ` `order` `by` `c2 ` `mode (nonoverlapping)` `pattern (` `'a.b'` `)` `symbols (c2=` `'a'` `as` `A, c2=` `'b'` `as` `B) ` `-- using lower case to match data` `result (` `count` `(* ` `of` `any` `(A,B)) ` `as` `Ct_Any_AB, accumulate(c2 ` `of` `any` `(A,B))));` |

But is giving me the following error:-

__ERROR__: The supplied statement could not be completed due to an internal error at databse worker node.

Please help me resolve this issue.

Thanks

Anuj

Copyright © 2004-2015 Teradata Corporation. Your use of this Teradata website is governed by the Privacy Policy and the Terms of Use, including your rights to materials on this website, the rights you grant to your submissions to this website, and your responsibilities regarding your conduct on this website.

The Privacy Policy and Terms of Use for this Teradata website changed effective September 8, 2016.