nPath: OVERLAPPING vs. NONOVERLAPPING

Learn Data Science
Teradata Employee

nPath is a very powerful sequential time series analytic for pathing and pattern analysis.  One of the most common and confusing questions I receive is what is the difference between MODE (OVERLAPPING) and MODE (NONOVERLAPPING)?  See example of a statement below.  I have commented out the MODE(NONOVERLAPPING) and highlighted it RED.  In this blog I will show you the difference between overlapping and nonoverlapping and will go through the data and show you the difference.

It is very easy to understand if you think of the concept of shingling.  Yes, shingling in text analytics, as well as shingling in a roof of a house.  Especially one in south Florida where the shingles are stacked on top of each other.  This is a very similar concept in OVERLAPPING and NONOVERLAPPING.  OVERLAPPING SUPPORTS SHINGLING RESULTS.  NONOVERLAPPING DOES NOT.  The difference is drastic and can really impact your resultset.

Please review the source data for our nPath Query:  (This is a retail clickstream simulation dataset.  It contains the GREEN partition for a cookie and session.  It contains a page that was clicked on with the date and time that the click occurred.)  Ignore product_id and search_word for this example.

cookie_idsession_idproduct_idpagesearch_worddatestamp
3601005view_product6/11/2009 13:29
3600search6/11/2009 13:34
3600search_results6/11/2009 13:38
3601005search_results6/11/2009 13:41
3601005search_results6/11/2009 13:43
3601003view_product6/11/2009 13:45
3601003checkout6/11/2009 13:46
3601003thank_you6/11/2009 13:49
3601002view_product6/11/2009 13:51

See the nPath example below:  (what it does is look for patterns in partition(cookie_id and session_id)  PATTERN(PAGE+.CONVERSION)  The symbol PAGE is made up of anything but 'HOME' in the page field of the source table.  The CONVERSION SYMBOL is where page = 'checkout'.  So the pattern must have at least one PAGE followed by and ending with a  CONVERSION.  The npath query also uses the datestamp field to sort the clicks ascending.

SELECT cookie_id, session_id, path

    FROM nPath(

    ON retail_web_clicks

    PARTITION BY cookie_id, session_id

    ORDER BY datestamp

    MODE (OVERLAPPING)  -- or MODE(NONOVERLAPPING)

    PATTERN ('PAGE+.CONVERSION')

    SYMBOLS (

    page <> 'home' AS PAGE,

    page = 'checkout' AS CONVERSION

    )

        RESULT (

        FIRST (cookie_id OF PAGE)  AS cookie_id,

        FIRST (session_id OF PAGE)  AS session_id,

        ACCUMULATE (page OF ANY (PAGE, CONVERSION)) AS PATH

        )

        ) n;

If we use the OVERLAPPING nPath argument we will see OVERLAPPING.  It will included all paths to a checkout click event.  Notice that it includes repeating sets of shingles.  They overlap.

OVERLAPPING
cookie_idsession_idpath
360[view_product, search, search_results, search_results, search_results, view_product, checkout]
360[search, search_results, search_results, search_results, view_product, checkout]
360[search_results, search_results, search_results, view_product, checkout]
360[search_results, search_results, view_product, checkout]
360[search_results, view_product, checkout]
360[view_product, checkout]

Now lets look at the MODE NONOVERLAPPING example from the same data set.  You notice that it only includes the full path and not the shingled elements that make up that set.

NONOVERLAPPING
cookie_idsession_idpath
360[view_product, search, search_results, search_results, search_results, view_product, checkout]

Just to be complete lets include the Analytics Foundation Guide explanation:

MODE Clause

The MODE clause indicates whether matched PATTERNs may overlap. After finding one sequence of rows that matches the specified pattern, Teradata Aster nPath looks for the next match.  To begin the next pattern search, the choice of the starting row depends on the match mode you have chosen:

• In OVERLAPPING match mode, Teradata Aster nPath finds every occurrence of the pattern in the partition, regardless of whether it might have been part of a previously found match. This means that, in OVERLAPPING mode, one row can match multiple symbols in a given matched PATTERN.

• In NONOVERLAPPING match mode, Teradata Aster nPath begins the next pattern search at the row that follows the last PATTERN match. Note that this is the default behavior of many commonly used pattern matching utilities like the popular grep utility in UNIX systems.

Cheers:

Thuma