npath failed unexpectedly

Aster
Not applicable

npath failed unexpectedly

My nPath Query keeps failing with a cryptic error message:

SQL-MR function NPATH failed unexpectedly. The following is information that may be useful to the developer of NPATH: java.lang.IllegalStateException: Db can't be registered more than once at com.asterdata.ncluster.sqlmr.util.dbc.DbFactoryProviderImpl$UserMemInfo.register(DbFactoryProviderImpl.java:637) at com.asterdata.ncluster.sqlmr.util.dbc.DbFactoryProviderImpl$SnapshotSortedMapDbImpl.open(DbFactoryProviderImpl.java:177) at com.asterdata.ncluster.util.kvstore.KVLinkedList.open(KVLinkedList.java:188) at com.asterdata.ncluster.aggregator.accumulatecdistinct.AccumulateCDistinct.<init>(AccumulateCDistinct.java:72) at ColumnHandle.newAggFunction(ColumnHandle.java:783) at ColumnHandle.update(ColumnHandle.java:630) at HybridStateResult.updateResultCol(HybridStateResult.java:236) at npath.updateResult(npath.java:1185) at npath.matchForwards(npath.java:1697) at npath.patternMatch(npath.java:2101) at npath.operateOnMultipleInputs(npath.java:506) at com.asterdata.ncluster.sqlmr.internal.SwigRunner.runOperatingOnMultipleInputs(SwigRunner.java:990) at com.asterdata.ncluster.sqlmr.internal.SwigRunner.runOperatingTask(SwigRunner.java:848) at com.asterdata.ncluster.sqlmr.internal.SwigRunner.run(SwigRunner.java:663) at com.asterdata.ncluster.sqlmr.internal.FunctionThread.runFunction(FunctionThread.java:123) at com.asterdata.ncluster.sqlmr.internal.FunctionThread.run(FunctionThread.java:59) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) In addition, more information may be available in the 'SQL-MapReduce Logs' portion of the Aster nCluster Management Console (click 'View Logs' from the Process Detail page).

When running the very same Query on a small selection of the Data, it runs fine. It is only when I run it on the full Dataset (~350 million rows) that it fails.

Here's the query:

DROP TABLE IF EXISTS schweizerjul.prips_npath_pdhdz_220517;
 
CREATE TABLE schweizerjul.prips_npath_pdhdz_220517
 DISTRIBUTE BY REPLICATION
 AS
 SELECT
    *
from
nPath (
    ON schweizerjul.prips_sth_large
    PARTITION BY avis_sendungs_id, year
    ORDER BY sth_scan_zeitpunkt
    SYMBOLS (
        sth_sto_nummer = 8 AND sth_stg_nummer = 1 and sth_stg_sds_nummer = 45 AS PE,
        sth_sto_nummer = 5 AND sth_stg_nummer = 2 and sth_stg_sds_nummer = 65 AS PA,
        sth_sto_nummer = 3 AND sth_stg_sds_nummer = 65 AS DR,                      
        sth_sto_nummer = 6 AND sth_stg_sds_nummer = 30 AS H,
        sth_sto_nummer = 3 AND sth_stg_sds_nummer = 45  AS DZ,
        sth_stg_sds_nummer = 80 AS L,
        sth_sto_nummer = 2 AS S,
        sth_sto_nummer = 4 and sth_stg_sds_nummer = 90 AS TZ,
        sth_sto_nummer = 4 and sth_stg_sds_nummer = 60 as TX,
        sth_sto_nummer = 8 and sth_stg_sds_nummer = 90 AS PZ
        )
    PATTERN('PE.PA.(DR|DZ|H|S)*.(TZ|TX|PZ)')
    RESULT(
        FIRST(avis_sendungs_id OF ANY (PE, PA, DR, DZ,H, TZ, TX, PZ)) as sendungs_id,
        FIRST(year OF ANY  (PE, PA, DR, DZ,H, TZ, TX, PZ)) as year,
        FIRST(avis_min_scan_zeitpunkt OF ANY(PE, PA, DR, DZ,H, TZ, TX, PZ)) as min_scan_zeitpunkt,
        FIRST(CAST(sth_scan_zeitpunkt as date) OF ANY(PE)) as PE_date,
        FIRST(CAST(sth_scan_zeitpunkt as date) OF ANY(TZ, TX, PZ)) as ZSV_date,
        ACCUMULATE (CDISTINCT sth_sto_nummer OF ANY (PE, PA, DR, DZ,H, TZ, TX, PZ)) AS path,
        FIRST(CAST(sth_scan_zeitpunkt AS date) - CAST(avis_min_scan_zeitpunkt AS date) OF ANY(TZ, TX, PZ)) as rundays,
        FIRST(sth_scan_zeitpunkt - avis_min_scan_zeitpunkt OF ANY(PE)) as PE,
        FIRST(sth_scan_zeitpunkt - avis_min_scan_zeitpunkt OF ANY(PA)) as PA,
        FIRST(sth_scan_zeitpunkt - avis_min_scan_zeitpunkt OF ANY(DR)) as DR,        
        FIRST(sth_scan_zeitpunkt - avis_min_scan_zeitpunkt OF ANY(H)) as H,       
        FIRST(sth_scan_zeitpunkt - avis_min_scan_zeitpunkt OF ANY(DZ)) as DZ,      
        FIRST(sth_scan_zeitpunkt - avis_min_scan_zeitpunkt OF ANY(TZ, TX, PZ)) as ZSV,
        FIRST(sth_org_id OF ANY(PE)) as oid_PS,
        FIRST(sth_org_id OF ANY(DR)) as oid_DR,
        FIRST(sth_org_id OF ANY(H)) as oid_H,
        FIRST(sth_org_id OF ANY(DZ)) as oid_DZ
        )
    MODE (NONOVERLAPPING)
    )
WHERE
path = '[8, 5, 3, 6, 3, 4]' AND
pe = '00:00:00' AND
-- Check whether holidays are in the runtime of the parcel
-- using http://stackoverflow.com/questions/325933/determine-whether-two-date-ranges-overlap
(    ZSV_date <  '03/25/2016' OR
    (PE_date > '03/28/2016' AND ZSV_date <  '05/01/2016') OR
    (PE_date > '05/05/2016' AND ZSV_date < '05/15/2016') OR
    (PE_date > '05/16/2016' AND ZSV_date < '05/26/2016') OR
    (PE_date > '05/26/2016' AND ZSV_date < '10/03/2016') OR
    (PE_date > '10/03/2016' AND ZSV_date < '10/31/2016') OR
    (PE_date > '11/02/2016' AND ZSV_date < '12/24/2016') OR
    (PE_date > '01/06/2017'))
    ;

The use case is that I investigate how long parcels take for various steps of a delivery process. I use nPath to gather the info for each parcel individually. Each individual Parcel's record consists of 10 to 20 rows, so there is not a huge skew.

5 REPLIES 5
Teradata Employee

Re: npath failed unexpectedly

What's the query ? I think you might have a big skew in the partition by <column> where the reducer is running out of memory - just a hunch. Worth checking.

Not applicable

Re: npath failed unexpectedly

Hi, I expanded my questions by providing the query and background information on the use case. I don't think I have a huge skew, but I will try to re-run the query and actively monitor the memory usage.

Anyhow, should a MR Algorithm not be independent from available memory at any given node? Or do I lack insight into the inner workings of nPath?

Teradata Employee

Re: npath failed unexpectedly

MR works partition by partition. So it will not use the memory for the entire query. However, I do not know what your setup is, hence my hunch. What's your size of your cluster and how many nodes/memory do you have? Can you confirm? 

Also, the error seems to occur when CDISTINCT  is used which seems to be a bug. Can you remove CDISTINCT and see if the query runs?

Not applicable

Re: npath failed unexpectedly

If I remove CDISTINCT, the query runs indeed. Is there any timeframe within which we can expect the bug to be removed and are fixes rolled out to users of older versions of the nCluster software, too?

Highlighted
Teradata Employee

Re: npath failed unexpectedly

I suggest you file a TAYS ticket (support ticket) and it will make through our roadmap, depending on the urgency of this request.