Function quantile throws a numeric overflow error in a large table

Database

Function quantile throws a numeric overflow error in a large table

I have a large table and it holds about 33 billion records. I need to run simple quantile function but it failed with a numeric overflow error.

SELECT acct_id, QUANTILE (10, ACCT_ID) AS decile  FROM MY_TABLE 

I understand the default data type of Teradata is integer. In my case, I will need a BIGINT to work with my table. I know how to make a CAST for other functions like COUNT or SUM.  But don't know how to do a CAST for QUANTILE function. Any suggestions are appreciated.

3 REPLIES
Teradata Employee

Re: Function quantile throws a numeric overflow error in a large table

QUANTILE is deprecated; Teradata recommends you use  standard SQL instead:

(RANK() OVER (ORDER BY ACCT_ID) -1)*10/COUNT(*) OVER() as decile

But that doesn't solve your issue. Like QUANTILE, RANK is internally limited to INTEGER. You could potentially implement your own window aggregate UDF, or you can do it in two OLAP stages using functions such as SUM that permit larger values:

SELECT acct_id,

(SUM(CAST(CASE WHEN acct_id = prev_acct_id THEN 0 ELSE 1 END AS BIGINT))

OVER (ORDER BY acct_id, prev_acct_id ROWS UNBOUNDED PRECEDING) /* computes Dense_Rank */

-1) * 10 / SUM(CAST(1 as BIGINT)) OVER () as decile

FROM (

SELECT acct_id,

MIN(acct_id) OVER (ORDER BY acct_id ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) as prev_acct_id

from MY_TABLE

) as dt;

Junior Contributor

Re: Function quantile throws a numeric overflow error in a large table

Check if you can avoid two OLAP-steps, SUM(CAST(1 AS BIGINT)) OVER (...) might be ok (if the data is not unique rows with the same value might be assigned to two quantiles)

Btw, TD15.10 finally fixes this, CASTing to BIGINT actually works.

And there's a COUNT MODE in dbscontrol to switch to either NUMBER or BIGINT globally.

Re: Function quantile throws a numeric overflow error in a large table

Thank you Fred and Dnoeth. I tried the OLAP option, but it takes too long. I worked with my busessiness user and reduced the amount of data we are pulling. That works for now.

For Dnoeth's suggestion, I will pass it on to our Teradata team.

Thank you all again.