I have a large table and it holds about 33 billion records. I need to run simple quantile function but it failed with a numeric overflow error.
SELECT acct_id, QUANTILE (10, ACCT_ID) AS decile FROM MY_TABLE
I understand the default data type of Teradata is integer. In my case, I will need a BIGINT to work with my table. I know how to make a CAST for other functions like COUNT or SUM. But don't know how to do a CAST for QUANTILE function. Any suggestions are appreciated.
QUANTILE is deprecated; Teradata recommends you use standard SQL instead:
(RANK() OVER (ORDER BY ACCT_ID) -1)*10/COUNT(*) OVER() as decile
But that doesn't solve your issue. Like QUANTILE, RANK is internally limited to INTEGER. You could potentially implement your own window aggregate UDF, or you can do it in two OLAP stages using functions such as SUM that permit larger values:
(SUM(CAST(CASE WHEN acct_id = prev_acct_id THEN 0 ELSE 1 END AS BIGINT))
OVER (ORDER BY acct_id, prev_acct_id ROWS UNBOUNDED PRECEDING) /* computes Dense_Rank */
-1) * 10 / SUM(CAST(1 as BIGINT)) OVER () as decile
MIN(acct_id) OVER (ORDER BY acct_id ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) as prev_acct_id
) as dt;
Check if you can avoid two OLAP-steps, SUM(CAST(1 AS BIGINT)) OVER (...) might be ok (if the data is not unique rows with the same value might be assigned to two quantiles)
Btw, TD15.10 finally fixes this, CASTing to BIGINT actually works.
And there's a COUNT MODE in dbscontrol to switch to either NUMBER or BIGINT globally.
Thank you Fred and Dnoeth. I tried the OLAP option, but it takes too long. I worked with my busessiness user and reduced the amount of data we are pulling. That works for now.
For Dnoeth's suggestion, I will pass it on to our Teradata team.
Thank you all again.