QUANTILE(100) x 50 times...

Analytics
Enthusiast

QUANTILE(100) x 50 times...

Hello,

I'm not a great SQL coder (a lot of my analysis is automatically generated SQL), but occaisionally I have to write some SQL by hand to perform tricky analysis. I'm trying to figure out an effcient method to create percentiles for each of 50 columns (within the same table).

By this I mean that I have 50 columns (stuff like call_count, sms_count, voice_count etc). I need to allocate an integer percentage to each row. The rows containing the lowest value a column will have a value 1%, whilst the highest values will have 100%. I will be creating 50 additional columns, one percentile column for each existing column. There are 3.5 million rows.

I thought I could perfrom QUANTILE(100, ) upon the 50 columns in a single statement. Running the analysis against just one column gives me a result in 30 secs. EXPLAIN for 50 columns quotes millions of hours to completion... I presumed I will run out of spool space long before that process completed.

So, I tried running 50 separate QUANTILE(100 insert queries, which all succeed very quickly. I now have 50 tables, each containing 3.5 million rows and two columns (key and the percentile). But now I can't seem to join the 50 tables by the inner join unique non null key (EXPLAIN seems to have hung).

So, I have two questions;
- Any suggestions for another way to create percentiles for 50 columns?
- Any reason why an inner join against 50 tables, all with the same unique indexed primary key, would be a problem?

Thanks

Tim

2 REPLIES
Senior Apprentice

Re: QUANTILE(100) x 50 times...

Hi Tim,

"I thought I could perfrom QUANTILE(100, ) upon the 50 columns in a single statement. Running the analysis against just one column gives me a result in 30 secs. EXPLAIN for 50 columns quotes millions of hours to completion... I presumed I will run out of spool space long before that process completed."

I've never seen an estimated time for an OLAP step, so this might be due to something else.
Could you post your query?

" - Any suggestions for another way to create percentiles for 50 columns?"

No, i'd go with the 50 Quantiles within one statement. Spool might be a problem, but for 50 joins you also need a lot of spool.

" - Any reason why an inner join against 50 tables, all with the same unique indexed primary key, would be a problem?"

No.

Dieter
Enthusiast

Re: QUANTILE(100) x 50 times...

Thank you for your input Dieter.

I tried running an EXPLAIN last night for the 50 tables join query and it took over 45 minutes for the EXPLAIN to complete.

The EXPLAIN feedback seemed ok, so I set the query running. The query itself completed in 55 minutes.

It looks like joining many tables together causes trouble for Teradata parser or whatever starts the query process.

Anyhow, running the 50 QUANTILES separately and then joining the tables together seems to work best.

Thanks

Tim