what is best way to query huge table primary partitioned by date and indexed by id and date

Analytics

what is best way to query huge table primary partitioned by date and indexed by id and date

I'm in telecommunications.

I can run a query which completes in several hours (against 1 month of data), but I am looking for a way to optimise my query to make it run faster. I have a generic-ish question about how I should index a cross-join table or optmise my query.

- bit of background -
I'm working on a huge detail table with billions of rows, containing call detail rows (cdr's). The table is primary partitioned by date and primary indexed by id (phone number) and date. There are many rows with the same id and date (multiple phone calls in a single day).

At the moment in one single query I create a cross-join with a few million id's (unique) each duplicated by 30 days (from sys_calendar). This cross-join part then inner-joins to the huge detail table. The query completes in several hours, although Explain looks great and says it should take a few minutes :) I'm using all the correct indexes etc and no DBA can fault the query.

I tried to break this query down so that the cross-join was a separate step and outputted the data into a table with primary index of id (phone number) and date. I hoped the query would run faster if I index the cross-join table to match the huge detail table. Creating the volatile cross-join table takes a few minutes. I then inner join this volatile cross-join table to the huge detail table, but after 8 hours (out of normal office hours) it hadn't completed and i killed it.

- question -
Any advice on how I might get a huge cross-join query against several billion rows to run faster?

Thanks

Tim
3 REPLIES

Re: what is best way to query huge table primary partitioned by date and indexed by id and date

Update. I think I've found the problem.

When I inner join the cross-join table with the huge detail table I had the order different from what I expected.

For example I think this is correct;
FROM IPSHARE.TMANNS_HERDS_CJ CJ
INNER JOIN IPVIEWS.mediated_call_hist MCH
ON CJ.medtd_service_no = MCH.medtd_service_no

But my mistake I had this;

FROM IPVIEWS.mediated_call_hist MCH
INNER JOIN IPSHARE.TMANNS_HERDS_CJ CJ
ON CJ.medtd_service_no = MCH.medtd_service_no

I'll run the query tonight, but I believe the order of the inner join is important in the performance.

Can anyone confirm or help explain exactly why this would be?

Thanks

Tim

Re: what is best way to query huge table primary partitioned by date and indexed by id and date

Hello I work with Teradata in a company of telecommunication,
Your query is ok, but we need know how many tables are joined in your query.

For example I think this is correct;
FROM IPSHARE.TMANNS_HERDS_CJ CJ
INNER JOIN IPVIEWS.mediated_call_hist MCH
ON CJ.medtd_service_no = MCH.medtd_service_no

But my mistake I had this;

FROM IPVIEWS.mediated_call_hist MCH
INNER JOIN IPSHARE.TMANNS_HERDS_CJ CJ
ON CJ.medtd_service_no = MCH.medtd_service_no

This is equals:
FROM IPVIEWS.mediated_call_hist MCH, IPSHARE.TMANNS_HERDS_CJ CJ
where CJ.medtd_service_no = MCH.medtd_service_no

Re: what is best way to query huge table primary partitioned by date and indexed by id and date

I agree.