Performance

Database
Enthusiast

Performance

Hi Friends,

I am trying to fogure out what would be the underlying cause for not able to retrieve any records while doing a select. Just to give a little background .

We have a table called table_consumer in which the primary index defined for that table is a surrogate key(Sequencly generated). To be frank i am not sure why they define this to be the primary index. There is no other or any index defined on that table.

The table is having a record count of 90million records.

while i am doing this check to see if this table has any duplicates using count(*) group by1,2,3,4 having count(*)>1, it throws me with the spool space issue.

Also while i am firing the query below it again throws me with the spool space issue. Appreciate any help as to what could be done here or tweeked.

SELECT 


A.SOURCE_ID

 FROM dbt.table_CONSUMER  A 

 INNER JOIN

(

SELECT  SOURCE_ID,DATA_MONTH, TIME_PERIOD,BUSINESS_OWNER,NUMBER_OF_MONTHS,START_DATA_DATE,DATA_REFRESH_TYPE,

RANK() OVER (PARTITION BY DATA_SOURCE_ID ORDER BY TIME_PERIOD DESC)  AS NUMBER_OF_ROWS_MONTH_ORDER

FROM

(SELECT B.SOURCE_ID, B.DATA_MONTH, (TIME_PERIOD- EXTRACT(DAY FROM TIME_PERIOD) + 1) TIME_PERIOD,

NUMBER_OF_MONTHS, BUSINESS_OWNER, START_DATA_DATE, DATA_REFRESH_TYPE

FROM dbv.view_CONSUMER SRC

INNER JOIN

dbv.PAIM_PROCESSING B

ON 

SRC.SOURCE_ID=B.SOURCE_ID

AND B.NUMBER_OF_ROWS=1

AND (TIME_PERIOD- EXTRACT(DAY FROM TIME_PERIOD) + 1)<=B.START_DATA_DATE

AND (TIME_PERIOD- EXTRACT(DAY FROM TIME_PERIOD) + 1) <=B.PROCESS_DATA_MONTH

AND SRC.ACTIVE_FLAG='Y'

AND B.DATA_REFRESH_TYPE NOT IN ('FULL REFRESH')

GROUP BY 1,2,3,4,5,6,7) B1

)C

ON

A.SOURCE_ID=C.SOURCE_ID

group by A.DATA_SOURCE_ID

The skew factor of the table A is only about 0.04%

1 REPLY
Enthusiast

Re: Performance

Hi John,

For the above query which you are facing spool space issue, i have few questions?

1. Any idea why there is no alias used in TIME_PERIOD- EXTRACT(DAY FROM TIME_PERIOD? Even for this also NUMBER_OF_MONTHS, BUSINESS_OWNER, START_DATA_DATE, DATA_REFRESH_TYPE. Try to use alias since you are joining two tables and the same column shouldnot exist in both tables.

SELECT B.SOURCE_ID, B.DATA_MONTH, (TIME_PERIOD- EXTRACT(DAY FROM TIME_PERIOD) + 1) TIME_PERIOD,

NUMBER_OF_MONTHS, BUSINESS_OWNER, START_DATA_DATE, DATA_REFRESH_TYPE

FROM dbv.view_CONSUMER SRC

INNER JOIN

dbv.PAIM_PROCESSING B

 

2. You have selected A.SOURCE_ID in the select list but it's not used in group by. I am not sure without having source id if you are able to run successfully?

ON

A.SOURCE_ID=C.SOURCE_ID

group by A.DATA_SOURCE_ID

3. dbt.table_CONSUMER  A  -- Inner join has view dbname for this table but select clause has the dbt dbname. Any specific reason bcz sometimes in views there used to be some filter condition?

4. Does START_DATA_DATE & PROCESS_DATA_MONTH has the same value or value range as this one -- TIME_PERIOD- EXTRACT(DAY FROM TIME_PERIOD) + 1?

AND (TIME_PERIOD- EXTRACT(DAY FROM TIME_PERIOD) + 1)<=B.START_DATA_DATE

AND (TIME_PERIOD- EXTRACT(DAY FROM TIME_PERIOD) + 1) <=B.PROCESS_DATA_MONTH

For the initial question regarding surrogate key, do you have any master table which holds consumer details and that has consumer id as PI?