try the following in case all columns are not null
Insert into new table
where keyid in (
qualify key_id = min(key_id) as key_id over (partition by PROCESS_DATA_MONTH !! DATA_SOURCE_ID !! DATA_SOURCE_CODE !! LOGICAL_DATA_SOURCE_CODE GEOGRAPHY_CODE)
in case a column can be null you might change this column to coalesce(cast(columnname as varchar(100)) , 'XXXXnotNULLxxxx')...
insert into newtable
will also work and avoid the olap f(x)... the same note re: null columns applies
Hai thanks a lot...
Can you come again on the null values?? What if i had a NULL in the geography code? Like below..
Key id PROCESS_DATA_MONTH DATA_SOURCE_ID DATA_SOURCE_CODE LOGICAL_DATA_SOURCE_CODE GEOGRAPHY_CODE
2123 6/1/2014 720 998,649 901,584 ?
2124 6/1/2014 720 998,649 901,584 ?
2125 6/1/2014 720 998,649 901,584 ?
I am trying to fetch the records leavign all the duplicates by the below 2 methods. Both are consuming time:(
while doing an explain the first query estimate time is around 45 min and for the 2nd query it is not displaying the estimated time.
Total count of the records in the table is around 31million.
Please help if any alternative.
All of them that is writen above is true. You can use min or max aggregation for surrogate keys and than you should group the other columns, or you can qualify the data as patition by DATA_SOURCE_ID, DATA_SOURCE_CODE order by Key_id (asc or desc what you would like to see as surrogate key) .
Also when you create surrogate key tables source identifier column and primary key column of the source system must be unique primary index, if you need to access data from surrogate key tables (usually semantic layers use surrogate key tables to extend datamarts with source primary key columns) with surrogate key columns I recomend sparse index with surrogate key column primary index.
Did you actually run the queries and checked DBQL?
Unless the PI of the table is the key_id the GROUP BY should be fast.
Btw, how can you insert duplicate values in a table, is there no Primary Key or does the loading process ignore it?
You have to remember that the hours, minutes, seconds in an explain plan are pronounced KOST. There is no relevance to wall clock time. Those measure labels in the explain plan are labels to describe the relative cost of each step, NO how long it will take.