How to change the Primary Index of a table holding approx. 13 TB of data ?


How to change the Primary Index of a table holding approx. 13 TB of data ?

This is my first post here !!


We have a huge table in Teradata holding the transnational data sizing approximately 13 TB. As a typical old Data Warehouse enhancement case, we have now found that the PI selection made in the past for this table is not the best. And due to this, we are facing lots of performance issues with the SQL's pulling data from this table.

So the very first idea that we wanted to implement is to load the data in temp table, alter the PI of the existing table or create a new table with new PI and load the data from temp table to new \ altered table.

But the challenge here is the table is live and this solution will not be the best due the size of the data movement. Also, other way we thought about delta load (to new table) - delta delete (from main table). But this also can not be the best workaround as the table is live and it will involve too much of efforts in data movement as well as in matching the row counts in source and target table.

I need your ideas on this scenario, how can I change the PI of this table by making small efforts without any system downtime.


Re: How to change the Primary Index of a table holding approx. 13 TB of data ?



Just to clarify things, you cannot change the PI of a table which contains data. You have to build a new table, as (I think) you've worked out that is not always easy.


You may have to 'stop processing' for a period of time, the trick will be to make that period as short as possible.


I think your basic options are:

1) create a new table, insert/select from old to new, rename old table to 'really old', rename new table to old.

To ease the data checking requirements, make sure that the insert/select uses a READ lock (not ACCESS lock) on the old table and put the first rename into the same transaction as the insert/select.

This will require at least twice the disk space of the existing table because you'll have the old table, eventually the new table and a spool file.


2) export all the data, drop the old table, create the new definition and then load.

But your data will be unavailable during this process, so may not meet all your requirements.


3) create a new table with the planned definition.

  change your ETL process(es) to load into this new table

  create a view which UNION ALL between the old and new table

  gradually insert/select from old to new and delete the copied data from the old.

Some thoughts to try and make this less painful:

- I'm assuming that your old table has PPI on it and usually you are loading new data into 'the current' partition (e.g. today's) . In which case each insert/select by partition (or a range of partitions), starting with those that are typically not going to be updated.

- If there is no PPI (one could probably ask "why not?") then add a NUSI on your selection column ( a VONUSI if it is a date column) which should make the insert/select and delete run quicker.

- With no PPI then you may need a check constraint on each table (usually on a date column) so that queries against the 'union all' view only access the table they need, although the view references both.


When you've finally got all of your data across, change the view to reference only the new table.


You can choose whether you rename your new table as the old name. If you do this will be a point (very quick) when you have to stop access to this data.





Ward Analytics Ltd - information in motion