We are looking at optimizing our DataWarehouse load process. Following is a core requirement for any table that we load:
Currently, we make copies of production tables, load/append new data and rename them on the scheduled date.
One drawback of this approach is that it requires additional space since we have to maintain multiple copies of tables. Many of our tables are very large – 5TB+
An approach I have taken in the past with Oracle (in a different project) is as follows:
Database Engine: Oracle
Frequency of loading: Monthly
This approach has couple of advantages:
We would love your feedback. Thanks.
Fact tables of this size are commonly row-partiioned. Partitioning doesn't mean the same thing in Teradata as in Oracle: rows in the same partition are stored together, but they are still spread over all the AMPs. In other words, each parallel processor's storage has partitions within it. This enables partition-level locking, the default locking level in Teradata DML on partitioned tables.
So if you defined a partitioning primary index on year-month, you could update a partition (perhaps in this case just insert rows to a predefined but empty partition) without locking the whole table. There would be only one table and no need for a union view. I'm just not sure if this lock would be escalated to a table lock during a mass insert such as insert-select or merge. Maybe qualifying the insert with a predicate on the year-month would be enough to tell the optimizer you are just updating that partition, or maybe you would have to export the updates and use singleton inserts via TPT Stream. There are others on this forum who could elaborate on that.
Thank you. This might help with lock management. Another requirement we have is that users must not see the newly loaded data until it is officially released. Is there a way to disable and enable data in a given partition?
Shielding the new data can easily be done by only providing access through a view, which includes a filter on the date.
Using the same Partitioning key for this filter makes it very efficient.
To make new data available, requires an update of the view.