I have a requirement in my project that the view has to be available to the users 24*7 with the latest data.
I am maitaining type 1 records in my base table ( from which view is builit) and the table is getting upserted every time we have data from the source using a BTEQ.
My question here is ,
Lets say I have a record with Primary Key P1, first name, last name, balance in my base table .
P1 JACK MARY 100
On my next batch load if I get a matching primary key record for the same P1 record in my base and also 1 million new records, then I have to update the P1 record and insert the rest as new records .
Will the 1 updated row show up in the view only after my whole bteq runs or will it be reflected as and when the row is processed. Other way of asking is it a row level commit or batch level commit.
Essentially I do not want users who are querying for Primary Key P1 to see old data because rest of the 999999 records are getting inserted because this is a real time system and users expect the latest data.
If you are operating in Teradata mode and allowing dirty reads of the table (access locking or row access locking) your users should be able to see the data as the table is being loading.
If the data is loaded into a load table and updated to then copied inot your base table using bteq, then the base table will be locked for writing during the upsert operation and all the updates commited on completion.
During the whole of this operation the data will be locked out from other runs (including user queries) unless you use access locking within your views or directly from your user SQL. If you do use access locking, then you may get inconsistent data, including two rows (before and after image) or no rows. The chance is very slight, but a one in a million chance is a real risk when you are doing a million operations!
To prevent it, do not use access locking on sensitive tables and use TPump to do the upsert operation. That is the main tool which will easily allow the low level commit granularity you require. It means that you have to to the load from an external file directly to the target (in your terminology base) table.
You could do a similar thing with bteq using an Import operation on an external file - this will give you the same level of control but at greater development cost and much greater operational overhead.
Even if I use a row access locking, wouldn't the entire transcation(which is the UPSERT in my case) in bteq has to complete to even for the data to be committed into the base table?
I presume atomicity doesn't allow partial commits with an transcaction(UPSERT QUERY in my bteq)? If thats true how could the latest rows be reflected in the view despite the fact that batch process is still running?
I would need to run lot of data quality checks and other operations on the file I receive.
So I thought of the following structure:
1) Load the file into a staging table using MLOAD/FAST LOAD
2)Run data quality bteq and load the good records to a TEMPORARY TABLE.
3)Run a bteq to load from temporary to base table or the target table.
So I am not sure if I can use a Tpump and also for such huge volumes I don't think Tpump is a good utility.
Please correct me if I am wrong.
If your bteq job does any sort of table-to-table copy (Insert/ Update/ Delete/ Upsert) then your target is locked for the duration of the copy. Access Locking will allow other to read the table but at the risk of momentary inconsistencies.
The easiest way is probably to do a carry forward approach by upserting an identical table and switching your views to the copy after the updating is complete. Do not try renaming the copy tables - if you do that there will be a point in time when the target table does not exist with the correct name! And make sure any dictionary updates (replace view. etc) are kept to a minimum.
With this method, there will be a brief moment when user access will be delayed if one user is running and your update is trying to replace the view. The Replace view will lock until the first job has finished with that view.
A second (high development/ high availability) approach is to load to staging and quality check as you suggest above but then extract two files - one for updates and one for inserts. Use TPump to load the updates and multiload to load the inserts. (Multiload is still deemed unsuitable for volume updates by many but unless you are talking of tens of millions of records, it is not significantly worse than Fastload and Ins/ Select.)
Give your manager the development estimate for option 2, and he may well reconsider the damage of momentary inconsistency!
Thanks for the response. I am struggling a little to understand the carry forward approach.
I have the following understanding from your approach:
To start with,Lets say I have a staging table T 1 and target table T 2 and indentical target table T3
Now I have a view defintion in place defined on T2. Now when I run my bteq ,I would be upserting table T2 and the view still points to T2.
Once the UPSERT is done, I run another job to upsert the identical table(T3) and switch my view to point to T3 once the job or bteq has run .
On Next instance when I run my upsert bteq for table T2, my view defintion still points to T3 table. And then when I run my job or bteq to upsert T3 from T2, we again end up with the same problem where you are running a bteq on T3 and the view is still poiting to T3. Please let me know if I missing something here.
You need two set of views - one for loading, another for user access.
You have a user view pointing to T3 in your scenario and you run the upsert into T2 via the load view.
At this point, users are reading T3 while you upsert T2.
At the end of the process, you switch the user view(s) to T2 and load view to T3.
You then run the upsert again - this time it updates T3. (If you are updating a high percentage of the table, it may be faster to Delete All and Insert/ Select - you will need to benchmark this if the percentage of records updated is over c. 10%.)
The next night (assuming daily update) users are reading T2 while you upsert T3 with the next data files...... and the loop repeats.
You may try the steps below:
STEP 1) Create a view of the table as below:
REPLACE VIEW <DATABASENAME>.<BASETABLENAME>_VW AS
LOCKING ROW FOR ACCESS
STEP 2) Use MERGE statement on this view in the BTEQ as below:
MERGE INTO <DATABASENAME>.<BASETABLENAME>_VW TGT
<put here the SELECT statement that is using for INSERT or UPDATE the base table>
ON TGT.PI = SRC.PI
WHEN MATCHED THEN
WHEN NOT MATCHED
INSERT (SRC.PI, SRC.FIRSTNAME, SRC.LASTNAME, SRC.BALANCE);
STEP 3) Allow end users to query from the VIEW not from the base table.