The data mart maintained in our project involves loading data from multiple countries into a single (regional) data warehouse. By design, we have different staging databases for each country where each table is multi-loaded or fast-loaded. Once this is complete, the data is moved from the country-specific staging tables to the regional DB tables using BTEQ inserts. A lot of country processes run in parallel, so the data loading from the staging to the regional tables almost invariably happens in parallel.
There are approximately 15 tables in the regional DB. To avoid deadlock and contention, we implemented a (very basic) semaphore mechanism to ensure that before country 1 is populating table TAB1, it will first check (in another table which maintains the lock status for all the 15 tables) is free for loading; if it is free, then the process will update the lock status table, then insert into the TAB1, and once insert is complete, release the lock (by updating the lock status table again). In the meantime, when the country2 process which is looking to insert into this table will go into sleep and again poll after 2 minutes to check the status.
This semaphore contention process (not being FIFO) has created a lot of delays and some of the critical processes now are waiting for a very long time.
I wanted to understand what happens if I dont have a semaphore mechanism in place - what if say there are 50 processes trying to load TAB1 at one instance - I understand that Teradata manages contention using an internal FIFO mechanism, but
1. Is there a chance of a deadlock?
2. What is the maximum number of processes which can wait in the Teradata queue?
3. Can some of these (waiting) processes be prioritized to pre-empt and avoid the FIFO sequence (maybe be creating a different database user)?
All country processes' data are independent of each other - the PI row-hashes created would be different for each process. Also, the data volumes range from 2000-13 million.
What I also thought of was to split the regional table TAB1 into country specific tables (partionioned by country code) and then have a unified view creation on top of all these tables using UNION ALL.
Please share your suggestions.
With best regards
how are those Inserts processed, a single transaction or one transaction per table?
Does the process only Insert or also Selects from those tables?
All locks within TD are always placed/processed FIFO, but to avoid your semaphore process you could utilize a Queue Table:
Each process simply inserts a row into the QT after all staging tables are loaded (priorization can be done via the Insert Timestamp) and the reading process (usually a Stored Procedure) submits all the Insert.
I was able to follow one of your posts to implement the parallel inserts using Teradata queue tables.
However, there are still some challenges I am facing with the implementation (that may match my current design):
1. How to capture the logs? - Currently, the bteq statements are able to generate the logs; in case i replace that using a stored procedure, how will i be able to log the query results?
2. Is there a way i can maintain a log table and for evert Staging->Target DML, I am able to capture the activity count? - within the queue table implementation, I am using a stored procedure which calls the dbc.sysexecsql.
3. Currently, my ETL job's (which calls the BTEQ for inserts) completion signifies the end of the load operation; however, with the stored procedure called internally to perform the inserts, how do i capture the completion of the load?
4. My batch window is currently a long one - spans across 12 hours for ~20 countries - how do i schedule the stored procedure (there may be cases wherein the first few countries get completed withint he first few hours, and after a gap, some other countries may start the load process)>
With best regards,
Can someone please advise how we can trace the stored procedure. If, let's say, I do an insert within an SP and it fails, how to know what was the error?
With best regards