We've semi-recently moved from SQL Server to Teradata and queries which ran quite will in SS are running out of spool space in Teradata. We have several tables with millions to billions of rows of data. The DBAs say only one index and it must be unique. Each of the tables have a single column UPI which is a sequential (identity) number.
The problem I need to solve: Queries constant spool errors in Teradata which run fine in SQL Server with multiple indexes.
Background: One table contains calls, queries were taking too long and were killed. We pull all data by CallDate and they finally added a partition on CallDate. In addition to CallDate data is also pulled by the person taking the call or who is transferring the call. The columns in the table are foreign keys to our employee table. The employee table's UPI is also a sequential number. Linking the calls to the employee table is straightforward, but when we need to filter calls taken by people of certain job codes. Some stat
Questions: We can't have more than one index, so what is the solution? Can there be more than one partition? Statistics were (supposed to) added the columns used in joins and where predicate. How are statistics going to help?
Thanks in advance!
Statistics help the optimizer develope better plans, but you are right, this sounds like a shema designed to fail.
PI's should be determined not just for uniqueness but more importantly joins.
Having SIs or JIs on secondary access paths is common practice.
Is the one Index criteria based on ETL not wanting to handle drop/create of indexes?
Your foreign key joins should be supported with soft ri.
And all of these supported with statstics for PI, SI, JI, Soft RI and any other columns used in joins and query qualifications.
If you have any Teradata PS on site have them look into this COMPASS asset to help
I'm not sure why they only allow one index, just that when my platform team created a secondary index they got told to remove it and put a partition in.
What are these: JI, Soft RI
Your statement "tables have a single column UPI which is a sequential (identity) number." tells me you never get any local joining, as the PI's can never be used for joining. This is the wrong way to design your data model.
Especially on large CDR tables, it is critical to design the table(s) with PI's that can be used for joining.
In your case, all tables are (almost) always pulled into spool, redistributed, then joined.
That link is Teradata internal - David said it was for a Teradata consultant to reference.
Carrying this further, depending on how many employees are answering how many calls per day, if the call table is partitioned by date, it may be reasonable to make employee ID the Non-Unique Primary Index (NUPI) of the call table, especially if you are always joining it to the employee table. Another possibility is a Join Index (JI) on calls and employees - if it contained employee name and job code, and those were the only columns referenced in the join, then the employee table would not be accessed. Defining Soft Referential Integrity (Soft RI) between these and other tables that are joined can help the optimizer know that there is a logical (if not formally defined) foreign key releationship. Study the Database Design manual from info.teradata.com for more ideas.
So depending on the SI the partitioning may have been a better option, that by itself does not preclude adding other indexes.
JI is Join Index, Soft RI is Soft Relational Integrity
@FGroenen You've confirmed much of what I've suspected, but wasn't sure in the "Teradata world". I would love to have more indexes and the tables setup the way the data is pulled. Unfortunately, the DBAs only allow one index (I have *no* idea why) and it must be the "key". The only time the key is used is when joining it to a dimension table which is a "child" table and contains more details about the fact record.
I plan on taking this information back to them and make sure there is a dialog around this information.