I got a table with many values in Primary Index column. I still need this column as PI.
Can someone suggest how to deal with the situation?
Multi-column PI isnt appropiate in this case.
I think you mean "I got a table with many NULLs in Primary Index column"?
Assuming so, always start with volume:
- How many rows will contain NULL?
- How many rows in the table?
If you have lots of rows with the same PI data value then the table will be skewed. This will have some effect on disk space usage and will affect any type of scan processing - full table scan, partition scans (if the table is PPI'd).
You say that "I still need this column as PI.". I'm assuming that this is because of join and/or selection criteria in queries.
It is often possible to gain acceptable read performance even without an appropriate PI by using the other indexing options that Teradata provides. Have you thought about:
- Single Table Join Index (STJI)
- Hash Index (HI)
With any of these in place you could then choose a PI that gives good distribution and use the index to provide the fast join / selection processing.
Another possible solution:
Instead of NULL you assign a random value outside of the range of possible values during load,
e.g. only positive customer numbers get a random negative value.
Of course, for joins you need to exclude the negative values (you can't join them anyway) or use Outer Joins and for display you must apply a CASE to get the original NULL.