I have a performance issue with a query. Below is the sample query where sales table does not have any index on cust_ky, but rep has cust_ky as a NUPI. Volume of rep is about 13 million and sales is 100 million.
select r.rep_nm, s.qty
from sales s join rep r on s.cust_ky = r.cust_ky
1. What should be the ideal join strategy for this case?
2. Is a NUSI on sales.cust_ky going to help?
#1: if there are statistics on the join columns you should trust the optimizer to pick the best plan
#2: no, Teradata rarely uses SIs for joins (unless it's a very small subset of the data)
is there a PPI on available that you could use, I only see 2 full table scans - also, reduce the data amount by using derived tables
This would go for HASH join where smaller table (REP) is duplicated on all AMPs based on hashing CUST_KEY.
If it is a frequently used query, then creating hash index on REP table is a good option.
This would enhance the select query performance, but would consume more space and may cause longer DML operations on REP table.