I'm running into speed issue with one of my joins on large datasets.
CREATE TABLE table_A (
) PRIMARY INDEX(table_a_id)
CREATE TABLE table_B (
) PRIMARY INDEX(table_b_id)
I am trying to join a large table (10,000,000+ rows) with a lookup table in the most efficient way possible with a BIGINT (lookup_idx). However the lookup table isn’t indexed by lookup_idx, but by two columns that give a range (start_lookup_idx, end_lookup_idx). There are no overlaps for any of the ranges.
from table_A a
left join table_B b on a.lookup_idx >= b.start_lookup_idx
and a.lookup_idx <= b.end_lookup_idx
This query runs far too slow. I know that algorithmically, teradata shouldn’t have to do a full table scan for every lookup if it's using ordered indexies, but that’s what it’s doing. Does anyone know how to make this query work correctly?
Currently I have a work around where I have built an intermediate table which denormalizes all possible lookup_idx values and has a foreign key into table_B. This query runs fast enough, but the intermediate table is much to large for me to hold onto indefinitely.
Can you share the explain of your query?
Which stats are collected?
just for my understanding...
all lookup_idx between start_lookup_idx and end_lookup_idx will have the same lookup value. Where and how is lookup_idx set in table_a?
If you all lookup_idx between all between start_lookup_idx and end_lookup_idx would be set to start_lookup_idx (or all to end_lookup_idx ) you would be able to do an equal join with the same result - and maybe benefit from stats...
This is most probably a product join. Is there a way to introduce "table_b_id" field into TableA, from the LDM perspective?
It might be tricky to update the tableA during ETL when the lookup tableB is updated, but it might be better than lots of queries with product joins.
p.s. 10mln rows is not a large table for Teradata in most cases. ;)