Joining large data on ranged indexies

Database

Joining large data on ranged indexies


Hey Guys,


I'm running into speed issue with one of my joins on large datasets.


CREATE TABLE table_A (


  table_a_id INTEGER,


  ...


  lookup_idx INTEGER


) PRIMARY INDEX(table_a_id)


CREATE TABLE table_B (


  table_b_id INTEGER,


  start_lookup_idx INTEGER,


  end_lookup_idx INTEGER,


  some_data VARCHAR(50),


  ...


) PRIMARY INDEX(table_b_id)


I am trying to join a large table (10,000,000+ rows) with a lookup table in the most efficient way possible with a BIGINT (lookup_idx).  However the lookup table isn’t indexed by lookup_idx, but by two columns that give a range (start_lookup_idx, end_lookup_idx).  There are no overlaps for any of the ranges.


select 


  a.table_a_id,


  b.some_data


from table_A a


  left join table_B b on a.lookup_idx >= b.start_lookup_idx 


    and a.lookup_idx <= b.end_lookup_idx


This query runs far too slow.  I know that algorithmically, teradata shouldn’t have to do a full table scan for every lookup if it's using ordered indexies, but that’s what it’s doing.  Does anyone know how to make this query work correctly?


Currently I have a work around where I have built an intermediate table which denormalizes all possible lookup_idx values and has a foreign key into table_B.  This query runs fast enough, but the intermediate table is much to large for me to hold onto indefinitely.


Any thoughts?


3 REPLIES
Supporter

Re: Joining large data on ranged indexies

Can you share the explain of your query?

Which stats are collected?

just for my understanding...

all  lookup_idx between start_lookup_idx and end_lookup_idx will have the same lookup value. Where and how is lookup_idx set in table_a?

If you all lookup_idx between all  between start_lookup_idx and end_lookup_idx  would be set to  start_lookup_idx (or all to  end_lookup_idx ) you would be able to do an equal join with the same result - and maybe benefit from stats...

Teradata Employee

Re: Joining large data on ranged indexies

This is most probably a product join.    Is there a way to introduce "table_b_id" field into TableA, from the LDM perspective?  

It might be tricky to update the tableA during ETL when the lookup tableB is updated, but it might be better than lots of queries with product joins.

p.s. 10mln rows is not a large table for Teradata in most cases. ;)

Supporter

Re: Joining large data on ranged indexies

By the way - how many rows contain your lookup table?