How to force duplication of the rows from a small table on all AMPs

Database
Teradata Employee

How to force duplication of the rows from a small table on all AMPs

Hello Everyone,

We are have a problem with one of our queries in which we join a large table with a small table on a 2000 AMP system. During query execution the rows from the small table gets distributed on all AMPs. When the other large table is joined to the small table the rows from the larger table are redistributed on the join column. But the problem is some of the values in the join column are lumpy and because of this some AMPs get really skewed when the tables join and we run out of spool space. Increasing the spool limit is not option for us. We retrieve around 3 million rows from the large table.
The same query when we run on a smaller system with 150 AMPs the rows from the smaller table gets duplicated on all AMPs. So we don't have see this issue on the smaller system. The volume of data is the same in both the systems.
We have run all the required statistics and checked all the PIs.
We would like to know if there is a way to force the duplication of the small table on the 2000 AMP system. It would be great if someone could help us with this issue.
Please let me know if you need more details.
13 REPLIES
Senior Apprentice

Re: How to force duplication of the rows from a small table on all AMPs

There's an old trick when you don't want an index to be used (and this is working for most DBMSs):
Do a calculation on that column, e.g.
large_table l join small_table s on l.col = s.col + 0
or s.col || '' or coalesce(s.col, s.col)

See if this results in a different plan...

Dieter
Enthusiast

Re: How to force duplication of the rows from a small table on all AMPs

Yes. Doing any manipulations on the index would prevent it from being used.

Dieter,

Will preventing an index usage result in duplication ?

Is there any relation between index usage and join strategy(Duplication/Redistribution) used for the small table?

I've run queries in EDW where i have used indexes, but still the small table got duplicated.

Regards,
Annal T
Senior Apprentice

Re: How to force duplication of the rows from a small table on all AMPs

Hi Annal,
this is just to "help" the optimizer to forget about the index. This might change the plan, that's why you have to check explain.

Dieter
Enthusiast

Re: How to force duplication of the rows from a small table on all AMPs

Collecting stats should help in this case.

Generally collecting stats on the small table would help the optimiser realise that the table is actually small and might make it go for duplication.

Make sure you have collected stats on the table present in the bigger system.

Regards,
Annal T

Re: How to force duplication of the rows from a small table on all AMPs

All,

I understand that this discussion is old but my problem is new. I am using Teradat 13.10.

I have a large table eg: tablea with NUPI as (col1,col2,col3) . This table has approx 400 million rows.

Another table is smaller tableb with UPI as (cols1).

My query is

select col1,col2,col3,col4,col5 (all from tablea) from tablea inner join tableb

on tablea.col1=tableb.cols1

I am trying to force the redistribution of the smaller table, but for some reasons, the larger table gets redistributed.

I even tried to use the above logic and did the join using (table1.col1 = tableb.cols1 + 0) but it didn't help either.

I tried forcing the output into a table having similar primary index as the larger table but it didn't help either.

Can anyone suggest anything?

Thanks in advance

Ayush

Enthusiast

Re: How to force duplication of the rows from a small table on all AMPs

Hi Ayush,

The reason that your tablea is redistributed is because the join is on UPI and no index. TableA has a three columns NUPI, and in the join condition you are only using one column, so basically no index will be used. On the other hand, tableb has a UPI and thus the optimizer is trying to redistribute the rows according to the UPI table join column.

You can either revisit the NUPI column selections for tablea because that is hammering the execution plan, while you can probably make some other column of tableb which qualifies for the index value, and create a USI on tableb.cols1

Re: How to force duplication of the rows from a small table on all AMPs

Aftab,

Thanks for the response and the suggestion.

What if I move all the data from the smaller table 'tableb' which has only 15 million rows into another table and create an altogether different PI.which won't have the col1 included....? Then in the join should it ignore the PI of tableb and start distributing it?

I don't have authority of changing the index of tablea, which is an enterprize wide table and won't be modified for my needs only :)

Ayush

Enthusiast

Re: How to force duplication of the rows from a small table on all AMPs

Hi Ayush,

In that case the join will cost more as you are joining two non-index columns. TD will be redistributing the data for both of the tables on the joining columns to make sure that the data is on the same amp while joining.

As long as tableb has the UPI on the join column, it won't be redistributed! Is it possible that you can create the SI on the tablea.col1 only? The subtable rows will be on the same amp as tableb and will help the optimizer...

JI is one of the things that could help your query run faster. But not sure whether you will be allowed to create a JI on tablea or not!

Re: How to force duplication of the rows from a small table on all AMPs

Aftab,

Thanks again. yes you were right. I tried the above approach and it still didn't help. The larger table is still being redistributed.

There is already a secondary index in the larger table col1, so I guess Optimizer is doing the best it can, though it sounds illogical to me.

JI might not be an option because the smaller table is deleted and reloaded every week, so not sure how much extra burden it will create everytime this is done.

Thanks

Ayush