General

turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-15-2014
02:47 PM

04-15-2014
02:47 PM

Hi,

I have two tables (both tables have 4 integer columns) one table with 70k records and other with 1.1 million records. I need to join both of them without any condition based on three columns. I have primary index based on three columns. So the skew of these tables is not more than 7. When I run the query I get no spool space error.

I ran the query in Visual Explain, I see the query is estimated to take 7 seconds only. Whle the query is running, when i checked (select * from dbc.DiskSpace where DatabaseName =user). I see CurrentSpool for one AMP increases to the max spool value and I get the spool error. But the data distribution is almost even in both the tables, not sure why currentspool is increasing only for one Amp.

Any help will be appreciated.

Thanks,

Bharani

9 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-15-2014
11:36 PM

04-15-2014
11:36 PM

What are the demographics of these tables .... e.g unique values per column? No more spool-space error occurs when it is running out of space on 1 AMP, so the way you write the SQL also matter.

Can you share your query?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-16-2014
09:00 AM

04-16-2014
09:00 AM

Table1 has col1, col2, col3, index is compiste index based on these three columns.

Table2 has col1, col2, col3, here too index is compiste index based on these three columns.

select t1.col4, t2.col4 from Table1 t1

inner join Table2 t2 ON t1.col1 = t2.col1 and t1.col2 = t2.col2 and t1.col3 = t2.col3

It is a simple inner join of these tables on these columns to get the result. Per column wise, the unique values are less, but if you combine three columns then there are many unique values. Skew is only 3 in each table.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-16-2014
09:17 AM

04-16-2014
09:17 AM

sel hashamp(hashbucket(hashrow(col1,col2,col3))),count(*) from table1 group by 1 order by 2 desc;

sel hashamp(hashbucket(hashrow(col1,col2,col3))),count(*) from table1 group by 2 order by 2 desc

Please check from above if one of the amps is overloaded. i assume that it is a NUPI on both the tables so one of the amps would be heavily loaded. Also please collect the stats on the 2 tables for all the column combination.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-16-2014
08:19 PM

04-16-2014
08:19 PM

hi

i new to teradata, now i am working on to reduce skew. here is my details

initially my table has 2 PI having skew 60

now i create a table on tmp database with 3 PI which has skew 3.5 only

shall we go with this for reduction of skew or i need to check for col stats. can anyone tell me about the use col stats

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-17-2014
01:42 AM

04-17-2014
01:42 AM

Which kind of table is it? Is it for staging? Or DWH table? Row-count?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-17-2014
06:24 AM

04-17-2014
06:24 AM

hi

it is incremental table

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-17-2014
08:26 AM

04-17-2014
08:26 AM

As i mentioned in the other post below, skew should not be the only factor deciding the PI but the higher the skew the more the time consumed for working on the skewed data.

http://forums.teradata.com/forum/database/how-to-reduce-skew-factor-on-stage-tables-with-nopi

Collect Stats is primarily to indicate to the optimiser on the size of the tables, the number of values in the column etc. to make it take the most optimisex plan for executing a query. for example, let us take a million row table being joined to a 100 row table, there are many ways that this can be done.i shall take a couple of ways to explain. If the optimiser knows that one of them is huge(million rows) and the other is a very small table, it can go with either duplicating the small table on all amps and in some cases even a product join of the 2 tables can be done. if the optimiser knows that both the tables are considerably huge then it will join the table in the spool and then redistribute across all the amps etc., if the join is aPI based join then it will be building it in the amps and the like steps.

Collect stats provides a way to identify the demographics of the tables and get the best plan for the optimiser.

HTH!!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-18-2014
11:26 AM

04-18-2014
11:26 AM

Skewfactor can also have a larger values if the data-set is small. How many AMPs you have on the system? And can you execute and share result of bellow SQL:

SELECT col1, col2, col3, COUNT(*) FROM table1 GROUP BY 1,2,3 HAVING COUNT(*) > 1;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

04-22-2014
08:01 AM

04-22-2014
08:01 AM

You table is NUPI table. when you join two tables with NUPI your output will be multiple of your NUPI. for example

Table A is having 10 rows (3 columns (NUPI)) and table B is having 10 rows (3 columns (NUPI)). when you join Table A and Table B with NUPI, your output will be 100 rows.

Check the count for each unique value (NUPI).