How to use COLLECT STATISTICS in the query

General
Enthusiast

How to use COLLECT STATISTICS in the query

Hello folks,

I am not familiar with COLLECT STATISTICS in Teradata but now I need to use it but don't know how within the query.

Can someone help me how should I use it?

I have checked :

DIAGNOSTIC HELPSTATS ON FOR SESSION;

EXPLAIN

 and got explanation: 

The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 4 minutes and 46 seconds.
BEGIN RECOMMENDED STATS FOR FINAL PLAN->
-- "COLLECT STATISTICS COLUMN (COUNTRY_CODE
,IND_CODE , CUST_NAME, CUST_NO) ON table1" (High
Confidence)
-- "COLLECT STATISTICS COLUMN (PROD_ID) ON
table2" (High Confidence)
-- "COLLECT STATISTICS COLUMN (PROD_ID) ON
table3" (High Confidence)
-- "COLLECT STATISTICS COLUMN (TR_CODE) ON table4"
(High Confidence)
-- "COLLECT STATISTICS COLUMN (END_DATE) ON
table5" (High Confidence)
-- "COLLECT STATISTICS COLUMN (START_DATE) ON
table5" (High Confidence) and so on and so forth.

 

Now, where I need to put these "collect statements" within the query?

Thank you in advance.

4 REPLIES 4
Ambassador

Re: How to use COLLECT STATISTICS in the query

You cut and paste the COLLECT statement (of course without the double quotes) into a query window and run it.

 

But remember DIAGNOSTIC HELPSTATS is like asking your kids What do you want for christmas?

Implementing all those stats is overkill, you should start with those you think are most usefull (those where the estimates for WHERE-conditions are way off) and rerun Explain after every new stats to see if the plan changes.

 

Are you trying to fix a known bad query?

If you got access to QueryLogStepsV it's much easier, because you can see the first step with totally wrong estimates (the following bad estimates are usually based on this).  Then there's another DIAGNOSTIC VERBOSEEXPLAIN ON FOR SESSION (it's hard to believe but Explain can be more elaborate), which is very usefull in combination with HELPSTATS, because it shows the missing stats for each step. Sometimes you only collect this specific stat and fix the query.

Enthusiast

Re: How to use COLLECT STATISTICS in the query

@dnoethThank you for your prompt answer but unfortunately I do not have access to QueryLogStepsV.

I am just querying in SQL Assistant or SQL Studio in order to get data for my needs. Now, I am facing issue with spool space and I am trying to optimize my query.

Next, based on your suggestion, I have put COLLECTIN STATS at the beginning of the query, like below:

COLLECT STATISTICS COLUMN (COUNTRY_CODE,IND_CODE , CUST_NAME, CUST_NO) ON table1 (High Confidence)
COLLECT STATISTICS COLUMN (PROD_ID) ON table2 (High Confidence)
COLLECT STATISTICS COLUMN (PROD_ID) ON table3 (High Confidence)
COLLECT STATISTICS COLUMN (TR_CODE) ON table4 (High Confidence)
COLLECT STATISTICS COLUMN (END_DATE) ON table5 (High Confidence)
COLLECT STATISTICS COLUMN (START_DATE) ON table5 (High Confidence)
 

SELECT 	t1.prod_id,
	t1.country_code,t1.cust_name, 
        t1.cust_nbr,t1.ind_code,
	t2.bla,bla
	t3.bla,blaCOUNT(DISTINCT t4.inst_nbr) 
	from   table1 t1
        join table2 t2 on t1.id=t2.id
        join table3 t3 on t3.id = t1.id
        join table4 t4 on t4.id = t1.id
        left join table5 t5 on t5.id=t4.id
WHERE condition1
AND condition2
AND condition3
AND condition4

So, when I run this I got an error:  Syntax error: expected something between the word 'table1' and '('. 

Really, I am newbie with collecting stats so if you could assist me a little bit more, I would appreciate your help.

Tags (1)
Enthusiast

Re: How to use COLLECT STATISTICS in the query

 
Ambassador

Re: How to use COLLECT STATISTICS in the query

The (High Confidence) is not part of the Collect, it's just info about how usefull this will be for the optimizer, replace it with a semicolon.

Did you check Explain if there's a Select-step with a totally wrong estimated? Start with the stats matching this WHERE-condition.

 

Is there a step with a huge spool causing which might cause that error or does Explain look ok?

 



I am just querying in SQL Assistant or SQL Studio in order to get data for my needs. Now, I am facing issue with spool space and I am trying to optimize my query.

If you query tables from your DWH there's a high probability that you don't have the Collect Stats access right, only for the tables you created, e.g. in a datalab.

 

Can you show the actual explain?