Aggregate Join Index

Database
The Teradata Database channel includes discussions around advanced Teradata features such as high-performance parallel database technology, the optimizer, mixed workload management solutions, and other related technologies.
MM
Enthusiast

Aggregate Join Index

Hello,
I have an Aggregate Join Index defined on a table. Only Sums.
The table is empty. The AJI has the same Index as the basetable. The Index is almost unique considering the records I want to laod. I also defined the records as not null, nevertheless the explain shows that the optimizer is not aware of this fact.
I want to load 40 000 000 records well distributed. The load of the base table takes 5 second.
The Update of the AJI last forever. The grouping of the AJI is rather granular, so it would be populated with also 40 000 000 records.
Is it like it is or is there a possiblity to change this? Is AJI Updating record by record? This would maybe explain this long running?

 

7 REPLIES
Senior Apprentice

Re: Aggregate Join Index

Hi,

I realise that you already have the AJI defined, but could you share an Explain of the 'create AJI'?

 

You said that the AJI has the same index as the base table, I assume you're talking about the Primary Index.

 

Probably more importantly, is the Primary Index of the AJI (and table) included within the GROUP BY columns? And what are the data types of the GROUP BY columns? Big VARCHARs will typically slow down aggregation processing.

 

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
MM
Enthusiast

Re: Aggregate Join Index

Hello,
might this help?

Best Regards Martin

 

Format:HTML Format Version:1.0 StartHTML: 165 EndHTML: 9653 StartFragment: 314 EndFragment: 9621 StartSelection: 314 EndSelection: 314SyntaxEditor Code Snippet

CREATE MULTISET TABLE f_lew_bil.t_agg_deliv_base2 ,NO FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO     (
      client_id SMALLINT not null,
      item_sid INTEGER not null,
      dat_formdate DATE FORMAT 'YY/MM/DD' not null,
      loc_sid_store INTEGER not null,
      src_wh_id INTEGER,
      deliverytype_id INTEGER not null,
      orig_store_nr INTEGER not null,
      cy_month INTEGER not null,
      sig_nr VARCHAR(20) CHARACTER SET UNICODE CASESPECIFIC not null,
      foodaction_fg BYTEINT not null,
      qty_total_del DECIMAL(18,4),
      val_rtl_rl_del DECIMAL(18,4),
      val_rtl_rl_net_del DECIMAL(19,5),
      qty_piece_del_agg DECIMAL(18,4),
      val_piece_del DECIMAL(18,4),
      val_piece_net_del DECIMAL(19,5),
      val_pr_diff_del DECIMAL(18,8),
      val_pr_diff_net_del DECIMAL(38,10),
      src_fg BYTEINT)PRIMARY INDEX ( client_id ,item_sid ,loc_sid_store,dat_formdate )

Format:HTML Format Version:1.0 StartHTML: 165 EndHTML: 7633 StartFragment: 314 EndFragment: 7601 StartSelection: 314 EndSelection: 314SyntaxEditor Code Snippet

CREATE JOIN INDEX f_lew_bil.i_aji_deliv_store_day_item2 AS
SELECT
  a.client_id ,a.loc_sid_store ,a.item_sid ,a.dat_formdate ,a.cy_month ,a.deliverytype_id ,a.orig_store_nr ,a.foodaction_fg ,SUM(a.qty_total_del)                           AS qty_total_del ,SUM(a.val_rtl_rl_del)                          AS val_rtl_rl_del ,SUM(a.val_rtl_rl_net_del)                      AS val_rtl_rl_net_del ,SUM(a.qty_piece_del_agg)                       AS qty_piece_del ,SUM(a.val_piece_del)                           AS val_piece_del ,SUM(a.val_piece_net_del)                       AS val_piece_net_del ,SUM(a.val_pr_diff_del)                         AS val_pr_diff_del ,SUM(a.val_pr_diff_net_del)                     AS val_pr_diff_net_del
FROM f_lew_bil.t_agg_deliv_base2 a
GROUP BY 1,2,3,4,5,6,7,8
PRIMARY INDEX(client_id, item_sid, loc_sid_store,dat_formdate)

 

Explain INSERT INTO f_lew_bil.t_agg_deliv_base2

SELECT

*

FROM f_lew_bil.v_mat_wawi_mov_item_fv_delivery;

.......

the AMPs. The size of Spool 27 is estimated with no confidence to

 

 

be 17,980,076 rows (2,049,728,664 bytes). The estimated time for

this step is 4.22 seconds.

22) We do an all-AMPs JOIN step from Spool 26 (Last Use) by way of an

all-rows scan, which is joined to Spool 27 (Last Use) by way of an

all-rows scan. Spool 26 and Spool 27 are joined using a single

partition hash join, with a join condition of ("(loc_sid =

st_loc_sid) AND (client_id = client_id)"). The result goes into

Spool 13 (all_amps) (compressed columns allowed), which is

redistributed by hash code to all AMPs to all AMPs. Then we do a

SORT to order Spool 13 by row hash. The size of Spool 13 is

estimated with no confidence to be 17,980,076 rows (2,661,051,248

bytes). The estimated time for this step is 13.00 seconds.

23) We do an all-AMPs MERGE into f_lew_bil.t_agg_deliv_base2 from

Spool 13. The size is estimated with no confidence to be

17,980,076 rows. The estimated time for this step is 2 minutes

and 9 seconds.

24) We do an all-AMPs SUM step to aggregate from Spool 13 (Last Use)

by way of an all-rows scan, and the grouping identifier in field 1.

Aggregate Intermediate Results are computed globally, then placed

in Spool 31. The size of Spool 31 is estimated with no confidence

to be 13,485,057 rows (2,589,130,944 bytes). The estimated time

for this step is 51.86 seconds.

25) We do an all-AMPs RETRIEVE step from Spool 31 (Last Use) by way of

an all-rows scan into Spool 29 (all_amps) (compressed columns

allowed), which is redistributed by hash code to all AMPs to all

AMPs. Then we do a SORT to order Spool 29 by row hash. The size

of Spool 29 is estimated with no confidence to be 13,485,057 rows

(2,373,370,032 bytes). The estimated time for this step is 11.16

seconds.

26) We do an all-AMPs UPDATE from

F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2 by way of Spool 29 (Last

Use) with a residual condition of (

"(((F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1033 IS NULL) AND

(F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1033 IS NULL )) OR

(F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1033 =

F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1033 )) AND

((((F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1032 IS NULL) AND

(F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1032 IS NULL )) OR

(F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1032 =

F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1032 )) AND

((((F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1031 IS NULL) AND

(F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1031 IS NULL )) OR

(F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1031 =

F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1031 )) AND

((((F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1030 IS NULL) AND

(F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1030 IS NULL )) OR

(F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1030 =

F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1030 )) AND

((((F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1029 IS NULL) AND

(F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1029 IS NULL )) OR

(F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1029 =

F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1029 )) AND

((((F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1028 IS NULL) AND

(F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1028 IS NULL )) OR

(F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1028 =

F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1028 )) AND

((((F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1027 IS NULL) AND

(F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1027 IS NULL )) OR

(F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1027 =

F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1027 )) AND

(((F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1026 IS NULL) AND

(F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1026 IS NULL )) OR

(F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1026 =

F_LEW_BIL.I_AJI_DELIV_STORE_DAY_ITEM2.Field_1026 ))))))))"). The

size is estimated with no confidence to be 13,485,057 rows. The

estimated time for this step is 1 minute and 36 seconds.

27) We spoil the parser's dictionary cache for the table.

28) Finally, we send out an END TRANSACTION step to all AMPs involved

in processing the request.

-> No rows are returned to the user as the result of statement 1.

 

Senior Apprentice

Re: Aggregate Join Index

Hi Martin,

 

Thanks. So no obvious problems with data types or indexing.

 

One possibility comes to mind. Is it feasible within your processing to drop/create the AJI each time you populate the base table? I think this would avoid transient journaling on the 'AJI build' part of the processing.

 

In your original post you asked "Is AJI Updating record by record?" - No.In your explain plan it looks like step#26 is where the AJI is updated, i.e. a single step.

 

What release of TD are you on?

 

Cheers,

Dave

 

 

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
MM
Enthusiast

Re: Aggregate Join Index

Hi Dave,
as you mentioned whether all records of the PI are included in the grouping columns, I took all grouping columns of the AJI as PI columns of the AJI and then it worked. :-)
I'm on 15.10.
Nevertheless it's a strange behaviour. Maybe Teradata is doing a product join if "to less columns" out of the partioning columns are part of the PI??
No good explanation. But with all columns it work. Thanks for your help!!! I will be here tomorrow (Austrian time :-))

Best Regards

Martin

Senior Apprentice

Re: Aggregate Join Index

Hi Martin,

Well, if it's working for you then that is good.

I don't understand why making the PI = GROUP BY columns should speed this up. Normally it is good enough to have the PI columns as part of the GROUP BY columns.

 

The only thing that immediately comes to mind is in your original definition the number of rows per PI data value. This might bring up a need for 'duplicate row checking', but (A) I don't know if that happens for AJI's (in principle it doesn't for AJI's but I don't know) and (B) from your original post the granularity of rows in the AJi is about the same as for the base table - and that builds in 5 seconds.

 

(Another) one of life's mysteries...

 

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
MM
Enthusiast

Re: Aggregate Join Index

Hi Dave,
I found out that this was not the reason.
Now I can describe my observations in a detailed way.
These findings do not depend on whether the PI of the AJI consists of all grouping columns or not.
Finding 1: Insert of 10 000 000 records is fast (the AJI Update Step)
Finding 2: Insert of 100 000 000 records lasts forever (the AJI Update Step)
Finding 3: Adding 100 000 000 records to already existing keys / groups is fast.
“Why“ would be interesting.
Have a good day
Martin
Senior Apprentice

Re: Aggregate Join Index

Hi Martin,Thanks for the info. Yes, to know why this difference exists would be interesting.

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com