Required help

Database
Enthusiast

Required help

I have a table of about 1,79,94,12,009 rows and the size of 133GB .

Now i had created another table want to store the aggregate

Insert Into dbatemp.O_ACNT_BAL_SP
SELECT
OAN_A ,
OAN_ODT_J ,
OAN_OBT_J ,
OAN_OAA_J ,
OAN_OCU_J ,
OAN_OPD_J ,
OAN_OSS_J ,
OAN_OAD_J , OAN_DOMCL_OCN_J , OAN_OMS_J ,
OAN_CRETN_ODT_J , OAN_FILE_OSX_N ,
Min(OAN_PARTN_KEY_1_X ) OVER (PARTITION BY OAN_PARTN_KEY_1_X ORDER BY OAN_PARTN_KEY_1_X),-PPI COLUMN
Max(OAN_PARTN_KEY_1_X)OVER(PARTITION BY OAN_PARTN_KEY_1_X ORDER BY OAN_PARTN_KEY_1_X) -- PPI COLUMN
FROM dbatemp.O_ACNT_BAL_test_sp
Group By OAN_A,OAN_ODT_J ,
OAN_OBT_J ,
OAN_OAA_J ,
OAN_OCU_J ,
OAN_OPD_J ,
OAN_OSS_J ,
OAN_OAD_J , OAN_DOMCL_OCN_J , OAN_OMS_J ,
OAN_CRETN_ODT_J , OAN_FILE_OSX_N,
OAN_PARTN_KEY_1_X

=============================================================
Attached explain
1) First, we lock a distinct dbatemp."pseudo table" for read on a
RowHash to prevent global deadlock for dbatemp.O_ACNT_BAL_test_sp.
2) Next, we lock dbatemp.O_ACNT_BAL_test_sp for read.
3) We do an all-AMPs SUM step to aggregate from
dbatemp.O_ACNT_BAL_test_sp by way of an all-rows scan with no
residual conditions, and the grouping identifier in field 1.
Aggregate Intermediate Results are computed globally, then placed
in Spool 4. The input table will not be cached in memory, but it
is eligible for synchronized scanning. The aggregate spool file
will not be cached in memory. The size of Spool 4 is estimated
with no confidence to be 1,349,044,830 rows. The estimated time
for this step is 1 hour and 4 minutes.
4) We do an all-AMPs STAT FUNCTION step from Spool 4 (Last Use) by
way of an all-rows scan into Spool 8 (Last Use), which is assumed
to be redistributed by value to all AMPs. The result rows are put
into Spool 1 (group_amps), which is built locally on the AMPs.
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.

This INSERT is taking very huge time . More than 2 hours but not getting executed . I am using PPI column . Can any body has faced this situatation in the past.? Please help in optimize the same as in future we need to test the same with higher table size with aggregation.
10 REPLIES
Enthusiast

Re: Required help

Hi Sushant,
The all amp stat functions MIN() over (Partition by) and Max() over (Partition by) on billions of rows is the reason for this query to run for long time.Avoid this function.Simple Min and Max functions should serve your purpose.And if you have partitioned this table,you should make use of that PPI in your queries. Here the full table scan(FTS) is happening. FTS on this much big table is not recommended. Also collect stats on the non aggregate columns.

Thanks,
Aravind Hegde
Enthusiast

Re: Required help

Thanks Arvind

But that's also not worked . Earlier the query has been written without the grouping function.

Query Before using the Analystical Functions
SELECT
OAN_A ,
OAN_ODT_J ,
OAN_OBT_J ,
OAN_OAA_J ,
OAN_OCU_J ,
OAN_OPD_J ,
OAN_OSS_J ,
OAN_OAD_J , OAN_DOMCL_OCN_J , OAN_OMS_J ,
OAN_CRETN_ODT_J , OAN_FILE_OSX_N ,
Min(OAN_PARTN_KEY_1_X ) ,
Max(OAN_PARTN_KEY_1_X)
FROM dbatemp.O_ACNT_BAL_test_sp
Group By 1,2,3,4,5,6,7,8,9,10,11,12

1) First, we lock a distinct dbatemp."pseudo table" for read on a
RowHash to prevent global deadlock for dbatemp.O_ACNT_BAL_test_sp.
2) Next, we lock dbatemp.O_ACNT_BAL_test_sp for read.
3) We do an all-AMPs SUM step to aggregate from
dbatemp.O_ACNT_BAL_test_sp by way of an all-rows scan with no
residual conditions, and the grouping identifier in field 1.
Aggregate Intermediate Results are computed globally, then placed
in Spool 3. The input table will not be cached in memory, but it
is eligible for synchronized scanning. The aggregate spool file
will not be cached in memory. The size of Spool 3 is estimated
with no confidence to be 1,349,559,007 rows. The estimated time
for this step is 1 hour and 3 minutes.
4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan into Spool 1 (group_amps), which is built locally
on the AMPs. The result spool file will not be cached in memory.
The size of Spool 1 is estimated with no confidence to be
1,349,559,007 rows. The estimated time for this step is 3 minutes
and 9 seconds.
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 1 hour and 6 minutes.

Still it was takeing the time This has earlier taken 24hrs to insert the rows.

One more thing we are using the compression for the columns of the tables.
Enthusiast

Re: Required help

Hi,
I don't think you can get this query to be finished in less than 30 minutes.
Explain Plan itself telling the time of 1hr 3 min for sum step.
We also have this kind of scenario in our project,so we always make use of range partitioning on big tables.(More than 10 million records).
Also estimate the time for the select query in your case and then go for Insert.
Compression doesn't have any bad effect in retrieving the rows. But Insert may become bit slower because of compression.
Collect statistics on the table and see that you can retrive the rows of the table with high confidence or atleast low confidence.Explain plan is telling 'no confidence'

Thanks,
Aravind Hegde
Enthusiast

Re: Required help

After using the simple MIN and MAX WITHOUT OVER(PARTITION BY ..) Clause still it is taking long long time.
Enthusiast

Re: Required help

Since your original post mentioned about insert taking long time, I think the places to look for will be Transient Journal (ie the destination table is not empty), a skewed PI on dest table, an NUPI on dest table with too many duplicate values for PI cols.

So the solution would be based on the what is the case and other factors like size of destination table etc.

commmon solutions for TJ includes creating a new table and then doing a UNION[ALL] of the dest table contents and the SEL query and insert that to the new table.

for skewed PI, you need to weigh your options on making some other columns for PI. (big topic)

for NUPI collisions, you can try making the table MULTISET (beware of the catches in the form of dup records).

So the solution to your issue could be a combination of these ...
Enthusiast

Re: Required help

Arvind

My Skew factor for the destination table is very samll 1 percent but still do you think that it might be one of the reason.
I took the count for rows per amp. it is almost same. But still i am confused why it is havin an issue. And as you have seent that explain is also showing the 1hr 30min extimated time for the query . Is there any thing related to DBScontrol parameter i need to check for PPI cache threshold which is 500 by default
0 21446463
1 21443006
2 21446649
3 21437362
4 21446691
5 21446651
6 21453561
7 21441903
8 21443987
9 21441492
10 21438708
11 21439666
12 21440235
13 21438873
14 21444468
15 21451069
16 21424977
17 21410793
18 21413002
19 21421911
20 21420538
21 21413215
22 21407801
23 21417619
24 21415536
25 21416238
26 21412892
27 21421752
28 21417597
29 21411652
30 21416799
31 21419783
32 21414542
33 21419582
34 21412584
35 21403173
36 21415544
37 21411985
38 21420465
39 21414476
40 21416950
41 21419612
42 21416304
43 21406138
44 21419915
45 21417261
46 21410815
47 21420203
48 21405137
49 21418547
50 21409454
51 21418300
52 21416681
53 21422146
54 21417109
55 21425976
56 21412537
57 21417414
58 21418664
59 21415493
60 21414348
61 21413535
62 21423653
63 21419196
64 21420049
65 21419086
66 21416358
67 21414865
68 21412593
69 21417350
70 21419253
71 21412657
72 21414413
73 21410654
74 21415250
75 21420230
76 21415978
77 21405743
78 21413490
79 21418091
80 21421048
81 21423696
82 21425232
83 21425345
Enthusiast

Re: Required help

Joe mentioned hash collisions. What is the primary index of your target table and if it is not a unique primary index, how unique is it (how many rows do you expect to have per primary index value)?

A high number of hash collisions caused by a "not very unique" primary index will make a fairly large number of inserts take a very long time.

It looks like you have a lot of columns in your "GROUP BY" clause. Are most (if not all) of these columns in the primary index of your target table?

If you're not able to create a pretty unique primary index, changing the table to MULTISET will improve performance. It would allow dups in your table, but if this is the only insert happening to the table, you wouldn't have duplicates because your GROUP BY will eliminate duplicates.
Enthusiast

Re: Required help

Barry

I am using the PI (Primary INDEX and not the UPI)

But my explain shown me that estimated time for the same is 1hr and 30 sec but when we execute the INSERT it takes longer time and it is not inserting at all.
Enthusiast

Re: Required help

The time displayed by explain are meager optimizer's best guesses (based on the statistics "insider" information, if there's any). They are not the final authority in deciding how long the SQL is going to run.

Besides the explain you posted is for SEL query.

And even if you did an explain for INS it won't tell you much about Hash collisions and all that stuff.

So there are two things here...

How long does your SEL query takes to run (if the INS is not included).

What's the time difference in time when the INS is included.

That's the kind of info which would help you figure out where's the bottle neck and what you need to do about it.

kind of divide and conquer approach.