Copying giant table is very slow

Database
Apprentice

Re: Copying giant table is very slow

Hi Klaus,

 

Thanks for that info.

 

So the explain estimate is 14 hours. Hour long did the actual run take? or How long did you allow it to run for before cancelling the query?

 

Or are you basing your 'too slow' estimate on the estimated time shown in the Explain?

 

If the answer to my last question is 'Yes' then that is not a good idea. The times shown in an Explain plan usually have no relationship to the real elapsed time of the query.

For example this time doesn't take account of machine loading, priority etc. The elapsed time should be used as comparative between two pieces of SQL that both have the same functionality running against the same data (basically two ways of writing the same SQL).

 

The Explain plan that you show is exactly what I'd expect to see for this processing.

 

Cheers,

Dave

 

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Enthusiast

Re: Copying giant table is very slow

The Partition is default 65.535 with data in 143.

Here are Input Table Stats.

 

       Unique Values	Column Names
       4,357,183,253	*
                 143	PARTITION
          13,483,051	AR_ID
                  10	CR_DB_IND_CL
               2,071	EFF_DT
               1,911	END_DT
                   2	INT_APL_TP_CL
                   2	INT_RATE_CSZ_F
                  41	INT_TP_CL
                 143	LDB_ID
                  11	SEQ_ID
       4,357,183,253	AR_ID,EFF_DT,INT_TP_CL,LDB_ID,SEQ_ID
          13,483,051	PARTITION,AR_ID
          85,255,729	AR_ID,INT_TP_CL,LDB_ID,SEQ_ID
          13,599,979	AR_ID,LDB_ID
                 426	PPN_EV_ID
             394,245	EFF_DT,END_DT
          13,599,979	PARTITION,AR_ID,LDB_ID
          74,917,253	AR_ID,LDB_ID,INT_TP_CL
          74,917,253	AR_ID,INT_TP_CL
              10,582	INT_BAL_MX_AMT
Enthusiast

Re: Copying giant table is very slow

Hi Dave

I cancelled the Original CREATE TABLE AS query after two hours because of the DBC block.

I have just Executed the INSERT/SELECT query and will leave it running for at least an hour to see if the Explain is just way off.

I would expect this kind of FastPath INSERT/SELECT query to be done in an hour? It's 'only' 280GB FastPath.

/Klaus

Apprentice

Re: Copying giant table is very slow

Hi Klaus,

 

Thanks. What you might also want to do (using another sessions) is:

- run "SELECT TIME,SUM(CURRENTPERM) FROM DBC.TableSizeV WHERE DATABASENAME = '**bleep**' AND TableName = 'yyy';"

- run "LOCKING TABLE yyy FOR ACCESS SELECT TIME,COUNT(*) FROM yyy;"

 

'yyy' is the name of your table

 

This wil give an idea of the growth of data in your table from both a row count and perm space perspective.

 

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Enthusiast

Re: Copying giant table is very slow

INSERT/SELECT was done in 01:13:35

 

Now to recreate USI and STATS

 

/Klaus

Enthusiast

Re: Copying giant table is very slow

Looking at Viewpoint during the INSERT/SELECT, I am a bit surprised to see that only two AMPS are doing work.

This snapshot is from the query halfway mark (0:32:00)

98,7% CPU and IO skew.

 

INSERT PIC 1.JPGINSERT PIC 2.JPG

Apprentice

Re: Copying giant table is very slow

So if only 2 AMPs are doing any work that would explain why it is so slow.

 

This has to be the PI of the target table and the number of unique values. You might be REALLY unlucky and have lots of unique values which hash to only 2 AMPs, but unlikely...

 

However, from what you've shown earlier column AR_ID is the PI and there are ~1.3M unique values and your skew is ~1%. Only two AMPs doing the processing doesn't make sense (at least to me...)

 

Sorry to ask, but are you really sure about the skew of the table? Please try the following:

SELECT 100.00 * AVG(currentperm) / MAX(currentperm) AS par_eff
FROM dbc.tablesizev
WHERE databasename = 'x'
AND TABLENAME = 'y';

If the skew is 1% then this should come back with an answer of 99%

 

I think you said that you're on TD15.x, so it isn't a 2-AMP hash map...

 

Unless there are row hash locks on every other AMP in the system which is causing the processing on those AMPs to block I'm struggling to understand why this might happen.

 

Just a thought (and I'm not sure if this is even possible) can you check for ARC locks on this table. Use Viewpoint / Remote Console / Query Session to display these.

 

Cheers,

Dave

 

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Enthusiast

Re: Copying giant table is very slow

Hi Dave

Output of your skew Query is 99,50 = 1% Skew

No ARC Locks on the table.

Here are 4 screenshots from Viewpoint at the 5min, 15min, 25min and 55min mark.

The Query uses all AMPs for the first 10min.

At that point it only uses 2 AMPS for the remaining period and CPU and IO count stops in its tracks.

This points to massive data skew but the skew number for the table just doesn't back that up??? I'm really puzzled.

Thanks for using your spare time on helping me out.

 

5.JPG

15.JPG

25.JPG

55.JPG 

Highlighted
Apprentice

Re: Copying giant table is very slow

Puzzled? You and me both!

 

Any Join or Hash Indexes defined on the target table?

Any triggers defined on the target table?

 

Apart from that I think you might need to speak to TD CS about this.

 

Sorry I couldn't help any further.

 

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Teradata Employee

Re: Copying giant table is very slow

A couple paths to explore:

- The NUSI is fairly non-unique. Even though there is not skew in the table as a whole, are there AR_IDs that have a LOT more rows than the average of 323 rows/nusi value? And are there many LDB_IDs for a single AR_ID or is one AR_ID contained within one LDB_ID?

- Other workload. If there is other very skewed work on the platform, then this work will be slowed down on the AMPs where the other skewed work is running and will be stretched out on those AMPs. The other AMPs will complete, and these will need to keep running to get finished.

- As Dave suggested, ask Teradata CS to health check the hardware. We have had cases of "slow luns" where a disk or controller was accting poorly without raising any fault indications. Or a memory chip that is beginning to fail could cause a performance issue in a node. Point them at the two nodes identified as the skewed ones.