FastLoad Performance Tuning for large volumes of input data

Tools & Utilities
Enthusiast

FastLoad Performance Tuning for large volumes of input data

Hi,

I am facing a problem Loading a data from an input file. Though the data is being successfully loaded to respective table in Teradata, the time taken for this task is high.
Some important information are:

The system is MP RAS UNIX

The input records are read as VARTEXT

**** 15:55:45 Number of FastLoad sessions requested = 8

**** 15:55:51 Number of AMPs available: 10

TOTAL RECORD LENGTH = 3502

Total number of records inserted is 778100

Total CurrentPerm of the table after successful loading is 378,142,720

Time taken to load is appr. 8 hrs (1hr for 100000 records)

Can someone help me out to increase the speed of the task for the above scenario.

Thanks,

12 REPLIES
Enthusiast

Re: FastLoad Performance Tuning for large volumes of input data

Could you give us some insight into your target table? What is the primary index of your target table and how unique is the primary index?
Enthusiast

Re: FastLoad Performance Tuning for large volumes of input data

Target table is created as below:

CREATE TABLE fssLoad_t(
UnixTimestamp char(24),
transDateTime TIMESTAMP NOT NULL,
UserID char(6) NOT NULL,
ErrorCode Integer,
InterfaceType Integer,
RowLimit Integer,
RowsFound Integer,
BytesRetrieved Integer,
RowsReturned Integer,
SizeReturned Integer,
DBRespTime Decimal(10,2),
TotalTransTime Decimal(10,2),
ConstrainedTime Decimal(10,2),
GSEQueryCost Integer,
TotalCPUCost Integer,
SQLString varchar(3000),
ClientAppID varchar(25),
ClientApplVersion char(35),
ClientObjectID char(35),
ClientObjectVersion char(35),
TopEndProd varchar(10),
GSEServerVer char(15),
GSEDicVer char(15),
GSEViews varchar(50) ,
OraInstNames varchar(30),
ReqOptimised char(10),
NoInstruments Integer default NULL,
NoFetches smallint default NULL
)
PRIMARY INDEX(UserID);;

As you can see above, there is a NUPI on UserID.

Thanks,
Enthusiast

Re: FastLoad Performance Tuning for large volumes of input data

HI,

As stated ion your query you have NUPI defined on the table. You have not mentioned whether you table is SET or MULTISET. Assuming your table is SET, then Teradata does a duplicate row check for SET table with No UNIQUE INDEX defined. . Therefore, a SET table with many rows per NUPI has a terrible insert performance. Create a MULTISET table and see the change in performance.
Hope this helps.

Thanks
Angel
Enthusiast

Re: FastLoad Performance Tuning for large volumes of input data

Thanks Angel.

I created the table as MULTISET and made a test run for 100000 records. There was slight improvement in the speed but still the performance is hit by time consumption of little less than an hour for the above set of records.

Please help me out in further improving the performance.

Thanks,
Enthusiast

Re: FastLoad Performance Tuning for large volumes of input data

You may want to see the skew factor for the table.
You can use the following query or you may use Teradata administrator to find this.

It is best if we have a skewfactor of about 5-8 for large tables.
This is only possible by selecting a unique enough primary index.
By selecting a unique enough PI, we ensure that the data is evenly distributed over AMPS.
Another solution to reduce the run time is to load the data into staging table having composite PI. You may do a INSERT..SELECT.
In any case, there will be re-distribution of rows when you do INSERT..SELECT.

Hope this helps.

SEL
DatabaseName
,TableName
,(100 - (CAST(AVG(CurrentPerm) AS DECIMAL(18,2)) /MAX(Currentperm)*100))
FROM DBC.tablesize
GROUP BY DatabaseName, TableName
WHERE DatabaseName IN ('DB_Name')
AND TABLENAME = 'TableName'
ORDER BY databasename, TableName;
Enthusiast

Re: FastLoad Performance Tuning for large volumes of input data

Thanks Vinay.

The reply was more informative. I checked out for the skew factor and it falls between 5-8.
I have a requirement constraint that i can't create a Unique PI. In fact, the records cannot support a unique PI.
I tried with your second solution of having a composite PI. But, eventhough there was an even distribution to AMPs, it still takes the same time to load 100000 records.

Kindly let me know if any other information is required.

If someone can tell me what is the optimum time consumed by the FastLoad utility for loading data from input file which takes a permspace of 100,000,000 on Teradata and the necessary configuration information for this optimal performance, then it would be of great help.

Please help me.

Thanks,
Enthusiast

Re: FastLoad Performance Tuning for large volumes of input data

Hmmm...

Can you post more information
* Network bandwidth details (client/host)
* Current workload details
* Number of sessions used by Fast load
* Actual process - Are you dropping and recreating the table or truncating
* If you are truncating, you may want to use DELETE..ALL option (this may not be the problem as you seem to have localized the problem to the fast load script itself)
* Probably snippets from your fast load log
* It may very well be network issue in reading the file if the file is on the file server

I don't think 1 GB of data loads to be an issue.
It is typical to load 100's of GB of data w/out performance impact (i am be overboard here)

Lets see if others have better suggestion than this.

Vinay
Enthusiast

Re: FastLoad Performance Tuning for large volumes of input data

Hmmm...

Can you post more information
* Network bandwidth details (client/host)
* Current workload details
* Number of sessions used by Fast load
* Actual process - Are you dropping and recreating the table or truncating
* If you are truncating, you may want to use DELETE..ALL option (this may not be the problem as you seem to have localized the problem to the fast load script itself)
* Probably snippets from your fast load log
* It may very well be network issue in reading the file if the file is on the file server

I don't think 1 GB of data loads to be an issue.
It is typical to load 100's of GB of data w/out performance impact (i am be overboard here)

Lets see if others have better suggestion than this.

Vinay
Enthusiast

Re: FastLoad Performance Tuning for large volumes of input data

Please find below some snippets of the output log of fastload:

**** 10:28:43 Processing starting at: Tue Jul 4 10:28:43 2006
**** 10:28:43 Character set has been set to: ASCII
**** 10:28:43 Tenacity Enabled: 3 hour(s)
**** 10:28:43 Sleep Minutes Set: 5 minute(s)
**** 10:28:43 Buffer size has been set to 49 KBytes

0001 SHOW VERSION;

FastLoad Version 07.05.00.002 for UNIX 5.4 running Streams TCP/IP

0003 RECORD 1 thru 100000;

**** 10:28:43 Starting record number set to : 1
**** 10:28:43 Ending record number set to : 100000

**** 10:28:43 Current CLI or RDBMS allows maximum row size: 64K
**** 10:28:43 Character set for this job: ASCII
**** 10:28:45 Number of FastLoad sessions connected = 10
**** 10:28:45 FDL4808 LOGON successful

Here I dropped the destination and error tables.

Here I created a destination table as in previous post with MULTISET and PRIMARY INDEX(UserID,transDateTime).

**** 10:28:50 Now set to read 'Variable-Length Text' records

**** 10:28:52 Number of AMPs available: 10
**** 10:28:52 BEGIN LOADING COMPLETE

**** 10:28:52 FDL4803 DEFINE statement processed

TOTAL RECORD LENGTH = 3502

Here I did INSERT INTO...VALUES...

**** 10:28:53 Number of recs/msg: 14
**** 10:28:53 Starting to send to RDBMS with record 1
**** 10:33:34 Sending row 10000
**** 10:39:44 Sending row 20000
**** 10:44:58 Sending row 30000
**** 10:49:53 Sending row 40000
**** 10:54:34 Sending row 50000
**** 10:59:52 Sending row 60000
**** 11:05:08 Sending row 70000
**** 11:10:47 Sending row 80000
**** 11:16:22 Sending row 90000
**** 11:22:01 Sending row 100000
**** 11:22:52 Finished sending rows to the RDBMS

0015 END LOADING;

**** 11:22:54 END LOADING COMPLETE

Total Records Read = 100000
- skipped by RECORD command = 0
- sent to the RDBMS = 100000
Total Error Table 1 = 0 ---- Table has been dropped
Total Error Table 2 = 0 ---- Table has been dropped
Total Inserts Applied = 99987
Total Duplicate Rows = 13

Start: Tue Jul 4 11:22:53 2006
End : Tue Jul 4 11:22:54 2006

**** 11:22:58 Logging off all sessions
**** 11:22:59 Total processor time used = '1.44 Seconds'
. Start : Tue Jul 4 10:28:43 2006
. End : Tue Jul 4 11:22:59 2006
. Highest return code encountered = '0'.
**** 11:22:59 FDL4818 FastLoad Terminated

From above it is clear that most of the time is consumed in Loading data. Checkpoint can seem to be a concern, but there was no improvement even without checkpoints.

There is not much workload on NCR node.

Thanks,