Is there anyway to get the .NET provider to perform anywhere near as well as fastload for loading data? We have several applications, in c#, that need to monitor directories and then load those files into a Teradata warehouse. I would prefer not to have to shell out to call fastload. Currently I am reading/parsing the file into a dataset and then using a dataadapter to update the table. Performance is no where near what fastload accomplishes. Does anyone have any suggestions/tips/tricks as to how to improve the .NET provider performance?
I tried the multiple data tables (with individual TDConnections) route also. It was actually slower and sometimes I received a deadlock error. My guess is that the INSERT statements generated by the DataAdapter Update method were getting in the way of each other (?). I don't mind shifting into C++ for access to the CLI. Is there any documentation available on how to program to it? Ideally, if I went down this road I would like to create a .NET wrapper DLL for the CLI.
I wrote "empty staging tables". That is each thread/connection/DataTable will load into a separate empty table.
For example lets assume that you want to load 1,000,000 rows. You can partition the rows across 30 threads. Each thread will read X rows into a DataTable, establish a connection to the Teradata database, create a new table, and insert the rows into the table.
Later you can use a single connection to INSERT-SELECT from the 30 staging tables into the target table.
I wrote "C++/CLI". In this case CLI refers to "Common Language Infrastructure".
The Teradata Parallel Transporter Application Programming Interface (Teradata PT API) is a set of application programming interfaces used to load and export data to and from Teradata systems. Teradata PT API enables an application to access the Teradata Database using proprietary Teradata load and export protocols (Teradata FastLoad, Teradata FastExport, and Teradata MultiLoad). Unlike the Teradata utilities and Teradata Parallel Transporter that are driven by scripts, Teradata PT API is a functional library that is part of your applications. This allows the applications to have more control during the load and export processes.
My experience relates to extracting data so it may not be 100% applicable so take it for what it's worth :)
I recently completed a project to compare the extract performance characteristics of TPT API, Fast Export, .NET Provider (v1.02) & ODBC. Without getting into all of the details, I was able to achieve raw extract performance equal and in some cases better (depending on volume of extract) using the .NET provider than I could with TPT API and Fast Export.
Bottom line, physical location of the server in relation to the Teradata servers is critical. The .NET provider is essentially a "single session" approach whereas I can increase the # of sessions in TPT & FExp. Across the WAN (multiple network hops), multiple sessions in TPT/FExp will drastically outperform .NET however when on the same LAN (or a max of 2 hops), multiple sessions in TPT/FExp actually showed a significant decrease in performance (possibly related to thread blocking due to more data coming than the utils can handle). In this scenario, where TPT/FExp were forced to run with a single session, .NET performed just as fast and in some cases even faster.
I did write managed C++ wrappers around the TPT API for extract only but my code was not the source of the performance drop as using the stand alone utilities by themselves behaved exactly the same.
I was reading in the Orange Book, Indices and Active Ingest, it mentioned using Iterative Arrays to achieve 1000+ rows/sec throughput. I have not been able to find anything more about how to code/implement this? Has anyone else read/heard of using iterative arrays with the .Net Provider/
I normally set the UpdateBatchSize to 0 now for my DataAdapter. From previous disucssions in this forum, it appears that the window size of the data going to the database is roughly 1 MB. So when you set the UpdateBatchSize to 0, it will put as many rows in as it can in that 1 MB window. How you set up the UPDATE statement for the DataAdapter will affect your performance. By default, if you let the framework create the UPDATE statement it tries to compare the values on every column, and as such, must transport alot more data than you may need. That can eat into the 1MB window size pretty quick. On large updates, I manually set the UpdateCommand so that it uses just the primary index columns and the fields I need to change. Using this, I have been able to pack alot more rows in the 1 MB window. For inserts, you pretty much get what you get but with one caveat. If you have default values in your destination table and can use them, do so. In a nutshell, any technique you can use to reduce the amount of information per row that needs to be sent back ( and thus giving you more rows per window) will give you better performance. What I would really like to see implemented in the .NET provider is something similar to the BulkUpdate that the SQL Server provider has.