Fetch performance: TeradataStudio vs. Python teradata library

Teradata Studio
Fan

Fetch performance: TeradataStudio vs. Python teradata library

Hi,

 

We have an application that needs to offload data on a daily basis from Teradata. The application is written in Python and uses the Python Teradata library (version 15.10.0.21). Functionally it works, but downloading of data is really slow. Execution of the query is OK-ish, about 2 minutes (table has >400M rows), however the actual offloading of the records is slow.

 

For example:

  • offloading 1.1M rows of data takes ~1 hour using the python application (data volume 660MB)
  • whereas, offloading the exact same data using Teradata Studio only takes 15 minutes

So offloading using Teradata Studio is 4 times faster (both situations were run from the same machine, using the same connection to Teradata).

 

Can someone explain why Teradata Studio is so much faster? Is there some (undocumented) setting that we need to do in the Python library to make it faster? Are there alternatives?

 

I also tried sqlalchemy i.c.m. with tdodbc1620__linux_indep.16.20.00.50-1 driver and that performed 10% slower than the Python Teradata library. 

Any help is appreciated. 

 

Regards,

Gero

2 REPLIES 2
Enthusiast

Re: Fetch performance: TeradataStudio vs. Python teradata library

What you are probably seeing with Studio is the fact that it is JDBC based and the "export" is using the JDBC fastexport feature depending on how you told it to export the data.   If you are dumping a table and used the right click on table name -> Data->Export Data then that is going to use the fastexport feature that is built into JDBC, same way the smartloader process uses fastload to read a file and load it to the database.   If you used Studio and just ran the query and said "export reults to a file" it would probably take as long as the python process did.    The python drivers do not have the ability to use the load utilities such as fastload/fastexport at this time.   Although i hope they do eventually add that capability or way to invoke the TPT processes at the least.

Teradata Employee

Re: Fetch performance: TeradataStudio vs. Python teradata library

Please try the "teradatasql" Python package, which is our native Python DBI Driver.

 

pip install teradatasql

 

Documentation and sample programs are available at https://github.com/Teradata/python-driver

 

(In contrast, the "teradata" Python package provides a wrapper around the Teradata ODBC Driver.)

 

The teradatasql package does not offer FastExport support yet, but we are working on that.

 

In the meantime, we are interested to hear your feedback on the regular SQL result set fetch performance of the teradatasql package. Thanks!