Teradata Python Module

Tools
Tools covers the tools and utilities you use to work with Teradata and its supporting ecosystem. You'll find information on everything from the Teradata Eclipse plug-in to load/extract tools.
Enthusiast

Re: Teradata Python Module

This module is awesome and I've been able to run queries and return to Pandas for data analysis, which is great! I'm having an issue with writing queries that create volatile tables. I keep getting an error: (3932, '[25000] [Teradata][ODBC Teradata Driver][Teradata Database] Only an ET or null statement is legal after a DDL Statement. ')

 

Can anyone help with this?

 

import teradata as td
import pandas as pd
from teradata import tdodbc

 

udaExec = td.UdaExec (appName="dpull", version="1.0",logConsole=False)

with udaExec.connect(method="ODBC", dsn="tdata",username="un", password="pw") as session:

 

query = """
CREATE VOLATILE TABLE vt_1
,NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS

(
SELECT
id
,seq_num
,key

from src as src

)
WITH DATA PRIMARY INDEX (id)
ON COMMIT PRESERVE ROWS;

 

SELECT
vt.id
,vt.seq_num
,vt.key
,t2.adj_rank

FROM vt_1 as vt

inner join table2 as t2

on vt.id = t2.id

"""
df = pd.read_sql(query,session)

Enthusiast

Re: Teradata Python Module

Does the query work on Teradata Assistant or BTEQ?

You could split the query:

query1 = """
CREATE VOLATILE TABLE vt_1
,NO FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL AS

(
SELECT
id
,seq_num
,key

from src as src



)
WITH DATA PRIMARY INDEX (id)
ON COMMIT PRESERVE ROWS; """



query2 =
"""

SELECT
vt.id
,vt.seq_num
,vt.key
,t2.adj_rank

FROM vt_1 as vt

inner join table2 as t2

"""

session.excute(query1)
df = pd.read_sql(query2,session)

Because you are in the same session you VOL table will still be accessible.

Looking at your second SELECT are we not missing ON conditions of our link between vt_1 and t2?
Enthusiast

Re: Teradata Python Module

Thanks so much for your help! I tried your script and got a different error. I did have a typo missing the ON in the example query. If I run the query I receive:

 


File "\teradata\tdodbc.py", line 192, in getDiagnosticInfo
"Invalid handle passed to SQLGetDiagRecW.")

InterfaceError: Invalid handle passed to SQLGetDiagRecW.

 

I'm not sure if it's assistant or BTEQ, how would I know?

Enthusiast

Re: Teradata Python Module

Okay, I must've had some issue with my SQL code, because that worked!! Thanks so much for the help. You can actually do several queries following the logic you provided:

 

session.execute(query1)
session.execute(query2)
df = pd.read_sql(query3,session)

Enthusiast

Re: Teradata Python Module - help with udaexec.ini

I can't seem to get the module to recognize my udaexec.ini config file. I made sure it's saved in the same directory as the ipython notebook that I'm using to call the query. Below is my initial code and I have the file configured based on the instructions provided. Any idea what I could

 

 

import teradata as td
import pandas as pd
from teradata import tdodbc

udaExec = td.UdaExec()

with udaExec.connect("{$udaexec.ini}") as session:
InterfaceError: ('CONFIG_ERROR', 'Unable to resolve "{$udaexec.ini}".  Parameter not found: \'udaexec.ini\'.  If parameter substitution is not intended, escape \'$\' by adding another \'$\'.')

 

Teradata Employee

Re: Teradata Python Module - help with udaexec.ini

Two things wrong:

The $ needs to be outside the { and the value inside the { } needs to be a keyword parameter name found in the [CONFIG] section within the udaexec.ini file - not the name of the file itself.

 

Enthusiast

Re: Teradata Python Module - help with udaexec.ini

Thank you for your help! I realized the typo after looking at it, but the rest wasn't quite as straightforward, but was able to figure it out. I realize I had to enter my data source name/address in the "system" parameter and then use the below:

 

udaExec = td.UdaExec(appConfigFile="udaexec.ini")

with udaExec.connect("${dataSourceName}") as session:
Visitor

Re: Teradata Python Module

Thanks for making this module. 

 

I'm having some difficulty with the data types I receive from my queries. Unfortunately, the keys associated with some tables in my database are 18 digit integers. When I write a query, the last couple of digits seem to be randomly assigned. I understand that this has to do with the floating point accuracy in python. Is there a way that I can assign particular columns to be pulled as another type like character or integer?

 

Here's what I'm doing:

 

 

udaExec = teradata.UdaExec (appName="HelloWorld",
                            version="1.0",
                            logConsole=False)
 
session = udaExec.connect(method="odbc", dsn="tdprod1",
                          username="...", 
                          password="...")
query = "SELECT * FROM ..."

#Reading query to df
pat = pd.read_sql(query,session)

From my googling, it seems like the 'dataTypeConverter' argument to 'connect' can modify how the data types are transformed when they come back to python but I'm struggling.

 

Any help would be greatly appreciated. Thank you!

Teradata Employee

Re: Teradata Python Module

Your query could CAST the key value to CHAR/VARCHAR at the database end.

 In theory you could write your own conversion function but that seems like the hard way.

 

Or you might consider switching from the "teradata" module to the current Teradata SQL Driver for Python (teradatasql). That driver maps Teradata integer data types to Python int rather than decimal.decimal.

Re: Teradata Python Module

I had exactly the same problem, so I wrote my own coverter, which is rather trivial. Teradatasql looks great, but still immature in my opinion (last time I chekcked it you could not select the transaction mode, and would not do many other things that are pretty basic to me, but I'm really looking forward).

 

Here my Conversion thing. In addition to the int/float thing I am fixing a bug in python 3.6 on windows for converting dates. 

 

"""
Custom utilities for data conversion between teradata and python.
Written by Otto Fajardo march 2018
"""
__author__ = "fajardoo"

from teradata.datatypes import *
from teradata import util
from teradata.api import *


NUMBER_TYPES = ("BYTEINT", "BIGINT", "DECIMAL", "DOUBLE", "DOUBLE PRECISION",
                "INTEGER", "NUMBER", "SMALLINT", "FLOAT", "INT", "NUMERIC",
                "REAL")

INT_TYPES = ("BYTEINT", "BIGINT", "INTEGER", "SMALLINT", "INT")

FLOAT_TYPES = ("FLOAT", "DOUBLE", "DOUBLE PRECISION", "REAL", "DECIMAL", "NUMBER", "NUMERIC")

BINARY_TYPES = ("BLOB", "BYTE", "VARBYTE")



class CustomDataTypeConverter(DefaultDataTypeConverter):
    """
    Handles conversion of result set data types into python objects.
    The differences with the default one (in teradata.datatypes)
    * Numeric types are by default converted to either float or int.
      There are two parameters in the constructor: useFloat and useInt, if those are false, the values are converted
      to Decimal.
      There are three numeric types in the default that are not classified as either float or int types: DECIMAL,
      NUMBER and NUMERIC. Here we are classifying these three as floats, so all types are either float or int.
    * Times and dates conversion use a different formula, as the default one (datetime.datetime.fromtimestamp) has bugs
      it fails on windows python 3.6 when using timestamps < 8600, and it fails on linux when using very big negative
      intergers.
    """

    def __init__(self, useFloat=True, useInt=True):
        """
        Constructor of the class
        Parameters
        ----------
        useFloat : bool
            use Float type instead of Decimal
        useInt : bool
            use int type instead of Decimal
        """

        self.useFloat = useFloat
        self.useInt = useInt

    def tera_timestamp_to_datetime(self, timestamp):
        """
        Converts a teradata timestamp (unix timestamp where 0 is 1 Jan 1970 00:00:00, but expressed in milliseconds
        instead of seconds) to a datetime.datetime object.

        Parameters
        ----------
        timestamp: float: teradata timestamp

        Returns
        -------
        server_datetime: datetime.datetime

        """
        # this is the original one, it works on unix, but not in python 3.6 on windows (python bug)
        # temp = datetime.datetime.fromtimestamp(float(value) // 1000)

        temp = datetime.datetime(1970, 1, 1, 0, 0) + (datetime.timedelta(seconds=(float(timestamp) // 1000)))
        server_datetime = temp.replace(microsecond=timestamp % 1000 * 1000)
        return server_datetime

    def convertValue(self, dbType, dataType, typeCode, value):
        """Converts the value returned by the database into the desired
         python object."""
        logger.trace(
            "Converting \"%s\" to (%s, %s).", value, dataType, typeCode)
        if value is not None:
            if typeCode == NUMBER:
                try:
                    return NUMBER(value)
                except:
                    # Handle infinity and NaN for older ODBC drivers.
                    if value == "1.#INF":
                        return NUMBER('Infinity')
                    elif value == "-1.#INF":
                        return NUMBER('-Infinity')
                    else:
                        return NUMBER('NaN')
            elif typeCode == float:
                return value if not util.isString else float(value)
            elif typeCode == int:
                return value if not util.isString else int(value)
            elif typeCode == Timestamp:
                if util.isString(value):
                    return convertTimestamp(value)
                else:
                    temp2 = self.tera_timestamp_to_datetime(value)
                    return temp2
            elif typeCode == Time:
                if util.isString(value):
                    return convertTime(value)
                else:
                    temp2 = self.tera_timestamp_to_datetime(value)
                    return temp2.time()
            elif typeCode == Date:
                if util.isString(value):
                    return convertDate(value)
                else:
                    temp2 = self.tera_timestamp_to_datetime(value)
                    return temp2.date()
            elif typeCode == BINARY:
                if util.isString(value):
                    return bytearray.fromhex(value)
            elif dataType.startswith("INTERVAL"):
                return convertInterval(dataType, value)
            elif dataType.startswith("JSON") and util.isString(value):
                return json.loads(value, parse_int=decimal.Decimal,
                                  parse_float=decimal.Decimal)
            elif dataType.startswith("PERIOD"):
                return convertPeriod(dataType, value)
        return value

    def convertType(self, dbType, dataType):
        """Converts the data type to a python type code."""
        typeCode = STRING
        if dataType in NUMBER_TYPES:
            typeCode = NUMBER
            if self.useFloat and dataType in FLOAT_TYPES:
                typeCode = float
            if self.useInt and dataType in INT_TYPES:
                typeCode = int
        elif dataType in BINARY_TYPES:
            typeCode = BINARY
        elif dataType.startswith("DATE"):
            typeCode = Date
        elif dataType.startswith("TIMESTAMP"):
            typeCode = Timestamp
        elif dataType.startswith("TIME"):
            typeCode = Time
        return typeCode

 

In order to use it you have to something like this:

 

dtype_converter = CustomDataTypeConverter(useFloat=True, useInt=True)

session = udaExec.connect( 'all the other parameters as usual',
                                       dataTypeConverter = dtype_converter)