Teradata & SQLAlchemy: Introduction

Tools
Tools covers the tools and utilities you use to work with Teradata and its supporting ecosystem. You'll find information on everything from the Teradata Eclipse plug-in to load/extract tools.
Teradata Employee

Teradata & SQLAlchemy: Introduction


SQLAlchemy

SQLAlchemy is an open source python library which provides an abstraction over relational databases making them easier to work with. It provides a SQL expression language to generate SQL queries and has an object relational mapper that allows you to express database queries in terms of classes and relationships between them. Using SQLAlchemy has several advantages:

(1) Write generic python code to work across different database backends.
(2) Interact with the database through python objects instead of writing SQL.
(3) Expose data to powerful python libraries.
(4) It can support the features of the underlying database.
(5) It's free.
 

The first and second point is especially useful if you find yourself working with many databases but don't want to write a SQL query for each database backend that essentially does the same thing. That said, SQLAlchemy is another layer of abstraction over the database so it is another "language" to learn but it goes a long way. SQLAlchemy includes built-in support for databases such as SQLite, PostgreSQL, and MySQL. To use it effectively, you should still know how the underlying database works, however with SQLAlchemy, interacting with it is much easier.

The third point is especially useful for obvious reasons. With SQLAlchemy, data in a database can be exposed to python libraries easily. Data can also be processed and stored in these databases as well. The fourth point is nice since you would typically want to expose the features of the underlying database for your python users without the abstraction getting in your way. SQLAlchemy essentially does this by relying on libraries like sqlalchemy-teradata to implement this expressivity.

So in summary, SQLAlchemy is an open source, database agnostic, unifying, way to interact with databases in python. It exposes data to the python ecosystem and is expressive enough to support features of the underlying database. Visit their website for a full list of the key features, testimonials, and organizations using it.

 


Table of Contents

  1. Getting Started
  2. Connecting to the database: Engine, Connection, and Dialect
  3. MetaData, Tables, and Columns
  4. Dialects
  5. Data Manipulation Language
  6. Table Reflection
  7. Contributing and Next Steps

 


Getting Started

SQLAlchemy allows users to interact with relational databases using the Python Database API (henceforth DBAPI). The DBAPI is an interface that different database vendors implement to allow users to connect and interact with the database in python.

Teradata implements the DBAPI 2.0 spec in the teradata library. There is an excellent article on it that I recommend going through. We will give a high level overview of some of the concepts in SQLAlchemy, the Teradata Dialect, and how you use it with the Teradata Database.

This article encourages your involvement by trying the examples. If you don't have python installed, I recommend the anaconda distribution of python since it has many useful libraries pre-installed. If you want something more lightweight then there is also miniconda. Otherwise, all you really need is python 2.7 and pip installed. For Mac users, sudo easy_install pip seems to work for installing pip.

Next we'll need the sqlalchemy-teradata library:

pip install sqlalchemy-teradata

The teradata and sqlalchemy python libraries will also be installed if they aren't already installed on your system. The teradata python library has a module that implements the DBAPI over ODBC (tdodbc). The sqlalchemy-teradata package uses this DBAPI implementation in order to process queries. Thus the sqlalchemy-teradata library is a bridge between SQLAlchemy and tdodbc, allowing access to the Teradata database from SQLAlchemy.

The Teradata ODBC driver is needed to use the sqlalchemy-teradata library since it uses tdodbc. The driver is available for Linux, Mac OSX, and Windows

In general, databases will have different access mechanisms (e.g. ODBC, JDBC, etc.) as well as different python DBAPI implementations. As a result, there are generally multiple combinations of Databases and DBAPI implementations that can be used with SQLAlchemy. This variability is handled by the Dialect class in SQLAlchemy. We'll learn more about it in the Dialects section, but in short, it provides a way to handle specific DBAPI interactions with the underlying database.


Overview

 

The sqlalchemy-teradata library implements interfaces provided by the sqlalchemy library to define interactions with the Teradata Database.

 


Connecting to the database: Engine, Connection, and Dialect

In order to execute queries against a database, you need a connection to it. SQLAlchemy manages connections using the sqlalchemy.engine.base.Engine object. An Engine can execute raw SQL queries, provide an explicit connection to execute queries, and manage a pool of connections.

 

from sqlalchemy import create_engine
user = 'sqlalc_user'
pasw=user
host = 'hostname'

# connect
td_engine = create_engine('teradata://'+ user +':' + pasw + '@'+ host + ':22/')

# execute sql
sql = 'select * from dbc.usersV'
result = td_engine.execute(sql)

 

In the example above, an Engine instance was created using the create_engine function. You'll notice that I passed a string url to the function. It specifies information necessary to connect to the database such as the database and dbapi combination to use, the credentials to use, the host to connect to, and a port (default is 22). The credentials refer to a database user and password that exists in the Teradata database.Connections and components

 

In the diagram above, there are three python libraries at work to make the execution successful. The sqlalchemy.engine package has a Connection class that encapsulates the database connection mechanism. It maintains the scope of a connection provided by the DBAPI explicitly. In our case, it encapsulates the underlying connection implementation in teradata.tdodbc. The TeradataDialect class simply tells the engine what implementation of the Connection class to use. This whole process basically plugs in Teradata-specific database interactivity.

Once the engine instance is created, we specify the SQL as a string for the engine to execute. It will start a connection for us using the underlying DBAPI Connection implementation, call the cursor to execute the SQL, get the results into memory, and finally close the connection. Starting up connections can be expensive so the engine will typically keep a pool of them for threads to re-use. The engine manages the connections for you.

We can keep a connection open, use it to execute queries, and close it ourselves.

 

conn = td_engine.connect()
conn.execute(sql)
conn.close()

 

Closing a connection simply returns it to the connection pool. In order to configure connection pools you can pass parameters to the create_engine function. In this article we simply stick to the default parameters.

 


Fetching Results

The results of the execution are kept in the result variable. It is a ResultProxy object which allows you to interact with the rows returned from the query. The ResultProxy manages the rows and is a SQLAlchemy extension of the underlying ODBC cursor implementation. It has methods to fetch rows from the result set and stores metadata on the rows it has.

 

# print first 5 attributes of the row
keys = result.keys()
print([x for x in keys[:5]])

[u'UserName',
u'CreatorName',
u'PasswordLastModDate',
u'PasswordLastModTime',
u'OwnerName']

 

ResultProxy objects have convenience methods to fetch rows in various ways, similar to cursor.fetch*() methods to read data from the result set

 

first_row = result.fetchone()     # Single Row
next_3_rows = result.fetchmany(3) # List of Rows

 

Typical usage of the ResultProxy is to iterate over it, processing each row.

 

# iterate over the rows and print first 5 cols
for row in next_3_rows:
print(row[:5])
print

(u'user',
u'SQLALC_USER',
datetime.date(2016, 6, 20),
datetime.time(16, 43, 51),
u'sqlalc_user')

(u'sqlalchemy',
u'SQLALC_USER',
datetime.date(2016, 6, 20),
datetime.time(16, 43, 51),
u'sqlalc_user')

(u'dba',
u'SQLALC_USER',
datetime.date(2016, 6, 20),
datetime.time(16, 43, 51),
u'sqlalc_user')

 

The first_row and row variables are RowProxy objects (or a list of them in the case of next_3_rows). Simialr to the ResultProxy, it is a wrapper that provides methods to make processing rows easier.

 

print(first_row.keys()[:5])
assert first_row.UserName == first_row['UserName']

[u'UserName',
u'CreatorName',
u'PasswordLastModDate',
u'PasswordLastModTime',
u'OwnerName']

 

Typically you'd execute queries generated by functions and objects provided by SQLAlchemy instead of strings which we'll see more of next.

 


MetaData, Tables, and Columns

 

The MetaData object collects information about your database schema by keeping track of tables created in SQLAlchemy. It can be dynamically bound to a database connection or engine in order to enable operations against the elements that it contains. These operations typically need access to the database, for example, creating a table associated with the MetaData object. Any table created in SQLAlchemy must have a MetaData associated with it.

As an example we'll create a simplified 'Department' table with three column attributes: 'deptno', 'name', and 'location'.

 

from sqlalchemy.sql.schema import Table, Column
from sqlalchemy import MetaData
from sqlalchemy.types import Integer, String, Unicode

meta = MetaData()
meta.bind = td_engine

dept = Table('Department', meta,
Column('deptno', Integer, nullable=False, unique=True),
Column('name', Unicode(256), nullable=False),
Column('location', Unicode(256), nullable=False)
)

 

We bound the MetaData object by setting the meta.bind attribute to the engine managing our connection to the Teradata database. This allows us to implicitly execute SQL statements against the table like insert and select -- all without needing to explicitly obtain a connection from the engine. Then we defined a table using the Table and Column constructs provided by SQLAlchemy. Once a table is defined, we can see how it will be created using the CreateTable class.

 

from sqlalchemy.schema import CreateTable

# create and print a CreateTable construct representing Create Table DDL
def show_create_table(table, engine):
print(CreateTable(table).compile(engine))

show_create_table(dept, td_engine)

CREATE TABLE "Department"  (
deptno INTEGER NOT NULL,
name VARCHAR(256) CHAR SET UNICODE NOT NULL,
location VARCHAR(256) CHAR SET UNICODE NOT NULL,
UNIQUE (deptno)
)

 

Metdata and Tables

SQLAlchemy separates python schema objects (like Table, Column, and Index) from its DDL representation. A Schema Definition Language is provided to access metadata on these objects, give their DDL representation, define constraints, and more. We've used it above to get the Teradata DDL for creating that table. We instantiate the CreateTable object with our Table instance and call the compile method with an engine that uses the TeradataDialect provided by sqlalchemy-teradata to generate the DDL.

 

Notice that the constraints on the columns of the table are displayed above. They are set by the named arguments in the Column constructor where we also pass the type and the column name. This is the table that actually gets defined when the query gets processed by the database:

 

bteq > show table departments;
CREATE SET TABLE SQLALC_USER.department ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
deptno INTEGER NOT NULL,
name VARCHAR(256) CHARACTER SET UNICODE NOT CASESPECIFIC NOT NULL,
location VARCHAR(256) CHARACTER SET UNICODE NOT CASESPECIFIC NOT NULL,
UNIQUE ( deptno ))
;

 

The CreateTable object only represents the Create Table DDL that will be sent to the database to be executed. It doesn't trigger the creation of dept. Since dept is associated to the MetaData object meta we can create the table using create_all which will trigger the creation of all tables associated with meta.

 

# call create method on all tables in meta
meta.create_all()

 

As the name suggests, create_all will try to create the tables that are associated with the MetaData object. By default, it first checks whether the table already exists in the database. Table creation done in this way respects table dependencies and foreign key relationships so that tables are created in the right order. Using the meta.drop_all method behaves similarly but drops tables instead. The table can also be created explicitly like so:

 

 

ept.create(bind=td_engine)

 

You can think of the MetaData object as a collection of tables. There are ways to access the MetaData object associated with a table and vice versa:

 

# list of tables associated with meta
assert dept in meta.sorted_tables

# MetaData object associated with Table
assert dept.metadata == meta

# Retrieve tables contained in meta, the key is given by the ```Table.key``` attribute
assert meta.tables[dept.key] == dept

 

# table methods and attributes to deal with the data that it contains.
print(dept.c.keys())
print(dept.c.values())
assert dept.c.deptno == dept.c['deptno']

['deptno',
'name',
'location']

[Column('deptno', Integer(), table=<Department>, nullable=False),
Column('name', Unicode(length=256), table=<Department>, nullable=False),
Column('location', Unicode(length=256), table=<Department>, nullable=False)]

 

We specified a particular engine to get the DDL specific to a particular database backend. It is possible to use other database engines in order to generate the DDL specific to that database. In fact, SQLAlchemy has support for databases included in the SQLAlchemy library itself. Since python has SQLite built in, we can use that as an example.

 

# start a sqlite database in our current directory
from sqlite3 import dbapi2 as sqlite
sqlite_engine = create_engine('sqlite+pysqlite:///sqlalchemy_example.db', module=sqlite)

 

The concept of reusing SQLAlchemy objects with different database engines is particularly useful when we want to work with multiple databases at a time. We can keep the same python code and just switch out the underlying engine to get different database specific output.

 

# process table with different database backend
for e in [td_engine, sqlite_engine]:
print(e.name)
show_create_table(dept, e)

'teradata'

CREATE TABLE "Department" (
deptno INTEGER NOT NULL,
name VARCHAR(256) CHAR SET UNICODE NOT NULL,
location VARCHAR(256) CHAR SET UNICODE NOT NULL,
UNIQUE (deptno)
)

'sqlite'

CREATE TABLE "Department" (
deptno INTEGER NOT NULL,
name VARCHAR(256) NOT NULL,
location VARCHAR(256) NOT NULL,
UNIQUE (deptno)
)

 

The table we defined is pretty generic with respect to the types of its columns and constraints. Other database dialects might provide types that aren't compatible with other database backends. In fact, some constructs are dialect specific so they won't be able to be reused across different databases or even dialects of the same database. This usually results in compilation failures in other backends at the SQLAlchemy level.

To extend our example, we'll create a simplified Employee table more specific to Teradata's DDL syntax.

 

#create the table in sqlite
dept.create(bind=sqlite_engine)

 

from sqlalchemy.schema import ForeignKey
from sqlalchemy_teradata import VARCHAR as Varchar
from sqlalchemy_teradata.compiler import TDCreateTablePost as post
#Table with generic types
emp = Table ('Employee', meta,
Column('empno', Integer, nullable=False),
Column('name', Unicode(256), nullable=False),
Column('deptno', Integer, ForeignKey('Department.deptno'), nullable=False),
prefixes=['set'],
teradata_post_create=post().primary_index(unique=True, cols=['empno'])
)

 

show_create_table(emp, td_engine)

CREATE set TABLE "Employee"  (
empno INTEGER NOT NULL,
name VARCHAR(256) CHAR SET UNICODE NOT NULL,
deptno INTEGER NOT NULL,
FOREIGN KEY(deptno) REFERENCES "Department" (deptno)
)
unique primary index( empno )

 

# create emp table using bound meta
emp.create()

 

There are constructs that are dialect specific being used to create this table. Lets check out the create ddl for the table in bteq:

bteq> show table employee
CREATE SET TABLE SQLALC_USER.employee ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
empno INTEGER NOT NULL,
name VARCHAR(256) CHARACTER SET UNICODE NOT CASESPECIFIC NOT NULL,
deptno INTEGER NOT NULL,
FOREIGN KEY ( deptno ) REFERENCES sqlalc_user.Department ( deptno ))
UNIQUE PRIMARY INDEX ( empno );
;

The teradata_postfixes variable is a dialect specific keyword used to add create table options after the table name but before the left parentheses in the order they are provided. Examples include options like FALLBACK, JOURNALING, etc. The arguments to this keyword get processed in TeradataDDLCompiler.postfix which is in the compiler module in sqlalchemy-teradata.

The prefixes variable allows you to add any prefix statements such as MULTISET or VOLATILE. This variable is a built-in keyword already provided in SQLAlchemy. The strings provided are appended in the order they are given.

The teradata_post_create variable is a dialect keyword used to add syntax elements corresponding to PRIMARY INDEXES, AMPS, PARTITION BY, etc. The arguments to this keyword get processed in TeradataDDLCompiler.post_create_table.

Currently, there are classes in the compiler.py module that provide ways to populate the dialect specific keywords. TDCreateTablePost is used to populate the teradata_post_create keyword. It is a generative object that allows you to chain various options together to build the DDL. Using these extensions are optional but they provide some examples when trying to understand the compilation process. You can provide your own object to teradata_post_create and add handler methods in to access it in the post_create_table method. Similarly for teradata_postfixes

 


Dialects

Many databases have their own implementations of the SQL standard or create their own extensions to the standard. Various DBAPI implementations can also vary. SQLAlchemy abstracts all of this using what is called a Dialect. A Dialect is an abstract class that provides behaviors controlling how SQL gets created and executed for a specific database. The TeradataDialect implements Dialect.

Teradata Dialect

The dialect is used by the sqlalchemy.engine to decide how to compile python constructs down to the actual query to execute. It also tells the engine what Connection implementation to use. The dialect sits in between the Engine and the DBAPI Connection implementation. A Connection created on behalf of the Engine will refer to this Dialect to handle SQL rendering and other behaviors such as preparing identifiers (like table and column names) and handling the context of an execution from start to finish. SQL rendering is handled largely by the compiler classes.


Compilers

Different dialects will typically have their own implementations of the compilers offered by SQLAlchemy. The sqlalchemy-teradata package contains implementations for various compilers to handle SQL rendering. These classes have methods that process and generate SQL based on the SQLAlchemy objects they are given and the information contained in those objects. The compilers implemented are in the figure above. The implementation consists of overriding various visit methods. These methods are called during points in the compilation process until the object is encoded into SQL. Thus database-specific SQL rendering is typically handled and isolated in these methods.


Types

SQLAlchemy types provide a mapping from python types to database types (and vice versa). As far as types go, SQLAlchemy provides generic types. Generic types are common enough to work across different dialects. Different dialects will create their own types or override the compilation of an existing type (or even both). The sqlalchemy-teradata package supports all the generic types provided by SQLAlchemy 1.0.10.

 


Data Manipulation Language

Once a table is created we can start interacting with it! Empty tables are not very interesting. Insert, update, and delete statements can be built using functions from the core expression language or using built-in methods given by the target Table. Both ways are equivalent. Once the statement is built, we use our Connection to execute the statement.

 

def show_dml(stmt):
print(stmt)
print(stmt.compile().params)

 

from sqlalchemy import insert, update, delete

#build the insert dml
ins_stmt = insert(dept).values(deptno = 1, name = 'Research', location='San Diego')
show_dml(ins_stmt)

#execute the insert statement
td_engine.execute(ins_stmt)

INSERT INTO "Department" (deptno, name, location) VALUES (?, ?, ?)
{'deptno': 1, 'name': 'Research', 'location': 'San Diego'}

 

from sqlalchemy import select

# select from the table
sel_stmt = select([dept])
print(sel_stmt)
res = td_engine.execute(sel_stmt)
print(res.fetchone())

SELECT "Department".deptno, "Department".name, "Department".location 
FROM "Department"
(Decimal('1'), u'Research', u'San Diego')

 

Python objects provided by SQLAlchemy represent portions of a SQL expression. We've seen this with the CreateTable object. Now we are using select, insert, update, and delete functions, which are part of the SQL Expression Language, to render DML statements. They create objects that can be built dynamically, adding constraints, and changing values.

 

from teradata.datatypes import Interval
# operations can also be done in a table-centric manner
sel_dept = dept.select().where(dept.c.name == 'Research & Analytics')
sel_emp = emp.select()

ins = emp.insert().values(empno=1, name='Mark', deptno=1)
upd = dept.update().where(dept.c.deptno==1).values(name='Research & Analytics')

 

show_dml(sel_dept)

SELECT "Department".deptno, "Department".name, "Department".location 
FROM "Department"
WHERE "Department".name = ?
{u'name_1': 'Research & Analytics'}

 

show_dml(sel_emp)

SELECT "Employee".empno, "Employee".name, "Employee".deptno 
FROM "Employee"
{}

 

show_dml(ins)

INSERT INTO "Employee" (empno, name, deptno) VALUES (?, ?, ?)
{'deptno': 1, 'name': 'Mark', 'empno': 1}

 

show_dml(upd)

UPDATE "Department" SET name=? WHERE "Department".deptno = ?
{u'deptno_1': 1, 'name': 'Research & Analytics'}

 

The DML statement are built by calling the appropriate Table method which will return a sqlalchemy.sql.expression object that expresses SQL DML against that table. For example, the Insert expression object is created using the Tables insert method or by using the insert function. It returns an expression.Insert object that represents the insert statement to execute. We see the DML generated by printing ins_stmt, and along with the params by compiling the object and calling the params attribute. Then we actually execute it using the connection.

 

conn = td_engine.connect()

conn.execute(ins)
res = conn.execute(sel_emp)
print (res.fetchall())

conn.execute(upd)
res = conn.execute(sel_dept)
print(res.fetchall())

[(Decimal('1'), u'Mark', Decimal('1'))]
[(Decimal('1'), u'Research & Analytics', u'San Diego')]

 

Typically we'll want to insert more than one row at a time. To do that we pass a list of dictionaries to the Connection object we get from our engine. Then we call execute on it.

 

emp_data = [{'empno': x[0], 'name':x[1] , 'deptno': 1} for x in {2:'Michael', 3:'Alan', 4:'Scott'}.items()]
td_conn = td_engine.connect()
ins_stmt = emp.insert()
td_conn.execute(ins_stmt, emp_data)

 

 

 

sel_stmt = emp.select()
for row in sel_stmt.execute():
print row

(Decimal('3'), u'Alan', Decimal('1'))
(Decimal('4'), u'Scott', Decimal('1'))
(Decimal('1'), u'Mark', Decimal('1'))
(Decimal('2'), u'Michael', Decimal('1'))

 

 You should check out the docs for inserts, selects, and updates and deletes to find out more.

 


Table Reflection

 

MetaData objects can be created without being bound to an engine. This can be useful if you want to load tables into a SQLAlchemy Table from a database, bind it to a specific database engine, then perform operations on that table. This is called 'reflection'.

 

new_meta=MetaData() #create an unbound metadata object

 

# create a SQLite table
from sqlalchemy import Date
movs = Table('Movies', new_meta,
Column('title', Unicode(256), nullable=False),
Column('id', Integer, primary_key=True, nullable=False),
Column('director', Unicode(256)),
Column('release_date', Date, nullable=False),
Column('genre', String(128), nullable=False)
)
movs.create(bind=sqlite_engine)

 

#reflect table from sqlite
movies = Table('Movies', new_meta, autoload=True, autoload_with=sqlite_engine)

# show create table ddl in sqlite
show_create_table(movies, sqlite_engine)

CREATE TABLE "Movies" (
title VARCHAR(256) NOT NULL,
id INTEGER NOT NULL,
director VARCHAR(256),
release_date DATE NOT NULL,
genre VARCHAR(128) NOT NULL,
PRIMARY KEY (id)
)

 

We reflect a table called 'Movies' from the sqlite database using the autoload and autoload_with arguments. The autoload variable enables reflection. The autoload_with variable specifies the target database via the engine instance we created. Once the table is loaded into SQLAlchemy, the dialect provided by the engine will have processed it so that we can do all the usual things such as create it inside the Teradata database.

 

#create table in teradata
new_meta.create_all(td_engine)

# query the td database with tours table using select
show_create_table(movies, td_engine)

CREATE TABLE "Movies"  (
"title" VARCHAR(256) CHAR SET UNICODE NOT NULL,
id INTEGER NOT NULL,
director VARCHAR(256) CHAR SET UNICODE,
release_date DATE NOT NULL,
genre VARCHAR(128) NOT NULL,
PRIMARY KEY (id)
)

 

Reflecting objects in the database allows you to load information on that object from a database schema. Depending on how the object is defined, it is possible to simply load it into other databases. If the object is a table that has dependencies on other tables in the database, those tables will also be loaded into the MetaData object. In addition it is possible to override the column definitions in the reflected table.

 


Contributing and Next Steps

The sqlalchemy-teradata library is still an ongoing project. Contributions on github are welcome!

I suggest you follow along these tutorials here connecting to a Teradata database instead:

The SQLAlchemy Core tutorial builds upon this article and gives a more complete guide to the SQL expression language. The ORM tutorial can be done independently and is focused on object mappings to tables as well as session management. There is an article by Mike Bayer, the creator of SQLAlchemy, that is also enlightening: SQLAlchemy-AOSA.

1 REPLY
Enthusiast

Re: Teradata & SQLAlchemy: Introduction

Ran through the examples, going to try play around with the actual ORM piece soon hopefully.  Looks good so far.

Tags (1)