In July 2015 Eric shared on http://developer.teradata.com/tools/reference/teradata-python-module insights on the Teradata Python Module which is supporting the devops approach. The article above is describing the main functions and is showing basic examples how to use them. In the meantime, the discussions clarified some questions and also revealed some minor bugs which had been fixed instantly.
When reading the article, it immediately got my interest but as so often the examples are not showing all the details and how the different aspects work together. Therefore, I decided to give it a try and started to develop a small application. Goal was to come up with a functionality currently not available in Teradata and which would be also not so easy to implement in pure SQL or Stored Procedures.
I choose the pivot challenge, e.g. convert rows to columns. The unpivot function is available as a table function but currently I am not aware of a pivot function. It is quite clear what need to be done in SQL for a specific pivot transformation but the challenge is to come up with a generic (e.g. parameter driven) solution in SQL.
The outcome of the PoC development confirmed my initial thoughts and I would hope that this approach is picked up at many customer sites to replace current BTEQ scripts. The approach would also allow to convert many stored procedures or macros into small generic programs. It might also allow a better process integration of complex transformations into existing ETL tools.
The solution I came up with is available on githup at https://github.com/d2k/TDPythonPivot. The solution uses the REST interface to communicate with the DB. Your can either download the zip file or clone the repository. The code is shared under the MIT license.
The process needs two config files (in fact 3 but the udaexec.ini I never touched).
The application comes with 3 demo parameter settings:
To avoid duplicate row errors and to allow linking process meta data to the data, the table contains two technical columns: the application and the run_id – which can be linked to the DBQL queryband information.
The devops approach really worked out well. I started with a pure SQL generation approach. After finishing that I added the create table functionality and finally added the Insert/Select function. All driven via parameter settings. Changes are tracked in git and commit history had been maintained (not all in github).
Refactoring can be done easily which improved the code over time.
The parameter driven approach worked well. Both required config files will be handled by the module and do not require different handling.
So again, I think this is the way to go ;-)
Comments are welcome
To Do list: adding possibility to allow different null replace values in case of column lists are given to be denormalized.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.