Looking for guidance on building a Teradata warehouse from scratch.


Looking for guidance on building a Teradata warehouse from scratch.

We have a shiny new Teradata appliance that we're about to embark on developing a new data warehouse with (FSLDM format). We have DataStage to land the data into Acquisition but I can find little information on best practices for setting up my Agile team to succeed when it comes to the in-database ELT processes e.g.

  • is it better to use GCFR with hand-coded transform views or use a tool like Wherescape RED?
  • if we stick with GCFR, is there a good tool for simplifying the build of the transform views?
  • how do we set up continuous integration and automated unit testing?

Re: Looking for guidance on building a Teradata warehouse from scratch.

Hi rorydph,

Agile methodologies are quite suitable for EDW projects (and nearly all software development projects). Defining the MVP can be challenging before you've completed LDM selection and the source mapping of your major systems. I'm assuming that your project is related to financial services, so the mapping of your Customer Information System and at least one Product System is probably the bare minimum...even that may not really be sufficient.

The Good news is you're using FSLDM and Agile. That's going to make things much easier.

Now, to your questions:

1. GCFR (formally TCF) will take care of business dates, job runs, re-runs, restarts, etc. This alone will save you a great deal of time and effort (and $$$) as you can't really have a production EDW without a control framework in place. Personally, I'm a little old-school and have a preference for hand-written  transform code using standard wrappers, templates and interfaces with GCFR than an off-the-shelf ETL/ELT tool. That being said, Wherescape RED is an excellent tool (one of the best, IMHO) as it persists all transformation logic within the TD system itself. RED code is stored as standard SQL within Stored Procs. A tool like RED will be of value to you, especially if you've enaged lower-cost outsourced resources that probably don't have the technical or communication skills you thought they would. In summary, either hand-written SQL or RED is suitable for use with GCFR - depending on the skills of your team.

2. I may have already answered this one; it doesn't really matter what control framework you use (buy or build) you're going to have to implement custom transformation logic in one form or another. Best practice is leaning toward SQL stored as Stored Procs OR Views in the EDW, executed by some control framework or another. Larger and more complex transformations will benefit from having multiple steps, temporary tables, etc. These workflows may be better suited to multi-step, Stored Proc executed logic. Simple tranforms are suitable to views. It all depends on the complexity of the transforms and volume of data. That being said, integration with GCFR is quite straightforward, assuming the tool you use (if using a tool) can call GCFR objects appropriately. There's no getting around the discovery and requirements definition for the transformations required to map your source systems into FSLDM (or any model). This process takes time and requires SMEs from your source system teams as well as a Data Modeler with FS/FSLDM experience.

3. Continuous integration and automated testing are key foundations of Agile methodologies. You will need to build the appropriate logic (scripts/code) to vaidate all checked-in code each day (or each cycle) against your control framework, wrappers, interfaces, protocols, etc - any technological dependencies that exist in your environment. The objective here is not to test the logic of what's been written, but rather that it compiles, calls standard interfaces properly, etc. Building this framework will take a little time, but it will be well worth it. Once complete, you'll be able to integrate these functions with your Agile tools to faciliate state/readiness visibility across your team(s). 

Good luck with your project,