Metadata is a much over-looked requirement of data warehousing, primarily since it's not straightforward; though this difficulty can be eased by use of appropriate tools, such as Ab Initio's EME. The following article looks at the various types of metadata, and the requirements and objectives each meets, then considers some benefits of, and methods for, using a) a metadata tool and b) EME, specifically.
Metadata are critical to a successful data warehouse, constituting the repository for technical and business information about the warehouse.
Metadata has three primary major goals:
Manage the components necessary to provide an understanding of the DW and the enterprise through its content.
Build and Administer:
Manage the components necessary to efficiently define, build and operate the DW. DW metadata includes the structure and contents of the operational data that is used to execute DW jobs and perform the audit, balance and control aspects of the DW environment. This goal requires well-defined metadata about both data and processes.
Manage the components necessary to enable self-service use of the DW that is efficient, accurate and repeatable.
The diagram below represents how and end-to-end (E2E) solution should integrate and present data, in the most usable way. Best practice is to store and view metadata in one place.
End-to-end implementation of the above metadata framework can be achieved using the Ab Initio EME / EMMS solution.
This solution integrates both technical and business metadata and enables you to grasp the entirety of your data processing — from operations to analytical systems.
Meta model implementation
The standard Ab Initio EME meta model needs to be expanded and created in the Teradata database; that way we can have one central data store for all metadata in the DWH platform.
An example metadata model to support this is shown below.
The base schema version of the Ab Initio EME needs to be expanded based on the above model, in order to support all business requirement of the DW platform.
The diagram below shows the base schema that is provided with Ab Initio EME.
In general terms the meta model should be able to provide the following information:
Additionally, best practice requires that the meta model should have a wider scope than the current DW, since DW projects can grow rapidly.
Extending the EME Base model will provide the opportunity to leverage all technologies, and create dependencies between them. At this stage Teradata see the following sources of metadata information:
All of the above sources will have to be loaded and stored in the metadata repository, with all dependencies and relationships to each other. Additionally Ab Initio can parse Teradata SQL or BTEQ scripts automatically when they are executed using the “Execute SQL” component, which allows the use of Ab Initio as a primary execution engine for an 'extract, transform, load and transform' (ETLT) approach. Using EME, this does not require any manual process in terms of metadata integration, as they will be parsed automatically by the Ab Initio component.
This capability of Ab Initio will allow automatic maintenance and understanding of Teradata SQL jobs, in terms of what is happening with the data.
Dependency analysis is supported through SQL SELECT statements back to the original source field; a dataset object representing the original source table must be present in the data store. In order to perform dependency analysis on SELECT statements, the EME must capture a snapshot of the database schema for each of the tables that are referenced.
For example, consider the following SQL statement:
FROM customers, orders
WHERE customers.id = orders.customer_id;
To perform dependency analysis for this query, the EME has to know the names and data types for all columns in the CUSTOMERS and ORDERS tables. The GDE captures this information when the output DML for the SELECT statement is generated. For situations in which metadata was imported from the database through a connector graph, the development team will have to create dataset definitions for database tables. This can be done by use of the appropriate connector graph in the $AB_HOME/connectors directory of the GDE Component Organizer. You must set up each connector graph so that it loads tables and columns. Loading Keys and Indexes is optional. If you load Keys, they will be profiled when you run a profile job using the EME objects that you generate from the dataset definition. Note, however, that the Data Profiler ignores Indexes.
When the graph is analysed, this information is used to create table datasets in the data store catalogue. Analysis of the SQL statement by the EME then allows the GDE to link columns in the table datasets with fields in the output record format of the INPUT TABLE component.
Additionally we should load the following data from their sources, to support the end-to-end metadata approach
This portfolio of information will provide all data that are needed to support data lineage requirements.
Ab Initio EME use
Ab Initio EME can be logically divided into:
The DW project will have to define an Ab Initio EME extended schema which will be stored in the Ab Initio system, but since a particular implementation may be using several different technologies for ETLT, Business Intelligence and Business Glossary, the physical database representation will need to be maintained as well. This is why, in the above section, Teradata is proposing two expansions of the metadata schemas: Ab Initio base, plus additional expansion.
These two models will need to be synchronised, so that installations can leverage the reporting capabilities of the EME solution.
Ab Initio EME user access (Desktop Portal and Web Interface)
The Ab Initio EME Web Interface is a user application for browsing, editing, and mapping metadata in an EME data store. This interface is suitable for metadata architects and others responsible for managing metadata within DW organisation (for example data stewards).
The image below gives an example of how the web interface can present metadata related to database objects:
The EME Desktop Portal provides all the functionality of the EME Web Interface to Enterprise Metadata (using different interface), as well as the following additional futures:
The following diagram illustrates how users interact with an EME data store’s metadata: