XML Document Loading

Extensibility
Extensibility covers the mechanisms by which you, as the user or developer, can extend the functionality of the Teradata Database, for example with the use of User Defined Functions, or UDFs.
Teradata Employee

XML Document Loading

Teradata XML Services (TXS) extends the database’s XML capabilities focusing on transforming data between its XML and relational representations. Two key components of XML Services are:

  • XML Shredding, the process of extracting data from XML documents and loading it into the database
  • XML Publishing, the process of transforming the results of SQL queries into XML documents.

In addition to these, TXS also includes a number of functions for common XML processing such as parsing, validation, XSL (XML Stylesheet Language) transformations and XPath queries. Also included in the package are the Apache Xerces (XML Parser) and Xalan (XSLT Transformer) libraries. If users find that the XML processing functionality they are looking for is not included out of the box, they can write their own UDFs utilizing these libraries.

This post focuses on loading XML documents and their content into the database. Documents can be stored in their original string representation, or the data contained in the documents can be shred into target database tables, or you can do both.

XML Documents as character data

Documents are stored in their string representation when it is important to preserve them verbatim (for example for audit, non-repudiation etc), or when the documents are always retrieved as a whole. In these cases, the documents will be stored into a column with a character data type (CHAR, VARCHAR, CLOB) column. If the document can be stored to a VARCHAR typed column (document size limit permitting), any of the Teradata load utilities can be used to load it.

If the document needs to be stored as a CLOB, BTEQ (v12.00.00.04 or later) can be used as follows:

create table STAGE (id VARCHAR(10), xml CLOB)
import vartext ‘|’ LOBCOLS=1 file = ‘load.txt’
. repeat *
using (a clob as deferred, b varchar(32))
insert into STAGE values (:b, :a);

 where the contents of the file load.txt would be something along the lines of:

doc1.xml|ID1
doc2.xml|ID2

Teradata SQL Assistant can also be used for loading XML documents into CLOB columns. When working with CLOBs, first enable Large Object support (From the ODBC Datasource Administrator, select the Teradata driver and click the configure button, then click options and check the checkbox "Use Native Large Object Support".) To insert an XML document into a CLOB column, select the File->Import Data menu option and execute the SQL insert statement for inserting data into the table with the CLOB column as follows:

insert into STAGE values (“someid”, ?C)

A pop-up window will let you select a file. This file should contain the path to the XML document you want to load.

Shredding

To extract data from XML documents and insert into database tables, TXS provides a number of variants of shredding procedures.

Simple Shredding (XPATH_SHRED)

The simplest use case is one in which the XML structure is simple, the data from the XML document is to be inserted without transformation into a single database table. For this kind of simple use case, XPath based shredding is appropriate. Consider a sample XML document shown below:

<CustomerList>
<Customer id=”1001”>
<CustomerName>John Doe</CustomerName>
<Address1>101 Maple Street</Address1>
<City>San Diego</City>
<State>CA</State>
<ZipCode>11111</ZipCode>
</Customer>
<Customer id=”2002”>
<CustomerName>Jane Doe</CustomerName>
<Address1>101 Maple Street</Address1>
<City>San Diego</City>
<State>CA</State>
<ZipCode>11111</ZipCode>
</Customer>


</CustomerList>

This is a simple structure which represents a list of customers. If the intent is to extract these customer records and populate a table with that data, a call to the XPATH_SHRED table function will be appropriate. The parameters to XPATH_SHRED include

  • The row path expression: a path identifying the repeating element in the XML structure (the Customer element). One row is created for each Customer element in the
  • The column path expressions: a group of paths identifying each of the columns of the rows being generated (id. CustomerName, Address1, City, State, ZipCode).
select x.ID, x.C1, x.C2, x.C3, x.C4
from stage, table(sysxml.XMLXPATH_SHRED0(
stage.a, stage.b,
'',
'/CustomerList/Customer' ,
'./@id {1}, ./CustomerName/text() {2}, ./Address1/text() {3}, ./City/text() {4}')) x
where stage.a = x.Id

The result of this query for the document given above will be

1 2002 Jane Doe 101 Maple Street San Diego
1 1001 John Doe 101 Maple Street San Diego

Shredding Workflow

For any shredding other than the simplest form described above, the developer has to first design the mapping between the XML and the relational structures and then registers this mapping with the database. The user next invokes the shredding operation by calling one of the shredding stored procedures. The shredding sps are classified into two kinds based on the shredding mode:

  1. Object Mode (a single document shredded at a time). Examples include SHRED_LOB, SHRED_VC
  2. Mini-batch Mode (a group of documents are shredded at a time and set based operations are used to apply database updates). Example include SHRED_BATCH0, AS_SHRED_BATCH

Mapping Definition

The mapping definition is stored either as an XSLT stylesheet document or an XML Schema document. Mappings are defined either by manually editing the schema or stylesheet document, or visually using a stylesheet mapping tools (Teradata’s Eclipse Plugin, Data Direct’s Stylus Studio and Axizon’s Tiger are currently supported). See the Orange Book for more details of how to define a mapping. The mapping is then registered with the database using the SP_REGISTER_SCHEMA or SP_REGISTER_STYLESHEET stored procedure.

Run Time

Once a mapping is registered, it can be uniquely identified by a registered URI. In most cases, users will load the XML document(s) into a staging table and then invoke a shredding stored procedure to shred the documents into target tables. The choice of the shredding SP will be influenced by the characteristics of the documents being shred and the database version. For instance, if the documents being shred are large (10s of Megabytes up to 2GB), AS_SHRED_BATCH (the streamed shredding procedure) is the way to go. This SP is available on TD12 and later. For prior database versions SHRED_BATCH in conjunction with the XMLSPLIT table function would be appropriate (please see Orange Book for details). Here is a sample call to one of the batch shredding stored procedures. The input parameters are:

  • A SQL query that makes the input documents available for shredding
  • The mapping definition
  • Username/password for user account with requisite permissions for updating the target tables

A sample call to the streamed shredding procedure would be:

call sysxml.AS_SHRED_BATCH ('select id, xmlDoc as res from stage', 'shredMapping', '', 'user' ,'password', resultcode);

Teradata XML Services provides a number of options for loading XML documents and their contents into the Teradata database. Users should choose the option most suitable for their XML document structure and size.