XML Query

Extensibility
Extensibility covers the mechanisms by which you, as the user or developer, can extend the functionality of the Teradata Database, for example with the use of User Defined Functions, or UDFs.
Highlighted
Teradata Employee

XML Query

Querying XML

Teradata Database v14.10 implements the XML type as a means for representing XML values in the context of the SQL query language. To support querying such values, TD14.10 also implements the XQuery query language, version 1.0. XQuery is a standard for an XML Query Language, defined by W3C. The normative specification of XQuery can be found at W3C (http://www.w3.org/TR/xquery/). This article will provide a quick introduction to XQuery for the sake of completeness, but tutorials on XQuery can be found elsewhere on the web (for example, http://www.w3schools.com/xquery/).

A Quick Introduction to XQuery

Let’s start with a simple example. Continuing with use case from the first article in this series, the sample queries in this article will work against documents that look like the following:

?xml version="1.0" encoding="UTF-8"?>

<Customers>

  <Customer C_CUSTKEY="11001" C_NAME="LCDs R Us">

    <C_ADDRESS>10001 Cary Parkway</C_ADDRESS>

    <C_NATIONKEY>21</C_NATIONKEY>

    <C_PHONE>31-389-986-4741</C_PHONE>

    <C_ACCTBAL>7140.81</C_ACCTBAL>

    <C_MKTSEGMENT>Electronics</C_MKTSEGMENT>

    <C_COMMENT>Manufacturer of Liquid Crystal Displays</C_COMMENT>

  </Customer>

  <Customer C_CUSTKEY="21001" C_NAME="Chips Electronics">

    <C_ADDRESS>2101 Stevens Creek Pkwy</C_ADDRESS>

    <C_NATIONKEY>21</C_NATIONKEY>

    <C_PHONE>31-389-986-4741</C_PHONE>

    <C_ACCTBAL>7140.81</C_ACCTBAL>

    <C_MKTSEGMENT>Electronics</C_MKTSEGMENT>

    <C_COMMENT>E-Commerce website owner</C_COMMENT>

  </Customer>

</Customers>

As in first article in this series, we will assume that these sample documents are loaded into a table named T_XMLDOCS, which is defined as below:

CREATE TABLE T_XMLDOCS(

    ID     VARCHAR(30),

    XMLDOC XML

);

Path Expressions

We will start with a simple query that retrieves all the “Customer” elements in these documents. We can use the XMLEXTRACT function to achieve that.

SELECT id, xmldoc.XMLEXTRACT('/Customers/Customer', NULL)

FROM T_XMLDOCS;

The XQuery query string in this example is “/Customers/Customer”. This is an example of a “Path Expression”, a type of XQuery expression that uses a path notation to traverse down an XML document tree to find nodes of interest within that tree. In this case, we are asking that we start at the root of the tree, traverse to the Customers element, and down from that element to the Customer element, and return all such elements that we encounter.

Predicates

We can get more selective about the elements we retrieve by specifying a “predicate”. A predicate is a condition we can impose at any stage of the traversal described above, to select paths on which to traverse. Say we are interested only in the Customers who are in the “FINANCE” segment, we can execute the following query:

SELECT id, xmldoc.XMLEXTRACT('/Customers/Customer[C_MKTSEGMENT = "FINANCE"]', NULL)

FROM T_XMLDOCS;

The portion of the query in the square brackets ([C_MKTSEGMENT = “FINANCE]) is the predicate. It specifies that the Customer elements we are interested in are the ones which have a child element named C_MKTSEGMENT with a text node under with a value of “FINANCE”.

A positional predicate can be used to select a path based on the position of a node. For instance, if we are interested in getting the first Customer element under the “Customers” element in each document, the query would look as following:

SELECT id, xmldoc.XMLEXTRACT('/Customers/Customer[1]', NULL)

FROM T_XMLDOCS;

The FLWOR expression

This is one of the more popular and versatile type of expression in XQuery and merits a mention here. The FLWOR acronym results from the various keywords that are part of the syntax of this expression (for, let, where, order by, return). Say we wanted to return the Customers who have an Account Balance greater than $1000, sorted in descending order of the balance. We can achieve it as follows:

SELECT id, xmldoc.XMLEXTRACT(

'for $cust in /Customers/Customer

 where $cust/C_ACCTBAL > 1000

 order by $cust/C_ACCTBAL descending

 return $cust', NULL)

FROM T_XMLDOCS;

Node Constructors

XQuery provides mechanisms for creating different types of XML nodes (element, attributes, text nodes, comments etc.). Element Constructors are used for creating new elements. The following example wraps the entire results of the previous query under a new root level node named “HighValueCustomers”.

SELECT id, xmldoc.XMLEXTRACT(

'<HighValueCustomers>

 {for $cust in /Customers/Customer

 where $cust/C_ACCTBAL > 1000

 order by $cust/C_ACCTBAL descending

 return $cust}

 </HighValueCustomers>', NULL)

FROM T_XMLDOCS;

This would be a way to make an XQuery query return a well-formed XML value (as opposed to the previous query which returned a sequence of Customer elements which will be a well-formed XML value – i.e., if you try to parse the string representation of the result, the XML parser will return an error, in this case indicating that you cannot have more than one element at the root level).

The XMLQUERY Function

All the examples so far use the XMLEXTRACT function, but the SQL/XML specification defines another function named XMLQUERY which provides other options for XQuery evaluation. The XMLQUERY function takes an XQuery query, and zero or more parameters to the XQuery query, and returns the results of the evaluation of that query. The fact that this function accepts zero or more parameters means that the query can act on more than one XML value (as opposed to XMLEXTRACT which only acts on the current XML instance). It also means that the XQuery query can be parameterized with other values that can be passed as part of the SQL query. The last example can be rewritten to use the XMLQUERY function as follows:

SELECT id, XMLQUERY('<HighValueCustomers>{ 
 for $cust in $inputdoc/Customers/Customer
 where $cust/C_ACCTBAL > 1000
 order by $cust/C_ACCTBAL descending
 return $cust
 }</HighValueCustomers>' PASSING BY VALUE xmldoc as inputdoc)
FROM T_XMLDOCS;

Note the clause PASSING BY VALUE xmldoc as inputdoc which passes the value in the xmldoc column as a parameter named inputdoc to the query. The query in turn refers to this parameter as $inputdoc in the path expression $inputdoc/Customers/Customer.

Casting XQuery results to other SQL Data Types

Both the XMLEXTRACT method and the XMLQUERY function return values of XML type. The XML type implements casts to various SQL data types like VARCHAR, INTEGER, FLOAT, DATE etc., which makes it easy to extract values from these results. Say we are interested in retrieving the account balances of the first customer in each document, the following query could be used:

SELECT id, CAST(xmldoc.XMLEXTRACT('/Customers/Customer[1]/C_ACCTBAL', NULL) AS DECIMAL(10,2)) as acctbal

FROM T_XMLDOCS;

 One caveat specific to casts to string data types (e.g. VARCHAR or CLOB) is that such a cast results in a string which is the concatenation of all the text nodes under that node while performing in an inorder traversal. For example, the following query:

SESELECT CAST(new XML('<a><b>b</b><x>xyz</x><c>c</c></a>') AS VARCHAR(32) CHARACTER SET UNICODE) as strcast;

will return the string “b xyz c”.

Memory Usage and (Semi) Streaming

XQuery cannot be evaluated in a single pass over a document. This requires that the document be loaded into memory during evaluation. With the limitations resulting from the dbscontrol parameter XMLMemoryLimit (described in the first article of this series), this is often a problem for large XML documents. To remedy this, we introduced an additional parameter to the XMLEXTRACT function to allow us to stream to an extent. XML documents often get large due to a repeating structure in the XML document. For instance, in our example documents, the “Customer” element is the repeating structure. If we had a large document that contained millions of customer records, we will have a very large document which cannot be loaded into memory within the memory constraints of the XMLMemoryLimit setting. In that case, we can specify a filter expression for the XMLEXTRACT method which identifies the repeating structure. Say we wanted to retrieve the Phone numbers of all our finance customers from a large document that contains millions of customer records:

SESELECT id, xmldoc.XMLEXTRACT('/Customers/Customer', '.[C_MKTSEGMENT = "FINANCE"]/C_PHONE', NULL)
FROM T_XMLDOCS;

 

This article provided an overview of XQuery. In addition, a number of resources including books and tutorials on the web are available for learning the XQuery language. looked at some examples of evaluating XQuery queries in the Teradata Database against XML Type instances. XQuery is also used in other functions like XMLTABLE which will discussed in subsequent articles on XML Shredding.