New Aster Text Capabilities

Learn Data Science
Teradata Employee

Our world is changing, but our reliance on documents seems to only be increasing. Whether a relic of the pen and paper days, the dried ink of the typewriter age, or the magnetic decay of electrons of the modern word processor, the document is not dead[1].

As an artifact, the document serves many purposes: an historical, unchanging account. Medical records, business contracts, insurance claims. While the medium may have changed, the uses have stayed the same. On the other hand, as a technology, the document has only created problems: non-standardized, proprietary formats, lost formats, structured and unstructured containers. Technology advances while the issues remain.

Being able to get at these historical records has become big business. Increased regulatory compliance in every industry has created a need for document warehousing and retrieval, but the man hours spent doing manual processing is unnecessary. We have the technology, let’s make use of it.

At Teradata, we recognize the importance of being able to deliver a solution to address this need. Our customers are some of the largest in their industry and are faced with the threat of imminent regulatory checks daily; therefore, we have developed a new, custom solution to address those needs.

On the one hand, we’ve created a function for Aster that can read text from nearly 300 formats including office documents, images, PDFs, code files, and even metadata in media formats. A simple SQL SELECT statement is all you need:

SELECT * FROM document_from_afs(

    ON (SELECT 1)

    PATH ('/home/docs/')


And you’re returned a row with the text extracted for each document:


If you have a corpus of scanned documents, we’ve also introduced a custom OCR solution for reading text out of images, putting a structure on it, and then performing our Aster analytics on the output:


Imagine: being able to parse out customer details, claims information, contracts between businesses and individuals from those stacks of papers, storing them on our redundant and distributed file system or loading them into a table, being able to tie them to the same customer record in your Enterprise Data Warehouse to get a more complete picture of your customer, and then using our suite of analytics to get the most complete view of your customer you’ve ever had.

This is the power of the new document capabilities afforded by Teradata Aster, a one stop shop for integrating the paper, document, and relational view of your business.

If you would like more information, please feel free to contact me at


[1]Baker, M.; Miksik, C., "The document is dead," Professional Communication Conference, 1996. IPCC '96 Proceedings. Communication on the Fast Track., International , vol., no., pp.56,61, 18-20 Sep 1996

doi: 10.1109/IPCC.1996.552581,