A Practical Approach to HL7 and Teradata Aster

Learn Data Science
Teradata Employee

What is HL7:


Health Level-7 or HL7 refers to a set of international standards for transfer of clinical and administrative data between Hospital information systems. These standards focus on the application layer, which is "layer 7" in the OSI model.

Document Purpose:

This document will review the complexity of macro analytics with respect to HL7 data structures.

Challenges of the HL7 and Data Analytics:

HL7 data comes in a variety of formats and this creates challenges to performing analysis.  It is simple to look at one instance of an HL7 medical record but what about when an organization wants to review tens of millions of instances?  The details below detail how challenging it can be to work with HL7 because the data format does not support easy load and go capability for advanced analytics.

The HL7 Format:


1.       Records are terminated by a \r.  I have a conversion to \n.  Simple. (dos2unix)

2.       Each record type is delimited by a pipe “|”.

3.       Each record has its own structure based on its type.  These types of records are not normalized to fit into a single table.

4.       The records in the file are coming in a specific order, meaning they are synchronous thus the order of the records is critical.

5.       A patient episode will be made up of many ordered record types.

6.       Each record type for a patient episode has a counter of the number records for that record type.

7.       There is no natural business/surrogate key that transcends a patient episode that I could see.  There is nothing in the OBX type which contains notes.

8.       Each patient episode starts with a ‘MSH’ record type.  There is no trailing ‘MSH’ record type to delimit the patient episode  set of records.

The HL7 record is a multi-structured data format defined in the previous section of this document.   It is difficult to look at a few of these Medical Records for patterns let alone tens of millions.  The team prepared the HL7 data into a format that made it easy to search through.  We did this using the Aster Data System for data preparation and load as well as for advanced analysis.   We were also able to link back to other data elements stored in HL7 and capture metadata about the patient, the patient visit, the medical facility, the doctors, and many other areas.  This pattern could be used beyond looking for an ejection fraction, but for the purposes of this document we will focus on this topic:  ejection fraction.

CASE STUDY: Finding Ejection Fraction in the Sea of 10 Million HL7 EMR Instances:


PROBLEM:  Find all of the instances of Ejection Fraction test results hidden in tens of millions of HL7 medical records.


The ejection fraction (EF) is an important measurement in determining how well your heart is pumping out blood and in diagnosing and tracking heart


  • A normal heart's ejection fraction may be between 55 and 70.
  • An EF between 40 and 55 indicates damage, perhaps from a previous heart attack, but it may not indicate heart failure.

FINDINGS:  What the team was able to determine, or correlate, was that women have hearts that are physically stronger than men.  Women also
had less variance in their ejection fraction results throughout their lives.  This is what the data demonstrated and this team is not sure if there is enough evidence to prove this point.  However, it was interesting what was able to determined just be looking at HL7 text data in a statistical


Within a few hours the team was able to load, prepare, and turn text into statistics across a highly unstructured document set of HL7 medical records.  We were then able to use a tool like Tableau to demonstrate ejection fraction readouts that would be normally hidden deep inside of this text.  See below for examples.

We are just touching the surface on this topic and its capabilities to help healthcare providers do amazing things with data.