Monitoring ETL applications with Unity Ecosystem Manager

UDA
The UDA channel is for Teradata’s Unified Data Architecture including the Analytical Ecosystem and other UDA influences. This channel provides information specific to the integration and co-existence of multiple systems, in particular when a mix of Aster, Teradata, and Hadoop are present. It is also meant to support information around the UDA enabling technologies so products like Viewpoint, Data Mover, Connectors, QueryGrid, etc.
Teradata Employee

Monitoring ETL applications with Unity Ecosystem Manager

The Analytical Ecosystem can be quite complex. It usually consists of multiple managed servers containing instances of databases, ETL servers, infrastructure services and application servers. Monitoring and managing applications in this environment can be a very challenging task. Knowing at any moment what is the state of your hardware, how your applications are doing, how many jobs have finished successfully, how many failed and why have they failed are the types of questions database administrators typically ask themselves. Now with the addition of Hadoop infrastructure components within an ecosystem, monitoring has become even harder. Unity Ecosystem Manager helps users to answer those questions and perform and necessary maintenance tasks.

Environment

Today a job can consist of steps that process both structured and semi-structured data at the same time. Imagine a user application that needs to process constantly arriving web logs, extract critical user data and insert it into a database.  The user needs to move files into HDFS, run a Map-Reduce job to extract the data, convert it into a relational DB format and finally insert the data into a database table.

The 14.10 release of Unity Ecosystem Manager allows monitoring of all aspects of a user application including jobs, tables, hardware (servers) and software (daemons).  In its latest version Unity Ecosystem Manager provides a unified view of the entire Analytical Ecosystem in a single web interface. The new Ecosystem Explorer portlet deployed on Teradata Viewpoint shows user applications, jobs, tables, servers and daemons with ability to see all application dependencies at the same time.

A user can configure and monitor a web log processing application from the Ecosystem Manager User Interface. Here’s a sample configuration for a Web Log processing application viewed from the Ecosystem Manager Explorer Application perspective. Using dependency buttons a user can view applications and its dependencies within a single view:

Unity Ecosystem Manager Sendevent API supports the ability to pass information about a job, table, server or application to the Ecosystem Manager repository.  This simple mechanism allows user to monitor the operation aspects of these components and so that they can perform necessary management tasks.  At the same time, Ecosystem Manager can self-discover many pieces of the Analytical Ecosystem infrastructure automatically. For example, installing the Ecosystem Manager client on a Linux box will automatically send information about the server as well as Teradata ETL jobs such as TPT or Data Mover processes to the Ecosystem Manager server.

How to track a Hadoop Job

In order to track Hadoop jobs a communication with the Job Tracker daemon needs to be established. Normally the configuration for Job Tracker resides in /etc/hadoop/conf/mapred-site.xml. Here’s a sample entry containing the port on which the daemon listens to requests:

<property>

    <name>mapred.job.tracker</name>

    <value>hortonworks-dev-training.localdomain:50300</value>

  </property>

A simple java program can connect to the Job Tracker via an interface called JobClient which permits getting information about all jobs. Here’s a javadoc  for Hadoop JobClient API: https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobClient.html.  The getAllJobs() method return an array of job statuses and from there it is possible to get individual task reports which contain such data as Map and Reduce tasks finish time. The JobStatus object contains job name, start time and run state (RUNNING, SUCCEEDED, FAILED, PREP and KILLED).  

    Configuration conf = new Configuration(); // hadoop job configuration

    conf.set("mapred.job.tracker", "http://153.64.26.135:50300");

    JobClient client = new JobClient(new JobConf(conf));

    JobStatus[] jobStatuses = client.getAllJobs();

    for (JobStatus jobStatus : jobStatuses)

         System.out.println("JobID: " + jobStatus.getJobID().toString() + ", status: " + getEnumStatus(jobStatus.getRunState());

Having obtained critical job metadata enables user to track the job and make necessary management decisions such as stopping one of the related jobs or changing an application state. To execute intelligent decisions about jobs, a user needs to send job metadata to Ecosystem Manager repository via sendevent interface.

           SendEvents se = new SendEvents();          

           if(jobStatus.getRunState() == 1) // RUNNING

              se.execute(getHostName(), jobStatus.getJobID().toString(), "START");

In order to send events to an EM server from a java program user needs to use Ecosystem Manager SendEvent Java API. The steps include installing EM client software (agent and publisher packages), run service config:

/opt/teradata/client/em/bin/emserviceconfig.sh JAVA_HOME Primary_EM_Repository Secondary_EM_Reposiotry  Ecosystem_Name Server_Type

After configuring Ecosystem Manager services a user can compile the Job tracking java program together with the SendEvent API calls. In order to do that user needs to add the following jar files containing Hadoop and Ecosystem Manager libraries to java CLASSPATH environment variable:

/usr/lib/hadoop/hadoop-core.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/commons-lang-2.4.jar:/usr/lib/hadoop/lib/jetty-6.1.26.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.jar: $EM_HOME/lib/messaging-api.jar:$EM_HOME/lib/em.jar:

A sample statement to compile the program:

/opt/teradata/jvm64/jdk6/bin/javac -cp $CLASSPATH TestTracker.java

A sample statement to run the program:

/opt/teradata/jvm64/jdk6/bin/java -cp /usr/lib/hadoop/hadoop-core.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/commons-lang-2.4.jar:/usr/lib/hadoop/lib/jetty-6.1.26.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/opt/teradata/client/em/lib/log4j-1.2.9.jar:$CLASSPATH TestTracker

Job Tracking java program would run as a daemon so there is a while(true) loop in the main method which could be terminated only with a kill command or a system exit call:

public static void main(String[] args) throws Exception {

     while(true) {

        run(args);

        Thread.sleep(4000);

     }

  }

Once started it would continue to scan the Hadoop Job tracker and send messages about jobs.

Here’s a sample set of START events that are going to be issued in order to see application and related components (jobs, tables and servers). Please note that START events will have to be followed by END events in order to see the job completed:

sendevent --et START --jid CopyLogFilesJob --tds sdll7949.labs.teradata.com  -t MLOAD --sc CONT -w WLAUOWID1 --app WebLogApp

sendevent --et START --jid MRExtractJob --tds sdll7949.labs.teradata.com  -t EXTRACT --sc CONT -w WLAUOWID2 --app WebLogApp

sendevent --et START --jid InsertJob --tds sdll6128  -t MLOAD --sc CONT -w WLAUOWID1 --app WebLogApp --wdb UserDB --wtb UserTB -v 50

User interface

While running the jobs and even after they finish user can review their statuses using Ecosystem Manager User interface. This picture shows how user can see the job execution report and server metric report that share the same timeline. This gives user a unique opportunity to debug jobs and answer the questions such as why the job failed or missed its SLA:

From Jobs view user can drill down to see job events

Finally user can view the table into which the obtained data (50 rows in this example) is inserted:

Conclusion

With Unity Ecosystem Manager’s monitoring capabilities it is easy to make sense of complicated Analytical Ecosystem. Specifically, this article shows how to monitor different ETL jobs whether it is a Hadoop job or another type.  It also demonstrates new Unity Ecosystem Manager User Interface including the ability to see all components of an Analytical Ecosystem including applications, jobs, tables and servers in a single integrated view.

7 REPLIES
Enthusiast

Re: Monitoring ETL applications with Unity Ecosystem Manager

Is Unity Ecosystem Manager a monitoring tool only or we can schedule jobs too based on applications, maintaining depdendencies, even if it is hadoop jobs, ETL jobs, database jobs too?

Cheers,

Raja

Teradata Employee

Re: Monitoring ETL applications with Unity Ecosystem Manager

Raja,

We are working on adding a workflow engine integrated with scheduler to Unity Ecosystem Manager.

Regards,

Dmitriy

Enthusiast

Re: Monitoring ETL applications with Unity Ecosystem Manager

The reason I ask is because few years back I used to work on Ab Initio projects wherein we used a lot of Control-M ,Autosys and in some projects Dollar-U. So jobs are scheduled.Later Ab Initio, came up with their own scheduling tool. Also I see that other ETL vendors are having their own scheduling tools, integrated, like SAP DS.However, in Teradata, we have  Teradata Query Scheduler, though I have not worked, but I see that it is easy to configure.

I can see a huge cost-saving way for the clients here.

So, my point is : in Unity Ecosystem manager, will it be Teradata Query Scheduler and Hadoop job scheduler?

Thanks and regards,

Teradata Employee

Re: Monitoring ETL applications with Unity Ecosystem Manager

Raja,

We plan to have a generic workflow engine that would be job type agnostic. As long as user can provide job location and parameters the engine would be able to run it regardless of what type of a job it is.

Regards,

Dmitriy

Enthusiast

Re: Monitoring ETL applications with Unity Ecosystem Manager

Nice Article. Couple of questions?

- Does this Hadoop integration work with anyother Hadoop cluster (MAP R, Cloudera, Hortonworks)?

- Is there any future integration to include Aster as well? Adding data load job for Aster.

Teradata Employee

Re: Monitoring ETL applications with Unity Ecosystem Manager

Hi Teratarun,

It should work with any Apache based HD cluster. Details such as Job Tracker daemon port may change but the API is common. I've tested with Hortonworks.

We are working on adding Aster integration paths in the future releases.

Regards,

Dmitriy

Teradata Employee

Re: Monitoring ETL applications with Unity Ecosystem Manager

Where can I acquire the EM client software (agent and publisher packages)?