Loading Data from Oracle to Hive using TDCH-TPT

Tools
Enthusiast

Loading Data from Oracle to Hive using TDCH-TPT

Hi, We have TPT 15.10.xx.xx and TDCH as 1.4. We have TPT installed in Linux (CentOS release 6.8). We are running tbuild command from Hadoop edge node where TPT is installed. When we are loading data from Teradata to Hive using TDCH-TPT, we are having success and we are able to load data to Hive table. But now, we want to read data from Oracle source (using ODBC operator) and write to Hive table (using DATACONNECTOR CONSUMER). In DATACONNECTOR operator, we are using HadoopHost, HadoopJobType (Hive), HadoopFileFormat (ORCFile), HadoopTargetDatabase, HadoopTargetTable, HadoopUser variables. 1) Is it possible to read data from Oracle and writing the same to Hive table? 2) While running the job, we are getting below error: ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 17/06/16 02:53:01 INFO processor.IDataStreamInputProcessor: input preprocessor com.teradata.connector.idatastream.processor.IDataStreamInputProcessor starts at: 1497595981229 17/06/16 02:53:01 INFO processor.IDataStreamInputProcessor: the teradata connector for hadoop version is: 1.4.1 17/06/16 02:53:01 WARN tool.ConnectorJobRunner: com.teradata.connector.common.exception.ConnectorException: The output post processor returns 1 17/06/16 02:53:01 INFO tool.ConnectorImportTool: com.teradata.connector.common.exception.ConnectorException: One or more of the types defined in the IDataStream schema are currently unsupported by the IDataStream plugin at com.teradata.connector.common.tool.ConnectorJobRunner.runJob(ConnectorJobRunner.java:140) at com.teradata.connector.common.tool.ConnectorImportTool.run(ConnectorImportTool.java:67) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at com.teradata.connector.common.tool.ConnectorImportTool.main(ConnectorImportTool.java:745) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 17/06/16 02:53:01 INFO tool.ConnectorImportTool: job completed with exit code 60005 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Can someone help us on this issue? Thanks & Regards, Arpan.
Tags (5)
11 REPLIES
Enthusiast

Re: Loading Data from Oracle to Hive using TDCH-TPT

Hi All, Can some one please help us on this? Thanks & Regards, Arpan.
Teradata Employee

Re: Loading Data from Oracle to Hive using TDCH-TPT

Can you please provide more information?

Can you post the script?

We would like to know the schema types you are providing.

 

-- SteveF
Enthusiast

Re: Loading Data from Oracle to Hive using TDCH-TPT

Hi Steve,

Thanks a lot for your response. 

Below is the TPT Control File.

DEFINE JOB EXPORT_ORA_TABLE_TO_FILE
DESCRIPTION 'export EXPORT_ORA_TABLE_TO_FILE'
     (
        DEFINE SCHEMA SCHEMA_COL_BASE
            (
                                C1   NUMBER(4),
                                C2   NUMBER(4),
                                C3   NUMBER(4),
                                C4   NUMBER(2),
                                C5   CHAR(3),
                                C6   VARCHAR(25),
                                C7   VARCHAR(19),
                                C8   VARCHAR(19),
                                C9   VARCHAR(19),
                                C10  NUMBER(3),
                                C11  NUMBER(3),
                                C12  VARCHAR(19),
                                C13  CHAR(16),
                                C14  VARCHAR(19),
                                C15  VARCHAR(19)
                        );

        DEFINE OPERATOR o_ODBCOper
        TYPE ODBC
        SCHEMA SCHEMA_COL_BASE
        ATTRIBUTES (
            VARCHAR UserName            = @UserName
           ,VARCHAR UserPassword        = @UserPassword
           ,VARCHAR DSNName             = @DSNName
           ,VARCHAR PrivateLogName      = 'ODBCloadlog'
           ,VARCHAR SelectStmt          = @SelectClause || @SourceSchema || @SourceTable
           ,VARCHAR TraceLevel          = 'none'
           ,INTEGER DataBlockSize       = 2048
        );

DEFINE OPERATOR o_FileWritter
        TYPE DATACONNECTOR CONSUMER
        SCHEMA SCHEMA_COL_BASE
        ATTRIBUTES (
         VARCHAR PrivateLogName         = 'DataConnector'
        ,VARCHAR TraceLevel             = 'none'
        ,VARCHAR HadoopHost             = @HadoopHost
        ,VARCHAR HadoopJobType          = @HadoopJobType
        ,VARCHAR HadoopFileFormat       = @HadoopFileFormat
        ,VARCHAR HadoopTargetDatabase   = 'default'
        ,VARCHAR HadoopTargetTable      = @HadoopTable
        ,VARCHAR HadoopUser             = @HadoopUser
        );
        APPLY TO OPERATOR (o_FileWritter[@LoadInst])
           SELECT * FROM OPERATOR (o_ODBCOper[@ReadInst] ATTRIBUTES (SelectStmt=@SelectClause || @SourceSchema || @SourceTable || ' WHERE C2 > = 0 and C2 < 2000;'))
UNION ALL
SELECT * FROM OPERATOR (o_ODBCOper[@ReadInst] ATTRIBUTES (SelectStmt=@SelectClause || @SourceSchema || @SourceTable || ' WHERE C2 > = 2000 and C2 < 3000;'))
UNION ALL
SELECT * FROM OPERATOR (o_ODBCOper[@ReadInst] ATTRIBUTES (SelectStmt=@SelectClause || @SourceSchema || @SourceTable || ' WHERE C2 > = 3000 and C2 < 4000;'))
UNION ALL
SELECT * FROM OPERATOR (o_ODBCOper[@ReadInst] ATTRIBUTES (SelectStmt=@SelectClause || @SourceSchema || @SourceTable || ' WHERE C2 > = 4000 and C2 < 5000;'))
UNION ALL
SELECT * FROM OPERATOR (o_ODBCOper[@ReadInst] ATTRIBUTES (SelectStmt=@SelectClause || @SourceSchema || @SourceTable || ' WHERE C2 > = 5000 and C2 < 6000;'))
UNION ALL
SELECT * FROM OPERATOR (o_ODBCOper[@ReadInst] ATTRIBUTES (SelectStmt=@SelectClause || @SourceSchema || @SourceTable || ' WHERE C2 > = 6000 and C2 < 7000;'))
UNION ALL
SELECT * FROM OPERATOR (o_ODBCOper[@ReadInst] ATTRIBUTES (SelectStmt=@SelectClause || @SourceSchema || @SourceTable || ' WHERE C2 > = 7000 and C2 < 9999;'));
     )
;

Below is the tbuild command:

tbuild -h 100M -f <FullPathToTPTControlFile> -v <FullPathToLogonFile> -u " UserName='<OracleUserName>' , UserPassword='<OracleUserPassword>' , DSNName='<DSNNameDefinedInODBCINI>' , load_op=o_ODBCOper , LoadInst=1 , ReadInst=7 , HadoopHost='<EdgeNodeWhereTPTIsRunning>' , HadoopJobType='hive', HadoopFileFormat='ORCfile', HadoopTable='<HadoopTargetTable>' , HadoopUser='<HadoopUser>'  , LOAD_DTS='2017-06-16 08:21:34' , Format='DELIMITED', TextDelimiter='$^$' , SkipRows=0 , SourceSchema='<OracleSourceSchema>' , SourceTable='<OracleSourceTable>', SelectClause='SELECT C1 ,C2 ,C3 ,C4 ,C5 ,C6 ,TO_CHAR(C7,''YYYY-MM-DD HH24:MI:SS'') ,TO_CHAR(C8,''YYYY-MM-DD HH24:MI:SS'') ,TO_CHAR(C9,''YYYY-MM-DD HH24:MI:SS'') ,C10 ,C11 ,TO_CHAR(C12,''YYYY-MM-DD HH24:MI:SS'') ,C13 ,TO_CHAR(C14,''YYYY-MM-DD HH24:MI:SS'') ,TO_CHAR(C15,''YYYY-MM-DD HH24:MI:SS'') FROM '" ORA_TDCH_TPT 

Thanks & Regards,

Arpan.

Teradata Employee

Re: Loading Data from Oracle to Hive using TDCH-TPT

Please set the “HADOOP_ROOT_LOGGER” environment variable on the node where you are running the TDCH job from:

 

export HADOOP_ROOT_LOGGER=DEBUG,console

 

That should give us a little more debug output to maybe see what is going on.

 

Thanks!

-- SteveF
Teradata Employee

Re: Loading Data from Oracle to Hive using TDCH-TPT

After speaking with the TDCH developers, I have learned that the TDCH "iDataStream" plugin (a plugin to TDCH used by TPT) does not support the NUMBER data type.

And unfortunately, this is not documented.

You will have to change your TPT schema to use a different data type (such as DOUBLE).

 

 

-- SteveF
Enthusiast

Re: Loading Data from Oracle to Hive using TDCH-TPT

Hi Steve, Thanks you for your response. Once I change datatype from NUMBER to INTEGER, previous issue is resolved. But now it is giving "IndexOutOfBoundsException". Below is the log: 17/07/08 05:15:52 INFO mapreduce.Job: Running job: job_1499445795569_0653 17/07/08 05:16:05 INFO mapreduce.Job: Job job_1499445795569_0653 running in uber mode : false 17/07/08 05:16:05 INFO mapreduce.Job: map 0% reduce 0% 17/07/08 05:16:17 INFO mapreduce.Job: Task Id : attempt_1499445795569_0653_m_000001_0, Status : FAILED Error: java.lang.IndexOutOfBoundsException at java.nio.ByteBuffer.wrap(ByteBuffer.java:375) at com.teradata.connector.idatastream.serde.IDataStreamSerDe.deserialize(IDataStreamSerDe.java:292) at com.teradata.connector.common.ConnectorCombineInputFormat$ConnectorCombinePlugedinInputRecordReader.getCurrentValue(ConnectorCombineInputFormat.java:531) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getCurrentValue(MapTask.java:535) at org.apache.hadoop.mapreduce.task.MapContextImpl.getCurrentValue(MapContextImpl.java:75) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.getCurrentValue(WrappedMapper.java:86) at com.teradata.connector.common.ConnectorMMapper.run(ConnectorMMapper.java:122) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Can you please advice what should we do to resolve this error? Thanks & Regards, Arpan.
Enthusiast

Re: Loading Data from Oracle to Hive using TDCH-TPT

Hi Steve,

Can you please help us on the issue?

 

Thanks & Regards,

Arpan.

Enthusiast

Re: Loading Data from Oracle to Hive using TDCH-TPT

Hi Steve,

I'm able to write data in HDFS, but there are few problems:

1) Whatever I'm defining in "HadoopTargetPaths" attribute, it's creating as a directory and withing that dorectory file name like "part-m-00000.snappy" and "part-m-00001.snappy" is being created. Is there any way I can specify the filename which needs to be created. I'm using HadoopNumMappers = 2

2) My observation is, whenever I'm trying to use INTEGER datatype (in source I've NUMBER, which is not supported by iDataStream), I'm getting IndexOutOfBound error. What data type I should use instead of NUMBER? I tried to use DOUBLE, but looks like it's not working.

3) Even "part-m-00000.snappy" is being created, data written into the file is not proper (looks like). I'm using HadoopFileFormat='textfile' and HadoopJobType='hdfs', HadoopNumMappers=2 , HadoopSeparator= '|'. Is the file created needs to be converted to some other format or I'm missing anything?

 

Please help.

 

Thanks & Regards,

Arpan.

 

 

Enthusiast

Re: Loading Data from Oracle to Hive using TDCH-TPT

Hi Steve,

I was able to get data from snally file, it's the default compression defined in mapred-site.xml. If we use "hadoop fs -text "Fully_Qualified_File_Name", I'm able to see the actual data values.

But, is there any way I can suppress the compression? \

Also, there are many options available in command line option, but there are very few options available in TDCH-TPT. Are there any undocumented variables for those optuions?

E.g is -conf,-classname,-url,-accesslock,-method,-throttlemappers,-sourcetimestampformat,-targettimestampformat,-stringtruncate etc. etc.

I'm trying to compare SQOOP import with TDCH-TPT.

 

Thanks & Regards,

Arpan.