how to resolve the weird data when using the Teradata-connector-1.3.3 to export data from Teradata to HDFS

Connectivity
Enthusiast

how to resolve the weird data when using the Teradata-connector-1.3.3 to export data from Teradata to HDFS

Hi All,

I am using teradata-connector-1.3.3 with the following versions:

->Teradata 15.0

->HDP 2.1

I am trying to import data from teradata tables to HDFS.

I create a table in the Teradata database.

CREATE MULTISET TABLE martinpoc.example3_td3 ,NO FALLBACK ,


     NO BEFORE JOURNAL,


     NO AFTER JOURNAL,


     CHECKSUM = DEFAULT,


     DEFAULT MERGEBLOCKRATIO


     (


      c1 INTEGER,


      c2 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC)


PRIMARY INDEX ( c1 );

I insert a data with Chinese word into this table.

INSERT INTO  martinpoc.example3_td3(c1,C2) VALUES ('1','蔡先生');

I run the command call the teradata connector to export data from teredata to HDFS

hadoop com.teradata.hadoop.tool.TeradataImportTool -libjars $LIB_JARS -url jdbc:teradata://192.168.65.132/CHARSET=UTF8,database=martinpoc -username martinpoc -password martin -jobtype hdfs -sourcetable example3_td3 -separator ',' -targetpaths /user/martin/example3_td3 -method split.by.hash -splitbycolumn c1

I success to export the data from teradata to HDFS, but I found the Chinese word in the HDFS is weird.

Do you have the same experience and how to resolve it.

thanks and Best Regard

Martin

Tags (1)
1 REPLY
Enthusiast

Re: how to resolve the weird data when using the Teradata-connector-1.3.3 to export data from Teradata to HDFS

Hi All

I download the read the tutorial document --Teradata Connector for Hadoop Tutorial v1 0 final.pdf. I found the Chapter 8.6 ---When should charset be specified in JDBC URL?  

It mention ---If the column of the Teradata table is defined as Unicode (UTF-8), then you should specify the same character set in the JDBC URL. Otherwise, it will result in wrong encoding of transmitted data, and there will be no exception thrown.

But this document don't show a sample or a example about how to specify the character. Do you know how to specif the character. 

Best Regards,

Martin