I am using teradata-connector-1.3.3 with the following versions:
I am trying to import data from teradata tables to HDFS.
I create a table in the Teradata database.
CREATE MULTISET TABLE martinpoc.example3_td3 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
c2 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC)
PRIMARY INDEX ( c1 );
I insert a data with Chinese word into this table.
INSERT INTO martinpoc.example3_td3(c1,C2) VALUES ('1','蔡先生');
I run the command call the teradata connector to export data from teredata to HDFS
hadoop com.teradata.hadoop.tool.TeradataImportTool -libjars $LIB_JARS -url jdbc:teradata://192.168.65.132/CHARSET=UTF8,database=martinpoc -username martinpoc -password martin -jobtype hdfs -sourcetable example3_td3 -separator ',' -targetpaths /user/martin/example3_td3 -method split.by.hash -splitbycolumn c1
I success to export the data from teradata to HDFS, but I found the Chinese word in the HDFS is weird.
Do you have the same experience and how to resolve it.
thanks and Best Regard
I download the read the tutorial document --Teradata Connector for Hadoop Tutorial v1 0 final.pdf. I found the Chapter 8.6 ---When should charset be specified in JDBC URL?
It mention ---If the column of the Teradata table is defined as Unicode (UTF-8), then you should specify the same character set in the JDBC URL. Otherwise, it will result in wrong encoding of transmitted data, and there will be no exception thrown.
But this document don't show a sample or a example about how to specify the character. Do you know how to specif the character.