teradataimporttool charset problem

Database
Fan

teradataimporttool charset problem

Hi.

I am importing data from teradata to hadoop with "Teradata Connector for Hadoop (Command Line Edition): Cloudera" v1.2:

http://downloads.teradata.com/download/connectivity/teradata-connector-for-hadoop-command-line-editi...

I have a table like this:

create table testable (

  id int not null,

  value varchar(50),

  text varchar(200),

  PRIMARY KEY (id)

);

And I have inserted this data:

insert into testtable values (1, '#1€', 'aá');

insert into testtable values (2, '#2€', 'eé');

The import job works normally:

export USERLIBTDCH=/usr/lib/tdch/teradata-connector-1.2.jar

hadoop jar $USERLIBTDCH com.teradata.hadoop.tool.TeradataImportTool -classname com.teradata.jdbc.TeraDriver -url jdbc:teradata://teradataServer/DATABASE=test,CHARSET=UTF8 -username dbc -password dbc -jobtype hdfs -fileformat textfile -targetpaths /temp/hdfstable -sourcetable testtable -splitbycolumn id

But the resulting file in hdfs:

1 #1? a?
2 #2? e?

How can I import "special" characters from teradata to hadoop (UTF-8)? If I use the jdbc driver directly (e.g. java program), it works ok. the problem seems to be in the connector...

4 REPLIES
Enthusiast

Re: teradataimporttool charset problem

I am also curious for these characters that if we can change from DATABASE=test,CHARSET=UTF8

to DATABASE=test,CHARSET=UTF16. If it works .

Re: teradataimporttool charset problem

Mardan, We have the same problem, exactly.

Cloudera Connector for Teradata, CDH4, works rightly, but We are interested in using Teradata Connector for Hadoop (Command Line Edition): Cloudera"  because is the Connector recomended in terms of performance.

Have you got to import special chars yet?

Fan

Re: teradataimporttool charset problem

With "DATABASE=test,CHARSET=UTF16" I get the same resulting file.

The columns of my teradata table are in unicode chartype:

select columnname,chartype from dbc.columns where tablename =  'testtable';

 ColumnName                                                   CharType 

 ------------------------------------------------------------ -------- 

 id                                                                         0

 text                                                                      2

 value                                                                    2

Teradata Employee

Re: teradataimporttool charset problem

What is the hexdump of the '?' in the hdfs file? Is it the UTF-8 0x1A replacement character, or something else? Is there a byte-order-mark, or is UTF-16 assumed to be little endian?