I am importing data from teradata to hadoop with "Teradata Connector for Hadoop (Command Line Edition): Cloudera" v1.2:
I have a table like this:
And I have inserted this data:
The import job works normally:
But the resulting file in hdfs:
1 #1? a?
2 #2? e?
How can I import "special" characters from teradata to hadoop (UTF-8)? If I use the jdbc driver directly (e.g. java program), it works ok. the problem seems to be in the connector...
I am also curious for these characters that if we can change from DATABASE=test,CHARSET=UTF8
to DATABASE=test,CHARSET=UTF16. If it works .
Mardan, We have the same problem, exactly.
Cloudera Connector for Teradata, CDH4, works rightly, but We are interested in using Teradata Connector for Hadoop (Command Line Edition): Cloudera" because is the Connector recomended in terms of performance.
Have you got to import special chars yet?
With "DATABASE=test,CHARSET=UTF16" I get the same resulting file.
The columns of my teradata table are in unicode chartype:
select columnname,chartype from dbc.columns where tablename = 'testtable';
What is the hexdump of the '?' in the hdfs file? Is it the UTF-8 0x1A replacement character, or something else? Is there a byte-order-mark, or is UTF-16 assumed to be little endian?