I am creating a TPT script to load some characters into table. I know if I set the session character set to UTF8 it will get loaded. But I am unable to set the session character set to UTF8 using my TPT script.
I tried saving the ".txt" file of my TPT script as UTF-8 format and used "USING CHARATER SET UTF8" before DEFINE JOB along with "tbuild -e utf8" option. But it is giving me error RDBMS Error 6705 : An illegally formed character string was encountered during translation.
Am I doing any thing wrong or is there any thing else which I am missing?
Experts, please let me know your answers.
The USING CHARACTER SET UTF8 is only for the data. Not the script.
The script does not have to be encoded in UTF8 in order for you to load UTF8 data into Teradata.
If there is nothing special in the script that requires it to be encoded in UTF8, just create the script in ASCII.
The -e command line option tells TPT That the script is encoded in something different than ASCII.
As for the error, the content of the data has one or more characters that are not acceptable to Teradata.
Teradata does not support every single UTF8 character sequence.
Where is the data coming from?
It is actually coming from oracle source. The thing is, people in my team tried to load that through informatica TPT API. There are 2 types of input data. Type-1 has chinease characters. When UTF8 is used with INF-TPT it is loaded. But then there is MS-LATIN (Windows-1252) characters they are not getting loaded with UTF8. But when MS-Latin (1252) code page is used within INFA-TPT they are getting loaded.
So I thought of creating a TPT script (instead of INF-TPT job) to use Translate function and see the result.
Let me try your option. Meanwhile, do you have any other suggestions in this regard, that would be really helpful.
Hi Feinholz and others,
I have made a little progress on this. Now I know the problem is with some extended ASCII characters.
Such as, ASCII: 10 , 13, 188 etc.
I can find each individual character with ASCII function and change them with CHR function. But I cannot change the entire string, as the functions do not work that way.
Also I found the hexadecimal values of the string by CHAR2HEXINT function and able to manually change the hexadecimal codes like '000A00BC'XC to character.
But I am not sure in job how shall I change the hex values to character on the fly.
Such as, SELECT ''''||CHAR2HEXINT('1/4')||''''XC to change the entire string back to character. But it did not work.
Hope I am able to explain the problem.
Please help me in this regard.
Can anyone please tell me how to load Extended ASCII characters in UTF8 session charset to TD ?
Is there any way to insert '000A00BC'XC into teradata through any function or something where I can use the column name instead of the hexadecimal string?
Any help or guidance is appreciated.
What Unicode characters you are refereing to? Use U+ code point notation. For example: U+000A (<control>)
U+00BC (VULGAR FRACTION ONE QUARTER)
These characters should load from UTF8 into the Unicode, or Latin, server character sets. Note that Unicode supports various Latin scripts: basic latin (ASCII), Latin-1, Latin Extended (see http://www.unicode.org/charts/).
Thanks for reply.
Well, there is the problem. Even I thought defining the column as unicode and using UTF8 as session charset will work.
But the problem is we are supposed to load the data through ETL tool Informatica where we are using code page UTF8.
Now the problems we are facing,
1. The source is Oracle
2. In the same source table we have 2 types of columns
Column with extended ascii char (U+000A <control> and U+00BC (VULGAR FRACTION ONE QUARTER) within a large string)
Column with Asian characters such as Chinease.
3. When we are using MS-Latin code page and loding into Uniode colum ascii getting loaded. But the asian characters are becoming garbled.
4. On the other hand when UTF8 code page is used asian characters are getting loaded but these 8-bit ASCII characters are getting rejected with 6705 error code.
I am looking for a solution which will load both in single shot.
I tried changing the 8-bit ascii characters to hexadecimal format using Char2HexInt function but then how to change those hex values back to ascii char using any function (not by using ''XC format) ?
Hope I am able to explain you the actual problem. Any solution or guidance is appreciated.
All columns in the source table need to use the same character set. In this case, a Unicode character set is the only one to support both Chinese and Latin. So UTF8 is a good choice. There should be a way in Informatica to convert Latin to UTF8 before the load.
I would like share my issue which is currently facing in Teradata parallel transporter script in Z/Os. I am having the TPT job for loading file data into table but the file volume is high.
i am using the DATA CONNECTOR AS PRODUCER to read the file. i gave producer instance as 2 but it is taking 1 instance to read file.
Like , Instance 1 Reading file 'DD:PTYIN'. i have used the attribute MultipeReaders='Y' but the TPT job got disconnecting while acqusition phase.
Kindly help on this !!
Thanks in advance .