I recently had a requirement to export data to files in UTF8 format.
I've created TPT scripts with the "USING CHARACTER SET UTF8" header and created the output schema with double/triple the data length format of the source. Now after executing (UNIX) and checking the output file, it is being created as ISO-8995 and not UTF8.
Here's an example of the TPT content
USING CHARACTER SET UTF8
DEFINE JOB sample
DEFINE SCHEMA FILE_OUT DESCRIPTION 'schema for output file' ( COL1 VARCHAR(300), COL2 VARCHAR(200), ... ) DEFINE OPERATOR Producer_Query TYPE EXPORT SCHEMA FILE_OUT ATTRIBUTES ( VARCHAR UserName=**bleep**, VARCHAR Pass=**bleep**, VARCHAR SelectStmt='SELECT CAST(COL1 AS VARCHAR(50)), CAST(COL2 AS VARCHAR(50)) ... ... FROM TABLE;' )
Did I miss anything?
How are you determining that the output is ISO-8859? Does the text contain extended characters?
If all the bytes are in the x'00'-x'7F' range, there is no difference between ISO-8859-1 and UTF-8.
I'm running the file command in UNIX to get the file character set.
For the recrod/data characters - they are all normal/plain text without any extended characters; which I think why the exported file is still coming out as ISO-8895. The only character which is unusual is my delimiter "§" which is upon searching still falls under ISO-8895.
So maybe just to conclude or confirm (please correct) - if the CHAR SET UTF8 is used in TPT script but exported data doesn't have extended characters, it will not create the file as UTF8?
ASCII is a subset of UTF8, and I suspect ISO-8895 is as well.
If there are no extended characters, the data file is all single-byte ASCII?
If so, that is still considered UTF8.