I'm fairly new to the Teradata world, although I have more than 10 years of experience with DB Development. I have one source file that needs to be loaded to Teradata - this file contains, amongst others, German and Greek language characters.
I've done a bit of reading regarding character sets in Teradata, and according to what I've read, I've set the source file encoding to UTF8. When calling FastLoad, I invoke the program with the "-c UTF 8" flag in order to force the encoding to UTF8.
Would like to highlight that my .NET/ODBC drivers are set for UTF8 sessions as well. Now, after loading the file, the German characters (ß, ö, ü, ä) are displayed as they should in SQL assistant, but I can't say the same for the Greek characters... ALL Greek characters get displayed as �.
This is the underlying DDL for the target table (altough being created through FastLoad):
CREATE MULTISET TABLE DwSandpitRomi.SEO_MARKETING_DAILY_TMP ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
REFERENCE_DATE VARCHAR(10) CHARACTER SET LATIN NOT CASESPECIFIC,
DOMAIN_URL VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
SEARCH_QUERY VARCHAR(500) CHARACTER SET LATIN NOT CASESPECIFIC,
NO PRIMARY INDEX ;
The FastLoad script is the following:
DROP TABLE DwSandpitRomi.SEO_MARKETING_DAILY_TMP;
CREATE MULTISET TABLE DwSandpitRomi.SEO_MARKETING_DAILY_TMP
NO PRIMARY INDEX ;
SET RECORD VARTEXT "<|>";
FILE=C:\ROMI\Andre\SEO\Full Extract\2017\Dec 2017.csv
INSERT INTO DwSandpitRomi.SEO_MARKETING_DAILY_TMP(
Can any help with this one? This is starting to make me feel crazy...
Your target table is defined with LATIN columns, but you need UNICODE.
Simply override the default:
CREATE MULTISET TABLE DwSandpitRomi.SEO_MARKETING_DAILY_TMP ( REFERENCE_DATE VARCHAR(10), DOMAIN_URL VARCHAR(100) CHARACTER SET UNICODE, SEARCH_QUERY VARCHAR(200) CHARACTER SET UNICODE, AVERAGE_POSITION FLOAT, CLICKS INTEGER, IMPRESSIONS INTEGER) NO PRIMARY INDEX ;
The data does get loaded now, but I can't make sense of the characters being displayed:
DE - sÃ¼per lig wetten
GR - ÎºÎ¿Ï÷Ï Î¿Î½Î¹ Ï¸Ï×Î¿Î¹Ï‡Î·Î¼Î±Ï×Î¿Ï¸
And joins do work. Could this be my .NET charset definition? It's currently set for UTF8 as well.
Those strange characters seem to be a display problem of correctly loaded data.
You need to
- specify the correct encoding of the input file (is it actually UTF-8 or maybe UTF-16)?
- set your .NET/ODBC session to Unicode (either UTF-9 or UTF-16)
- use a character set which can actually display all Unicode characters (most of the pre-Windows 10 fonts can't)
Seems you did all of those...
You still have a bad input definition, if you use VARTEXT every input field must be VARCHAR (and the column length for UTF-8 must be tripled:
DEFINE REFERENCE_DATE VARCHAR(10), DOMAIN_URL VARCHAR(300), SEARCH_QUERY VARCHAR(600), AVERAGE_POSITION VARCHAR(20), CLICKS VARCHAR(11), IMPRESSIONS VARCHAR(11)
Btw, using TPT would be probably easier.
Create the table first and then use
tdload -c UTF-8 -d"<|>" -f"C:\ROMI\Andre\SEO\Full Extract\2017\Dec 2017.csv" -t SEO_MARKETING_DAILY_TMP --TargetWorkingDatabase DwSandpitRomi -S -h yoursystem -u youruser -p yourpassword
The -S option saves the created scipt in the logs directory of your TPT install, probably
C:\Program Files\Teradata\Client\xx.xx\Teradata Parallel Transporter\logs
If this works you can use a generic TPT script and simply pass somejob variables