Moving unicode data from Oracle to Teradata using TPT and the ODBC Stream Operator

Tools & Utilities
Enthusiast

Moving unicode data from Oracle to Teradata using TPT and the ODBC Stream Operator

My Oracle table has one CHAR(10 BYTE) column.

 

--DROP TABLE "SQL_SANDBOX"."UNICODE_CHAR_BYTE10_TEST";
CREATE TABLE "SQL_SANDBOX"."UNICODE_CHAR_BYTE10_TEST"
   (  "CUSTOMER_NAME" CHAR(10 BYTE) NOT NULL ENABLE
   ) SEGMENT CREATION IMMEDIATE
  PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
  STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
  PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "USERS";
  --The "umbrella" is 3 bytes.
INSERT INTO "SQL_SANDBOX"."UNICODE_CHAR_BYTE10_TEST"  VALUES('☂'); --3 bytes
INSERT INTO "SQL_SANDBOX"."UNICODE_CHAR_BYTE10_TEST"  VALUES('☂☂'); --6 bytes
INSERT INTO "SQL_SANDBOX"."UNICODE_CHAR_BYTE10_TEST"  VALUES('☂☂☂'); --9 bytes
INSERT INTO "SQL_SANDBOX"."UNICODE_CHAR_BYTE10_TEST"  VALUES('☂☂☂1'); --10 bytes (three 3 byte characters and one 1 byte)

 

We can have a combination of ASCII and Unicode.

 

My Teradata table is defined like this.

 

--DROP TABLE "SQL_SANDBOX"."UNICODE_CHAR_BYTE10_TEST";
CREATE MULTISET TABLE SQL_SANDBOX.UNICODE_CHAR_BYTE10_TEST ,FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
CUSTOMER_NAME CHAR(10) CHARACTER SET UNICODE NOT CASESPECIFIC NOT NULL)
PRIMARY INDEX ( CUSTOMER_NAME );

 

Here is my TPT script (I'm using the Data Direct Wire Protocol Drivers).

My understanding is the for UTF-16 you need to multiply any characters by two.  UTF-8 you would multiply by three (and make sure the number was even).

 

USING CHARACTER SET UTF16
DEFINE JOB "Load To Teradata"
(
DEFINE SCHEMA SourceTable
(
CUSTOMER_NAME CHAR(20)
);

DEFINE OPERATOR ODBCOperator
TYPE ODBC
SCHEMA SourceTable
ATTRIBUTES (
VARCHAR SelectStmt = 'SELECT CUSTOMER_NAME FROM SQL_SANDBOX.UNICODE_CHAR_BYTE10_TEST'
,VARCHAR DSNName = 'Oracle 64'
,VARCHAR UserName = 'SYSTEM'
,VARCHAR UserPassword = '<SOURCE_PASSWORD>'
);

DEFINE OPERATOR StreamOperator
TYPE Stream
SCHEMA *
ATTRIBUTES (
VARCHAR Tdpid = 'td1510vm'
,VARCHAR UserName = 'dbc'
,VARCHAR UserPassword = '<TARGET_PASSWORD>'
,VARCHAR WorkingDatabase = 'SQL_SANDBOX'
,VARCHAR MacroDatabase = 'SQL_SANDBOX'
,VARCHAR ErrorTable = 'SQL_SANDBOX.UNICODE_CHAR_BYTE10_TEST_ET'
,VARCHAR LogTable = 'SQL_SANDBOX.UNICODE_CHAR_BYTE10_TEST_tlog'
,VARCHAR DateForm = 'ANSIDATE'
);

APPLY 'INSERT INTO SQL_SANDBOX.UNICODE_CHAR_BYTE10_TEST(:CUSTOMER_NAME);'
TO OPERATOR (StreamOperator[1])
SELECT * FROM OPERATOR (ODBCOperator[1]);
);

 

 Main job step terminates with status 8.


Job step MAIN_STEP terminated (status 8)
Job UNICODE_CHAR_BYTE10_TEST_LoadJob terminated (status 8)

I've also tried to move CHAR(10 CHAR) from Oracle to Teradata and run into the same status 8.

 

Anyone know what I'm doing wrong?  I'm using Teradata 15.10 and TPT 15.10.  Oracle 11.2.0.2.0.

 

Thanks for your help!

4 REPLIES
Visitor

Re: Moving unicode data from Oracle to Teradata using TPT and the ODBC Stream Operator

It may be that the umbrella is not a supported value in the UTF16 that Teradata supports.  See the UDF's available at:

 

http://downloads.teradata.com/download/tools/unicode-tool-kit

 

There is additional information available there.

Enthusiast

Re: Moving unicode data from Oracle to Teradata using TPT and the ODBC Stream Operator

Yes.  Thank you.  We considered that this character may be just a little too strange and may not be supported.

 

However, we're having issue with basic extended ascii as well, such as è and á from Oracle to Teradata when we add USING CHARACTER SET UTF8 or UTF16 to our script.  We're able to move these to SQL Server and DB2 via BulkCopy from Oracle, but Teradata produces an error  :/

 

I'll take a look at this tool kit.  I'm not sure how much it will help with this TPT issue, but I'll give it a shot to see what I can learn.  Appreciate the help!

 

Enthusiast

Re: Moving unicode data from Oracle to Teradata using TPT and the ODBC Stream Operator

We can actually insert the umbrella into a Teradata CHAR(10) CHARACTER SET UNICODE column using INSERT INTO so I'm not understanding why we can't do the same with TPT.  Our select statement when we run it on Oracle will show the ☂ character so I'm assuming the data is coming off or Oracle okay via the ODBC stream loader, however, there is some issue with inserting into Teradata and it may lead back to the schema?  

 

We just don't know... and would appreciate any ideas.  Thanks again!

 

CREATE MULTISET TABLE SQL_SANDBOX.UNICODE_CHAR_BYTE10_TEST ,FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
CUSTOMER_NAME CHAR(10) CHARACTER SET UNICODE NOT CASESPECIFIC NOT NULL)
PRIMARY INDEX ( CUSTOMER_NAME );

 

INSERT INTO "SQL_SANDBOX"."UNICODE_CHAR_BYTE10_TEST"  VALUES('☂');

Enthusiast

Re: Moving unicode data from Oracle to Teradata using TPT and the ODBC Stream Operator

Basically, what we have found is that since CHAR(N CHAR) or Oracle can hold up to 4 bytes per character and because of this we need to multiply every CHAR(N CHAR) column by 4.

 

So CHAR(10 CHAR) needs to be CHAR(40) or Teradata both in the table and the schema in order to move CHAR(10 CHAR) via the STREAM OPERATOR to Teradata.  This is also true using BulkCopy to move to other systems as well as we're moving Oracle to about 5 or 6 other systems.

 

If you have CHAR(10 BYTE) data it will fit in a regular CHAR(10) column.  It is only when using the character symantic in Oracle and not the byte symantic.  Also, other data types, such as VARCHAR2 or CLOB don't seem to be affected.  We're not putting CHARACTER SET UTF8/UFT16 in our script because we've decided to only support extended ASCII at this time and for this it seems to work well.

 

Best Regards,

 

Todd