Non UTF8 character issue in teradata to Aster

Aster
New Member

Non UTF8 character issue in teradata to Aster

I am using following for for a analysis requirement. I am trying to create a Aster table from a table in Teradata.

create table prdd_prd.mns_app_review distribute by hash(commentid) compress low as 
SELECT * FROM load_from_teradata(ON public.MR_DRIVER TDPID('###') USERNAME('###')  PASSWORD('$$$') NUM_INSTANCES('4') SPOOLMODE('Spool')  QUERY_TIMEOUT(3600) 
QUERY('sel * from  NTL_PRD_QMTBLS.mns_rev'));

 

The above code is throwing following error.

Spoiler
Executed as Single statement. Failed [34 : HY000] [AsterData][ASTERJDBCDSII](34) ERROR: string (byte length=338) contains non-UTF8 character (starting with byte = 0x0) at offset 0 () 

 

I am assuming there is some non utf charecter in the data. 

I think i have options 

1) Convert the data in the Teradata table to UTF8. Is there a way to do that ? 

SELECT REGEXP_REPLACE(COLUMN,?????

2) Create an Aster table to accept all the data 

I tried multiple character set nothing works 

 

Can anyone please help?

 

Regards,
Amit

 

3 REPLIES

Re: Non UTF8 character issue in teradata to Aster

Hi,

I am having the same issue and I am listing my efforts here in hope of finding the solution.

Like Amit, I am also loading data from Teradata, and the data I am loding are email addresses.

So, first I created a view in Teradata based on the follwoing select statement to exclude the email addresses that are not in the format of email and contains charachters that are not in the ASCII codes:

select ema_adr 
from my_table where ema_adr not in
(
select a11.ema_adr from ( select ema_adr, oreplace(trim(TRANSLATE(ema_adr USING LATIN_TO_UNICODE WITH ERROR)),'''','') as ema_adr_ver1, regexp_replace(ema_adr_ver1,'([\._-]|[0-9]|[a-z]|[A-Z]){1,50}@([\._-]|[0-9]|[a-z]|[A-Z]){1,50}[\.]([a-z]|[A-Z]){1,20}','',1,0,'i') as ema_adr_ver2, case when length(ema_adr_ver2)=0 then 1 else 0 end as ema_adr_ver_3 from my_table where ema_adr_ver_3 = 0 group by 1,2,3 )a11
group by a11.ema_adr
)

It seems it is working fine in Teradata, however when I create a view in Aster it is not working. 

I applied the same logic in Aster to create a view ( that is loading data from the Teradata view):

 

select 
 regexp_replace(ema_adr,'([\._-]|[0-9]|[a-z]|[A-Z]){1,50}@([\._-]|[0-9]|[a-z]|[A-Z]){1,50}[\.]([a-z]|[A-Z]){1,20}','','i') as ema_adr_ver1,
case
  when length(regexp_replace(ema_adr,'([\._-]|[0-9]|[a-z]|[A-Z]){1,50}@([\._-]|[0-9]|[a-z]|[A-Z]){1,50}[\.]([a-z]|[A-Z]){1,20}','','i'))=0
  then 1
  else 0
end as ema_adr_ver_2  
from Teradata_View
group by 1,2

But it is not working and I am getting the following error:

  

[AsterData][ASTERJDBCDSII](34) ERROR: string (byte length=5) contains non-UTF8 character (starting with byte = 0x0) at offset 0 ()

 

I really appreciate any help. 

 

Thank you,

Ardeshir

Teradata Employee

Re: Non UTF8 character issue in teradata to Aster

Skip Rows with Data Errors using SKIP_ERROR_RECORDS

When using load_from_teradata, the connector can encounter an error when parsing a row with data that is valid in Teradata but is not supported in Aster Database. For example, the Teradata database allows a CHAR column in a table to have null characters (characters with code 0). However, such data is not supported in CHAR columns in Aster. When running load_From_teradata on this table, the following error is returned:
Error: String (byte length=8) contains non-UTF8 character (starting with byte = 0x0)

Rows that produce this type of error can be bypassed by specifying the argument clause SKIP_ERROR_RECORDS('YES') in the load_from_teradata() function call.

 

Please review Aster Database user guide for further details on supported datatypes.

Re: Non UTF8 character issue in teradata to Aster

Thank you Margarete!