How to get string containing characters that are latin encoded and between 127 to 255

Database
Enthusiast

How to get string containing characters that are latin encoded and between 127 to 255

Hi All,

We have trouble in eliminating characters that were wrongly encoded and loaded into database.

  • Source file being in UTF-8, we have loaded the data by reading the data in Latin & writing to Latin encoded Teradata columns.

This has resulted in special characters seen in data like (These are not all! we just dont know what else can be termed as so!) :

('%¿%', '%É%', '%…%', '%é%', '%á%','%ñ%', '%ú%', '%ì%', '%ó%', '%ñ%', '%ó%','%¿%')

  • Now, i want to identify these kind of special characters & thus had a look at Latin encoding. In latin, 00-255 are the characters that can be encoded & i see 0-127 having english like characters.

Said that, i want to see all the strings in a column which has characters that is latin encoded with value >127.

Thanks,

Ranjan

Tags (1)
5 REPLIES
Teradata Employee

Re: How to get string containing characters that are latin encoded and between 127 to 255

It sounds like UTF8 data has been loaded into a Latin column, probably by using the ASCII session to pass the characters through without error. Removing the non-ASCII characters will result in data loss. Instead, you can use the unicode toolkit UDFs to convert the UTF8 in a latin column to the Unicode server character set which is encoded in UTF-16. See:

http://downloads.teradata.com/download/tools/unicode-tool-kit

-Dave

Enthusiast

Re: How to get string containing characters that are latin encoded and between 127 to 255

Thanks David! But, i have other kind of character encoding/decoding done during the time of file movement until the time what the final data has been loaded to table.

For now, my goal is to identify an string in a column of a table with Latin code points >127.

How can i query and get this result set?

Enthusiast

Re: How to get string containing characters that are latin encoded and between 127 to 255

For now, my goal is to identify ANY string in a column of a table with Latin code points >127.

How can i query and get this result set?

Teradata Employee

Re: How to get string containing characters that are latin encoded and between 127 to 255

REGEXP_SUBSTR should solve the problem.

Enthusiast

Re: How to get string containing characters that are latin encoded and between 127 to 255

A sample query will help indeed!

Thanks,

Ranjan