what country data exists in Unicode column

Database
KN
Enthusiast

what country data exists in Unicode column

Usually we define table (typicallu  Column level Unicode Char set ).. We know those columns have multibyte data..

However how do i know what language data exists in those Unicode defined columns? Is  there any UDF's/functions to find it out?

example

a Table has 100 rows

10 rows - chinese data

10 rows - Japanese Data

10 rows- Korean data

10 rows german data

20 rows spanish data

2 REPLIES 2
Teradata Employee

Re: what country data exists in Unicode column

Hi.

 

No easy way. I'd suggest looking for specific characters for a language in some CASE structure ... (for example: 'ñ' for spanish, 'ß' for german and so on...) but it is not bulletproof, since german and spanish share most of the characters. Maybe this approach will fit for the others. Anyway, the pattern characters must exists in the text to work, so you have to choose them carefully.

 

This could also work if you search for common words instead of characters.  

 

Better solution would be adding a column to the table with the language of the text (IMHO).

 

HTH.

 

Cheers.

 

Carlos.

 

Highlighted
Teradata Employee

Re: what country data exists in Unicode column

You need to use Unicode regular expressions (e.g., ...\p{chinese}...). The regular expression engine could be one of the regexp_* SQL functions, or a shell grep initiated with a script table operator.

 

For more details on the syntax :

https://www.regular-expressions.info/refunicode.html