Partial String Matching In Teradata SQL

Database

Partial String Matching In Teradata SQL

Hi all,

I have 2 tables in Teradata. Some of the contents of the tables are shown as follows :

Table A

Names

-----------

XYZ Limited

Mr John Smith

WWW Charity Foundtion

XYZ Co

Mr John Smith Charity Foundation

Table B

Keyword                                   Entity

--------------                              ------------------

Limited                                    COMPANY

Charity                                    CHARITY

Mr                                           INDIVIDUAL

Co                                           COMPANY

I need to map the names present in Table A to the entity  in Table B based on the keyword search in Table B.

So my output would become 

Names                                                        Entity

-----------                                                    -----------

XYZ Limited                                                COMPANY

Mr John Smith                                            INDIVIDUAL

WWW Charity Foundtion                             CHARITY

XYZ Co                                                       COMPANY

Mr John Smith Charity Foundation              CHARITY

The following points are to be adhered while doing the keyword search.

  1. The search is case insensitive
  2. The precedence of search is COMPANY followed by CHARITY followed by INDIVIDUAL. So Mr John Smith Charity Foundation is classified as a charity even though it has "Mr" in it
  3. The search should be for whole words only. Words are considered to space delimited. So Mr John Smith is INDIVIDUAL wheras MrJohnSmith or Mr,John,Smith are UNKNOWN. Similary XYZ Co is a COMPANY whereas ZICo is UNKNOWN.

Could you please guide me on how to write a SQL statement in Teradata to get the above Output. I cannot use a Procedure or a UDF since I do not have the access to the create one. A UDF would have been best in this case but my hands are tied.

It would be helpful if you could help me with the solution.

Thanks and Regards,

Prithwish

3 REPLIES
Enthusiast

Re: Partial String Matching In Teradata SQL

Try this..

create volatile table tableA (names varchar(50))on commit preserve rows;

insert into tableA values ('XYZ Limited');
insert into tableA values ('Mr John Smith');
insert into tableA values ('WWW Charity Foundtion');
insert into tableA values ('ABC limited & co');
insert into tableA values ('Mr. limited co charity');
insert into tableA values ('Mr. co charity');
insert into tableA values ('Mr.cocharity');

sel * from tableA;

create volatile table tableB (keyword varchar(50), entity varchar(50))on commit preserve rows;

insert into tableB values ('Limited','COMPANY');
insert into tableB values ('Charity','CHARITY');
insert into tableB values ('Mr','INDIVIDUAL');
insert into tableB values ('Co','COMPANY');

sel * from tableB;

sel names,
case when names like any ('% Limited%' or '% Limited %' or '%Limited %') then 'Company'
when names like any ('% Charity%' or '% Charity %' or '%Charity %') then 'Charity'
when names like any ('% Mr%' or '% Mr %' or '%Mr %') then 'Individual'
when names like any ('% Co%' or '% Co %' or '%Co %') then 'Company'
else 'unknown'
end as entity from tableA, tableB
group by 1,2;

Re: Partial String Matching In Teradata SQL

Hi Sakti.

These values are just a subset of millions of values. So writing a case statement will not be possible. Also "%Co %" would treat ZiCo Ferenandes as a Company which shouldm not be the case.

In Oracle the following could be done by

select * from (  
select
names,
min(entity) over (partition by names order by decode(entity,'Company','1','Charity','2','Individual','3','4')) entity
from tableA join tableB on (
regexp_like(tableA.names, '(\W|^)' || tableB.keyword || '(\W|$)', 'i')
)
)
group by names, entity
;

But this doesn't work in Teradata.

Enthusiast

Re: Partial String Matching In Teradata SQL

Can you please explain this regexp_like(tableA.names, '(\W|^)' || tableB.keyword || '(\W|$)', 'i') part in your query?