Identifying multiple capital letters in a row in a larger string

Database
New Member

Identifying multiple capital letters in a row in a larger string

Any help would be much appreciated on figuring out how to identify Acronyms within a text field that has mixed upper and lower case letters.
For example, we might have "we used the BBQ sauce on the Chicken" I need my query to select "BBQ" and nothing else in the cell.

Additionally, There could be multiple capitalized string per row and The output should include the uppercase string.

It's a fairly small dataset, so even an imperfect list would be a huge improvement of current process.

Any ideas are much appreciated!!
1 REPLY
Junior Contributor

Re: Identifying multiple capital letters in a row in a larger string

Copied from the Stackoverflow answer:

 

Defining acronyms as all uppercase words with 2 to 5 characters with a '\b[A-Z]{2,5}\b' regex:

WITH cte AS
( -- using @Andrew's Volatile Table 
SELECT * 
FROM vt
-- only rows containing acronyms
WHERE RegExp_Similar(col1, '.*\b[A-Z]{2,5}\b.*') = 1 
)
SELECT
outkey,
tokenNum,
CAST(RegExp_Substr(Token, '[A-Z]*') AS VARCHAR(5)) AS acronym -- 1st uppercase word 
--,token
FROM TABLE
( RegExp_Split_To_Table
( cte.id,
cte.col1,
-- split before an acronym, might include additional characters after
-- [^A-Z]*? = any number of non uppercase letters (removed)
-- (?= ) = negative lookahead, i.e. check, but don't remove
'[^A-Z]*?(?=\b[A-Z]{2,5}\b)',
'' -- defaults to case sensitive
) RETURNS
( outKey INT,
TokenNum INT,
Token VARCHAR(30000) -- adjust to match the size of your input column 
)
) AS t
WHERE acronym <> ''