I understand the query would work if only it had aggregate functions in the sub-query. But in this case there is no aggregate function used and as I understand the inner query will give multiple values which needs to be compared with PROC1-10.
Note that the sub-query is correlated, it is joining column PROC1 in the main table (t1 in my example code) to the CODE column in the sub-query table (USR_RACSTER.CHL_2017).
To use your sample data values:
We still need the MAX function to ensure that the sub-query only returns a single row. if there were multiple rows in the sub-query table that contained the same value as PROC1 this would fail if we didn't have the MAX function.
Another way to look at this may be to think about the original logic. Taking a snippet of Laura's original code:
WHEN PROC1 IN (SELECT CODE FROM USR_RACSTER.CHL_2017 WHERE VALUE_SET_NAME IN('Chlamydia Tests') AND CODE_SYSTEM IN ('CPT','HCPCS'))
Just looking at the sub-query, this could return a list of values.
However, only one of those values can match to the value in PROC1 (obviously).
So what is actually being asked is
"if the value in PROC1 is in one or more rows in the "USR_RACSTER.CHL_2017" table rows then "WHERE VALUE_SET_NAME IN('Chlamydia Tests')" then this is a match.
What I've done is code a join in the sub-query which forces that match to happen.
As I said above, we still need the MAX function because we need to ensure that the sub-query returns a single value (one column, one row).
I completely agree with your solution if the query is a correlated sub-query. I think there is a confusion here. Laura's first post doesnt show "AND code = a.Proc1" while later again in the later post we could see this join. So if later post by Laura is correct then Dave's solution should work. We need to figure out which is the correct query. The one at the first post or the later one. My solution is totally based on the first post. Hey Laura, can you confirm on this?
I added the "code = t1.proc1" so that:
- I could make it a correlated sub-query, so that
- I could use this TD feature , so that
- I can convert Laura's original SQL to something valid in TD which will give the correct results.
You are correct that this depends on the original functionality, that is (obviously) what we need to replicate. I believe that my solution using the correlated sub-query approach does that.
I'm onsite wth a customer today but I'll see if I can create some sample data and then post that along with the results of my processing.
The following (I think) demonstrates that the user of correlated scalar sub-queries (CSSQ) works for this process.
This example simply looks at columns PROC1 and PROC2. If it gives the correct answer then just extend it for columns PROC3 - PROCn.
-- Create a couple of tables to mimic the relevant parts of the originals -- Column 'seqnbr' is simply an identifier for each row in the main table so that we can validate results DROP TABLE vt1; CREATE SET VOLATILE TABLE vt1 (seqnbr INTEGER NOT NULL ,proc1 INTEGER ,proc2 INTEGER ,code_system CHAR(10) ) UNIQUE PRIMARY INDEX(seqnbr) ON COMMIT PRESERVE ROWS; CREATE SET VOLATILE TABLE CHL_2017 (VALUE_SET_NAME CHAR(30) ,code INTEGER )ON COMMIT PRESERVE ROWS;
Now load some simple data - but based on the original post I think it is acceptable.
DELETE FROM vt1; INSERT INTO vt1 VALUES(1,10,100,'CPT'); INSERT INTO vt1 VALUES(2,5,101,'CPT'); INSERT INTO vt1 VALUES(3,5,200,'CPT'); DELETE FROM chl_2017; INSERT INTO chl_2017 VALUES('Chlamydia Tests',10); INSERT INTO chl_2017 VALUES('Chlamydia Tests',20); INSERT INTO chl_2017 VALUES('Chlamydia Tests',100); INSERT INTO chl_2017 VALUES('Chlamydia Tests',200);
Now run the following query that uses the CSSQ feature and the result using the above data:
SEL t1.* ,(CASE WHEN proc1 = (SELECT MAX(code) FROM chl_2017 WHERE code = t1.proc1 AND VALUE_SET_NAME IN('Chlamydia Tests')) AND CODE_SYSTEM IN ('CPT','HCPCS') THEN '1' WHEN proc2 = (SELECT MAX(code) FROM chl_2017 WHERE code = t1.proc2 AND VALUE_SET_NAME IN('Chlamydia Tests')) AND CODE_SYSTEM IN ('CPT','HCPCS') THEN '1' ELSE '0' END) AS ChlamScreen FROM vt1 AS t1 ORDER BY 1;
Using my sample data this gives:
seqnbr proc1 proc2 code_system ChlamScreen 1 10 100 CPT 1 2 5 101 CPT 0 3 5 200 CPT 1
Laura, given my sample data, is this the result that you want?
Having a look at multiple possible solutions is a good idea.
I understand that "The sub query will return multiple rows", but for any single row in the table containing column PROC1, that column can only contain a single value (assuming it isn't an ARRAY data type), so you only need one value returned from the sub-query.
Over to you.
Dave's solution is a exact rewrite of the logic, but when you look at Explain it will be horrible.
Btw, the logic in SQL Server is way too complicated, this should work, too:
( SELECT CODE FROM USR_RACSTER.CHL_2017 WHERE VALUE_SET_NAME IN('Chlamydia Tests')
AND CODE_SYSTEM IN ('CPT','HCPCS')
) IN (Proc1, Proc2, Proc3, ...) THEN '1' ELSE'0' END
The bad news is, due to unknown reasons Teradata doesn't allow column names or calculations for IN :-(
IMHO a single Left Join based on rewriting IN using OR provides the same result and should have good performance:
SELECT DISTINCT -- distinct is needed because there might be multiple matches
t1.*, CASE WHEN chl.code IS NOT NULL THEN '1' ELSE '0' END FROM vt1 AS t1 LEFT JOIN chl_2017 AS chl ON (chl.code = t1.proc1 OR chl.code = t1.proc2 OR
...) AND chl.VALUE_SET_NAME IN ('Chlamydia Tests') AND chl.CODE_SYSTEM IN ('CPT','HCPCS')