Hi, I am new to SQL and to Teradata so I am sorry if this is a stupid question. I studied real-time programming in school using C. Recently I saw a SQL statement in a BTEQ that didn't sit right with me. Unfortunately I don't have access to Teradata at work. I will have to wait until I get home to play with it but I can't find any fault in my logic. So here goes.
The purpose of the BTEQ is to extract data and it uses a select statement. It extracts a lot of columns from many tables in a large database, so I think performance is an issue that should be considered. Just a short example would be:
select col1, col2, case when substr(col3,1,2) in ('AA','BB','CC') then 1 when substr(col3,1,2) in ('AB','BC','CD') then 2 when substr(col3,1,2) in ('AC','BD','CE') then 3 from table1 where ...
The thing that I don't like is that it appears to call substr 3 times in my example for the millions of rows in the table. In the real script it is about 10 times and a similar proccess is repeated for a few columns.
So, finally my question, what is the performance of an alias? How is it handled? Here is my idea:
select col1, col2, case substr(col3,1,2) as temp when temp in ('AA','BB','CC') then 1 when temp in ('AB','BC','CD') then 2 when temp in ('AC','BD','CE') then 3 from table 1 where ...
I did some research about case statement. Apparently it can act like a case in C that I am used to where you have a variable and check if it is equal to things; or in SQL it can evaluate a boolean expression every time.
So, is it possible to only call substr once and store it in a place holder? Or am I thinking to much in C and an SQL alias is just to pretty up the code but only replaces the alias when it is parsed? Or maybe it is smart enough that it only executes the substr once in the original statement, like how you can't reference column aliases in group by.
All I can find online about aliases is that they help clean up the code by not making you type everything out repeatedly, or rename a column when it is output.
The optimizer should be smart enough not to re-calculate the same substring repeatedly. If you look at explain you will see (on TD12) that the optimizer just replaces the alias with the base calculation, so an alias is just conveniant, nothing else than a search&replace done in a text editot.
In Teradata an alias can be used in any place, so "total" can be used in HAVING, but this is an SQL extension which is AFAIK only available in Access, too.
Your "idea" query is not valid SQL but using an old Teradata extension you could write: case when substr(col3,1,2) (named temp) in ('AA','BB','CC') then 1 when temp in ('AB','BC','CD') then 2 when temp in ('AC','BD','CE') then 3
According to ANSI SQL you need a Derived Table: select case when temp in ('AA','BB','CC') then 1 when temp in ('AB','BC','CD') then 2 when temp in ('AC','BD','CE') then 3 ... from (select substr(col3,1,2) as temp from ...) as dt
The optimizer will probably resolve that nested statement and procude the same plan.