How to Write the correct syntax for Join

Database

How to Write the correct syntax for Join

q) Use COUNT and DISTINCT to determine how many distinct skus there are in the skuinfo, skstinfo, and trnsact tables. Which skus are common to all tables, or unique to specific tables?

A) I was trying to find the solution to the above q in TERA DATA. The first part was simple i was able to run three commands and got the distinct skus (stock keeping unit).

Now to find the common sku in all three table i am running the command but having errors :

SELECT COUNT(DISTINCT a.sku),COUNT(DISTINCT b.sku),COUNT(DISTINCT c.sku)
FROM skuinfo a INNER JOIN skstinfo b INNER JOIN trnsact c
ON a.sku=b.sku AND b.sku=c.sku;

**Error Occurred . . .**
[com.teradata.commons.datatools.sqlparsers.common.ParseException: Encountered ";" at line 3 column 31. Was expecting one of: "and" ... "at" ... "cross" ... "day" ... "full" ... "hour" ... "inner" ... "join" ... "left" ... "minute" ... "month" ... "on" ... "or" ... "right" ... "second" ... "timezone_hour" ... "timezone_minute" ... "year" ... "||" ... "(" ... "**" ... "+" ... "-" ... "*" ... "/" ... "mod" ... "." ... "[" ... ]

Why is there also an error if i use **Where** in Place of ON? Please let me know why is my syntax not working, though most forums say the query is right. Thanks



Tags (1)
8 REPLIES
Senior Apprentice

Re: How to Write the correct syntax for Join

Nobody will tell that this query is right (otherwise they don't know about SQL).

You join three tables but the second ON is missing.

And when you explicit join syntax the ON is mandatory, so putting the conditions in WHERE instead of ON must fail.

Re: How to Write the correct syntax for Join

Hi Dieter,

Thanks for the reply but i have the condition by AND statement.

I modified the query:-

SELECT COUNT(DISTINCT a.sku),COUNT(DISTINCT b.sku),COUNT(DISTINCT c.sku)

FROM skuinfo a INNER JOIN skstinfo b

ON a.sku=b.sku,

skstinfo b INNER JOIN trnsact c

ON  b.sku=c.sku;

But the Error persists

Error Code - 3870 

Error Message - [Teradata Database] [TeraJDBC 15.10.00.05] [Error 3870] [SQLState HY000] Alias name cannot match another table/alias name in FROM clause.

Senior Apprentice

Re: How to Write the correct syntax for Join

That's the same query you already got on StackOverflow:

SELECT COUNT(DISTINCT a.sku),COUNT(DISTINCT b.sku),COUNT(DISTINCT c.sku)
FROM skuinfo a
JOIN skstinfo b
ON a.sku=b.sku
JOIN trnsact c
ON b.sku=c.sku;

You should start learning basic SQL syntax first.

But it's not a good/correct way to get what you want. 

Re: How to Write the correct syntax for Join

Hi Dieter,

I have a question that is related to the question posed by rahul1987 (see above, 4 weeks ago); i.e., it arises from a large Teradata exercise that's provided by a 5-week SQL course offered by Coursera / Duke University.   My request for your help comes about because: (1) the course does not provide a solution to the exercise, and (2) because the query below resulted in the following Teradata error:

     Error Code - 2646

     Error Message - [Teradata Database] [TeraJDBC 15.10.00.05] [Error 2646] [SQLState HY000] No more spool space in DUKESQLMOOC3335.

Please note the “spool space” problem does not occur when I apply only 2 INNER JOINs (in fact, the query runs very efficiently).  However, the error occurs when I add a third INNER JOIN.

Here's the exercise question and the query I wrote to try to answer it:

Q1: Which stores are COMMON to the following tables: strinfo, store_msa, skstinfo, and trnsact?

SELECT DISTINCT s.store

FROM strinfo s INNER JOIN store_msa m ON m.store=s.store

                       INNER JOIN skstinfo k ON k.store=s.store

                       INNER JOIN trnsact t ON t.store=s.store

ORDER BY s.store ASC;

Interestingly, the following query, which has 3 LEFT JOINs and involves the same 4 tables, works very efficiently (i.e., in approximately 7 seconds).

Q2: Which stores are UNIQUE to the following tables: strinfo, store_msa, skstinfo, and trnsact?

SELECT DISTINCT s.store

FROM strinfo s LEFT JOIN store_msa m ON m.store=s.store

                      LEFT JOIN skstinfo k ON k.store=s.store

                      LEFT JOIN trnsact t ON t.store=s.store

WHERE m.store IS NULL AND k.store IS NULL AND t.store IS NULL

ORDER BY s.store ASC;

Any suggestions on where I’ve gone wrong or on how I can improve the above two queries will be much appreciated.

Best,

Gary

Enthusiast

Re: How to Write the correct syntax for Join

Hi,

The first thing is spool space issue occurs mostly when there is a product join in the query. As you are joining the store table with Transactions table, it may go for a product join. In your case, it seems that joining columns contain null values, if you add the below condition to first query, it will fix the issue.

WHERE m.store IS NOT NULL AND k.store IS NOT NULL AND t.store IS NOT NULL

You can check the explain plan to see what join strategy Teradata is using to resolve these inner joins.

To get the explain plan, write explain before the query e.g. explain select * from table;

in the second query, you have added a where condition with IS null condition, there may be very small data set left after the where condition is applied so this join works faster.

Khurram

Re: How to Write the correct syntax for Join

Hi Khurram,

Thank you for your comments - I have a feeling you're correct regarding the product join creating problems for the query.  As suggested, I included the WHERE clause; however, I encountered the same Teradata error message (i.e., 2646).  Here's the revised query:

SELECT DISTINCT s.store

FROM strinfo s INNER JOIN store_msa m ON m.store=s.store

                       INNER JOIN skstinfo k ON k.store=s.store

                       INNER JOIN trnsact t ON t.store=s.store

WHERE m.store IS NOT NULL AND k.store IS NOT NULL AND t.store IS NOT NULL

ORDER BY s.store ASC;

The output from placing the EXPLAIN command before SELECT in the above query indicates very long times are required to process parts of the query:

7) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an


all-rows scan, which is joined to Spool 3 (Last Use) by way of an


all-rows scan. Spool 2 and Spool 3 are joined using a hash join


of 4 partitions, with a join condition of ("STORE = STORE"). The


result goes into Spool 5 (all_amps), which is built locally on the


AMPs. The result spool file will not be cached in memory. The


size of Spool 5 is estimated with no confidence to be


433,162,790,291 rows (9,096,418,596,111 bytes). The estimated time for this step is 1 hour and 54 minutes. 

 


8) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an


all-rows scan, which is joined to Spool 5 (Last Use) by way of an


all-rows scan. Spool 4 and Spool 5 are joined using a product


join, with a join condition of ("(STORE = STORE) AND ((STORE =


STORE) AND ((STORE = STORE) AND (STORE = STORE )))"). The result


goes into Spool 1 (group_amps), which is redistributed by the hash


code of (UA_DILLARDS.s.STORE) to all AMPs. Then we do a SORT to


order Spool 1 by the sort key in spool field1 eliminating


duplicate rows. The result spool file will not be cached in


memory. The size of Spool 1 is estimated with no confidence to be


270 rows (6,750 bytes). The estimated time for this step is 16 hours and 10 minutes.


9) Finally, we send out an END TRANSACTION step to all AMPs involved


in processing the request.


-> The contents of Spool 1 are sent back to the user as the result of


statement 1. The total estimated time is 18 hours and 4 minutes.

At this point, it looks like my attempt to answer the course exercise question is correct - I can see nothing that I did incorrectly in assembling the query (particularly after including the suggested WHERE clause).  Perhaps the data processing procedures involved with three (3) INNER JOINs are causing the query to exceed the computational resources of Coursera's Teradata platform.

Again thank you, Khurram, for your input.

Best regards,

Gary

Teradata Employee

Re: How to Write the correct syntax for Join

Think about what the query you wrote is doing. You are joining each row from the first table with all rows in the second table that have the same Store, then doing it again with the third table and fourth table. In your EXPLAIN above, after step 7 you may have generated nearly half a trillion rows, and in step 8 you will multiply by yet another factor before duplicates are removed (and a very small number of rows returned).

Using DISTINCT to remove duplicates is not good design; it's better not to introduce unwanted duplicates in the first place:

SELECT s.store

FROM (select distinct store from strinfo) s /* If store is already unique in this table, you would not need the subquery */

INNER JOIN (select distinct store from store_msa) m ON m.store = s.store

INNER JOIN (select distinct store from skstinfo) k ON k.store = s.store

INNER JOIN (select distinct store from trnsact) t on t.store = s.store

order by s.store;

Re: How to Write the correct syntax for Join

Nice analysis and explanation.  Thank you Fred.  I'll keep your query design comment regarding DISTINCT in mind for the future.  Being new to SQL, I'm learning new things every day.

Best,

Gary