Difference between Create table Statements

Database
Highlighted
Enthusiast

Difference between Create table Statements

Hi All

I want to understand if there will be a difference in performace or on any front between

Any inputs will be of a great help. I am dealing with huge fact tables and many volatile tables to be created in a stored procedure.

 

1. Creating a Volatile table / permanent table first and then inserting into it

Ex:

STEP1:

CREATE VOLATILE TABLE #TABLE1,

NO FALLBACK, NO JOURNAL, NO LOG

(COLUMN1 DEF)

PRIMARY INDEX (COLUMN1)

ON COMMIT PRESERVE ROWS;

STEP2:

INSERT INTO #TABLE1 (

SELECT COLUMN1 FROM TABLE2

);

2. Creating a table on the fly using a SELECT statement results

CREATE VOLATILE TABLE #TABLE1,

NO FALLBACK, NO JOURNAL, NO LOG

AS (

SELECT COLUMN1 FROM TABLE2

) WITH DATA

PRIMARY INDEX (COLUMN1)

ON COMMIT PRESERVE ROWS;

 

Thanks !

AnandC

9 REPLIES
Junior Contributor

Re: Difference between Create table Statements

Hi AnandC,

there should be no difference in speed.

Dieter

Enthusiast

Re: Difference between Create table Statements

Hi Dieter,

I have experienced (lot of times) that if I follow the method 1 as AnandC mentioned then data gets inserted so fast than if I follow the method 2. I always tried with permanent tables though. I just tried with permanent table before writing this here, but still method 1 worked much faster than 2. I observed this especially with huge data.

Do you know how it could possible or something that I am missing ?

Thanks.

Junior Contributor

Re: Difference between Create table Statements

Did you check the actual resource usage in DBQL?

Whenever i did it was the same CPU/IOs for both versions.

Of course, there might be differences, but in most cases it could be explained by SET vs. MULTISET or different PIs because they forgot about it.

If you look at Explain, both are quite similar, but for permanentt tables there's one significant difference:

CREATE TABLE AS SELECT first merges the data and then creates the table header, so it's only visible when it's finished. That's why i prefer #1, you can see the table grow in dbc.TablesSizeV (or do a COUNT(*) with ACCESS LOCK) and you get better control over the DDL.

I usually prefer to do a CREATE TABLE AS SELECT WITH NO DATA, followed by a SHOW TABLE. Then i modify the DDL to my needs and use this.

I mainly do #2 for SPs with Dynamic SQL, when i really don't know about the definition in advance. 

And if there actually was a more efficient way to do the insert in one of the cases why use a slower implementation in the other case?

Dieter 


Enthusiast

Re: Difference between Create table Statements

Thanks very much for the infromation Dieter..

As you mentioned, I am going to use the CREATE tables in huge stored procedures and there are going to be atleast 50 volatile tables created and dropped when the session is ended. More over they are going to hold and process billions of data daily

I am going to choose OPTION # 2 which i was more inclining towards

Thanks !

Anand

Enthusiast

Re: Difference between Create table Statements

I am afraid to say that I do not get almost similar explain in both the cases.

Here is the example. I am trying to insert data from table 1 to table 2. Table 1 is a multiset table. There are around 90 million records in table 1.

Please take a look at the explain CREATE TABLE (SELECT ..). Please check the step 4.

 1) First, we lock a distinct DATABASE1."pseudo table" for read on a

     RowHash to prevent global deadlock for DATABASE1.TABLE1.

  2) Next, we lock DATABASE1.TABLE2 for exclusive use, and we

     lock DATABASE1.TABLE1 for read.

  3) We create the table header.

  4) We do an all-AMPs RETRIEVE step from DATABASE1.TABLE1 by way

     of an all-rows scan with no residual conditions into Spool 1

     (all_amps), which is redistributed by the hash code of (

     DATABASE1.TABLE1.COLUMN1) to all AMPs.  Then we do a SORT to

     order Spool 1 by row hash.  The input table will not be cached in

     memory, but it is eligible for synchronized scanning.  The result

     spool file will not be cached in memory.  The size of Spool 1 is

     estimated with high confidence to be 49,607,857 rows (

     4,563,922,844 bytes).  The estimated time for this step is 2

     minutes and 6 seconds.

  5) We do an all-AMPs MERGE into DATABASE1.TABLE2 from Spool

     1 (Last Use).  The size is estimated with high confidence to be

     49,607,857 rows.  The estimated time for this step is 1 second.

  6) We lock a distinct DBC."pseudo table" for read on a RowHash for

     deadlock prevention, we lock a distinct DBC."pseudo table" for

     write on a RowHash for deadlock prevention, we lock a distinct

     DBC."pseudo table" for write on a RowHash for deadlock prevention,

     and we lock a distinct DBC."pseudo table" for write on a RowHash

     for deadlock prevention.

  7) We lock DBC.Indexes for write on a RowHash, we lock DBC.DBase for

     read on a RowHash, we lock DBC.TVFields for write on a RowHash, we

     lock DBC.TVM for write on a RowHash, and we lock DBC.AccessRights

     for write on a RowHash.

  8) We execute the following steps in parallel.

       1) We do a single-AMP ABORT test from DBC.DBase by way of the

          unique primary index.

       2) We do a single-AMP ABORT test from DBC.TVM by way of the

          unique primary index.

       3) We do an INSERT into DBC.Indexes (no lock required).

       4) We do an INSERT into DBC.TVFields (no lock required).

       5) We do an INSERT into DBC.TVFields (no lock required).

       6) We do an INSERT into DBC.TVFields (no lock required).

       7) We do an INSERT into DBC.TVFields (no lock required).

       8) We do an INSERT into DBC.TVFields (no lock required).

       9) We do an INSERT into DBC.TVFields (no lock required).

      10) We do an INSERT into DBC.TVFields (no lock required).

      11) We do an INSERT into DBC.TVFields (no lock required).

      12) We do an INSERT into DBC.TVFields (no lock required).

      13) We do an INSERT into DBC.TVFields (no lock required).

      14) We do an INSERT into DBC.TVFields (no lock required).

      15) We do an INSERT into DBC.TVFields (no lock required).

      16) We do an INSERT into DBC.TVFields (no lock required).

      17) We do an INSERT into DBC.TVFields (no lock required).

      18) We do an INSERT into DBC.TVFields (no lock required).

      19) We do an INSERT into DBC.TVFields (no lock required).

      20) We do an INSERT into DBC.TVFields (no lock required).

      21) We do an INSERT into DBC.TVFields (no lock required).

      22) We do an INSERT into DBC.TVFields (no lock required).

      23) We do an INSERT into DBC.TVFields (no lock required).

      24) We do an INSERT into DBC.TVFields (no lock required).

      25) We do an INSERT into DBC.TVFields (no lock required).

      26) We do an INSERT into DBC.TVFields (no lock required).

      27) We do an INSERT into DBC.TVFields (no lock required).

      28) We do an INSERT into DBC.TVFields (no lock required).

      29) We do an INSERT into DBC.TVFields (no lock required).

      30) We do an INSERT into DBC.TVFields (no lock required).

      31) We do an INSERT into DBC.TVFields (no lock required).

      32) We do an INSERT into DBC.TVFields (no lock required).

      33) We do an INSERT into DBC.TVM (no lock required).

      34) We INSERT default rights to DBC.AccessRights for

          DATABASE1.TABLE2.

  9) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> No rows are returned to the user as the result of statement 1.

Now, tTake a look at the exlain CREATE TABLE... INSERT INTO...

  1) First, we lock a distinct DATABASE1."pseudo table" for write on a

     RowHash to prevent global deadlock for DATABASE1.TABLE2.

  2) Next, we lock a distinct DATABASE1."pseudo table" for read on a

     RowHash to prevent global deadlock for DATABASE1.TABLE1.

  3) We lock DATABASE1.TABLE2 for write, and we lock

     DATABASE1.TABLE1 for read.

  4) We do an all-AMPs MERGE into DATABASE1.TABLE2 from

     DATABASE1.TABLE1.  The size is estimated with no confidence

     to be 9,361,440 rows.  The estimated time for this step is 1

     second.

  5) We spoil the parser's dictionary cache for the table.

  6) Finally, we send out an END TRANSACTION step to all AMPs involved

     in processing the request.

  -> No rows are returned to the user as the result of statement 1.

Teradata Employee

Re: Difference between Create table Statements

The only difference is that .... in option 1 you have a leverage to tweak/select (correct) index.

I have seen scenarios where option 2 takes hours to process data .... and if you go with option 1 and correct selection of index .... it will just take a matter of seconds.

The latest occurance of the same happened at one of my clients yesterday. :)

HTH!

Junior Contributor

Re: Difference between Create table Statements

Adeel is right,

check the PIs of both table, they are different, in the second case it was exactly the same as the source table :-)

A wrong PI or default SET instead of MULTISET are the common reasons for bad performance.

Dieter

Enthusiast

Re: Difference between Create table Statements

That is true the PI of second case is the same as the table as I took the definition of table1 to create table2.

However, I thought in the option 1 when you do CREATE TABLE AS SELECT, the new table would get the exact same PI as the table you are creating from. So in both the options, newly created table should have the same PI. But I guess, that is not the case with option 1 and now it makes sense why option 1 is taking longer in some cases.

Thank you both for the clarification.
Junior Contributor

Re: Difference between Create table Statements

Which PI should the optimizer choose when you write a multi-table SELECT?

When you do a CREATE TABLE AS SELECT you will loose all index info (probably resulting in a NUPI on the first column) and a column's NOT NULL definition, plus SET/MULTISET is reset to the session default (SET for Teradata, MULTISET for ANSI sesssions).

Dieter