SELECT RANDOM RECORDS from big table

Database
Enthusiast

SELECT RANDOM RECORDS from big table

Hi Experts ,

I want to select 10000 random records from one big table with 8 Billion records and insert data with another table with same structure(PI).

I have tried TOP 10000 AND SAMPLE 10000 .

Both the cased explain plan shows optimizer redistributes + sorts (8 Billion records) on PI columns then get 10000 records . :(

Each time query fails with Out of spool space error .

Please help.
4 REPLIES
Senior Apprentice

Re: SELECT RANDOM RECORDS from big table

Hi Arint,

can you show your actual query and an Explain of your query?

Both SAMPLE and TOP (without ORDER BY) should not redistribute and/or sort.

Enthusiast

Re: SELECT RANDOM RECORDS from big table

Hi Dieter . Thanks a lot for your quick response . I have pasted the plans below .

 Please pardon me if i misunderstand the query plan . thanks again .

 Explain with TOP 10000(without order by )

  1) First, we lock a distinct SOURCE_DATABASE."pseudo table"

      for read on a RowHash to prevent global deadlock for

      SOURCE_DATABASE.SOURCE_TABLE.

   2) Next, we lock a distinct TARGET_DATABASE."pseudo table" for

      write on a RowHash to prevent global deadlock for

      TARGET_DATABASE.SOURCE_TABLE_06nov.

   3) We lock SOURCE_DATABASE.SOURCE_TABLE for read,

      and we lock TARGET_DATABASE.SOURCE_TABLE_06nov for

      write.

   4) We do an all-AMPs STAT FUNCTION step from

      SOURCE_DATABASE.SOURCE_TABLE by way of an

      all-rows scan with no residual conditions into Spool 5 (Last Use),

      which is assumed to be redistributed by value to all AMPs.  The

      result rows are put into Spool 3 (all_amps) (compressed columns

      allowed), which is built locally on the AMPs.  The size is

      estimated with high confidence to be 8,419,707,707 rows (

      5,169,700,532,098 bytes).

   5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of

      an all-rows scan with a condition of ("Field_118 <= 1000") into

      Spool 1 (all_amps) (compressed columns allowed), which is

      redistributed by the hash code of (

      SOURCE_DATABASE.SOURCE_TABLE.COL1,

      SOURCE_DATABASE.SOURCE_TABLE.COL2,

      SOURCE_DATABASE.SOURCE_TABLE.COL3) to all

      AMPs.  Then we do a SORT to order Spool 1 by row hash.  The result

      spool file will not be cached in memory.  The size of Spool 1 is

      estimated with high confidence to be 8,419,707,707 rows (

      5,068,664,039,614 bytes).  The estimated time for this step is 4

      hours and 6 minutes.

   6) We do an all-AMPs MERGE into

      TARGET_DATABASE.SOURCE_TABLE_06nov from Spool 1 (Last

      Use).  The size is estimated with high confidence to be

      8,419,707,707 rows.  The estimated time for this step is 76 hours

      and 30 minutes.

   7) We spoil the parser's dictionary cache for the table.

   8) Finally, we send out an END TRANSACTION step to all AMPs involved

      in processing the request.

   -> No rows are returned to the user as the result of statement 1.

Explain with sample 10000(without order by )

  1) First, we lock a distinct SOURCE_DATABASE."pseudo table"

      for read on a RowHash to prevent global deadlock for

      SOURCE_DATABASE.SOURCE_TABLE.

   2) Next, we lock a distinct TARGET_DATABASE."pseudo table" for

      write on a RowHash to prevent global deadlock for

      TARGET_DATABASE.SOURCE_TABLE_06nov.

   3) We lock SOURCE_DATABASE.SOURCE_TABLE for read,

      and we lock TARGET_DATABASE.SOURCE_TABLE_06nov for

      write.

   4) We do an all-AMPs SAMPLING step from

      SOURCE_DATABASE.SOURCE_TABLE by way of an

      all-rows scan with no residual conditions into Spool 1 (all_amps)

      (compressed columns allowed), which is redistributed by the hash

      code of (

      SOURCE_DATABASE.SOURCE_TABLE.COL1,

      SOURCE_DATABASE.SOURCE_TABLE.COL2,

      SOURCE_DATABASE.SOURCE_TABLE.COL3) to all

      AMPs.  Then we do a SORT to order Spool 1 by row hash.  Samples

      are specified as a number of rows.

   5) We do an all-AMPs MERGE into

      TARGET_DATABASE.SOURCE_TABLE_06nov from Spool 1 (Last

      Use).  The size is estimated with high confidence to be

      8,419,707,707 rows.  The estimated time for this step is 76 hours

      and 30 minutes.

   6) We spoil the parser's dictionary cache for the table.

   7) Finally, we send out an END TRANSACTION step to all AMPs involved

      in processing the request.

   -> No rows are returned to the user as the result of statement 1.

Senior Apprentice

Re: SELECT RANDOM RECORDS from big table

Hi Arint,

the SAMPLE is not sorting and it redistributes the resulting rows only (the estimated number of rows is just misleading).

The TOP indicates you're running an older version of Teradata where this was poorly optimized.

Better go with SAMPLE :)

Enthusiast

Re: SELECT RANDOM RECORDS from big table

Thanks a lot Dieter . You are right as always . I am working on old TD 12.00 .  :P . Will go with your suggestion .