Way to optimize query for bigdata processing

Database

Way to optimize query for bigdata processing

Hi,

I need to process data based on the following query.

SELECT MAX(id)

    FROM (

    SELECT TOP 1000000 id

        FROM table1

        WHERE idBETWEEN 1100101 AND 1100228

            AND  id> 23434

        ORDER BY id) AS t;

output : 445544

The output of this query will replace the id value '23434' and the query will again be executed.

SELECT MAX(id)

    FROM (

    SELECT TOP 1000000 id

        FROM table1

        WHERE idBETWEEN 1100101 AND 1100228

            AND  id> 445544

        ORDER BY id) AS t;


This will be executed around 200 times based on the number of records.

But this is taking a lot of time when the record count is very high.

Any way i can optimize this query so that i get a better response time.

The explain plan is as below.

Next, we do an All-AMPs RETRIEVE step from table1 accessing 59 partitions into Spool 4427, which is built locally on the AMPs. 3

ACTIVE We do an All-AMPs STAT FUNCTION step from Spool 4427 (Last Use) by way of an all-rows scan into Spool 4430, which is assumed to

 be redistributed by value to all AMPs. The result rows are put into Spool 4428, which is redistributed by hash code to all AMPs. 

4 We do an All-AMPs RETRIEVE step from Spool 4428 (Last Use) by way of an all-rows scan into Spool 4426, which is built locally on the AMPs. 

5 We do a SUM step to aggregate from Spool 4426 (Last Use) by way of an all-rows scan. Aggregate Intermediate Results are computed globally, 

then placed in Spool 4435. 

6 We do an All-AMPs RETRIEVE step from Spool 4435 (Last Use) by way of an all-rows scan into Spool 4433, which is built locally on the AMPs.

Need to find away in which the data can be retained in spool and the next query can be fired with only the top and max .

thanks


1 REPLY
Junior Contributor

Re: Way to optimize query for bigdata processing

As you access two months of a table partitioned by day why don't you simply process all the rows for a single day = 59 batches?

This will return all batches in one query, but it might not be the same ids as your iterative process:

SELECT id
FROM table1
WHERE date_col BETWEEN 1100101 AND 1100228
QUALIFY
ROW_NUMBER() OVER (ORDER BY id) MOD 1000000 = 999999
ORDER BY id;