Data redistributed to a single amp while doing count(*) from subquery: top n with order by

Database
Highlighted
Scout

Data redistributed to a single amp while doing count(*) from subquery: top n with order by

Hi,

I've encountered a behaviour which I can't explain or don't really understand. I'm definitely missing something here and I would like to fill the gap in my knowledge.

 

When I run the following query I encounter SELECT Failed. 2646: (-2646)No more spool space in username.

It took no more than 3 minutes to get the error message.

 

SELECT CAST(COUNT(*) AS BIGINT)
FROM (SELECT TOP 10 * FROM DB_NAME.TABLE_NAME_ ORDER BY ROW_NUMBER_, SOURCE_FILE_NAME) t;

I'm aware that this query is a test for the sake of it.

 

Some basic system information:

288 amp system with TD version/release:16.20.29.01 installed.

I didn't have a chance to test it on different systems yet.

 

Table definition (I've removed columns which I think are irrelevant).

CREATE MULTISET TABLE DB_NAME.TABLE_NAME ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
...other columns
ROW_NUMBER_ DECIMAL(18,0),
SOURCE_FILE_NAME VARCHAR(256) CHARACTER SET UNICODE,
...other columns
)
PRIMARY INDEX ( ROW_NUMBER_ );

- The PRIMARY INDEX ( ROW_NUMBER_ ) isn't unique, but ensures an even distribution of data across the amps. There exactly 16 rows per value.

Actually the (SOURCE_FILE_NAME, ROW_NUMBER_) duo is the primary key and therefore is unique, with SOURCE_FILE_NAME storing 16 values total.

- Neither of the columns have NULL values - which is ofc good.

- Stats on the whole table are up to date

 

The dbc.diskspace showed that everything is being pushed into a single AMP (vproc number being different for each try) who had ~29.5 GB of spool consumed with all other amps laying on zero.

Runtime information from viewpoint confirms it.

Spool space: 29,385,619,500
Hot AMP spool: 29,385,619,500
Spool skew: 99.65278


The explain plan:

1) First, we lock DB_NAME.TABLE_NAME_ for read on a 
reserved RowHash to prevent global deadlock. 
2) Next, we lock DB_NAME.TABLE_NAME_ for read. 
3) We do an all-AMPs STAT FUNCTION step from 
DB_NAME.TABLE_NAME_ by way of an all-rows scan with no 
residual conditions into Spool 6 (Last Use), which is 
redistributed by hash code to all AMPs. The result rows are put 
into Spool 4 (all_amps), which is built locally on the AMPs. This 
step is used to retrieve the TOP 10 rows. The size is estimated 
with high confidence to be 10 rows (5,390 bytes). The estimated 
time for this step is 19 minutes and 7 seconds. 
(...)

Now watching in viewpoint the actual query running gives the same idea - spool on a single amp is rapidly consumed during step 3:

Step 3 High Confidence Active Active
Estimated Time: 0:19:06.585 Estimated Rows: 10
Actual Time:Actual Rows:
We do an All-AMPs STAT FUNCTION step from PS_RWD_IMSPM_OMOP.MEASUREMENT
in TD_Map1 by way of an all-rows scan into Spool 587,
which is redistributed by hash code to all AMPs.
The result rows are put into Spool 585, which is redistributed by hash code to all AMPs.

I don't know what exactly is being hashed.

First I thought that we're doing a sort by ROW_NUMBER_, SOURCE_FILE_NAME, so this would be it (both of the columns). I did:

SELECT hashamp(hashbucket(hashrow(SOURCE_FILE_NAME, ROW_NUMBER_))) as ampno, count(1) as cnt_
FROM DB_NAME.TABLE_NAME_ group by ampno
order by cnt_;

But the result was as predicted: even distribution with difference between most and least used amp being a fraction of a percent. And this is when I lost my track...

 

When I run the subquery:

SELECT TOP 10 * FROM DB_NAME.TABLE_NAME_ ORDER BY ROW_NUMBER_, SOURCE_FILE_NAME;

The explain plan for step 3 is basically the same with the redistribution part as well. The only difference being the next statement "result rows put into spool (group_amps)" -  instead of (all_amps) in the original. I'm quite convinced that it's not the problem. This query completes within 35 seconds and returns (of course) 10 rows.

 

Why redistribution to a single amp is happening? I've run out of ideas...

 

Any help appreciated.

 

APo