I am attempting to explore data on my company's views. I don't have direct access to the underlying tables. Most tables contain roughly 5-10 billion rows. When I try:
SELECT TOP 10 * FROM [myView]
I get the error "SELECT Failed. 2646: No more spool space on [my username]"
How do I select only a few records from these massive views?
The way your doing it will work or you can use "Sample"
Have you viewed the view definition? If you have access to select from it you can run "Show View myview to check if is is a complex view which may need more spool to build your answerset. If it is not a complex view check with your DBA to ensure you have spool assigned.
Rglass - thanks for the tip. The SHOW VIEW revealed that the [myView] is not complex. I do believe I have spool assigned because i am able to "SELECT TOP 10 *" on smaller views (less than 2 billion records) successfully. I am new to teradata, but i am hoping not to "load" billions of records before showing only 10.
I ran the EXPLAIN feature, which returned this:
SELECT TOP 10 *
1) First, we lock [myTable] in view [myView] for access.
2) Next, we do an all-AMPs RETRIEVE step from [myTable] in view
[myView] by way of an all-rows scan with a condition of (
"([myTable] in view [myView].Column1 = 'I ') AND
(([myTable] in view [myView].Column2 = 'NO ') OR
([myTable] in view [myView].Column2 = 'NA '))")
into Spool 2 (all_amps) (compressed columns allowed), which is
built locally on the AMPs. The input table will not be cached in
memory, but it is eligible for synchronized scanning. The result
spool file will not be cached in memory. The size of Spool 2 is
estimated with low confidence to be 12,023,256,733 rows (
6,179,722,341,888 bytes). The estimated time for this step is 13
minutes and 16 seconds.
3) We do an all-AMPs STAT FUNCTION step from Spool 2 by way of an
all-rows scan into Spool 5, which is redistributed by hash code to
all AMPs. The result rows are put into Spool 1 (group_amps),
which is built locally on the AMPs. This step is used to retrieve
the TOP 10 rows. One AMP is randomly selected to retrieve 10
rows. If this step retrieves less than 10 rows, then execute step
4. The size is estimated with low confidence to be 10 rows (
4) We do an all-AMPs STAT FUNCTION step from Spool 2 (Last Use) by
way of an all-rows scan into Spool 5 (Last Use), which is
redistributed by hash code to all AMPs. The result rows are put
into Spool 1 (group_amps), which is built locally on the AMPs.
This step is used to retrieve the TOP 10 rows. The size is
estimated with low confidence to be 10 rows (16,490 bytes).
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
Do you see anything that concerns you? - or is it simply not possible to SELECT TOP 10 * on a large view without asking DBAs to increase my spool space (which i suspect they wont do).
the view is not complex, but there's a WHERE-condition in it and that's enough to make the TOP slow :-(
Both TOP and SAMPLE are always applied as last step and any WHERE/JOIN/GROUP BY will be done before, in your case resulting in 6+TB spool. The DBA will not increase your spool, but might grant you access to the base tables.
But big tables are usually partitioned, so adding a WHERE-condition matching the PPI will greatly reduce the spool size.
If you don't know about the partition definition you might run a
select * from dbc.PartitioningConstraintsV
where DatabaseName = myDatabase and TableName = myTable;
I am not sure if the 'select top 10 * ...' holds for Aster too. When I try running the code snippet:
SELECT TOP 10 *
ORDER BY col_name
I get the following error:
Executed as Single statement. Failed [34 : 42000] [AsterData][ASTERJDBCDSII](34) ERROR: syntax error at or near "10" ()
Elapsed time = 00:00:00.156
STATEMENT 1: SELECT Statement failed.
Can anyone suggest what is the correct syntax for aster?
Thanks in advance,