We were trying to get a sample of 200K records. Initially we were testing if we could get those samples using "top". But it threw a spool space error. Later on we got those samples using rank ( as 1 of the methods ) .
What we want to know is that why "top" did not work in this case? Is there any limit/threshold for it? Would the same happen if we use "sample"? How do each 1 of them exactly work?
Teradata strongly recommends that you give sufficient space to DBC to be used as spool. There are two ways to do this: set aside permanent space to remain unused or create an empty database to hold the space.
The advantage of creating a database to hold spool space means that space will always be set aside and available for spool usage and will not be mixed with the space allocated for DBC.
Reserving a specific amount of space from your total space in DBC helps ensure the entire system will run smoothly and properly because if DBC runs out of spool space, you may run into some problems with performance, unsuccessful queries, or user logon.
While TOP is designed for speed it may be overloading a single (or a couple) AMP causing your spool to skew by going to a few AMPs to get the data. SAMPLE would almost certainly be an all AMP operation and you may find a more even spool distribution but will sacrifice speed due to the statistical nature of SAMPLE.
ROW_NUMBER() would also work but as an OLAP function with an ORDER BY it will probably be fairly CPU intensive.