I have a question regarding SAMPLE. SInce this thread has some discussion regarding the SAMPLE i will put my doubts here.
How does sample select the rows, does it select rows randomly or in the order of hash or rowid? .Suppose, if we specify sample .25, does it select the first 25% rows or does it pick up rows randomly?
regarding this threads question, looks to me that we are interested in the first few rows or some particular row based on Row ID. How does using sample help in this case. Please let me know if Iam missing something here
My understanding is that it picks the rows randomly (i.e. not by hash ID or ROWID). However, you can decide whether you want it to take a random sample by AMP (i.e. approximately the same number of rows from each AMP) or across AMPs. The default is for it to take the same number of rows (or approximately) for each AMP. This is also faster. However, this would not be a "true" random sample of the table since rows that are on AMPs with fewer rows would have a higher percentage chance of being selected than rows that are on AMPs with more rows. However, the default method would probably work for most applications. If you need to get a "true" sample, then you need to specify "RANDOMIZED ALLOCATION".
The reason that I brought up the "sample" in the first place was that it appeared that in the query, dixon was trying to get 1/4 of the rows by doing a "mod 4" on "ROWNUM" and then selecting that rows that had a remainder of 0. This would get you about 1/4 of the rows, assuming that "ROWNUM" is sequential. So, I was just suggesting another method of getting 1/4 of the rows if that's what he was trying to do.
This thread has some useful information..Thanks for the posts. Anyway,the question remains unanswered here. Any thoughts how to retrieve a SAMPLE ROW in Teradata.I understand that SAMPLE n will give the RANDOM n% from the AMPs (evenly or randomly depending on the initial allcation). But how can I target the EXACT ROW that I am interested in?..
Hi suman, One possibility would be to use the Ranking function in conjunction with a qualify statement for the number. Just Rank the various rows on your key and order it by the key sequence. Then use the Qualify rank = x function to pull the row number. There is no direct equivalent of the ROWID function in Oracle here but this is an effective work around. But as you can see, if the table is large, the partitioned window function causes quite a burden on the system resources. So use the function judicially and also ensure that the ranking keys and ordering key sequence are smartly chosen so that you always get the rank as a unique number rather than having multiple rows with the same rank. I would be glad to help with any further details if you need.