Hi I have a table like below
CREATE MULTISET TABLE EMPLOYEE
PRIMARY INDEX ( emp_id );
PARTITION BY (
emp_id BETWEEN 1 AND 1000 EACH 100
, NO RANGE
where PI column is also part of range partition.
I wanted to know how data will be sorted in each partition.
Each partition of 100 emp_id's will keep sorted data as 1..100 in 1st partition 101-200 in 2nd partition and so on or each partition will hold random 100 emp_ids.
I want to use this ID as part of between clause in my SQL and table has very have number of records crossing 300 million.
Yes, each partition will hold values as you have described. But also remember that the data is distributed across the AMPs using the PI column (in your case emp_id).
So what is physically going to happen is:
- assume you have 1000 employees with unique values for emp_id and 20 AMPs
- each AMP will hold approx 50 rows
- the table definition allows for 10 partitions - and each AMP may contain data in any of those 10 partitions.
Having loaded your data, you then run a query with a WHERE clause such as:
WHERE emp_id BETWEEN 98 AND 257
This query will use all AMPs.
On each AMP the query will only search 3 partitions (containing emp_id's 1-100, 101-200, 201-300)
(Note that if any of the partitions are empty then those partitions will not be searched).
- if the emp_id column is used in the search criteria for your (important) queries then that is a good choice for PPI.
- you said that "sorted data as 1..100 in 1st partition 101-200 in 2nd partition and so on or each partition will hold random 100 emp_ids". It is easiest to think about the data as being sorted into each partition - just remember that each logical partition (i.e. what you define in the table definition) is spread across every AMP in the system (or MAP if you're on 16.10 and higher!).