Hi Teradatares :-)
I am new to Teradata database. Currently I am thiking of creating "big" table that should grow 200 mln rows each week. After a year data it will be 10 000 000 000 rows
After reading some best practice, documentation and forum, that's what I kind of understand.
Create table test
( sample_population_date date,
Primary index (sample_population_date)
Partition by range_n (sample_population_date between 01/01/2013 and 01/01/2010 each interval '1' day)
This table will contain campaign population (list of users) that is going to be run everyday) and select queries are going to be affecting one partition.
I want to take benefit of single AMP performance(over nAMPs query) and partitioning.
For this reason I want to Primary index to distribute rows for the same date (campaign) to be store on single amp, additionallly I am paritioning by that date to have many "full" partitions on single AMP.
Single partition size in rows is 200 000 000.
Is it the best performance big table design in teradata? Is there any better way of achiving one big table that will perform for hundreds of million rows in very fast way?
Any negative consequence for this approach like stitistics collections?
no, no, no, no, don't do that!
The number of rows per PI-value should be as low as possible, as a rule of thumb they should fit on a datablock, i.e. up to a three or four digit number. When the table is partitioned you might get that number per partition. Otherwise you need some serious reasons, but never millions of rows in one partition on a single AMP.
Everything would be really slow, e.g. reading all the rows for a given date would read data from a single AMP's disks instead of all disks in the system.
You need a better PI and sample_population_date might be part of it.
Please read the Database Design manual, there's a module on Primry Indexes including the criteria for selecting a PI.
Btw, what is your DBMS background? Oracle, SQL Server, etc.?
Ok, I understand that would be to much for single AMP to handle millions of rows.( I am new to TD so I prefered to ask :-) )
I will make PI distributed as evenly as possible , i.e. using ( userid , campaign_name ) that should be unique combination accross table and partition campaign date to allow partitioning prunning.
When that comes to single AMP query, then is it adviced only for a simple quering using Pirmary Index?
I did Oracle mostly, Postgres(Amazon Redshift), SQL Server (not a big fan)
assume i have a 1 millions records file target table is having columns like(id,name,gender)
gender column we are mentioned PI(NUPI).
source gender data like(Male,Female,Unknown),
how many AMP's storing the 1 million records