Blog

The best minds from Teradata, our partners, and customers blog about relevant topics and features.

turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

04-29-2012
08:46 PM

- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content

04-29-2012
08:46 PM

A quintillion is 10 raised to the power of 18 (that is, 1 followed by 18 zeros). 9.2 quintillion (or more precisely, 9,223,372,036,854,775,807) is the new limit in Teradata Database 14.0 for the number of combined partitions. Note that this is the same as the maximum BIGINT value (which is a signed 8-byte integer). While tables that are defined with 65535 or less combined partitions still use 2-byte partitioning, if the number of defined combined partitions exceeds 65535 for a table, the table is created with 8-byte partitioning. 8-byte partitioning allows up to 9.2 quintillion combined partitions to be defined for the table.

With 8-byte partitioning, up to 62 levels of partitioning can be defined as compared to the 15 levels for 2-byte partitioning.

For 8-byte partitioning, the system-derived columns PARTITION and PARTITION#L*n* have BIGINT data type rather than the INTEGER data type for 2-byte partitioning (in either case, *n *can be between 1 and 62).

The row size increases by 8 bytes for 8-byte partitioning as compared to not partitioning and row size increases by 4 or 6 bytes for 8-byte partitioning as compared to 2-byte partitioning. Rowid references to an 8-byte partitioned object in spool, secondary indexes, join indexes, hash indexes grow 6 bytes over a 2-byte partitioned base object.

**Example 1: **Non-nullable DATE Range

CREATE TABLE t1 (a INT, d1 DATE NOT NULL, ...)

PRIMARY INDEX (a)

PARTITION BY RANGE_N(d1 BETWEEN DATE '0001-01-01'

AND DATE '9999-12-31'

EACH INTERVAL '1' DAY);

This defines 3,652,059 row partitions (the maximum number of ranges that can be defined for DATE). There is no need for a NO RANGE, UNKNOWN, or NO RANGE OR UNKNOWN partition since d1 is specified as NOT NULL (and, therefore, the test value cannot be null) and all the possible values of d1 fall within the specified ranges. In this case, it is recommended not to specify these partitions.

**Example 2: **Nullable DATE Range

CREATE TABLE t2 (a INT, d2 DATE, ...)

PRIMARY INDEX (a)

PARTITION BY RANGE_N(d2 BETWEEN DATE '0001-01-01'

AND DATE '9999-12-31'

EACH INTERVAL '1' DAY,

UNKNOWN);

This defines 3,652,060 row partitions (the maximum number of useful row partitions that can be defined for DATE). Note that a NO RANGE partition could be defined such that there are 3,652,061 partitions defined. However, the NO RANGE partition would always be empty since the ranges in the above RANGE_N cover all possible values for a DATE column. Therefore, there is no need to specify a NO RANGE partition (and it is recommended not to specify one).

**Example 3: **Non-nullable BIGINT Range

CREATE TABLE t3 (a INT, b1 BIGINT NOT NULL, ...)

PRIMARY INDEX (a)

PARTITION BY

RANGE_N(b1 BETWEEN 1

AND 9223372036854775805

EACH 1)

This defines 9,223,372,036,854,775,805 row partitions (the maximum number of *ranges* allowed*)*. Two partitions are reserved for NO RANGE and UNKNOWN (or NO RANGE OR UNKNOWN). There is no need for an UNKNOWN or NO RANGE OR UNKNOWN partition since b1 is specified as NOT NULL and, therefore, the test value cannot be null. A NO RANGE row partition may or may not be desirable depending on whether out of range values should be allowed in the table or not.

**Example 4: **Nullable BIGINT Range

CREATE TABLE t4 (a INT, b2 BIGINT, ...)

PRIMARY INDEX (a)

PARTITION BY

RANGE_N(b2 BETWEEN 1

AND 9223372036854775805

EACH 1,

UNKNOWN)

This defines 9,223,372,036,854,775,806 row partitions (the maximum number of ranges + 1 for the UNKNOWN partition). One partition is reserved for the NO RANGE partition. If b2 (that is, the test value) is null for a row, the row goes in the UNKNOWN partition. If b2 (that is, the test value) is not null and has a value that is not in the specified ranges, an error occurs since a NO RANGE partition is not defined. A NO RANGE partition could be added or the UNKNOWN partition replaced with a NO RANGE OR UNKNOWN partition to allow row with out of range values to have a partition in which to go. Such a partition may or may not be desirable depending on whether out of range values should be allowed in the table or not.

**Example 5: **Multilevel Row Partitioning

CREATE TABLE t5 (a INT, ts1 TIMESTAMP(3) WITH TIME ZONE NOT NULL, productid INT NOT NULL, ...)

PRIMARY INDEX (a)

PARTITION BY (

RANGE_N(ts1 BETWEEN TIMESTAMP '1950-01-01 00:00:00.000+00:00'

AND TIMESTAMP '2100-12-31 23:23:59.999+00:00'

EACH INTERVAL '1' DAY),

RANGE_N(productid BETWEEN 1 AND 100000 EACH 1) );

This defines 5,515,200,000 combined partitions with 55,152 row partitions for the 1^{st} partitioning level 100,000 row partitions for the 2^{nd} partitioning level. There is no need for an UNKNOWN or NO RANGE OR UNKNOWN partition since ts1 is specified as NOT NULL and, therefore, the test value cannot be null. A NO RANGE partition may or may not be desirable for each of the partitioning levels depending on whether out of range values should be allowed in the table or not. When partitioning with TIMESTAMP it is recommended to use TIMESTAMP WITH TIME ZONE.

With the large number of combined partitions that can be defined, it is expected that a high percentage of combined partitions will be empty. Also, it is recommended for multilevel partitioning that a nonempty combined partition on an AMP should consist of 10 or more data blocks. If nonempty combined partitions are less than 10 data blocks, the table may be over partitioned and you may need to consider a different partitioning (fewer levels of partitioning or fewer partitions per level) or no partitioning. For example on a 200 AMP system with a 200 gigabyte table defining 100,000 combined partitions[1] and with 100 rows per data block and 100 data blocks per each nonempty combined partition per AMP, about 99% of the defined 100,000 combined partitions would be empty. If all 100,000 combined partitions on each AMP were each nonempty with 100 rows per data block and 100 data blocks, there would be 200 billion rows; if each row was 100 bytes, the primary data would be 20 petabytes. Most likely then it is not the case that every combined partition would be populated. In other words, if considering a multidimensional use of multilevel partitioning, not all combinations of dimension values would actually occur. In some cases, you may find that 65535 combined partitions is plenty and you only need to use 2-byte partitioning.

[1] This is a relatively small number of combined partitions compared to the limit of 9,223,372,036,854,775,807. That is, this defines only about 0.0000000000011% of the possible combined partitions that could be defined.

With a large number of combined partitions for a primary-indexed table or join index, primary index access, joins, and aggregations may be expensive unless there is effective row partition elimination such that only a small number of nonempty combined partitions are accessed. If such operations are too expensive, other plans may be chosen and the primary index may be only usable for distribution of rows to the AMPs. Sliding-window joins may not be feasible since performance degrades if the number of partitions in a window is much less than the number of nonempty, noneliminated partitions that need to be joined. Rowkey-based merge joins, if there are many levels of partitioning, may not be feasible; this type of join requires an equality condition between all the partitioning columns and all the primary index columns of the primary index of the two tables.

Along with the new limit, TD 14.0 includes other partitioning enhancements such as direct RANGE_N partitioning on TIMESTAMP or BIGINT, ADD option, etc. which I'll discuss in subsequent blog entries. And of course, TD 14.0 includes column partitioning as part of the Teradata Columnar capabilities (see my earlier blog entry on Teradata Columnar).

- Tags:
- database
- partitioning

7 Comments

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.