Teradata Useful Information Q/A for freshers and Expert Levels

Database
Enthusiast

Teradata Useful Information Q/A for freshers and Expert Levels

What is Teradata & how does it differ from other conventional Systems?

Teradata is a relational database that can store billions of rows and petabytes (1 petabyte=1000 terabytes) of data. The architecture of the system makes it possible and provides the flexibility to access & process the data in a faster way.

It differs from other conventional database systems from its architecture to the processing speed. Easy scalability and its fault tolerance keep the demand high for this system.

Please follow the post i ll be updating the information on regular basis you can post your comments i ll be answering here

27 REPLIES
Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

What are the main components of Teradata System?

Teradata has 3 main components which do wonder to the world of data management & Storage.

It has

1. PE (Parsing Engine) : Acts as a gate keeper to the Teradata Systems and manages all sessions, interprets the SQL statements for any errors, manages the access rights for the user, defines a least expensive optimizer plan for the query to execute and sends the request to AMP via Bynet.

2. Message Passing Layer (Bynet)  : Carries messages between the AMPs and PEs, provides Point-to-Point and Broadcast communications, Merging answer sets back to the PE and Making Teradata parallelism possible

3. AMP (Access Module Processor) : AMP is the heart of Teradata which does most of the operations for data storage and retrieval. It also takes care of  finding the rows requested, Lock management of the tables and rows, Sorting rows, Aggregating columns, Join processing etc.

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

What are Primary Index and Primary Key in Teradata?

Unlike other database systems, Teradata distributes the data based on PI (Primary Index). PI is defined at the time of table creation and database automatically takes the first column as the PI if the PI is not mentioned explicitly.

Since the data distribution is based on PI, it is wise to choose a PI that evenly distributes the data among the AMP. 

For example, if Table A has two columns like below and we have 5 AMPs in the System.

ID            Gender

1              Male

2              Male

3              Male

4              Male

5              Female

If we choose ID as PI, since the values are distinct all 5 rows are distributed evenly across all 5 AMPs. But if GENDER has been chosen as PI , we have only 2 distinct values and data will be stored in only 2 AMPS leaving other 3 AMPS empty and idle.

Note: Same Value of PI will be stored in the same AMP.

Primary Key is a concept that uniquely identifies a particular row of a table.

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

What are the types of PI (Primary Index) in Teradata?

There are two types of Primary Index. Unique Primary Index ( UPI) and Non Unique Primary Index (NUPI). By default, NUPI is created when the table is created. Unique keyword has to be explicitly given when UPI has to be created.

UPI will slower the performance sometimes as for each and every row , uniqueness of the column value has to be checked and it is an additional overhead to the system but the distribution of data will be even.

Care should be taken while choosing a NUPI so that the distribution of data is almost even . UPI/NUPI decision should be taken based on the data and its usage.

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

How to Choose Primary Index(PI) in Teradata?

Choosing a Primary Index is based on Data Distribution and Join frequency of the Column. If a Column is used for joining most of the tables then it is wise to choose the column as PI candidate.

For example, We have an Employee table with EMPID and DEPTID and this table needs to be joined to the Department Table based on DEPTID.

It is not a wise decision to choose DEPTID as the PI of the employee table. Reason being, employee table will have thousands of employees whereas number of departments in a company will be less than 100. So choosing EMPID will have better performance in terms of distribution.

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

How the data is distributed among AMPs based on PI in Teradata?

•       Assume a row is to be inserted into a Teradata table

•       The Primary Index Value for the Row is put into the Hash Algorithm

•       The output is a 32-bit Row Hash

•       The Row Hash points to a bucket in the Hash Map.The first 16 bits of the Row Hash of is used to locate a bucket in the Hash Map

•       The bucket points to a specific AMP

•       The row along with the Row Hash are delivered to that AMP

When the AMP receives a row it will place the row into the proper table, and the AMP checks if it has any other rows in the table with the same row hash. If this is the first row with this particular row hash the AMP will assign a 32-bit uniqueness value of 1. If this is the second row hash with that particular row hash, the AMP will assign a uniqueness value of 2. The 32-bit row hash and the 32-bit uniqueness value make up the 64-bit Row ID. The Row ID is how tables are sorted on an AMP.

This uniqueness value is useful in case of NUPI's to distinguish each BUPI value.

Both UPI and NUPI is always a One AMP operation as the same values will be stores in same

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

How Teradata retrieves a row?

For example, a user runs a query looking for information on Employee ID 100. The PE sees that the Primary Index Value EMP is used in the SQL WHERE clause.

Because this is a Primary Index access operation, the PE knows this is a one AMP operation. The PE hashes 100 and the Row Hash points to a bucket in the Hash Map that represents AMP X. AMP X  is sent a message to get the Row Hash and make sure it’s EMP 100.

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

What are Secondary Indexes (SI) , types of SI and disadvantages of Secondary Indexes in Teradata?

Secondary Indexes provide another path to access data. Teradata allows up to 32 secondary indexes per table. Keep in mind; row distribution of records does not occur when secondary indexes are defined. The value of secondary indexes is that they reside in a subtable and are stored on all AMPs, which is very different from how the primary indexes (part of base table) are stored. Keep in mind that Secondary Indexes (when defined) do take up additional space.

Secondary Indexes are frequently used in a WHERE clause. The Secondary Index can be changed or dropped at any time. However, because of the overhead for index maintenance, it is recommended that index values should not be frequently changed.

There are two different types of Secondary Indexes, Unique Secondary Index (USI), and Non-Unique Secondary Index (NUSI). Unique Secondary Indexes are extremely efficient. A USI is considered a two-AMP operation. One AMP is utilized to access the USI subtable row (in the Secondary Index subtable) that references the actual data row, which resides on the second AMP.

A Non-Unique Secondary Index is an All-AMP operation and will usually require a spool file. Although a NUSI is an All-AMP operation, it is faster than a full table scan.

Secondary indexes can be useful for:

•       Satisfying complex conditions

•       Processing aggregates

•       Value comparisons

•       Matching character combinations

•       Joining tables

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

How are the data distributed in Secondary Index Subtables in Teradata?

When a user creates a Secondary Index, Teradata automatically creates a Secondary Index Subtable. The subtable will contain the:

•       Secondary Index Value

•       Secondary Index Row ID

•       Primary Index Row ID

When a user writes an SQL query that has an SI in the WHERE clause, the Parsing Engine will Hash the Secondary Index Value. The output is the Row Hash, which points to a bucket in the Hash Map.

That bucket contains an AMP number and the Parsing Engine then knows which AMP contains the Secondary Index Subtable pertaining to the requested USI information. 

The PE will direct the chosen AMP to look-up the Row Hash in the Subtable. The AMP will check to see if the Row Hash exists in the Subtable and double check the subtable row with the actual secondary index value. Then, the AMP will pass the Primary Index Row ID back up the BYNET network. This request is directed to the AMP with the base table row, which is then easily retrieved.

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

What are the types of JOINs available in Teradata?

Types of JOINs are  :  Inner Join, Outer Join (Left, Right, Full), Self Join, Cross Join and Cartesian Joins.

The key things to know about Teradata and Joins

•       Each AMP holds a portion of a table.

•       Teradata uses the Primary Index to distribute the rows among the AMPs.

•       Each AMP keeps their tables separated from other tables like someone might keep clothes in a dresser drawer.

•       Each AMP sorts their tables by Row ID.

•       For a JOIN to take place the two rows being joined must find a way to get to the same AMP.

•       If the rows to be joined are not on the same AMP, Teradata will either redistribute the data or duplicate the data in spool to make that happen.