Teradata Useful Information Q/A for freshers and Expert Levels

Database
Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

What are the types of Join Strategies available in Teradata?

Join Strategies are used by the optimizer to choose the best plan to join tables based on the given join condition.

•       Merge (Exclusion)

•       Nested

•       Row Hash

•       Product (including Cartesian Product joins)

There are different types of merge join strategies available. But in general , while joining two tables the data will be redistributed or duplicated across all AMPs to make sure joining rows are in the same AMPs.

If the two tables are joined based on PI, no redistribution/duplication will happen as the rows will be in the same AMP and performance will be better.  If one table PI is used and Other table PI not used, redistribution/duplication of the table will happen based on the table size.In these cases Secondary Indexes will be helpful.

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

Explain types of re-distribution of data happening for joining of columns from two tables in Teradata?

Case 1 - P.I = P.I joins

Case 2 - P.I = non Index joins

Case 3 - non Index = non Index joins

Case1 - there is no redistribution of data over amp's. Since amp local joins happen as data are present in same AMP and need not be re-distributed. These types of joins on unique primary index are very fast.

Case2 - data from second table will be re-distributed on all amps since joins are happening on PI vs. non Index column. Ideal scenario is when small table is redistributed to be joined with large table records on same amp

case3 - data from both the tables are redistributed on all AMPs. This is one of the longest processing queries; Care should be taken to see that stats are collected on these columns

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

What is Partitioned Primary Index (PPI) in Teradata?

Partitioned primary index is physically splitting the table into a series of subtables, one for every partitioning value. When a single row is accessed, it looks first at the partitioning value to determine the subtable, then at the primary index to calculate the rowhash for the row(s).

For example, we have PPI on a MONTH Column, the rows of particular months are all sorted with in the same partition and whenever data is accessed for particular month, it will retrive the data in a faster way.

It helps to avoid full table scans.

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

What are the advantages and disadvantages of PPI in Teradata?

Advantages:

·      Range queries don’t have to utilize a Full Table Scan.

·      Deletions of entire partitions are lightning fast.

·      PPI provides an excellent solution instead of using Secondary Indexes

·      Tables that hold yearly information don’t have to be split into 12 smaller tables to avoid Full Table Scans (FTS). This can make modeling and querying easier.

·      Fastload and Multiload work with PPI tables, but not with all Secondary Indexes.

Disadvantages:

·      A two-byte Partition number is added to the ROW-ID and it is now called a ROW KEY. The two-bytes per row will add more Perm Space to a table.

·      Joins to Non-Partitioned Tables can take longer and become more complicated for Teradata to perform.

·      Basic select queries utilizing the Primary Index can take longer if the Partition number is not also mentioned in the WHERE clause of the query.

·      You can’t have a Unique Primary Index (UPI) if the Partition Number is not at least part of the Primary Index. You must therefore create a Unique Secondary Index to maintain uniqueness.

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

Volatile and Global Temporary Tables in Teradata?

Volatile tables are temporary tables that are materialized in spool and are unknown to the Data Dictionary.

A volatile table may be utilized multiple times and in more than one SQL statement throughout the life of a session. This feature allows for additional queries to utilize the same rows in the temporary table without requiring the rows to be rebuilt.

Volatile tables are local to session and the tables are dropped once the session is disconnected.

ON COMMIT PRESERVE ROWS option should be mentioned at the time of table creation. It means that at the end of a transaction, the rows in the volatile table will not be deleted. The information in the table remains for the entire session. Users can ask questions to the volatile table until they log off. Then the table and data go away.

Global Temporary Tables are similar to volatile tables in that they are local to a user’s session. However, when the table is created, the definition is stored in the Data Dictionary. In addition, these tables are materialized in a permanent area known as Temporary Space. Because of these reasons, global tables can survive a system restart and the table definition will not discarded at the end of the session. However, when a system restarts, the rows inside the Global Temporary Table will be removed. Lastly, Global tables require no spool space. They use Temp Space.

Statistics can be collected in both of the tables in TD13 Version. Previously Collecting Stats on Volatile tables are not allowed.

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

SubQuery and Correlated Subquery in teradata?

Sub queries and Correlated Sub queries are two important concepts in Teradata and used most of the times.

The basic concept behind a subquery is that it retrieves a list of values that are used for comparison against one or more columns in the main query. Here the subquery is executed first and based on the result set, the main query will be executed.

For example,

Select empname,deptname from employee where empid IN ( select empid from salarytable where salary>10000).

In the above query, empid will be choosen first based on the salary in the subquery and main query will be executed based on the result subset.

Correlated Subquery is an excellent technique to use when there is a need to determine which rows to SELECT based on one or more values from another table.It combines subquery processing and Join processing into a single request.

It first reads a row from the main query and then goes into the subquery to find the rows that match the specified column value.Then it goes for the next row from the main query. This process continues untill all the qualifying rows from MAIN query.

For example,

select empname,deptno, salary

from employeetable as emp

where

salary=(select max(salary) from employeetable as emt where emt.deptno=emp.deptno)

Above query returns the highest paid employee from each department. This is also one of the scenario based questions in teradata.

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

How to calculate the tablesize , database size  and free space left in a database in teradata?

DBC.TABLESIZE and DBC.DISKSPACE are the systems tables used to find the space occupied.

Below Query gives the table size of each tables in the database and it will be useful to find the big tables in case of any space recovery.

SELECT         DATABASENAME,

TABLENAME,

SUM(CURRENTPERM/(1024*1024*1024)) AS "TABLE SIZE"

FROM

DBC.TABLESIZE

WHERE

DATABASENAME = <'DATABASE_NAME'> AND TABLENAME = <'TABLE_NAME'>

GROUP BY 1,2;

Below query gives the total space and free space available in a database.

SELECT

        DATABASENAME DATABASE_NAME,

        SUM(MAXPERM)/(1024*1024*1024) TOTAL_PERM_SPACE,

        SUM(CURRENTPERM)/(1024*1024*1024) CURRENT_PERM_SPACE,

        TOTAL_PERM_SPACE-CURRENT_PERM_SPACE as FREE_SPACE

 FROM

        DBC.DISKSPACE

 WHERE

        DATABASENAME = <'DATABASE_NAME'>

group by 1;

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

What are the Performance improvement techniques available in Teradata?

First of all use EXPLAIN plan to see how the query is performing. Keywords like Product joins, low confidence are measures of poor performance.

Make Sure, STATS are collected on the columns used in WHERE Clause and JOIN columns. If STATS are collected , explain plan will show HIGH CONFIDENCE This tells the optimizer about the number of rows in that table which will help the optimizer to choose the redistribution/duplication of smaller tables.

Check the joining columns & WHERE Clause whether PI, SI or PPI are used.

Check whether proper alias names are used in the joining conditions.

Split the queries into smaller subsets in case of poor performance.

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

What does” Pseudo Table” Locks mean in EXPLAIN Plan in Teradata?


It is a false lock which is applied on the table to prevent  two users from getting conflicting locks with all-AMP requests.

PE will determine an particular AMP to manage all AMP LOCK requests for given table and Put Pseudo lock on the table.

Enthusiast

Re: Teradata Useful Information Q/A for freshers and Expert Levels

Can you compress a column which is already present in table using ALTER in Teradata?

No,   We cannot use ALTER command to compress the existing columns in the table.

A new table structure has to be created which includes the Compression values and data should be inserted into Compress column table.

Please note - ALTER can be used only to add new  columns with compression values to table.