How to read all data from a large table with minimum locking

Database
Senior Apprentice

Re: How to read all data from a large table with minimum locking

Hi,

 

Just to clarify a couple of points that you've raised - basically they are not 'problems'.

1) Locking. You (quite correctly) are thinking about the locking impact of your processing on other processing. However, using the "locking for access" modifier that 'problem' goes away. The only normal processing that would get blocked if you are using an 'access lock' are DDL queries (ALTER TABLE, CREATE INDEX etc.)

2) Yes you can get a list of indexed columns from a system table - however that will be a very stable query across releases of TD. Yes there will be changes as you go up releases but those tend to be minor. For instance retrieving the basic information about columns in indexes is (something like):

select databasename
      ,tablename
      ,columnname
      ,indexnumber
      ,columnposition
      ,indextype
from dbc.indicesv
where ...;

That query will probably run on all releases of TD from 12 to 16.2. And even when you do have to change it is very easy to design your application to handle changes to system tables based on the TD version.

Cheers,

Dave

 

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Junior Contributor

Re: How to read all data from a large table with minimum locking

If this an important task and your main concern is running out of spool simply ask your DBA to increase your spool.

 

Overall resource usage will be way lower when you scan the table only once vs. multiple/lots of accesses.

Enthusiast

Re: How to read all data from a large table with minimum locking


dnoeth wrote:

If this an important task and your main concern is running out of spool simply ask your DBA to increase your spool.


It's not important task. I can certainly request the spool to be increased but definately it can't be that big to accomodate every single table - some tables may be huge. So some form of pagination is necessary anyway. And since it is required I'd rather make queries to return smaller set so that 1) we don't need too much memory resources of server side for spool 2) the data flow is not hold for too long.

 


dnoeth wrote:

 

Overall resource usage will be way lower when you scan the table only once vs. multiple/lots of accesses.


What I'm concern now is the spool space. I don't care much about CPU/Network/RAM usage. Not sure if it's justified though. I'll probably need to check. 

Enthusiast

Re: How to read all data from a large table with minimum locking

1. CREATE TABLE TO HOLD ALL THE PRIMARY INDEX COLUMNS VALUES
CREATE TABLE AGG_PRIMARY_INDEX
AS
SELECT PRIMARY_INDEX COL1,2.. AS PI_VAL
,COUNT(*)
FROM TABLE
GROUP BY PRIMARY_INDEX COL1,2..;

2. ALTER TABLE AGG_PRIMARY_INDEX TO INCLUDE ONE MORE COLUMN COMMENT. KEEP IT VARCHAR 20
3. SHELL SCRIPTING
ITERATE THROUGH TABLE AGG_PRIMARY_INDEX. CALL BTEQ EXPORT WITH THE PARAMETER ON PI_VAL.
YOU WILL USE ACCESS LOCKING ON TABLE WHILE SELECT
YOU WILL USE PI_VAL IN WHERE CALUES
4. WITH EVERY LOOP UPDATE COLUMN COMMENT OF AGG_PRIMARY_INDEX WITH VALUE AS COMPLETE

COMMENT:
1. YOU WILL NOT END UP LOCKING THE TABLE FOR OTHERS (LOCKING USING ACCESS)
2. SELECT WILL BE FAST AS IT IS ON PI
3. EXPORT WILL BE IN PARTS. (YOU CAN EVEN LOAD NEWLY CREATED TABLE WITH PARTITION)
4. WITH EACH LOOP YOU WILL HAVE LOG OF HOW MUCH DATA YOU HAVE ACTUALLY EXPORTED
5. IN CASE OF FAILURE AT ANY STAGE YOU WILL RESTART FROM POINT OF FAILURE

Junior Contributor

Re: How to read all data from a large table with minimum locking


m4r3k wrote:

... some tables may be huge. So some form of pagination is necessary anyway. And since it is required I'd rather make queries to return smaller set so that 1) we don't need too much memory resources of server side for spool 2) the data flow is not hold for too long.

 


You're probably not going to return all rows to the client, so the spool containing the answer set will be small anyway, don't care too much about intermediate spools. Processing large amounts of data is what Teradata is used for.

 

Can you exactly describe what you're actually trying to do?

Highlighted
Teradata Employee

Re: How to read all data from a large table with minimum locking

If it is desired to always read all rows and all columns of the table, then see the NO SPOOL option of the Export operator in TPT. This will not spool the source table, just pass it back to the operator. The result can then be landed into a file or written to a pipe. It cannot be used if there are selection criteria, aggregation or order by, it must be a simple read of the entire table.

 

It woulld be very good to seriously consider whether the logic desired can be implemented in SQL statements and pushed down to the database engine. Teradata's greatest strength is in applying complex operations to large data sets, returning only the answers needed in the application. Exporting all the data to perform the operations outside does not make use of the strength of the technology that has been implemented.

Enthusiast

Re: How to read all data from a large table with minimum locking


dnoeth wrote: 

You're probably not going to return all rows to the client, so the spool containing the answer set will be small anyway, don't care too much about intermediate spools. Processing large amounts of data is what Teradata is used for.


In fact I want to return all rows to the client. 

 


Can you exactly describe what you're actually trying to do?


I rougly describe the problem in this post: http://community.teradata.com/t5/Database/How-to-read-all-data-from-a-large-table-with-minimum-locki...

Enthusiast

Re: How to read all data from a large table with minimum locking


ToddAWalter wrote:

If it is desired to always read all rows and all columns of the table, then see the NO SPOOL option of the Export operator in TPT. This will not spool the source table, just pass it back to the operator. The result can then be landed into a file or written to a pipe. It cannot be used if there are selection criteria, aggregation or order by, it must be a simple read of the entire table.


I looked a bit on TPT and FastExport. In this article I found that the "NO SPOOL" option in FastExport is only for small results set, while I need huge results. I'm not sure if those constrains apply also to TPT, but it's what I presume. 

I also could not find any detailed information on how TPT or FastExport deal with locking. There is LOCKING modifier which ensure that the locks are kept until all data is moved to the spool. But I presume even if no LOCKING modifier is applied the query uses read locks. I'm not sure if there is a way of disabling it (leaving access lock for rows only). 

Junior Contributor

Re: How to read all data from a large table with minimum locking


m4r3k wrote: 

In fact I want to return all rows to the client. 


And what is this client supposed to do with those rows?

Process it better/faster/nicer than a database system?

 

You're trying to reinvent the wheel.

Enthusiast

Re: How to read all data from a large table with minimum locking

Process it better/faster/nicer than a database system?

 


Such responses does not really help me.

I don't know how it's going to be processed. Processing is not the subject of this issue. All I need is to read the content of the whole table/database as raw (binary or textual) data.

If it was possible to implement is on the Teradata side with SQL or UDF I would certainly do that. But it's not possible.