Read consistency compared to Oracle

Database
Fan

Read consistency compared to Oracle

Dear all, this is my first post. I have primarily worked on Oracle for 20 years and very recently DB2.

Oracle offers the highest level of read consistency, i.e. writers don’t block readers, readers don't block writers and readers do not see uncommitted data (aka dirty reads).

So if I were to do this in Oracle:

Session 1:
CREATE TABLE emp(empno NUMBER, deptno NUMBER, sal NUMBER);
INSERT INTO emp VALUES (1, 10, 1000);
COMMIT;
INSERT INTO emp VALUES (2, 10, 1000);

--Now I do not commit and start a new session in parallel:
Session 2:
SELECT SUM(sal) FROM emp WHERE deptno=10;
SUM(sal)
--------
1000
--Here I only get to see the committed data and the ongoing write does not block the read


Now I return to session 1:

INSERT INTO emp VALUES (3, 10, 1000);
--The active read session does not block the write

 

And return to session 2:
SELECT SUM(sal) FROM emp WHERE deptno=10;
SUM(sal)
--------
1000
--I still get to see only the committed data and the ongoing write does not block the read

 

Can I achieve the same results in Teradata? Do readers block writers (or vice versa) and if and when it doesn't, will I see dirty data?

I do not have access to a Teradata instance, so I would be grateful if somebody could try this and confirm.

Cheers

3 REPLIES
Senior Apprentice

Re: Read consistency compared to Oracle

Hi,

 

As with so many things (particulalry in Teradata), "it depends"...

 

Firstly Teradata supports two transaction modes, 'Teradata' (also called BTET) and ANSI and then within 'Teradata mode' there is a further sub division.

 

Some basic rules apply across all of the above:

- an 'active reader' (i.e. an ongoing transaction that has placed a read lock on an object) will block a different transaction that wishes to place a write lock on the same object.

- vice-versa is also true

- if a reader wants to access an object which is currently locked by a writer then they can use a 'dirty read' (aka an 'access lock' in Teradata termnology).

 

And it may vary depending on what version of TD you're using because of a specific feature - which I'll come to later.

 

However let's start with default (and probably the most common) transaction mode, Teradata.

 

When runing in Teradata mode, by default each SQL request is a transaction by itself. So taking your example from above what would happen is:

 

Session 1:
CREATE TABLE emp(empno NUMBER, deptno NUMBER, sal NUMBER);

(the database will commit at this point)
INSERT INTO emp VALUES (1, 10, 1000);
(the dbms will commit at this point)

INSERT INTO emp VALUES (2, 10, 1000);
(the dbms will commit at this point)

--Now I do not commit and start a new session in parallel: (As noted in Teradata BY DEFAULT this is already commited by the dbms)

 

Session 2:
SELECT SUM(sal) FROM emp WHERE deptno=10;
SUM(sal)
--------
2000   <<< different answer because all data is committed, there are no locks on table 'emp', all data is seen by the reader.

 

Now start changing things:

Session 1:

CREATE TABLE emp(empno NUMBER, deptno NUMBER, sal NUMBER);

(the database will commit at this point)

Begin transaction; (or usually BT; for short)

INSERT INTO emp VALUES (1, 10, 1000);

INSERT INTO emp VALUES (2, 10, 1000);

Assuming that session 1 is still logged on, the transaction is still active, the locks are still in place:

Session2:

SELECT SUM(sal) FROM emp WHERE deptno=10;

- this query will block because in Teradata a writer (the txn in session#1) will block a reader.

 

If the query from Session2 was changed to the following:

LOCKING TABLE emp FOR ACCESS

SELECT SUM(sal) FROM emp WHERE deptno=10;

The query will run and will return an answer of 2000 - from all rows.

 

FYI - the 'BT;' command starts what is referred to as an explicit transaction, which can be terminated cleanly by an 'ET;' command. This is the functional equivalent of COMMIT; for Teradata mode explcit transactions.

 

The alternative to 'Teradata mode' is 'ANSI mode', but the base rules about what data is available to a reader are the same. What changes (for this discussion) is when a txn (I got bored writing 'transaction' the whole time!) starts and when it ends.

 

So using your earlier example:

Session 1:

CREATE TABLE emp(empno NUMBER, deptno NUMBER, sal NUMBER);

COMMIT;

 

INSERT INTO emp VALUES (1, 10, 1000);


INSERT INTO emp VALUES (2, 10, 1000);

 At this point this txn is till active and therefore the rows are locked. They can only be read by using the 'dirty read' mechanism described above.

FYI - from a syntax perspective the 'dirty read' can be specified by using 'locking for access' or by 'SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL read uncommitted' <<= this last one allows you to set it once for your session, the 'locking' modifier has to be placed on each request.

 

So the above are the basics for locking between transactions in a Teradata environment. Now we get to teh bit that is realtively new.

 

In Teradata 15.10 a new feature was introduced called 'load isolation' - which I think is effectively what you're looking for here.

  • With this feature, the reader does not get blocked by the writer and the reader only sees the committed data.
  • The table has to be defined as a 'load isolated' table.
  • The writer and reader SQL syntax has to specifyc that this is 'load isolated' writing and reading.

Does that help ?

 

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com
Fan

Re: Read consistency compared to Oracle

That's a great reply, thanks.

 

Now I need to figure out how this can all fit in together in Informatica where the ETL jobs will be reading changes from the source systems and writing them to the FSLDM EDW, and ETL jobs reading changes from the FSLDM EDW and writing them to the data marts, all whilst the users are running reports/queries on the data marts.

 

So the objective is to achieve a real (or near real) time data warehouse.

 

An Active Data Warehouse allegedly solves this problem, although I could not find any literature on the inner workings of the architecture. I suppose it works on the basis of load isolation?

Senior Apprentice

Re: Read consistency compared to Oracle

As I said, 'load isolation' is relatively new to the Teradata world - and I suspect that not many customer sites are using it.

 

Traditionally, most customers that I've worked with haven't worried too much about this 'dirty read' aspect.(One or two have, but most haven't). I think that is largely because historically the data was loaded overnight and end users queried the data during the day, so no overlap of processing.

 

One customer where we did have the overlap resolved the issue by (effectively) implementing 'load isolation' manually. They had a 'run number' column in the relevant tables, and all access was through views which would only allow access to dat from previous 'run numbers'.

 

You might want to talk to your Teradata a/c team, they will have a better understanding of your requirements and processing than I do and can probably stear you in the right direction.

 

Cheers,

Dave

Ward Analytics Ltd - information in motion
www: http://www.ward-analytics.com