I am using batch insert mode of TDCH to move data from Hadoop to Teradata. The job runs for sometime but then breaksdown with " Transaction ABORTed due to deadlock". The data is decently large about 10M rows. I was using 3 mappers, that was giving the error.
I ran the same job with 1 mapper and it succeeded. Is there any option by which I can stop deadlocks from happening ?
TDCH version : 1.4.3. TD DB version : 15.
What's the activity look like on YARN on your Hadoop cluster? Does the YARN queue you're running this in have enough resources available to spin up 3 containers? Furthermore, does the job seem to make any progress before deadlocking? Or does it deadlock right away?
Deadlock only happens when multiple rows with same PI index combinations are there in queue.
Because INSERT values lock on a RowHash level. Same PI combination in multple records mean
they will be involved for the same RowHash which puts the system in confusion which record
to accept causing a Deadlock situation. So I will say, you are having duplicate records
PI index over all the sessions.
When fastload methods are used then multiple sessions can be used without deadlocks.
So please use fastload instead of batch.insert