My requirement is to move data from set of tables (size per table avg 2TB) from Teradata to Hadoop.
Scooping is taking lot of resources and time.
An idea occured to me (might not be great).
Whether we can extract data to a file using tpt or fexp in parts and load this file to HDFS cluster using proper delimiter ?
If yes, can i just go amp wise.
thats to say if i extract using below query.
SEL * FROM databasename.tablename HASHAMP(HASHBUCKET(HASHROW(indexcolumns)) =ampnumber
(mine is a 250 amp architecture)
Or if we have any faster way to move data from teradata to hadoop.
Please share the same
Could you share if the resoures bottleneck is on Teradata?
If it is on Hadoop, who cares :-)
Can you please provide additional information such as
PLease find the details below.
Is this a one time activity? -- Yes
Is Teradata and Hadoop on the same network? -- Believe its in the hadoop side
What kind of Teradata extraction tool are you using? (any throttles on utilities) -- Normal Scooping which hits an export
What bottlenecks are you seeing? -- slowness in loading, the teradata end is quick and responsive when we directly write to a file.
Is it during the extraction phase or load into HDFS? -- Majorly at loading end
Are you loading into any Hive Schema? -- Yes
Any Hive repository issues? -- NO, its an empty system , dedicated to us
Please share your DBS/HDP versions -- Teradata is 15.1 , HDP is the Horton Works latest release with tap2 frame work.
Agreed, its smooth at teradata end. Just wanted to see , if we can move the data faster.
As scoop is going to be time consuming.
I suggest you start engaging both Hadoop DBA (Not sure if they exist) and Teradata DBA's to troubleshoot this issue.
One thing that comes to mind will be to remove replication in Hadoop and set replicas to 1.