TPT 14.10 output to named pipe and then gzip to final files

Tools
Enthusiast

TPT 14.10 output to named pipe and then gzip to final files

I tried to write TPT output to named pipe, and then use gzip to compress the stream into final files.

gzip < /var/tmp/fact_abc_segment.fifo-1 > ~/data/fact_abc_segment.1.fastload.gz

gzip < /var/tmp/fact_abc_segment.fifo-8 > ~/data/fact_abc_segment.8.fastload.gz

EXPORT_OPERATOR: connecting sessions
EXPORT_OPERATOR: The RDBMS retryable error code list was not found
EXPORT_OPERATOR: The job will use its internal retryable error codes
FILE_WRITER: TPT19424 pmGetPos failed. Request unsupported by Access Module (24)
FILE_WRITER: TPT19424 pmGetPos failed. Request unsupported by Access Module (24)
FILE_WRITER: TPT19307 Fatal error checkpointing data.
FILE_WRITER: TPT19424 pmGetPos failed. Request unsupported by Access Module (24)
FILE_WRITER: TPT19307 Fatal error checkpointing data.
FILE_WRITER: TPT19307 Fatal error checkpointing data.
FILE_WRITER: TPT19424 pmGetPos failed. Request unsupported by Access Module (24)
FILE_WRITER: TPT19003 TPT Exit code set to 12.

FILE_WRITER: TPT19003 ECI operator ID: FILE_WRITER-6640
!WARNING! FIFO pipe detected
**** 16:50:28 Starting to send rows to file '/var/tmp/fact_abc_segment.fifo-1'
FILE_WRITER: TPT19222 Operator instance 1 processing file '/var/tmp/fact_abc_segment.fifo-1'.
!WARNING! FIFO pipe detected
**** 16:50:28 Starting to send rows to file '/var/tmp/fact_abc_segment.fifo-2'
!WARNING! FIFO pipe detected
FILE_WRITER: TPT19222 Operator instance 2 processing file '/var/tmp/fact_abc_segment.fifo-2'.
!WARNING! FIFO pipe detected
!WARNING! FIFO pipe detected

CheckPoint No. 1 started.
MAIN_STEP INSERT_1[0001] Success FILE_WRITER 8 1 CHECKPOINT-Started 16:50:30 0.0000 0.0000 65000 0 0 0 6
0 0 N Y
MAIN_STEP INSERT_1[0002] Success FILE_WRITER 8 2 CHECKPOINT-Started 16:50:30 0.0000 0.0000 65000 0 0 0 6
0 0 N Y
!ERROR! Can't position pipe
MAIN_STEP INSERT_1[0003] Success FILE_WRITER 8 3 CHECKPOINT-Started 16:50:30 0.0000 0.0000 65000 0 0 0 6
0 0 N Y
!ERROR! Can't position pipe
NOT returning position for file=/var/tmp/fact_abc_segment.fifo-1
!ERROR! Can't position pipe
MAIN_STEP INSERT_1[0004] Success FILE_WRITER 8 4 CHECKPOINT-Started 16:50:30 0.0000 0.0000 65000 0 0 0 6
0 0 N Y
NOT returning position for file=/var/tmp/fact_abc_segment.fifo-2

Keep getting "pmGetPos" error.

Tags (4)
26 REPLIES
Enthusiast

Re: TPT 14.10 output to named pipe and then gzip to final files

TPT_INFRA: TPT03720: Error: Checkpoint command failed with 48
!ERROR! Can't position pipe
NOT returning position for file=/var/tmp/fact_abc_segment.fifo-5
FILE_WRITER: TPT19424 pmGetPos failed. Request unsupported by Access Module (24)
FILE_WRITER: TPT19307 Fatal error checkpointing data.
NOT returning position for file=/var/tmp/fact_abc_segment.fifo-7
FILE_WRITER: TPT19424 pmGetPos failed. Request unsupported by Access Module (24)
!ERROR! 'Request unsupported by Access Module'
NOT returning position for file=/var/tmp/fact_abc_segment.fifo-8
!ERROR! pmGetPos rc=24

Here is the wrapper shell script:

if [[ -z "$TWB_ROOT" ]]
then
TWB_ROOT=`find /opt/teradata/client -type d -name tbuild | sort -n | head -1`
LD_LIBRARY_PATH=$TWB_ROOT/lib
fi

data_path=/mnt/n001/temp/FACT_ABC_SEGMENT
data_date=`date +'%Y%m%d%H'`
fifo_path=/var/tmp
i=1

for i in $(seq 1 8); do
fifo_pipe=$fifo_path/fact_abc_segment.fifo-${i}
mkfifo $fifo_pipe
gzip -3 < $fifo_pipe > ${data_path}/fact_abc_segment.${i}.fastload.gz &
done

echo "Ready to call TPT"
tbuild -z 60 -h 128M -C \
-f ~/tpt/export.fact_abc_segment.sql \
-v ~/tpt/export.fact_abc_segment.jobvars \
-u "DataFileCount=${i}" \
fmcs-$data_date
Teradata Employee

Re: TPT 14.10 output to named pipe and then gzip to final files

What are you trying to accomplish here?

TPT can both read and write gzip files.

Might be easier to do that than to use pipes.

-- SteveF
Enthusiast

Re: TPT 14.10 output to named pipe and then gzip to final files

Oh, I have an old wrapper shell script to compress TPT output. Just upgrade to 14.10 TTU lately, it did not work anymore.

How to WRITE to *.gz directly? I tried to specify FileName = xxx.gz but the output file can't be read by gzip.

$ cat tpt/export.fact_abc_segment.jobvars 
UserName='tpt_reader'
UserPassword='tpt_password'
TdpId='tddev'
TechnicalSubjectArea='segment'
DataFilePath='/mnt/data/fastload_local/FACT_ABC_SEGMENT'
DataFileName='fact_abc_segment.gz'

SourceTableName='FACT.FACT_ABC_SEGMENT'

Output files look like:

$ ls -1
fact_abc_segment.gz-1
fact_abc_segment.gz-2
fact_abc_segment.gz-3
fact_abc_segment.gz-4
fact_abc_segment.gz-5
fact_abc_segment.gz-6
fact_abc_segment.gz-7
fact_abc_segment.gz-8

$ gzip -d -c fact_abc_segment.gz-2

gzip: fact_abc_segment.gz-2: not in gzip format

The file content looks like uncompressed fastload formatted indicator file:

$ hexdump -C -n 128 fact_abc_segment.gz-1
00000000 12 00 00 dd 42 11 00 e5 4b f0 03 00 00 00 00 7a |....B...K......z|
00000010 00 00 00 04 0a 12 00 00 dd 42 11 00 32 8e 2d 12 |.........B..2.-.|
00000020 00 00 00 00 7a 00 00 00 04 0a 12 00 00 dd 42 11 |....z.........B.|
00000030 00 99 cd 8b 12 00 00 00 00 7a 00 00 00 04 0a 12 |.........z......|
00000040 00 00 dd 42 11 00 c6 d6 38 01 00 00 00 00 7a 00 |...B....8.....z.|
00000050 00 00 04 0a 12 00 00 dd 42 11 00 3e 6f de 03 00 |........B..>o...|
00000060 00 00 00 7a 00 00 00 04 0a 12 00 00 dd 42 11 00 |...z.........B..|
00000070 7e e5 f7 03 00 00 00 00 7a 00 00 00 04 0a 12 00 |~.......z.......|
Enthusiast

Re: TPT 14.10 output to named pipe and then gzip to final files

Hi Steven,

The original goal is to generate *.gz files directly to save disk space and transportation network overhead later. Generally, *.gz files are 3~5x smaller than the plain FASTLOAD output.

Here is the TPT template:

USING CHARACTER SET UTF8 
DEFINE JOB EXPORT_TO_FASTLOAD_FORMAT
DESCRIPTION 'Export from ' || @SourceTableName || ' to the INDICDATA file: ' || @DataFileName
(
DEFINE SCHEMA DATA_FILE_SCHEMA
(
"DATE_ID" IntDate,
"MEMBER_ID" BigInt,
"CUSTOM_SEGMENT_ID" Int,
"PRIORITY" ByteInt
);

DEFINE OPERATOR EXPORT_OPERATOR
TYPE EXPORT
SCHEMA DATA_FILE_SCHEMA
ATTRIBUTES
(
VARCHAR PrivateLogName = @SourceTableName || '_log',
VARCHAR TdpId = @TdpId,
VARCHAR UserName = @UserName,
VARCHAR UserPassword = @UserPassword,
VARCHAR QueryBandSessInfo = 'Action=TPT_EXPORT; Format=Fastload;',
VARCHAR SpoolMode = 'noSpool',
INTEGER MaxDecimalDigits = 18,
VARCHAR DateForm = 'INTEGERDATE',
VARCHAR SelectStmt = 'select * from ' || @SourceTableName
);

DEFINE OPERATOR FILE_WRITER
TYPE DATACONNECTOR CONSUMER
SCHEMA *
ATTRIBUTES
(
VARCHAR PrivateLogName = 'indicdata_writor_log',
VARCHAR DirectoryPath = @DataFilePath,
VARCHAR FileName = @DataFileName,
VARCHAR Format = 'Formatted',
VARCHAR OpenMode = 'Write',
VARCHAR IndicatorMode = 'Y'
);

APPLY TO OPERATOR (FILE_WRITER[@DataFileCount])
SELECT * FROM OPERATOR (EXPORT_OPERATOR[@NumOfReader]);
);

Thank you so much for the help!

Enthusiast

Re: TPT 14.10 output to named pipe and then gzip to final files

It seems that gzip file can be generated if I write to a single outpuf file with ".gz" file extension.

But I need to generate multiple *.gz files, in order to:

  • utilize multiple CPU cores to compress the data stream
  • gzip is quite CPU instensive, a single gzip process can easily use up 100% of a core
  • while load via TPT's multi-readers, multiple *.gz files are more efficient than single *.gz file
  • all readers don't have to read from the beginning of the same big *.gz file

TPT used to allow writing to named pipes on Linux, so we can launch multiple gzip process in the background to compress the data stream.

Enthusiast

Re: TPT 14.10 output to named pipe and then gzip to final files

┌nmon─12c──────[H for help]───Hostname=xxxx-xxxxx───Refresh= 2secs ───21:03.35─────────────────────────────────────────┐
│ CPU Utilisation ─────────────────────────────────────────────────────────────────────────────────────────────────────│
│ +-------------------------------------------------+ │
│CPU User% Sys% Wait% Idle|0 |25 |50 |75 100| │
│ 1 1.0 6.1 0.5 92.4|sss > │
│ 2 2.0 2.0 3.0 93.1|W > │
│ 3 0.5 0.5 0.5 98.5| > │
│ 4 1.0 2.0 0.0 97.0| > │
│ 5 0.0 0.6 57.4 42.0|WWWWWWWWWWWWWWWWWWWWWWWWWWWW > │
│ 6 3.0 1.0 4.5 91.5|UWW > │
│ 7 0.0 0.0 0.0 100.0| > │
│ 8 1.0 0.5 0.0 98.5| > │
│ 9 100.0 0.0 0.0 0.0|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU> │
│10 2.5 0.5 0.0 97.0|U > │
│11 0.0 0.0 0.0 100.0| > │
│12 0.5 0.0 0.0 99.5| > │
│13 0.5 0.5 0.0 99.0| > │
│14 3.0 0.5 96.5 0.0|UWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW> │
│15 1.2 2.5 1.2 95.1|s > │
│16 5.2 5.2 2.1 87.6|UUssW > │
│17 0.0 1.1 0.0 98.9| > | │
│18 3.0 4.0 1.5 91.6|Us > │
│19 0.0 0.5 0.0 99.5| > | │
│20 3.0 1.5 0.0 95.5|U > │
│21 0.0 0.0 0.0 100.0| > │
│22 0.0 1.6 0.0 98.4| > | │
│23 0.0 0.0 0.0 100.0| > | │
│24 0.0 0.0 0.0 100.0| > | │
│ +-------------------------------------------------+ │
│Avg 4.8 1.1 7.7 86.5|UUWWW > | │
│ +-------------------------------------------------+ │
│ Network I/O ─────────────────────────────────────────────────────────────────────────────────────────────────────────│
│I/F Name Recv=KB/s Trans=KB/s packin packout insize outsize Peak->Recv Trans │
│ lo 0.0 0.0 0.0 0.0 0.0 0.0 14.7 14.7 │
│ eth0 26250.0 83880.6 46598.8 59032.9 576.8 1455.0 749912.6 290677.8 │
│ eth1 0.1 0.1 1.5 0.5 60.0 179.0 1.8 0.1 │
│ bond0 26250.1 83880.7 46600.3 59033.4 576.8 1455.0 749914.4 290677.8 │
│──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────│

nmon output to illustrate when TPT is writing to a single *.gz file without using named pipe. One of the cores is maxed out, but Teradata server has a lot of potential to pump more data to the client machine.

Teradata Employee

Re: TPT 14.10 output to named pipe and then gzip to final files

You can specify multiple instances of the DataConnector operator (as the file writer) and use the -C option on the command line. TPT will round-robin the data to each instance and each instance will write to its own file.

This will help you generate multiple .gz files (all should be roughly the same size).

-- SteveF
Enthusiast

Re: TPT 14.10 output to named pipe and then gzip to final files

Hi Steven,

When I "APPLY TO OPERATOR (FILE_WRITER[@DataFileCount])" and use -C to generate multiple output files, all the files will have ***.gz-n as extension, but the actual file contents are the uncompressed Formatted Fastload instead of compressed Formatted Fastload.

However, when I output to a single file, the content is indeeded compressed with gzip codec.

Is this a bug or intended behavior?

Teradata Employee

Re: TPT 14.10 output to named pipe and then gzip to final files

Yup. Found a bug.

The DC operator is looking at the .gz-x extension and not the .gz extension (prior to appending the instance number).

Plus, putting the instance number on the file extension means the user would have to rename their files prior to loading. Not ideal. We will have to fix that too.

Thanks for your patience.

-- SteveF