Transform data set


Transform data set

[font=Verdana]Hi Guys,

I was wondering if someone could give advice as to whether the following can be done.

I have sets of data that look like this:

Access Number 7582244A
Attribute 1 1312487
Attribute 2 John Harris'
Attribute 3 025527423'
Attribute 4 English
Attribute 5 Balance Inquiry
Access Number 25564641b
Attribute 1 326489
Attribute 2 Amy Adam
Attribute 3 03025554
Attribute 4 Spanish
Attribute 5 Card Activation
Access Number 4854344v
Attribute 1 125656552
Attribute 2 Xyz Holding
Attribute 3 063255655
Attribute 4 English
Attribute 5 Balance Inquiry

: the data is only available in wordpad and generated direct from source system. This is daily transaction data and average data size per day >1gb.

Need to transform the data into:
[size=4]Access Number Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5
7582244A 1312487 John Harris 025527423' English Balance Inquiry
25564641b 326489 Amy Adam 03025554 Spanish Card Activation
4854344v 125656552 Xyz Holding 063255655 English Balance Inquiry[/size]

Can this be done?

Thanx all.[/font]

Re: Transform data set

The following two scripts assume that your lines are all in the correct sequence - an Access Number followed by 5 Attribute values.
It first changes the "Access Number" header to a newline, removes the "Attribute 1" etc headers using sed.
It pipes the result to awk, taking each group of 6 values (one per line) and transforms them to a single line.

I assumed your input data is in indata.txt; output goes to outdata.txt. I put a "~" between each value to allow loadiing from a separated file - like csv, but I used "~". (Any character can be used but the character should not be present in any of your fields.)

SETUP - One off

cat > transform.sed << EOF
1,$s/Access Number //
1,$s/Attribute . //

cat > transform.awk << EOF
{print $1,"~",$2,"~",$3,"~",$4,"~",$5,"~",$6}


sed -f transform.sed indata.txt | awk -f transform.awk > outdata.txt

If your data layout cannot be guaranteed, or if you are not on Linux/ Unix, a small perl program could also do the job.

Re: Transform data set

If you are on windows or if you'd like to be platform independet you can download perl from and run a perl script to do the job:

#!/usr/bin/perl -w
use strict;
my ($col_value, @row, $out_line);

#read each row of input from STDIN
while (<>) {
if ( m/\A #match for start of row ...
\w+\s\w+\s #... followed by two words
(.*) #capture the rest into $1
/xms ) {
if (scalar @row == 5) { #read 6 col_values (array is zero-indexed)
$out_line = join "~", @row; #separate the 6 col_values by "~"
print "$out_line"; #print line to STDOUT
undef(@row); #reset array for the next rows
else {
push @row, $col_value; #add col_value to the array