Re: complex text file manipulation
From: William Park (opengeometry_at_yahoo.ca)
Date: 07/28/05
- Next message: Chris F.A. Johnson: "Re: complex text file manipulation"
- Previous message: John W. Krahn: "Re: Q: efficient way to locate "bad characters" in a text file"
- In reply to: oraustin_at_hotmail.com: "complex text file manipulation"
- Next in thread: William Park: "Re: complex text file manipulation"
- Reply: William Park: "Re: complex text file manipulation"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Wed, 27 Jul 2005 18:21:00 -0400
In <comp.unix.shell> oraustin@hotmail.com wrote:
> I've got a real puzzler here (well for me at least) - it's accounting
> data that's provided as a text file with fixed with fields -
> the problem is one invoice is split over multiple lines.
> To view the following line-wrap needs to be off and the font spacing
> needs to even.
> ------------------------------------------------------------------------------------------------------------------------------------
> REG. REG VENDOR'S NR. VENDOR INVOICE INVOICE
> INVOICE NET VAT % DESCR.
> DATE sEQ. NR VENDOR'S NAME VAT REG. NR. DATE NR.
> AMOUNT AMOUNT AMOUNT EXEMP.
> ------------------------------------------------------------------------------------------------------------------------------------
> 30/01/04 0000125 1149 00488160011 30/12/03 23091533
> 320.14 266.78 53.36 20.00
> MARLENI & ROADSI P.A. 0000138784
> 30/01/04 0000126 1683 00777280157 31/12/03 8300030488
> 1,550.80 812.18 81.22 10.00
> NERFFE' ITALIANA SPA 0000138785
> 547.83 109.57 20.00
>
> 1.00 0.00 0.00 ART 7
> 30/01/04 0000127 2098 02411210582 31/12/03 01138
> 68.00 61.82 6.18 10.00
> CEGGI GEGGDO-CERBFTTO DE 0000138786
>
> The eventual goal is to load this data into a database so I'd like to
> convert this data to a CSV (or equivalent).
> Each new invoice begins with dd/mm/yy, the name of the vendor occurs on
> the next row, I need to sum the net amount column for each
> invoice (the end of which is marked by the occurence of a new line
> begining with a date)
>
> My question is qhat tool/s should I be using to crack th is one and
> could anyone help me get going please.
> I know it's a lot to ask. But believe me I'm going to be do this stuff
> day in day out so it won't be long before I can contribute
> to the newsgroup as well as request.
> Thanks Oliver
NET AMOUNT is 8th field, so easy enough. But, complication is that
- you have empty line in the middle of 2nd record
- vendor's name has spaces
- number has comma (1,550.80)
First, I assume you want CSV that looks something like
30/01/04,..............,266.78,...,MARLENI & ROADSI P.A.,...
30/01/04,...,"1,550.80",812.18,...,NERFFE' ITALIANA SPA,...
30/01/04,..............,61.82,...,CEGGI GEGGDO-CERBFTTO DE,...
Assuming your input file is 'file.in', I would do
rm xx*
csplit -z file.in '/^[0-9][0-9]/' '{*}'
for i in xx*; do
out=()
while read -A; do
case $1 in
[A-Za-z]*) set -- "${*:1:$#-1}" "${!#}" ;;
esac
pp_append -a out "${@|.csvquote}"
done < $i
echo "${out[*]|,,}"
done > file.out
The output file 'file.out' is 3 lines, like
30/01/04,0000125,1149,00488160011,30/12/03,23091533,320.14,266.78,53.36,20.00,
MARLENI & ROADSI P.A.,0000138784
30/01/04,0000126,1683,00777280157,31/12/03,8300030488,"1,550.80",812.18,81.22,10.00,
NERFFE' ITALIANA SPA,0000138785,547.83,109.57,20.00,1.00,0.00,0.00,ART,7
30/01/04,0000127,2098,02411210582,31/12/03,01138,68.00,61.82,6.18,10.00,
CEGGI GEGGDO-CERBFTTO DE,0000138786
Then, you can extract 8th field and add them up,
while read -A -C; do
echo $8
done < file.out | awk '{sum += $1} END {print sum}'
Now, to run this on Window natively, you obviously need to compile and
install Bash and other Linux tools. <www.cygwin.com> is the only way I
know, but have no personal experience. Even if you can compile and
install Linux stuff on Windows, I would recommend setting up a Linux
box, and
- share a partition with Windows box. That way, Linux can do the
data processing, and Windows can use the result.
- put the script into /etc/inetd.conf. It works like web server.
You connect to a port in Linux box, and send your data. The
script will run, and spit out the result back to you.
- put the script into ~/.procmailrc. It works like spam filtering.
You send email to Linux box, and the script will send the result
back as email.
-- William Park <opengeometry@yahoo.ca>, Toronto, Canada ThinFlash: Linux thin-client on USB key (flash) drive http://home.eol.ca/~parkw/thinflash.html BashDiff: Super Bash shell http://freshmeat.net/projects/bashdiff/ -- William Park <opengeometry@yahoo.ca>, Toronto, Canada ThinFlash: Linux thin-client on USB key (flash) drive http://home.eol.ca/~parkw/thinflash.html BashDiff: Super Bash shell http://freshmeat.net/projects/bashdiff/
- Next message: Chris F.A. Johnson: "Re: complex text file manipulation"
- Previous message: John W. Krahn: "Re: Q: efficient way to locate "bad characters" in a text file"
- In reply to: oraustin_at_hotmail.com: "complex text file manipulation"
- Next in thread: William Park: "Re: complex text file manipulation"
- Reply: William Park: "Re: complex text file manipulation"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|