Re: complex text file manipulation

From: William Park (opengeometry_at_yahoo.ca)
Date: 07/28/05


Date: Wed, 27 Jul 2005 18:21:00 -0400

In <comp.unix.shell> oraustin@hotmail.com wrote:
> I've got a real puzzler here (well for me at least) - it's accounting
> data that's provided as a text file with fixed with fields -
> the problem is one invoice is split over multiple lines.
> To view the following line-wrap needs to be off and the font spacing
> needs to even.
> ------------------------------------------------------------------------------------------------------------------------------------
> REG. REG VENDOR'S NR. VENDOR INVOICE INVOICE
> INVOICE NET VAT % DESCR.
> DATE sEQ. NR VENDOR'S NAME VAT REG. NR. DATE NR.
> AMOUNT AMOUNT AMOUNT EXEMP.
> ------------------------------------------------------------------------------------------------------------------------------------
> 30/01/04 0000125 1149 00488160011 30/12/03 23091533
> 320.14 266.78 53.36 20.00
> MARLENI & ROADSI P.A. 0000138784
> 30/01/04 0000126 1683 00777280157 31/12/03 8300030488
> 1,550.80 812.18 81.22 10.00
> NERFFE' ITALIANA SPA 0000138785
> 547.83 109.57 20.00
>
> 1.00 0.00 0.00 ART 7
> 30/01/04 0000127 2098 02411210582 31/12/03 01138
> 68.00 61.82 6.18 10.00
> CEGGI GEGGDO-CERBFTTO DE 0000138786
>
> The eventual goal is to load this data into a database so I'd like to
> convert this data to a CSV (or equivalent).
> Each new invoice begins with dd/mm/yy, the name of the vendor occurs on
> the next row, I need to sum the net amount column for each
> invoice (the end of which is marked by the occurence of a new line
> begining with a date)
>
> My question is qhat tool/s should I be using to crack th is one and
> could anyone help me get going please.
> I know it's a lot to ask. But believe me I'm going to be do this stuff
> day in day out so it won't be long before I can contribute
> to the newsgroup as well as request.
> Thanks Oliver

NET AMOUNT is 8th field, so easy enough. But, complication is that
    - you have empty line in the middle of 2nd record
    - vendor's name has spaces
    - number has comma (1,550.80)

First, I assume you want CSV that looks something like

    30/01/04,..............,266.78,...,MARLENI & ROADSI P.A.,...
    30/01/04,...,"1,550.80",812.18,...,NERFFE' ITALIANA SPA,...
    30/01/04,..............,61.82,...,CEGGI GEGGDO-CERBFTTO DE,...

Assuming your input file is 'file.in', I would do

    rm xx*
    csplit -z file.in '/^[0-9][0-9]/' '{*}'
    for i in xx*; do
        out=()
        while read -A; do
            case $1 in
                [A-Za-z]*) set -- "${*:1:$#-1}" "${!#}" ;;
            esac
            pp_append -a out "${@|.csvquote}"
        done < $i
        echo "${out[*]|,,}"
    done > file.out

The output file 'file.out' is 3 lines, like

    30/01/04,0000125,1149,00488160011,30/12/03,23091533,320.14,266.78,53.36,20.00,
        MARLENI & ROADSI P.A.,0000138784
    30/01/04,0000126,1683,00777280157,31/12/03,8300030488,"1,550.80",812.18,81.22,10.00,
        NERFFE' ITALIANA SPA,0000138785,547.83,109.57,20.00,1.00,0.00,0.00,ART,7
    30/01/04,0000127,2098,02411210582,31/12/03,01138,68.00,61.82,6.18,10.00,
        CEGGI GEGGDO-CERBFTTO DE,0000138786

Then, you can extract 8th field and add them up,

    while read -A -C; do
        echo $8
    done < file.out | awk '{sum += $1} END {print sum}'

Now, to run this on Window natively, you obviously need to compile and
install Bash and other Linux tools. <www.cygwin.com> is the only way I
know, but have no personal experience. Even if you can compile and
install Linux stuff on Windows, I would recommend setting up a Linux
box, and

    - share a partition with Windows box. That way, Linux can do the
      data processing, and Windows can use the result.

    - put the script into /etc/inetd.conf. It works like web server.
      You connect to a port in Linux box, and send your data. The
      script will run, and spit out the result back to you.

    - put the script into ~/.procmailrc. It works like spam filtering.
      You send email to Linux box, and the script will send the result
      back as email.

-- 
William Park <opengeometry@yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
	   http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
	  http://freshmeat.net/projects/bashdiff/
-- 
William Park <opengeometry@yahoo.ca>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
	   http://home.eol.ca/~parkw/thinflash.html
BashDiff: Super Bash shell
	  http://freshmeat.net/projects/bashdiff/


Relevant Pages

  • Re: Simply Linux Invoice Program
    ... c, c++ and java. ... > I'm looking for a simple invoice program that runs in linux. ...
    (comp.programming)
  • Re: Simply Linux Invoice Program
    ... > suitable program for my needs. ... > I'm looking for a simple invoice program that runs in linux. ... to use this as learning experience, ...
    (comp.programming)
  • Re: Simply Linux Invoice Program?
    ... > After several weeks of searching, I haven't been able to find a ... > suitable program for my needs. ... > I'm looking for a simple invoice program that runs in linux. ...
    (alt.os.linux)
  • Simply Linux Invoice Program
    ... After several weeks of searching, I haven't been able to find a ... suitable program for my needs. ... I'm looking for a simple invoice program that runs in linux. ...
    (comp.programming)