Re: Parse irregular data, dump into delimited text file
From: William James (w_a_x_man_at_yahoo.com)
Date: 11/27/05
- Next message: Rocky Jr.: "Re: ksh under Cygwin"
- Previous message: Dan Mercer: "Re: ksh under Cygwin"
- In reply to: d_at_rren.cymraeg.org: "Parse irregular data, dump into delimited text file"
- Next in thread: d_at_rren.cymraeg.org: "Re: Parse irregular data, dump into delimited text file"
- Reply: d_at_rren.cymraeg.org: "Re: Parse irregular data, dump into delimited text file"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 27 Nov 2005 13:52:39 -0800
d@rren.cymraeg.org wrote:
> I've been given an MS Word document containing information to input into
> a database. I've knocked it into shape using various unix tools eg.
> sed, cut etc. so now I have data in a plain text file like this :
>
> name|address|postcode|telephone
>
> The address field contains data in an irregular form, eg.
> 12, the high street, town, place, biggerplace
> The vicarage, town place
>
> I need to be able to format address field above ready for importing
> into another database. In this new database, I have 3 fields for
> address (address1, address2, address3).
>
> So my problem is how to cut this address data and then put it back in a
> text file with delimiter. address3 should contain only one word,
> however, address1 and address2 may contain more than one word. When
> filling in the fields, data should be added from left to right, or in
> the order address1 then address2 then address3. If there is an address
> field left with a blank, that is not a problem as it will be handled by
> the mailmerge software.
This input
name1|12, high street, town, place, biggerplace|postcode1|telephone1
name2|The vicarage, town, place|postcode2|telephone2
name3|The vicarage, place|postcode2|telephone2
produces this output
name1|12 high street|town, place|biggerplace|postcode1|telephone1
name2|The vicarage|town|place|postcode2|telephone2
name3|The vicarage|place||postcode2|telephone2
The language is Ruby:
# Read each line of the file given on the command line.
ARGF.each { |line|
# Remove the newline at the end of the string.
line.chomp!
# Split the string into an array.
array = line.split( "|" )
# Split the address field on commas, removing surrounding
# whitespace.
address = array[1].split( /\s*,\s*/ )
# I'm assuming that if the first part of the address is
# a number, it should be combined with the next part.
if address[0] =~ /^\d+$/
address[0..1] = address[0..1].join( " " )
end
if address.size > 3
# Combine all but the first and last part into one part.
address[1..-2] = address[1..-2].join( ", " )
else
# If we have fewer than 3 parts, tack on an empty string.
address.push( "" ) while address.size < 3
end
array[1] = address
# Assuming that the ouput field-separator is "|".
puts array.flatten.join( "|" )
}
- Next message: Rocky Jr.: "Re: ksh under Cygwin"
- Previous message: Dan Mercer: "Re: ksh under Cygwin"
- In reply to: d_at_rren.cymraeg.org: "Parse irregular data, dump into delimited text file"
- Next in thread: d_at_rren.cymraeg.org: "Re: Parse irregular data, dump into delimited text file"
- Reply: d_at_rren.cymraeg.org: "Re: Parse irregular data, dump into delimited text file"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|