Re: AWK does a proper replace but whipes out other characters
From: Ed Morton (morton_at_lsupcaemnt.com)
Date: 05/14/04
- Next message: Barry Margolin: "Re: [Slightly OT] - Socket Security"
- Previous message: Kevin Collins: "Re: [Slightly OT] - Socket Security"
- In reply to: Liza: "AWK does a proper replace but whipes out other characters"
- Next in thread: rakesh sharma: "Re: AWK does a proper replace but whipes out other characters"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Thu, 13 May 2004 19:03:50 -0500
Liza wrote:
> Hi,
> I have a file where tokens are sparated by commas and strings are
> enclosed in double quotes, while numbers are not. Some double quoted
> strings have commas inside them and that ruins my processing since I
> pick out tokens based on the separator i.e. comma.
> Here is a sample line:
>
> "abcd","efg","Google, Inc","555,777,"abc",99999
>
> I need to kill the comma in "Google, Inc" or replace it with
> whitespace or a dot.
> This usually happens in column 3 as in the example above.
>
> I came up with the following awk command (I'm a newby so I know for
> all of you this is very simple)
>
> awk -F '","' '{gsub (",",".", $3);print}' file
> This replaces "Google, Inc" with Google.Inc but it deletes "," between
> the tokens.
> Can you please help me to just replace one comma?
> Also, what if it wasn't just the column 3 that has a problem what if
> other columns whould have commas inside the double quotes, how can I
> deal with it? But if you could help me with one column that'll do.
>
> Lastly, I need to save all the other lines and the fixed lines in the
> same or some other file.
>
> Thank you in advance for your help.
>
> N.K.
Run the file through this first:
gawk 'BEGIN{FS=""}{
numQuotes = 0
for (i=1; i<=NF; i++) {
if ($i == "\"") {
numQuotes++
}
if ( ! (($i == ",") && (numQuotes % 2) ) ) {
printf "%s",$i
}
}
print ""
}'
and it'll discard any commas that occur within quotes. It does assume
that the quotes are always balanced (i.e. if there's an opening quote
then there's also a closing one before the end of the field) which is
not what your input line had:
"abcd","efg","Google, Inc","555,777,"abc",99999
note the odd number of double quotes above. If that's the case then you
need to look for some other method of identifying the commas you want
removed.
Ed.
- Next message: Barry Margolin: "Re: [Slightly OT] - Socket Security"
- Previous message: Kevin Collins: "Re: [Slightly OT] - Socket Security"
- In reply to: Liza: "AWK does a proper replace but whipes out other characters"
- Next in thread: rakesh sharma: "Re: AWK does a proper replace but whipes out other characters"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|