Re: AWK does a proper replace but whipes out other characters

From: Ed Morton (morton_at_lsupcaemnt.com)
Date: 05/14/04


Date: Thu, 13 May 2004 19:03:50 -0500


Liza wrote:
> Hi,
> I have a file where tokens are sparated by commas and strings are
> enclosed in double quotes, while numbers are not. Some double quoted
> strings have commas inside them and that ruins my processing since I
> pick out tokens based on the separator i.e. comma.
> Here is a sample line:
>
> "abcd","efg","Google, Inc","555,777,"abc",99999
>
> I need to kill the comma in "Google, Inc" or replace it with
> whitespace or a dot.
> This usually happens in column 3 as in the example above.
>
> I came up with the following awk command (I'm a newby so I know for
> all of you this is very simple)
>
> awk -F '","' '{gsub (",",".", $3);print}' file
> This replaces "Google, Inc" with Google.Inc but it deletes "," between
> the tokens.
> Can you please help me to just replace one comma?
> Also, what if it wasn't just the column 3 that has a problem what if
> other columns whould have commas inside the double quotes, how can I
> deal with it? But if you could help me with one column that'll do.
>
> Lastly, I need to save all the other lines and the fixed lines in the
> same or some other file.
>
> Thank you in advance for your help.
>
> N.K.

Run the file through this first:

gawk 'BEGIN{FS=""}{
        numQuotes = 0
         for (i=1; i<=NF; i++) {
                 if ($i == "\"") {
                         numQuotes++
                 }
                 if ( ! (($i == ",") && (numQuotes % 2) ) ) {
                         printf "%s",$i
                 }
         }
         print ""
}'

and it'll discard any commas that occur within quotes. It does assume
that the quotes are always balanced (i.e. if there's an opening quote
then there's also a closing one before the end of the field) which is
not what your input line had:

"abcd","efg","Google, Inc","555,777,"abc",99999

note the odd number of double quotes above. If that's the case then you
need to look for some other method of identifying the commas you want
removed.

        Ed.



Relevant Pages

  • Re: How do I include quotation marks in a TOC field code?
    ... As for the commas inside/outside the quotes, ... >>> you use italics, ... >>> Microsoft MVP ...
    (microsoft.public.word.docmanagement)
  • Re: Large text file to a CSV file...
    ... spaces if all text fields are not qualified with quotes. ... lines and replace the embedded quotes and commas. ... Open FileNameOut For Output As #intOutputHandle ...
    (microsoft.public.access.externaldata)
  • Re: Large text file to a CSV file...
    ... spaces if all text fields are not qualified with quotes. ... lines and replace the embedded quotes and commas. ... Open FileNameOut For Output As #intOutputHandle ...
    (microsoft.public.access.externaldata)
  • Re: Import data into seperate columns
    ... I don't see the double quotes. ... utilizing the WEBMD_NSF Format. ... currently work in Patient Check-in. ... it doesn't "respect" the commas. ...
    (microsoft.public.excel.misc)
  • Re: AWK does a proper replace but whipes out other characters
    ... > enclosed in double quotes, ... > strings have commas inside them and that ruins my processing since I ... > pick out tokens based on the separator i.e. comma. ...
    (comp.unix.shell)