Re: retain_quoted v0.003

From: Chris F.A. Johnson (cfajohnson_at_gmail.com)
Date: 10/11/05


Date: Tue, 11 Oct 2005 17:09:15 -0400

On 2005-10-11, John Kelly wrote:
> retain_quoted v0.003
>
> What is it?
>
> A sed script which escapes blanks and backslashes inside quoted
> strings.
[snip]
> There is yet a problem, though.
>
> As mentioned, read does not understand quoted strings. So inside the
> variable holding the value found in column 4, we have a pair of quotes
> surrounding the string. To the user, this is unacceptable, because he
> wanted the quotes only for preserving the literal value of the string,
> not for accompanying it into the variable. The empty string in column
> 2 confirms this notion.
>
> To solve the complete problem, we need an unquote function, which may
> be forthcoming any day now. :-)

   I thought that had already been dealt with.

> In the meantime, here is "retain_quoted," the sed script. It begins
> with version 0.003, superseding the formerly named "sed escapes" which
> ended with version 0.002.
>
> Version 0.003 has significant improvements. It handles quoted strings
> like the shell, so double quotes can be escaped inside a double quoted
> string. And it's remarkably more efficient, handling whole substrings
> instead of one character at a time.
>
> It's commented to be understood, even though it is "sed." And I'm not
> joking. :-D

    This script actually confirms my dislike of sed for much other
    than simple search and replace. I'd prefer to write this in C.

    The following program is not thoroughly tested and may need
    tweaking, but it only took a few minutes to write:

#include <stdio.h>

int main(void)
{
  int c;
  int inq = 0;
  int escaped = 0;

  while ( (c = getchar()) != EOF ) {
    switch (c) {
      case '"':
        if ( escaped == 1 ) escaped = 0 ;
        else if ( inq == 0 ) inq = 1;
        else inq = 0;
        break;

      case ' ':
        if ( inq == 1 ) putchar( '\\');
        break;

      case '\\':
        ++escaped;
        break;
    }
    putchar( c );
    if ( escaped >= 2 ) escaped = 0;
  }
  return 0;
}

> #!/bin/sed -rf
>
> # I, John Kelly, the author of this original work, hereby release it to
> # the public domain. Do with it what you wish, except complain; it has
> # no warranty of any kind. Such effected Tuesday, October 11, 2005.
>
> # append priming newline
> s/$/\n/
>
>: scan_string
> /^\n/b exit
>
> # find opening quote
> /^(([^"'\n]*([^\"'\n]|\\"|\\'))*)("|')(.*)/{
>
> # rotate it to the front
> s//\4\5\1/
>
> # find closing single quote
> /^(')[^'\n]*\1/{
> s//&\n/
> b retain_quoted
> }
>
> # find closing double quote
> /^(")(([^"\n]*([^\"\n]|\\"))*)\1/{
> s//&\n/
> b retain_quoted
> }
>
> # did not find closing quote
> b unquoted
>
> : retain_quoted
> h # save string
> s/(.).*/\1/ # isolate quote character
> x # swap it for string
> G # append it to string
> s/(.*).(.)/\1\2/ # remove extraneous newline
> h # save string
> s/^(.)([^\n]*)\1\n.*/\2/ # isloate quotation
> s/[\[:blank:]]/\\&/g # insert backslash escapes
> x # swap quotation for string
> s/^([^\n]*)\n// # strip old quotation from string
> x # swap string for quotation
> G # append string to quotation
> s/^([^\n]*)\n(.*)(.)/\3\1\3\n\2/ # restore quotation marks
>
> s/^([^\n]*)\n(.*)/\2\1/ # rotate quoted to the rear
> b scan_string
> }
>
>: unquoted
> s/^([^\n]*)(.*)/\2\1/ # rotate unquoted to the rear
>
>: exit
> # remove priming newline
> s/^.//

-- 
    Chris F.A. Johnson                     <http://cfaj.freeshell.org>
    ==================================================================
    Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
    <http://www.torfree.net/~chris/books/cfaj/ssr.html>