Re: Bug in #! processing - One More Time

From: Garance A Drosehn (gad_at_FreeBSD.org)
Date: 05/15/05

  • Next message: Garance A Drosehn: "Re: Bug in #! processing - One More Time"
    Date: Sun, 15 May 2005 05:24:35 -0400
    To: Maxim Sobolev <sobomax@FreeBSD.org>
    
    

    At 7:05 PM +0300 5/13/05, Maxim Sobolev wrote:
    >Attached please find patch which rips any special processing
    >of command line arguments. It should put FreeBSD into the very
    >same ship with the rest of unices and linuces out there.
    >
    >...This corresponds to the case (3) from the Garance's original
    >message.
    >
    >I've run this change through buildworld.

    Fwiw, this is the patch that I have been working with for the
    same purpose. I might trim down the comments some more. This
    includes a kludge to recognize '#!#<' as a way to trigger the
    historical parsing behavior. I have another version which also
    recognizes '#!#+' as a way to trigger a whole bunch of debug
    messages, but that's probably of no interest to anyone but me.

    This patch is also at:

    http://people.freebsd.org/imgact_shell.diff

    and the file you'd end up with by applying the patch is at:

    http://people.freebsd.org/imgact_shell.c

    This has installed and tested in i386 and powerpc, and sometime
    on Sunday I'll have it installed on a sparc64 machine. (it's
    building now, but my ultra10 machine is much much slower than my
    i386 and my Mac-mini...).

    Index: imgact_shell.c
    ===================================================================
    RCS file: /home/ncvs/src/sys/kern/imgact_shell.c,v
    retrieving revision 1.32
    diff -u -r1.32 imgact_shell.c
    --- imgact_shell.c 25 Feb 2005 10:17:53 -0000 1.32
    +++ imgact_shell.c 15 May 2005 09:07:13 -0000
    @@ -36,6 +36,15 @@
      #include <sys/imgact.h>
      #include <sys/kernel.h>

    +#define KEEP_OLDCODE 1
    +#if BYTE_ORDER == LITTLE_ENDIAN /* temp for OLD_CODE kludge */
    +#define DBG_MAGIC 0x2B23 /* #+ in "little-endian" */
    +#define OLD_MAGIC 0x3C23 /* #< */
    +#else
    +#define DBG_MAGIC 0x232B /* #+ in big-endian */
    +#define OLD_MAGIC 0x233C /* #< */
    +#endif
    +
      #if BYTE_ORDER == LITTLE_ENDIAN
      #define SHELLMAGIC 0x2123 /* #! */
      #else
    @@ -43,15 +52,66 @@
      #endif

      /*
    - * Shell interpreter image activator. An interpreter name beginning
    - * at imgp->args->begin_argv is the minimal successful exit requirement.
    + * At the time of this writing, MAXSHELLCMDLEN == PAGE_SIZE. This is
    + * significant because the caller has only mapped in one page of the
    + * file we're reading. This code should be changed to know how to
    + * read in the second page, but I'm not doing that just yet...
    + */
    +#if MAXSHELLCMDLEN > PAGE_SIZE
    +#error "MAXSHELLCMDLEN is larger than a single page!"
    +#endif
    +
    +/**
    + * Shell interpreter image activator. An interpreter name beginning at
    + * imgp->args->begin_argv is the minimal successful exit requirement.
    + *
    + * If the given file is a shell-script, then the first line will start
    + * with the two characters `#!' (aka SHELLMAGIC), followed by the name
    + * of the shell-interpreter to run, followed by zero or more tokens.
    + *
    + * If there are *any* tokens, then we start up the interpreter such that
    + * it will see:
    + * arg[0] -> The name of interpreter as specified after `#!' in the
    + * first line of the script. The interpreter name must
    + * not be longer than MAXSHELLCMDLEN bytes.
    + * arg[1] -> *If* there are any additional tokens on the first line,
    + * then we add a new arg[1], which is a copy of the rest of
    + * that line. The copy starts at the first token after the
    + * interpreter name. We leave it to the interpreter to
    + * parse the tokens in that value.
    + * arg[x] -> the full pathname of the script. This will either be
    + * arg[2] or arg[1], depending on whether or not tokens
    + * were found after the interpreter name.
    + * arg[x+1] -> whatever arguments that were specified on the original
    + * command line (if the user had specified any).
    + *
    + * This processing is described in the execve(2) man page.
    + */
    +
    +/*
    + * HISTORICAL NOTE: From 1993 to mid-2005, FreeBSD parsed out the tokens as
    + * found on the first line of the script, and setup each token as a separate
    + * value in arg[]. This extra processing did not match the behavior of other
    + * OS's, and caused a few subtle problems. For one, it meant the kernel was
    + * deciding how those values should be parsed (wrt characters for quoting or
    + * comments, etc), while the interpreter might have other rules for parsing.
    + * It also meant the interpreter had no way of knowing which arguments came
    + * from the first line of the shell script, and which arguments were specified
    + * by the user on the command line.
    + *
    + * Luckily, only few things in the base system would notice that non-standard
    + * processing (mainly /bin/sh and /usr/bin/env). And for programs which are
    + * not in the base system, the "newer" behavior matches how NetBSD, OpenBSD,
    + * Linux, Solaris, AIX, IRIX, and some other Unixes have always setup the
    + * arg-list for the interpreter. So if a program can handle this behavior on
    + * any of those other OS's, it should be able to handle it for FreeBSD too.
       */
      int
      exec_shell_imgact(imgp)
              struct image_params *imgp;
      {
              const char *image_header = imgp->image_header;
    - const char *ihp;
    + const char *ihp, *interpb, *interpe, *maxp, *optb, *opte;
              int error, offset;
              size_t length, clength;
              struct vattr vattr;
    @@ -79,14 +139,34 @@
              if (error)
                      return (error);

    - clength = (vattr.va_size > MAXSHELLCMDLEN) ?
    - MAXSHELLCMDLEN : vattr.va_size;
    + /*
    + * Copy shell name and arguments from image_header into a string
    + * buffer. Remember that the caller has mapped only the
    + * first page of the file into memory.
    + */
    + clength = (vattr.va_size > PAGE_SIZE) ? PAGE_SIZE : vattr.va_size;
    +
    + maxp = &image_header[clength];
    + ihp = &image_header[2];
    +#if KEEP_OLDCODE
    + /*
    + * XXX - Temporarily provide a quick-and-dirty way to get the
    + * older, non-standard option-parsing behavior, just in case
    + * someone finds themselves in an emergency where they need it.
    + * This will not be documented. It is only for initial testing.
    + */
    + if (*(const short *)ihp == OLD_MAGIC)
    + ihp += 2;
    + else
    + goto new_code;
    + interpb = ihp;
    +
              /*
               * Figure out the number of bytes that need to be reserved in the
               * argument string to copy the contents of the interpreter's command
               * line into the argument string.
               */
    - ihp = &image_header[2];
    + ihp = interpb;
              offset = 0;
              while (ihp < &image_header[clength]) {
                      /* Skip any whitespace */
    @@ -152,7 +232,7 @@
               * Loop through the interpreter name yet again, copying as
               * we go.
               */
    - ihp = &image_header[2];
    + ihp = interpb;
              offset = 0;
              while (ihp < &image_header[clength]) {
                      /* Skip whitespace */
    @@ -174,8 +254,96 @@
                      imgp->args->begin_argv[offset++] = '\0';
                      imgp->args->argc++;
              }
    + goto common_end;
    +new_code:
    +#endif
    + /*
    + * Find the beginning and end of the interpreter_name. If the
    + * line does not include any interpreter, or if the name which
    + * was found is too long, we bail out.
    + */
    + while (ihp < maxp && ((*ihp == ' ') || (*ihp == '\t')))
    + ihp++;
    + interpb = ihp;
    + while (ihp < maxp && ((*ihp != ' ') && (*ihp != '\t') && (*ihp != '\n')
    + && (*ihp != '\0')))
    + ihp++;
    + interpe = ihp;
    + if (interpb == interpe)
    + return (ENOEXEC);
    + if ((interpe - interpb) >= MAXSHELLCMDLEN)
    + return (ENAMETOOLONG);
    +
    + /*
    + * Find the beginning of the options (if any), and the end-of-line.
    + * Then trim the trailing blanks off the value. Note that some
    + * other operating systems do *not* trim the trailing whitespace...
    + */
    + while (ihp < maxp && ((*ihp == ' ') || (*ihp == '\t')))
    + ihp++;
    + optb = ihp;
    + while (ihp < maxp && ((*ihp != '\n') && (*ihp != '\0')))
    + ihp++;
    + opte = ihp;
    + while (--ihp > interpe && ((*ihp == ' ') || (*ihp == '\t')))
    + opte = ihp;

              /*
    + * We need to "pop" (remove) the present value of arg[0], and "push"
    + * either two or three new values in the arg[] list. To do this,
    + * we first shift all the other values in the `begin_argv' area to
    + * provide the exact amount of room for the values added. Set up
    + * `offset' as the number of bytes to be added to the `begin_argv'
    + * area, and 'length' as the number of bytes being removed.
    + */
    + offset = interpe - interpb + 1; /* interpreter */
    + if (opte != optb) /* options (if any) */
    + offset += opte - optb + 1;
    + offset += strlen(imgp->args->fname) + 1; /* fname of script */
    + length = (imgp->args->argc == 0) ? 0 :
    + strlen(imgp->args->begin_argv) + 1; /* bytes to delete */
    +
    + if (offset - length > imgp->args->stringspace)
    + return (E2BIG);
    +
    + bcopy(imgp->args->begin_argv + length, imgp->args->begin_argv + offset,
    + imgp->args->endp - (imgp->args->begin_argv + length));
    +
    + offset -= length; /* calculate actual adjustment */
    + imgp->args->begin_envv += offset;
    + imgp->args->endp += offset;
    + imgp->args->stringspace -= offset;
    +
    + /*
    + * If there was no arg[0] when we started, then the interpreter_name
    + * is adding an argument (instead of replacing the arg[0] we started
    + * with). And we're always adding an argument when we include the
    + * full pathname of the original script.
    + */
    + if (imgp->args->argc == 0)
    + imgp->args->argc = 1;
    + imgp->args->argc++;
    +
    + /*
    + * The original arg[] list has been shifted appropriately. Copy in
    + * the interpreter name and options-string.
    + */
    + length = interpe - interpb;
    + bcopy(interpb, imgp->args->buf, length);
    + *(imgp->args->buf + length) = '\0';
    + offset = length + 1;
    + if (opte != optb) {
    + length = opte - optb;
    + bcopy(optb, imgp->args->buf + offset, length);
    + *(imgp->args->buf + offset + length) = '\0';
    + offset += length + 1;
    + imgp->args->argc++;
    + }
    +
    +#if KEEP_OLDCODE
    +common_end:
    +#endif
    + /*
               * Finally, add the filename onto the end for the interpreter to
               * use and copy the interpreter's name to imgp->interpreter_name
               * for exec to use.

    -- 
    Garance Alistair Drosehn     =      gad@gilead.netel.rpi.edu
    Senior Systems Programmer               or   gad@FreeBSD.org
    Rensselaer Polytechnic Institute;             Troy, NY;  USA
    _______________________________________________
    freebsd-arch@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-arch
    To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org"
    

  • Next message: Garance A Drosehn: "Re: Bug in #! processing - One More Time"

    Relevant Pages

    • Re: Problems with FreeBSD on Poweredge 6300
      ... > started to move away from Redhat in favor of FreeBSD, ... > I also tried booting Redhat 9.0 and it works fine. ... so I can't boot up to patch it. ...
      (freebsd-current)
    • [PATCH i386] Live Patching Function on 2.6.11.7
      ... The patch was over 50k, so I separate it to each architecture and in line.. ... Live patch command loads the patch modules to target process's memory ... * param:regs register struct ...
      (Linux-Kernel)
    • [RFC PATCH] Address spaces as independent objects
      ... Below is a patch which allows address spaces to be created, ... PTRACE_SWITCH_MM takes a file descriptor in data and makes the child ... +struct user_regs { ...
      (Linux-Kernel)
    • [RFC PATCH] Address spaces as independent objects
      ... Below is a patch which allows address spaces to be created, ... PTRACE_SWITCH_MM takes a file descriptor in data and makes the child ... +struct user_regs { ...
      (Linux-Kernel)
    • Re: Announce: BDB-BASIC Release 0.60 Released
      ... It seems that release 0.60 has busted database ... Patch 2 corrects a couple of other bugs in the ... choose to create/execute in the interpreter. ... in shell I/O redirected fashion if required. ...
      (comp.lang.basic.misc)