(patch for Bash) regex case statement

From: William Park (opengeometry_at_yahoo.ca)
Date: 01/08/04


Date: 8 Jan 2004 04:49:59 GMT

Following up on my previous patch for regex conditional tests, I've
extended regex matching to 'case' statement.

If any portion of the pattern is quoted, then quotes are removed in the
same manner as here document delimiter, and regular expression matching
is done via builtiin command 'match'. If pattern is not quoted, then
the normal glob matching is used.

Eg.
    case abc123 in
        abc*) ... ;; --> glob
        [a-z]*[0-9]) ... ;; --> glob
        'abc') ... ;; --> regex
        '[a-z].*[0-9]') ... ;; --> regex
    esac
where all case patterns are successful. This is functionally equivalent
to
    if [[ abc123 == abc* ]]; then
        ...
    else if [[ abc123 == [a-z]*[0-9] ]]; then
        ...
    else if [[ abc123 =~ 'abc' ]]; then
        ...
    else if [[ abc123 =~ '[a-z].*[0-9]' ]]; then
        ...
    fi

For more detail about regex tests, see
    http://groups.google.com/groups?selm=btgihg%246gdcq%241%40ID-99293.news.uni-berlin.de

-- 
William Park, Open Geometry Consulting, <opengeometry@yahoo.ca>
Linux solution for data management and processing. 
diff -ru bash-2.05b/Makefile.in bash/Makefile.in
--- bash-2.05b/Makefile.in	2002-05-31 13:44:23.000000000 -0400
+++ bash/Makefile.in	2004-01-05 16:16:19.000000000 -0500
@@ -400,7 +400,7 @@
 	       $(DEFSRC)/builtin.def $(DEFSRC)/cd.def $(DEFSRC)/colon.def \
 	       $(DEFSRC)/command.def ${DEFSRC}/complete.def \
 	       $(DEFSRC)/declare.def \
-	       $(DEFSRC)/echo.def $(DEFSRC)/enable.def $(DEFSRC)/eval.def \
+	       $(DEFSRC)/echo.def $(DEFSRC)/enable.def $(DEFSRC)/eval.def $(DEFSRC)/array.def \
 	       $(DEFSRC)/exec.def $(DEFSRC)/exit.def $(DEFSRC)/fc.def \
 	       $(DEFSRC)/fg_bg.def $(DEFSRC)/hash.def $(DEFSRC)/help.def \
 	       $(DEFSRC)/history.def $(DEFSRC)/jobs.def $(DEFSRC)/kill.def \
@@ -419,7 +419,7 @@
 BUILTIN_OBJS = $(DEFDIR)/alias.o $(DEFDIR)/bind.o $(DEFDIR)/break.o \
 	       $(DEFDIR)/builtin.o $(DEFDIR)/cd.o $(DEFDIR)/colon.o \
 	       $(DEFDIR)/command.o $(DEFDIR)/declare.o \
-	       $(DEFDIR)/echo.o $(DEFDIR)/enable.o $(DEFDIR)/eval.o \
+	       $(DEFDIR)/echo.o $(DEFDIR)/enable.o $(DEFDIR)/eval.o $(DEFDIR)/array.o\
 	       $(DEFDIR)/exec.o $(DEFDIR)/exit.o $(DEFDIR)/fc.o \
 	       $(DEFDIR)/fg_bg.o $(DEFDIR)/hash.o $(DEFDIR)/help.o \
 	       $(DEFDIR)/history.o $(DEFDIR)/jobs.o $(DEFDIR)/kill.o \
@@ -1135,6 +1135,11 @@
 builtins/eval.o: command.h config.h ${BASHINCDIR}/memalloc.h error.h general.h xmalloc.h ${BASHINCDIR}/maxpath.h
 builtins/eval.o: shell.h syntax.h bashjmp.h ${BASHINCDIR}/posixjmp.h sig.h unwind_prot.h variables.h arrayfunc.h conftypes.h quit.h 
 builtins/eval.o: dispose_cmd.h make_cmd.h subst.h externs.h ${BASHINCDIR}/stdc.h
+
+builtins/array.o: command.h config.h ${BASHINCDIR}/memalloc.h error.h general.h xmalloc.h ${BASHINCDIR}/maxpath.h
+builtins/array.o: shell.h syntax.h bashjmp.h ${BASHINCDIR}/posixjmp.h sig.h unwind_prot.h variables.h arrayfunc.h conftypes.h quit.h 
+builtins/array.o: dispose_cmd.h make_cmd.h subst.h externs.h ${BASHINCDIR}/stdc.h
+
 builtins/exec.o: bashtypes.h
 builtins/exec.o: command.h config.h ${BASHINCDIR}/memalloc.h error.h general.h xmalloc.h ${BASHINCDIR}/maxpath.h
 builtins/exec.o: shell.h syntax.h bashjmp.h ${BASHINCDIR}/posixjmp.h sig.h unwind_prot.h variables.h arrayfunc.h conftypes.h
@@ -1278,6 +1283,9 @@
 builtins/echo.o: $(DEFSRC)/echo.def
 builtins/enable.o: $(DEFSRC)/enable.def
 builtins/eval.o: $(DEFSRC)/eval.def
+
+builtins/array.o: $(DEFSRC)/array.def
+
 builtins/exec.o: $(DEFSRC)/exec.def
 builtins/exit.o: $(DEFSRC)/exit.def
 builtins/fc.o: $(DEFSRC)/fc.def
diff -ru bash-2.05b/braces.c bash/braces.c
--- bash-2.05b/braces.c	2002-05-06 13:50:40.000000000 -0400
+++ bash/braces.c	2004-01-03 16:40:26.000000000 -0500
@@ -64,6 +64,8 @@
 static char **array_concat ();
 #endif
 
+#include "chartypes.h"		/* needed for ISLOWER() and ISUPPER() */
+
 /* Return an array of strings; the brace expansion of TEXT. */
 char **
 brace_expand (text)
@@ -161,22 +163,195 @@
       ADVANCE_CHAR (amble, alen, j);
     }
 
-  if (!amble[j])
-    {
-      free (amble);
-      free (preamble);
-      result[0] = savestring (text);
-      return (result);
-    }
+  if (!amble[j]) {
+      /*************************************************************************
+       * Okey, found a standalone brace expression without ','.  If the amble
+       * contains 'a..b' expression, where 'a' and 'b' are positive integers,
+       * then replace it with 'a,a+1,...,b' (if a < b) or 'a,a-1,...,b' (if a >
+       * b), and give it back to shell for a normal expansion.  If 'a' or 'b'
+       * has leading '0', then zero pad the numbers.  The format size is the
+       * maximum size of 'a' or 'b'.  This is brace version of 'seq a b'.
+       *
+       * If 'a' or 'b' is a regular shell variable (not positional parameter or
+       * array element), then replace it with its value $a or $b.  If 'a' or 'b'
+       * starts with '!', then indirect substitution will be tried, similiar to
+       * ${!a} or ${!b}.  In any case, if the final 'a..b' is pure number, then
+       * generate the usual integer sequence.  This is brace version of 'seq $a
+       * $b' or 'seq ${!a} ${!b}'.
+       *
+       * If 'a' or 'b' is '#', then replace it with value $# and generate
+       * integer sequence as usual.  If 'a' or 'b' is '*', then replace it with
+       * value $#, and generate parameter sequence by putting '${}' around the
+       * integers to indicate positional parameter.  However, expansion is done
+       * only if there are parameters (ie. $# >= 1).  If there is no parameter,
+       * then don't replace it.  This is brace version of 'seq a $#', 'seq $#
+       * b', and $*.
+       *
+       * If the expression is 'a--b', where 'a' and 'b' are strings of same
+       * size, then generate string sequence.  Characters must be both lowercase
+       * or both uppercase.  So, {a--c} is same as {a,b,c} and {A--C} is same as
+       * {A,B,C}, and {Aa--Bb} is same as {Aa,Ab,...,Az,Ba,Bb}.
+       * 
+       * Otherwise, return the original string back to shell as is, like before.
+       *
+       * --William Park <opengeometry@yahoo.ca>
+       */
+      char *a, *b, *t;
+      int dollarflag, zeropad, compareflag;
+      size_t i, end, n, size;
+      intmax_t x, y;
+
+      if (t = strstr (amble, "--")) {
+	  a = substring (amble, 0, t - amble);
+	  b = substring (amble, t - amble + 2, alen);
+	  if (strlen (a) == 0 || strlen (a) != strlen (b)) {
+	      free (a);
+	      free (b);
+	      goto Original_Code;
+	  }
+	  size = strlen (a);
+	  n = 1;
+	  for (i = 0; i < size; i++) {
+	      if (! (ISLOWER (a[i]) && ISLOWER (b[i]) || ISUPPER (a[i]) && ISUPPER (b[i]))) {
+		  free (a);
+		  free (b);
+		  goto Original_Code;
+	      }
+	      if (a[i] != b[i] || n > 1) 
+		  if (n == 1)
+		      n = abs (b[i] - a[i]) + 1;	/* first position */
+		  else
+		      n *= 26;		/* max number: 26^{size} */
+	  }
+	  
+	  /* By this point, 'a' and 'b' are strings of equal size.
+	   */
+	  tack = strvec_create (n + 1);
+	  n = 0;
+	  do {
+	      tack[n++] = savestring (a);
+	      if ((compareflag = strcmp (a, b)) == 0) {
+		  tack[n] = (char *)NULL;
+		  break;
+	      }
+	      else if (compareflag < 0) {
+		  for (i = size - 1; i >= 1 && (a[i] == 'Z' || a[i] == 'z'); i--)
+		      a[i] -= 25;	/* back to 'A' or 'a' */
+		  ++a[i];
+	      }
+	      else if (compareflag > 0) {
+		  for (i = size - 1; i >= 1 && (a[i] == 'A' || a[i] == 'a'); i--)
+		      a[i] += 25;	/* back to 'Z' or 'z' */
+		  --a[i];
+	      }
+	  } while (1);
+      }
+      else if (t = strstr (amble, "..")) {
+	  a = substring (amble, 0, t - amble);
+	  b = substring (amble, t - amble + 2, alen);
+	  dollarflag = zeropad = 0;
+
+	  if (legal_identifier (a) && (t = get_string_value (a))) {
+	      free (a);
+	      a = savestring (t);
+	  }
+	  else if (*a == '!') {
+	      if (legal_identifier (a + 1) && (t = get_string_value (a + 1))) 
+		  if (legal_identifier (t) && (t = get_string_value (t))) {
+		      free (a);
+		      a = savestring (t);
+		  }
+	  }
+	  else if ((*a == '#' || *a == '*') && a[1] == '\0') {
+	      if (*a == '*')
+		  dollarflag = 1;
+	      if (n = number_of_args ()) {
+		  free (a);
+		  a = itos (n);
+	      }
+	  }
+
+	  if (legal_identifier (b) && (t = get_string_value (b))) {
+	      free (b);
+	      b = savestring (t);
+	  }
+	  else if (*b == '!') {
+	      if (legal_identifier (b + 1) && (t = get_string_value (b + 1))) 
+		  if (legal_identifier (t) && (t = get_string_value (t))) {
+		      free (b);
+		      b = savestring (t);
+		  }
+	  }
+	  else if ((*b == '#' || *b == '*') && b[1] == '\0') {
+	      if (*b == '*')
+		  dollarflag = 1;
+	      if (n = number_of_args ()) {
+		  free (b);
+		  b = itos (n);
+	      }
+	  }
+
+	  /* By this point, 'a' and 'b' must be all numbers.  If not, then exit
+	   * per original code.  Check for empty string explicitly, because
+	   * all_digits() returns 1 if string is empty (crazy!).
+	   */
+	  if (!(*a && all_digits (a) && legal_number (a, &x) && x >= 0
+		      && *b && all_digits (b) && legal_number (b, &y) && y >= 0)) {
+	      free (a);
+	      free (b);
+	      goto Original_Code;
+	  }
+
+	  i = x;
+	  end = y;
+	  n = abs (end - i) + 1;
+	  size = (strlen (a) > strlen (b)) ? strlen (a) : strlen (b);
+	  if (strlen (a) > 1 && *a == '0' || strlen (b) > 1 && *b == '0')
+	      zeropad = 1;
+
+	  tack = strvec_create (n + 1);
+	  n = 0;
+	  do {
+	      t = (char *)xmalloc (size + 3 + 1);	/* ${number} or number */
+	      if (dollarflag)
+		  sprintf (t, "${%d}", i);
+	      else if (zeropad)
+		  sprintf (t, "%0*d", size, i);
+	      else
+		  sprintf (t, "%d", i);
+	      tack[n++] = t;
+	      if (i == end) {
+		  tack[n] = (char *)NULL;
+		  break;
+	      }
+	      else if (i < end)
+		  ++i;
+	      else if (i > end)
+		  --i;
+	  } while (1);
+      }
+      else {
+Original_Code:
+	  free (amble);		/* original code */
+	  free (preamble);
+	  result[0] = savestring (text);
+	  return (result);
+      }
+
+      free (a);
+      free (b);
+      goto New_Tack;
+  }
 #endif /* SHELL */
 
-  postamble = &text[i + 1];
-
   tack = expand_amble (amble, alen);
+New_Tack:
   result = array_concat (result, tack);
   free (amble);
   strvec_dispose (tack);
 
+  postamble = &text[i + 1];
+
   tack = brace_expand (postamble);
   result = array_concat (result, tack);
   strvec_dispose (tack);
diff -ru bash-2.05b/builtins/Makefile.in bash/builtins/Makefile.in
--- bash-2.05b/builtins/Makefile.in	2002-04-23 09:24:23.000000000 -0400
+++ bash/builtins/Makefile.in	2004-01-05 16:16:34.000000000 -0500
@@ -108,7 +108,7 @@
 DEFSRC =  $(srcdir)/alias.def $(srcdir)/bind.def $(srcdir)/break.def \
 	  $(srcdir)/builtin.def $(srcdir)/cd.def $(srcdir)/colon.def \
 	  $(srcdir)/command.def $(srcdir)/declare.def $(srcdir)/echo.def \
-	  $(srcdir)/enable.def $(srcdir)/eval.def $(srcdir)/getopts.def \
+	  $(srcdir)/enable.def $(srcdir)/eval.def $(srcdir)/array.def $(srcdir)/getopts.def \
 	  $(srcdir)/exec.def $(srcdir)/exit.def $(srcdir)/fc.def \
 	  $(srcdir)/fg_bg.def $(srcdir)/hash.def $(srcdir)/help.def \
 	  $(srcdir)/history.def $(srcdir)/jobs.def $(srcdir)/kill.def \
@@ -125,7 +125,7 @@
 
 OFILES = builtins.o \
 	alias.o bind.o break.o builtin.o cd.o colon.o command.o \
-	common.o declare.o echo.o enable.o eval.o evalfile.o \
+	common.o declare.o echo.o enable.o eval.o array.o evalfile.o \
 	evalstring.o exec.o \
 	exit.o fc.o fg_bg.o hash.o help.o history.o jobs.o kill.o let.o \
 	pushd.o read.o return.o set.o setattr.o shift.o source.o \
@@ -225,6 +225,9 @@
 echo.o: echo.def
 enable.o: enable.def
 eval.o: eval.def
+
+array.o: array.def
+
 exec.o: exec.def
 exit.o: exit.def
 fc.o: fc.def
@@ -365,6 +368,14 @@
 eval.o: $(topdir)/subst.h $(topdir)/externs.h
 eval.o: $(topdir)/shell.h $(topdir)/syntax.h $(topdir)/unwind_prot.h $(topdir)/variables.h $(topdir)/conftypes.h
 eval.o: $(BASHINCDIR)/maxpath.h
+
+array.o: $(topdir)/command.h ../config.h $(BASHINCDIR)/memalloc.h
+array.o: $(topdir)/error.h $(topdir)/general.h $(topdir)/xmalloc.h
+array.o: $(topdir)/quit.h $(topdir)/dispose_cmd.h $(topdir)/make_cmd.h
+array.o: $(topdir)/subst.h $(topdir)/externs.h
+array.o: $(topdir)/shell.h $(topdir)/syntax.h $(topdir)/unwind_prot.h $(topdir)/variables.h $(topdir)/conftypes.h
+array.o: $(BASHINCDIR)/maxpath.h
+
 exec.o: $(topdir)/bashtypes.h
 exec.o: $(topdir)/command.h ../config.h $(BASHINCDIR)/memalloc.h
 exec.o: $(topdir)/error.h $(topdir)/general.h $(topdir)/xmalloc.h
diff -ru bash-2.05b/builtins/array.def bash/builtins/array.def
--- bash-2.05b/builtins/array.def	2004-01-06 14:13:49.000000000 -0500
+++ bash/builtins/array.def	2004-01-07 19:16:18.000000000 -0500
@@ -0,0 +1,803 @@
+This file is array.def, from which is created array.c.
+It implements the builtin "array" in Bash.
+
+$PRODUCES array.c
+
+
+/* Copied from ./eval.def */
+
+#include <config.h>
+#if defined (HAVE_UNISTD_H)
+#  ifdef _MINIX
+#    include <sys/types.h>
+#  endif
+#  include <unistd.h>
+#endif
+
+#include "../shell.h"
+#include "bashgetopt.h"
+#include "common.h"
+
+
+/* My code begins... */
+
+#include <sys/types.h>		/* for regex */
+#include <regex.h>		/* for regex */
+
+int regex_ignore_case = 0;
+int regex_match_newline = 0;
+
+
+/*******************************************************************************
+ * Command-line version of regex conditional test
+ *	string =~ regex
+ *	string !~ regex
+ * It's equivalent to Awk match() function,
+ *	match (string, regex, SUBMATCH)	
+ *
+ * --William Park <opengeometry@yahoo.ca>
+ */
+
+$BUILTIN match
+$FUNCTION match_builtin
+$SHORT_DOC match string regex
+Command-line version of regex conditional test,
+    string =~ regex     --> match string regex
+    string !~ regex     --> ! match string regex
+Return success if 'string' contains 'regex' pattern.  Also, return array
+variable SUBMATCH containing substrings which match parenthesized groups
+in 'regex'.  It's equivalent to Awk match() function,
+    match (string, regex, SUBMATCH)
+$END
+
+
+/* Backend engine for match_builtin() and other functions wanting to use regex
+ * matching.
+ */
+int
+match_regex (string, regex)
+    char *string, *regex;
+{
+    regex_t preg;		/* size_t preg.re_nsub; */
+    regmatch_t *pmatch;
+    int cflag, eflag;
+
+    SHELL_VAR *var;
+    char *t;
+    regoff_t a, b;
+    size_t i, n;
+    int retval;
+
+    cflag = REG_EXTENDED;
+    if (regex_ignore_case) cflag |= REG_ICASE;
+    if (regex_match_newline) cflag |= REG_NEWLINE;
+    if (regcomp (&preg, regex, cflag) != 0) {
+	builtin_error ("`%s': illegal regex in regcomp()", regex);
+	regfree (&preg);
+	return (EXECUTION_FAILURE);
+    }
+
+    n = preg.re_nsub;
+    pmatch = (regmatch_t *) xmalloc ((n+1) * sizeof (regmatch_t));
+    eflag = 0;
+    if (regexec (&preg, string, n+1, pmatch, eflag) != 0) {
+	retval = EXECUTION_FAILURE;
+
+    } else if ((var = find_or_make_array_variable ("SUBMATCH", 1)) == 0) {
+	retval = EXECUTION_FAILURE;	/* readonly or noassign */
+
+    } else {
+	retval = EXECUTION_SUCCESS;
+	array_flush (array_cell (var));
+	for (i = 0; i <= n; i++) {
+	    a = pmatch[i].rm_so;
+	    b = pmatch[i].rm_eo;
+	    if (a >= 0 && b >= 0) {
+		t = substring (string, a, b);
+		array_insert (array_cell (var), i, t);
+		free (t);
+	    }
+	}
+    }
+
+    free (pmatch);
+    regfree (&preg);
+    return (retval);
+}
+
+
+int
+match_builtin (list)
+     WORD_LIST *list;
+{
+    char *string, *regex;
+
+    if (no_options (list))
+	return (EX_USAGE);
+    list = loptend;		/* skip over possible `--' */
+
+    if (list == 0)		/* 0 argument */
+	return (EXECUTION_FAILURE);
+
+    string = list->word->word;
+    list = list->next;
+    if (list == 0)		/* 1 argument: match string */
+	return (EXECUTION_FAILURE);
+
+    regex = list->word->word;
+    list = list->next;
+    if (list == 0)		/* 2 argument: match string regex */
+	return (match_regex (string, regex));
+    
+    builtin_error ("expected only 2 arguments");
+    return (EXECUTION_FAILURE);
+}
+
+
+/*******************************************************************************
+ * Emulate Python's map() function.
+ *
+ * --William Park <opengeometry@yahoo.ca>
+ */
+
+$BUILTIN arraymap
+$FUNCTION arraymap_builtin
+$SHORT_DOC arraymap command name [name ...]
+Mimicking Python's map() function, it runs 'command' for each element of
+arrays 'name', ... in parallel.  'command' should take as many positional
+parameters as there are arrays.  This is modified version of 'eval'
+builtins, and is equivalent to
+    command "${name[0]}" "${name[0]}" ...
+    command "${name[1]}" "${name[1]}" ...
+    ...
+    command "${name[N]}" "${name[N]}" ...
+where 'N' is the maximum of all indexes.  Array elements are referenced by
+index key, starting from 0 to N, not the order of storage.  So, there can
+be empty parameters.
+$END
+
+
+int
+arraymap_builtin (list)
+     WORD_LIST *list;
+{
+#if defined (ARRAY_VARS)
+    char *name, *command, *eval_string;
+    arrayind_t i, n;
+    size_t size, eval_len;
+    SHELL_VAR *var;
+    WORD_LIST *t;
+
+    if (no_options (list))
+	return (EX_USAGE);
+    list = loptend;		/* skip over possible `--' */
+
+    if (list == 0)		/* 0 argument */
+	return (EXECUTION_SUCCESS);
+
+    command = list->word->word;	/* no checking */
+
+    list = list->next;
+    if (list == 0)		/* 1 argument: arraymap command */
+	return (EXECUTION_SUCCESS);
+
+    /* 2 or more arguments: arraymap command a ... */
+
+    n = 0;
+    size = strlen (command);
+    for (t = list; t != 0; t = t->next) {
+	name = t->word->word;
+	if (legal_identifier (name) == 0) {
+	    sh_invalidid (name);
+	    return (EXECUTION_FAILURE);
+	}
+
+	var = find_variable (name);
+	if (var == 0 || array_p (var) == 0) {
+	    sh_notfound (name);
+	    return (EXECUTION_FAILURE);
+	}
+
+	i = array_max_index (array_cell (var));
+	n = (n > i) ? n : i;	/* max of all index */
+
+	/* ' "${name[index]}"'  -->  name + index + 8 */
+	size += strlen (name) + INT_STRLEN_BOUND (intmax_t) + 8;
+    }
+
+    /* command "${name[0]}" "${name[0]}" ...
+     * ...
+     * command "${name[n]}" "${name[n]}" ...
+     */
+    for (i = 0; i <= n; i++) {
+	eval_string = (char *) xmalloc (size + 1);
+
+	strcpy (eval_string, command);
+	eval_len = strlen (eval_string);
+
+	for (t = list; t != 0; t = t->next) {
+	    name = t->word->word;
+	    sprintf (eval_string + eval_len, " \"${%s[%d]}\"", name, i);
+	    eval_len = strlen (eval_string);
+	}
+
+	/* Note that parse_and_execute () frees the string it is passed. */
+	if (parse_and_execute (eval_string, "arraymap", SEVAL_NOHIST) != EXECUTION_SUCCESS)
+	    return (EXECUTION_FAILURE);
+    }
+
+    return (EXECUTION_SUCCESS);
+#endif /* ARRAY_VARS */
+}
+
+
+/*******************************************************************************
+ * Emulate Python's filter() function.
+ *
+ * --William Park <opengeometry@yahoo.ca>
+ */
+
+$BUILTIN arrayfilter
+$FUNCTION arrayfilter_builtin
+$SHORT_DOC arrayfilter filter name
+Mimicking Python's filter() function, it runs 'filter' for each element of
+array 'name'.  It returns the array elements, for which 'filter' returns
+success (0).
+$END
+
+
+int
+arrayfilter_builtin (list)
+    WORD_LIST *list;
+{
+#if defined (ARRAY_VARS)
+    char *name, *filter, *eval_string;
+    size_t size;
+    SHELL_VAR *var;
+    ARRAY *a;
+    ARRAY_ELEMENT *ae;
+
+    if (no_options (list))
+	return (EX_USAGE);
+    list = loptend;	/* skip over possible `--' */
+
+    if (list == 0) 		/* 0 argument */
+	return (EXECUTION_SUCCESS);
+
+    filter = list->word->word;	/* no checking */
+
+    list = list->next;
+    if (list == 0) 		/* 1 argument: arrayfilter filter */
+	return (EXECUTION_SUCCESS);
+
+    name = list->word->word;	/* 2 arguments: arrayfilter filter name */
+    if (legal_identifier(name) == 0) {
+	sh_invalidid (name);
+	return (EXECUTION_FAILURE);
+    }
+    var = find_variable (name);
+    if (var == 0 || array_p (var) == 0) {
+	sh_notfound (name);
+	return (EXECUTION_FAILURE);
+    }
+
+    /* filter "value"
+     * ...
+     * filter "value"
+     */
+    a = array_cell (var);
+    if (a == 0 || array_empty (a)) return;	/* do nothing */
+
+    for (ae = element_forw (a->head); ae != a->head; ae = element_forw (ae)) {
+	size = strlen (filter) + strlen(element_value (ae)) + 3;
+
+	eval_string = (char *)xmalloc (size + 1);
+	sprintf (eval_string, "%s \"%s\"", filter, element_value (ae));
+
+	/* Note that parse_and_execute () frees the string it is passed. */
+	if (parse_and_execute (eval_string, "arrayfilter", SEVAL_NOHIST) == EXECUTION_SUCCESS)
+	    puts (element_value (ae));
+    }
+
+    return (EXECUTION_SUCCESS);
+#endif	/* ARRAY_VARS */
+}
+
+
+/*******************************************************************************
+ * Add some of Python's list/dict functionalities.
+ *
+ * --William Park <opengeometry@yahoo.ca>
+ */
+
+
+$BUILTIN array
+$FUNCTION array_builtin
+$SHORT_DOC array [-src] [-i value] [-j sep] [-evEV regex] name [arg...]
+By default, array values are printed, one per line.  Only one option is
+allowed, so the last one takes effect.
+    -i value    Print all indexes with string 'value'  --> list.index(value), ...
+    -j sep      Join all element strings with 'sep' separator  --> sep.join(list)
+
+The following operation changes the array in-place.
+    -s          Sort on array element's value  --> list.sort()
+    -r          Reverse the array  --> list.reverse()
+    -c          Collapse the array, so that there is no missing index
+
+If one or more arguments are present, then the default is to append them
+sequentially to the end of array, mimicking list.append(arg) in Python.  If
+-e or -v option is given, then POSIX 'regex' pattern is applied on 'arg',
+using regcomp(3) and regexec(3), and the resulting substrings are added to
+the end of array.  If shell option 'nocaseregex' is set, then match is
+case-insensitive.  If shell option 'multilineregex' is set, then '.', '^',
+and '$' do not span across \n (newline), so the match is line by line.
+    -e regex    Extract 'regex' patterns from 'arg', and append each
+                matching substring.  (think egrep -e)
+                --> re.findall(regex,arg), minus null string
+    -v regex    Remove regex(7) patterns from 'arg' strings, and append
+                each non-matching substring.  (think egrep -v)
+                --> re.split(regex,arg), minus null string 
+
+Array variable 'name' is not created to allow for repeated calls.  So,
+create it manually.
+$END
+
+
+/* Wrapper around inttostr() in ../lib/sh/itos.c, to convert array index
+ * (arrayind_t) to string.  One can use itos(), but it copies string which
+ * requires an extra step of freeing it.
+ */
+static char *
+element_index_to_string (ae)
+    ARRAY_ELEMENT *ae;
+{
+    /* 'static' to survive outside the function, but is not intended for
+     * long term storage.
+     */
+    static char indstr[INT_STRLEN_BOUND(intmax_t) + 1];
+    
+    return inttostr (element_index (ae), indstr, sizeof (indstr));
+}
+
+
+/* Copied from array_walk() in ../array.c.  For each array element, print its
+ * index key, value, or both index and value, separated by '\t'.  Similiar to
+ * dict.keys(), dict.values(), and dict.items() in Python.
+ */
+static void
+print_elements (var)
+    SHELL_VAR *var;
+{
+    ARRAY *a;
+    ARRAY_ELEMENT *ae;
+
+    a = array_cell (var);
+    if (a == 0 || array_empty (a)) return;	/* do nothing */
+
+    for (ae = element_forw (a->head); ae != a->head; ae = element_forw (ae))
+	printf ("%s\t%s\n", element_index_to_string (ae), element_value (ae));
+}
+
+
+/* Copied from array_walk() in ../array.c.  Print index of all array elements
+ * with 'value'.
+ */
+static void
+print_all_indexes_with_value (var, value)
+    SHELL_VAR *var;
+    char *value;
+{
+    ARRAY *a;
+    ARRAY_ELEMENT *ae;
+
+    a = array_cell (var);
+    if (a == 0 || array_empty (a)) return;	/* do nothing */
+
+    for (ae = element_forw (a->head); ae != a->head; ae = element_forw (ae))
+	if (strcmp(element_value (ae), value) == 0)
+	    puts (element_index_to_string (ae));
+}
+
+
+/* Set array index so that they are from 0 to n-1, where n is the number of
+ * elements that the array has.
+ */
+static void
+array_collapse (var)
+    SHELL_VAR *var;
+{
+    ARRAY *a;
+    ARRAY_ELEMENT *ae;
+    arrayind_t i, n;
+
+    a = array_cell (var);
+    if (a == 0 || array_empty (a)) return;	/* do nothing */
+    
+    n = array_num_elements (a);
+    ae = a->head;
+    for (i = 0; i < n; i++) {
+	ae = element_forw (ae);
+	element_index (ae) = i;
+    }
+}
+
+
+/* Reverse the array order, by swapping the element values.  The index keys are
+ * unchanged.  Similiar to list.reverse() in Python.
+ */
+static void
+array_reverse (var)
+    SHELL_VAR *var;
+{
+    ARRAY *a;
+    ARRAY_ELEMENT *ae, *be;
+    char *t;
+
+    a = array_cell (var);
+    if (a == 0 || array_empty (a)) return;	/* do nothing */
+
+    /* 'ae' goes forward, and 'be' goes backward */
+    for (ae = element_forw (a->head), be = element_back (a->head); 
+	 ae != a->head && be != a->head && element_index (ae) < element_index (be);
+	 ae = element_forw (ae), be = element_back(be))
+    {
+	t = element_value (ae);		/* swap the values */
+	element_value (ae) = element_value (be);
+	element_value (be) = t;
+    }
+}
+
+
+/* Sort the array, either based on element's value.  Similiar to list.sort() in
+ * Python.
+ */
+static void
+array_sort (var, flag)
+    SHELL_VAR *var;
+    int flag;
+{
+    ARRAY *a;
+    ARRAY_ELEMENT *ae;
+    char **base;	/* array holding pointers to element values */
+    arrayind_t n, i;
+
+    int my_strcmp (x, y) char **x, **y;
+    {
+	strcmp (*x, *y);
+    }
+    int my_intcmp (x, y) char **x, **y;
+    {
+	int i = atoi (*x);
+	int j = atoi (*y);
+	
+	if (i < j) return -1;
+	if (i > j) return 1;
+	return 0;
+    }
+
+    a = array_cell (var);
+    if (a == 0 || array_empty (a)) return;	/* do nothing */
+
+    n = array_num_elements (a);
+    base = (char **) xmalloc (n * sizeof (char *));
+    ae = a->head;
+    for (i = 0; i < n; i++) {
+	ae = element_forw (ae);
+	base[i] = element_value (ae);
+    }
+
+    if (integer_p (var)) 
+	qsort (base, n, sizeof (char *), my_intcmp);
+    else
+	qsort (base, n, sizeof (char *), my_strcmp);
+
+    ae = a->head;
+    for (i = 0; i < n; i++) {
+	ae = element_forw (ae);
+	element_value (ae) = base[i];
+    }
+    free (base);
+}
+
+
+/* Copied from bind_array_variable() in ../arrayfunc.c.  Find the last index and
+ * append right after it.  Actually, array_insert() in ../array.c inserts it
+ * "before" the head, which is effectively appending it because ARRAY is
+ * circular linked list.  Similiar to list.append() in Python.
+ */
+static void
+array_append (var, arg)
+    SHELL_VAR *var;
+    char *arg;		/* raw string */
+{
+    char *value;
+    arrayind_t N;
+  
+    if (readonly_p (var) || noassign_p (var))
+	err_readonly (var->name);
+    else {
+	N = array_max_index (array_cell (var));		/* -1 if empty */
+	value = make_variable_value (var, arg);
+	if (var->assign_func)
+	    (*var->assign_func) (var, value, N+1);
+	else
+	    array_insert (array_cell (var), N+1, value);
+	FREE (value);
+    }
+}
+
+
+int
+array_builtin (list)
+    WORD_LIST *list;
+{
+#if defined (ARRAY_VARS)
+    char *name, *arg, *sep, *value, *regex;
+    SHELL_VAR *var;
+    int flag, opt;
+
+    regex_t preg;
+    regmatch_t pmatch[1];
+    char *head, *body, *tail, *t;
+    int cflag, eflag;
+
+    flag = 0;
+    sep = value = (char *)NULL;
+
+    reset_internal_getopt ();
+    while ((opt = internal_getopt (list, "srci:j:e:v:")) != -1) {
+	switch (opt) {
+	case 's':
+	case 'r':
+	case 'c':
+	    flag = opt;
+	    break;
+	case 'i':
+	    flag = opt;
+	    value = list_optarg;
+	    break;
+	case 'j':
+	    flag = opt;
+	    sep = list_optarg;
+	    break;
+	case 'e':
+	case 'v':
+	    flag = opt;
+	    regex = list_optarg;
+	    break;
+	default:
+	    builtin_usage ();
+	    return (EX_USAGE);
+	}
+    }
+    list = loptend;
+
+    if (list == 0) 		/* 0 argument */
+	return (EXECUTION_SUCCESS);
+
+    name = list->word->word;		/* first argument */
+    if (legal_identifier(name) == 0) {
+	sh_invalidid (name);
+	return (EXECUTION_FAILURE);
+    } 
+    var = find_variable (name);
+    if (var == 0 || array_p (var) == 0) {
+	sh_notfound (name);
+	return (EXECUTION_FAILURE);
+    }
+
+    list = list->next;
+
+    if (list == 0) {		/* 1 argument: array [...] name */
+	switch (flag) {
+	case 's':		/* array -s name */
+	    array_sort (var, flag);
+	    break;
+	case 'r':		/* array -r name */
+	    array_reverse (var);
+	    break;
+	case 'c':		/* array -c name */
+	    array_collapse (var);
+	    break;
+	case 'i':		/* array -i value name */
+	    print_all_indexes_with_value (var, value);
+	    break;
+	case 'j':		/* array -j sep name */
+	    arg = array_to_string (array_cell (var), sep, 0 /* no quoting */);
+	    puts (arg);
+	    break;
+	default:		/* array name */
+	    print_elements (var);
+	    break;
+	}
+    }
+    
+    /* 2 or more arguments.  So, we are appending.  If 'list == 0' already, then
+     * it falls through.
+     */
+    while (list) {
+	arg = list->word->word;		/* array -[ev] regex name arg... */
+	if (flag == 'e' || flag == 'v') {
+	    cflag = REG_EXTENDED;
+	    if (regex_ignore_case) cflag |= REG_ICASE;
+	    if (regex_match_newline) cflag |= REG_NEWLINE;
+	    if (regcomp (&preg, regex, cflag) != 0) {
+		builtin_error ("`%s': illegal regex in regcomp()", regex);
+		regfree (&preg);
+		return (EXECUTION_FAILURE);
+	    }
+
+	    head = body = tail = arg;
+	    eflag = 0;
+	    while (*body && regexec (&preg, body, 1, pmatch, eflag) == 0) {
+		body += pmatch[0].rm_so;
+		tail += pmatch[0].rm_eo;
+		if (body == tail) {
+		    body++;
+		    tail++;
+		} else {
+		    if (flag == 'e' && tail != body) {
+			t = substring (body, 0, tail - body);
+			array_append (var, t);
+			free (t);
+		    }
+		    if (flag == 'v' && body != head) {
+			t = substring (head, 0, body - head);
+			array_append (var, t);
+			free (t);
+		    }
+		    head = body = tail;
+		}
+		eflag = REG_NOTBOL;
+	    }
+	    if (flag == 'v' && *head) 
+		array_append (var, head);
+
+	    regfree (&preg);
+
+	} else
+	    array_append (var, arg);	/* append original 'arg' */
+
+	list = list->next;
+    }
+
+    stupidly_hack_special_variables (name);
+    /* fflush (stdout); */
+    return (EXECUTION_SUCCESS);
+#endif	/* ARRAY_VARS */
+}
+
+
+/*******************************************************************************
+ * Fully embedded Python.  With Python compiled and installed to /usr/local as
+ * usual, Bash can be compiled with
+ *
+ *	./configure 
+ *	make CFLAGS="-DEMBED_PYTHON -I/usr/local/include/python2.2"
+ *	     LDFLAGS="-L/usr/local/lib/python2.2 -L/usr/local/lib/python2.2/config
+ *	              -Xlinker -export-dynamic"
+ *	     LOCAL_LIBS="-lpython2.2 -lpthread -lutil -lm"
+ *
+ * where '-lpython2.2 -lpthread -lutil -lm' were determined from Python's
+ * Makefile, and '-Xlinker -export-dynamic' were determined from 
+ *	import distutils.sysconfig
+ *	distutils.sysconfig.get_config_var('LINKFORSHARED')
+ * as described in Python documentation for embedding.
+ *
+ * If you don't want Python, then simply do
+ *	./configure
+ *	make
+ *
+ * Embedding is no longer needed.  You can run a Python script which repeatedly
+ * runs 'exec' statement.  Script, called 'coprocess.py', would go something
+ * like
+ *
+ *	import sys
+ *	fifo_in = sys.argv[1]
+ *	fifo_out = sys.argv[2]
+ *	while 1:
+ *	    fin = open(fifo_in, "r")
+ *	    fout = open(fifo_out, "w")
+ *	    sys.stdout = fout
+ *	    exec fin
+ *	    sys.stdout.flush()
+ *	    fout.close()
+ *	    fin.close()
+ *
+ * Then, you can do
+ *
+ *	mkfifo in out 
+ *	python coprocess.py in out &
+ *
+ *	echo "print 1.0+2.0" > in
+ *	cat out
+ *	echo "import math" > in
+ *	echo "print math.pi" > in
+ *	cat out
+ *	
+ * --William Park <opengeometry@yahoo.ca>
+ */
+
+#if defined(EMBED_PYTHON)
+#include "Python.h"	/* includes <stdio.h> */
+
+$BUILTIN embeddedpython
+$FUNCTION embeddedpython_builtin
+$DEPENDS_ON EMBED_PYTHON
+$SHORT_DOC embeddedpython [-cq] arg...
+Send the command-line arguments to embedded Python.  Syntax follows the
+normal Python, ie.
+    python scriptfile
+    python -c "command"
+except that multiple files or strings can be used.  By default, the
+arguments are script files, so sequentially send the file contents to
+Python via PyRun_SimpleFile().  With '-c' option, the arguments are command
+strings, so sequentially send the string contents to Python via
+PyRun_SimpleString().  For readability, leading whitespaces in the strings
+are removed.  '-q' stops the embedded Python via Py_Finalize().
+$END
+
+
+int
+embeddedpython_builtin (list)
+    WORD_LIST *list;
+{
+    char *arg;
+    int opt, cflag, out;
+
+    cflag = 0;
+
+    reset_internal_getopt ();
+    while ((opt = internal_getopt (list, "cq")) != -1) {
+	switch (opt) {
+	case 'c':
+	    cflag = 1;
+	    break;
+	case 'q':
+	    Py_Finalize();
+	    break;
+	default:
+	    builtin_usage ();
+	    return (EX_USAGE);
+	}
+    }
+    list = loptend;
+
+    if (list == 0) 		/* 0 argument */
+	return (EXECUTION_SUCCESS);
+
+    if (! Py_IsInitialized())
+	Py_Initialize();
+
+    if (cflag) {		/* send string */
+	char *t;
+
+	for ( ; list; list = list->next) {
+	    arg = list->word->word;
+	    while (*arg && spctabnl (*arg) && isifs (*arg))
+		arg++;
+	    out = PyRun_SimpleString (arg);
+	    if (out)
+		return (EXECUTION_FAILURE);
+	}
+    } else {			/* send file */
+	FILE *fd;
+
+	for ( ; list; list = list->next) {
+	    arg = list->word->word;
+	    fd = fopen (arg, "r");
+	    if (fd == NULL) {
+		builtin_error ("cannot open file `%s'", arg);
+		return (EXECUTION_FAILURE);
+	    }
+	    out = PyRun_SimpleFile (fd, arg);
+	    fclose (fd);
+	    if (out)
+		return (EXECUTION_FAILURE);
+	}
+    }
+
+    fflush (stdout);
+    return (EXECUTION_SUCCESS);
+}
+#endif	/* EMBED_PYTHON */
diff -ru bash-2.05b/builtins/echo.def bash/builtins/echo.def
--- bash-2.05b/builtins/echo.def	2002-03-19 10:45:28.000000000 -0500
+++ bash/builtins/echo.def	2004-01-03 16:30:01.000000000 -0500
@@ -31,10 +31,12 @@
 #include <stdio.h>
 #include "../shell.h"
 
+#include "chartypes.h"		/* needed for ISXDIGIT() and HEXVALUE() */
+
 $BUILTIN echo
 $FUNCTION echo_builtin
 $DEPENDS_ON V9_ECHO
-$SHORT_DOC echo [-neE] [arg ...]
+$SHORT_DOC echo [-neE] [-uU] [arg ...]
 Output the ARGs.  If -n is specified, the trailing newline is
 suppressed.  If the -e option is given, interpretation of the
 following backslash-escaped characters is turned on:
@@ -52,6 +54,13 @@
 
 You can explicitly turn off the interpretation of the above characters
 with the -E option.
+
+The following options are added:
+    -u      converts 2-digit '%NN' URL hexcode into 0xNN ASCII character.
+            To avoid confusion wth '\xN' or '\xNN' hexcodes, this option is
+            ignored if -e option is on.
+    -U      encodes ASCII characters to '%NN' hexcode, which is inverse of
+            -u option.
 $END
 
 $BUILTIN echo
@@ -62,7 +71,7 @@
 $END
 
 #if defined (V9_ECHO)
-#  define VALID_ECHO_OPTIONS "neE"
+#  define VALID_ECHO_OPTIONS "neEuU"
 #else /* !V9_ECHO */
 #  define VALID_ECHO_OPTIONS "n"
 #endif /* !V9_ECHO */
@@ -88,6 +97,8 @@
   int display_return, do_v9, i, len;
   char *temp, *s;
 
+  int decode_URL = 0;		/* convert '%NN' to 0xNN ASCII character */
+
   do_v9 = xpg_echo;
   display_return = 1;
 
@@ -124,6 +135,12 @@
 	    case 'E':
 	      do_v9 = 0;
 	      break;
+	    case 'u':
+		decode_URL = 1;
+		break;
+	    case 'U':
+		decode_URL = 2;
+		break;
 #endif /* V9_ECHO */
 	    default:
 	      goto just_echo;	/* XXX */
@@ -145,6 +162,33 @@
 	      for (s = temp; len > 0; len--)
 		putchar (*s++);
 	    }
+
+	  /*********************************************************************
+	   * Conversion between 2-digit '%NN' URL hexcode and ASCII character,
+	   * but only if -e option is not enabled to avoid confusion.  Doing
+	   * this in C is much easier than shell function, because you need
+	   * access to internal binary number.  I wrote this because I could
+	   * only remember '%20' as URL code for space.
+	   *
+	   * --William Park <opengeometry@yahoo.ca>
+	   */
+	  else if (decode_URL == 1) {
+	      for (s = temp; *s; s++)
+		  if (*s == '%' && ISXDIGIT (s[1]) && ISXDIGIT (s[2])) {
+		      putchar (HEXVALUE (s[1]) * 16 + HEXVALUE (s[2]));
+		      s += 2;
+		  } else 
+		      putchar (*s);
+	  } else if (decode_URL == 2) {
+	      char hexchar[] = "0123456789abcdef";
+
+	      for (s = temp; *s; s++) {
+		  putchar ('%');
+		  putchar (hexchar[(*s / 16) & 15]);	/* upper half */
+		  putchar (hexchar[*s & 15]);		/* lower half */
+	      }
+	  }
+
 	  else	    
 	    printf ("%s", temp);
 #if defined (SunOS5)
diff -ru bash-2.05b/builtins/read.def bash/builtins/read.def
--- bash-2.05b/builtins/read.def	2002-03-19 14:33:41.000000000 -0500
+++ bash/builtins/read.def	2004-01-05 17:55:33.000000000 -0500
@@ -23,7 +23,7 @@
 
 $BUILTIN read
 $FUNCTION read_builtin
-$SHORT_DOC read [-ers] [-u fd] [-t timeout] [-p prompt] [-a array] [-n nchars] [-d delim] [name ...]
+$SHORT_DOC read [-ers] [-u fd] [-t timeout] [-p prompt] [-a array] [-n nchars] [-d delim] [-DN] [name ...]
 One line is read from the standard input, or from file descriptor FD if the
 -u option is supplied, and the first word is assigned to the first NAME,
 the second word to the second NAME, and so on, with leftover words assigned
@@ -45,6 +45,10 @@
 its value is the default timeout.  The return code is zero, unless end-of-file
 is encountered, read times out, or an invalid file descriptor is supplied as
 the argument to -u.
+
+The following options are added:
+    -N      creates Awk-style NF and NR shell variables.
+    -D      reads DOS lines terminated by '\r\n'.
 $END
 
 #include <config.h>
@@ -140,6 +144,9 @@
   int rlind;
 #endif
 
+  int awk_NF_NR = 0;		/* Awk's NF and NR variables */
+  int dos_EOL = 0;		/* read DOS lines which end with '\r\n' */
+
   USE_VAR(size);
   USE_VAR(i);
   USE_VAR(pass_next);
@@ -175,7 +182,7 @@
   delim = '\n';		/* read until newline */
 
   reset_internal_getopt ();
-  while ((opt = internal_getopt (list, "ersa:d:n:p:t:u:")) != -1)
+  while ((opt = internal_getopt (list, "ersa:d:n:p:t:u:DN")) != -1)
     {
       switch (opt)
 	{
@@ -239,6 +246,14 @@
 	case 'd':
 	  delim = *list_optarg;
 	  break;
+
+	case 'N':
+	    awk_NF_NR = 1;
+	    break;
+	case 'D':
+	    dos_EOL = 1;
+	    break;
+
 	default:
 	  builtin_usage ();
 	  return (EX_USAGE);
@@ -454,6 +469,22 @@
 	break;
     }
   input_string[i] = '\0';
+ 
+  /*****************************************************************************
+   * Read DOS lines which end in '\r\n'.  If we are reading by lines (ie. delim
+   * == '\n' and nchars == 0), then remove the extra '\r' at the end of string.
+   * So, 
+   *	read -D a b c ...
+   * is equivalent to
+   *	read
+   *	REPLY="${REPLY%$'\r'}"
+   *	read a b c ... <<< "$REPLY"
+   * but less typing.
+   *
+   * --William Park <opengeometry@yahoo.ca>
+   */
+  if (dos_EOL && nchars == 0 && delim == '\n' && i > 0 && input_string[i-1] == '\r')
+      input_string[--i] = '\0';
 
 #if 1
   if (retval < 0)
@@ -492,6 +523,29 @@
 
   retval = eof ? EXECUTION_FAILURE : EXECUTION_SUCCESS;
 
+  /*****************************************************************************
+   * Emulation of Awk variables NF and NR.  The total number of IFS fields and
+   * number of lines read so far will be assigned to shell variables 'NF' and
+   * 'NR', respectively.
+   *
+   * --William Park <opengeometry@yahoo.ca>
+   */
+  if (awk_NF_NR) {
+      intmax_t n;
+      WORD_LIST *fwlist;
+
+      fwlist = list_string (input_string, ifs_chars, 0);
+      n = list_length ((GENERIC_LIST *)fwlist);
+      bind_var_to_int ("NF", n);
+      dispose_words (fwlist);
+
+      t = get_string_value ("NR");
+      if (t && *t && legal_number (t, &n) && n >= 0)
+	  bind_var_to_int ("NR", n + 1);
+      else 
+	  bind_var_to_int ("NR", 1);
+  }
+
 #if defined (ARRAY_VARS)
   /* If -a was given, take the string read, break it into a list of words,
      an assign them to `arrayname' in turn. */
diff -ru bash-2.05b/builtins/reserved.def bash/builtins/reserved.def
--- bash-2.05b/builtins/reserved.def	2002-04-04 14:52:41.000000000 -0500
+++ bash/builtins/reserved.def	2004-01-07 23:14:20.000000000 -0500
@@ -26,6 +26,12 @@
 list of items.  If `in WORDS ...;' is not present, then `in "$@"' is
 assumed.  For each element in WORDS, NAME is set to that element, and
 the COMMANDS are executed.
+
+If NAME contains multiple variables separated by comma, ie.
+    for a,b,c [in WORDS ... ;] do COMMANDS; done
+then list items are sequentially assigned to the loop variables 'a', 'b',
+and 'c' in each loop.  If there is shortage of item, then the last loop
+will run with null assigned to leftover variables.
 $END
 
 $BUILTIN for ((
@@ -69,6 +75,10 @@
 $SHORT_DOC case WORD in [PATTERN [| PATTERN]...) COMMANDS ;;]... esac
 Selectively execute COMMANDS based upon WORD matching PATTERN.  The
 `|' is used to separate multiple patterns.
+
+If any portion of PATTERN is quoted, then quotes are removed in the same
+manner as here document delimiter, and regular expression matching is done
+via builtin command 'match'.
 $END
 
 $BUILTIN if
diff -ru bash-2.05b/builtins/shopt.def bash/builtins/shopt.def
--- bash-2.05b/builtins/shopt.def	2002-04-04 14:21:32.000000000 -0500
+++ bash/builtins/shopt.def	2004-01-05 17:11:53.000000000 -0500
@@ -62,6 +62,10 @@
 extern int cdspelling, expand_aliases;
 extern int check_window_size;
 extern int glob_ignore_case;
+
+extern int regex_ignore_case;		/* for regex */
+extern int regex_match_newline;		/* for regex */
+
 extern int hup_on_exit;
 extern int xpg_echo;
 
@@ -139,6 +143,13 @@
   { "no_empty_cmd_completion", &no_empty_command_completion, (shopt_set_func_t *)NULL },
 #endif
   { "nocaseglob", &glob_ignore_case, (shopt_set_func_t *)NULL },
+
+  /*****************************************************************************
+   * For case-insensitive regex --William Park <opengeometry@yahoo.ca>
+   */
+  { "nocaseregex", &regex_ignore_case, (shopt_set_func_t *)NULL },
+  { "multilineregex", &regex_match_newline, (shopt_set_func_t *)NULL },
+
   { "nullglob",	&allow_null_glob_expansion, (shopt_set_func_t *)NULL },
 #if defined (PROGRAMMABLE_COMPLETION)
   { "progcomp", &prog_completion_enabled, (shopt_set_func_t *)NULL },
diff -ru bash-2.05b/execute_cmd.c bash/execute_cmd.c
--- bash-2.05b/execute_cmd.c	2002-03-18 13:24:22.000000000 -0500
+++ bash/execute_cmd.c	2004-01-07 22:43:40.000000000 -0500
@@ -1527,15 +1527,49 @@
   SHELL_VAR *old_value = (SHELL_VAR *)NULL; /* Remember the old value of x. */
 #endif
 
-  if (check_identifier (for_command->name, 1) == 0)
-    {
-      if (posixly_correct && interactive_shell == 0)
+  /*****************************************************************************
+   * Enable multiple loop variables in for-loop, with syntax
+   *	for  a,b,c,...  in list; do
+   *	    ...
+   *	done
+   * where no space is allowed around ',' (comma) because only one word is
+   * parsed.  List items are sequentially assigned to the loop variables 'a',
+   * 'b', 'c', etc.  If there is shortage of item, then the last iteration will
+   * run with '' (null) assigned to leftover variables.
+   *
+   * --William Park <opengeometry@yahoo.ca>
+   */
+  int multi_variables;
+  WORD_LIST *list_of_for_variables, *fv;
+
+  multi_variables = 0;
+
+  if (xstrchr (for_command->name->word, ',') != NULL) {		/* split 'a,b,c,...' */
+      char *t;
+
+      multi_variables = 1;
+      list_of_for_variables = word_split (for_command->name, ",");
+      identifier = list_of_for_variables->word->word;
+  }
+  /*
+   * Check if a, b, c, ... are legal shell variables.
+   */
+  if (multi_variables) {
+      for (fv = list_of_for_variables; fv; fv = fv->next)
+	  if (check_identifier (fv->word, 1) == 0)
+	      goto Exit_by_Original_Code;
+  } else {
+      if (check_identifier (for_command->name, 1) == 0)		/* original code */
 	{
-	  last_command_exit_value = EX_USAGE;
-	  jump_to_top_level (EXITPROG);
+Exit_by_Original_Code:
+	  if (posixly_correct && interactive_shell == 0)
+	    {
+	      last_command_exit_value = EX_USAGE;
+	      jump_to_top_level (EXITPROG);
+	    }
+	  return (EXECUTION_FAILURE);
 	}
-      return (EXECUTION_FAILURE);
-    }
+  }
 
   loop_level++;
   identifier = for_command->name->word;
@@ -1561,21 +1595,46 @@
     {
       QUIT;
       this_command_name = (char *)NULL;
-      v = bind_variable (identifier, list->word->word);
-      if (readonly_p (v) || noassign_p (v))
-	{
-	  if (readonly_p (v) && interactive_shell == 0 && posixly_correct)
-	    {
-	      last_command_exit_value = EXECUTION_FAILURE;
-	      jump_to_top_level (FORCE_EOF);
-	    }
-	  else
+      
+      /* Assign list items into a, b, c, ...
+       */
+      if (multi_variables) {
+	  for (fv = list_of_for_variables; fv; fv = fv->next) {
+	      identifier = fv->word->word;
+	      if (list) {
+		  /*
+		   * Goto the next item in the list, only if there are more
+		   * variables to assign.  If finished assigning, then leave the
+		   * incrementing for the next iteration.
+		   */
+		  v = bind_variable (identifier, list->word->word);
+		  if (fv->next)
+		      list = list->next;
+	      } else 			/* no more items */
+		  v = bind_variable (identifier, "");
+	      if (readonly_p (v) || noassign_p (v))
+		  goto Exit_by_Original_Code_2;
+	  }
+
+      } else {
+	  v = bind_variable (identifier, list->word->word);		/* original code */
+	  if (readonly_p (v) || noassign_p (v))
 	    {
-	      run_unwind_frame ("for");
-	      loop_level--;
-	      return (EXECUTION_FAILURE);
+Exit_by_Original_Code_2:
+	      if (readonly_p (v) && interactive_shell == 0 && posixly_correct)
+		{
+		  last_command_exit_value = EXECUTION_FAILURE;
+		  jump_to_top_level (FORCE_EOF);
+		}
+	      else
+		{
+		  run_unwind_frame ("for");
+		  loop_level--;
+		  return (EXECUTION_FAILURE);
+		}
 	    }
-	}
+      }
+
       retval = execute_command (for_command->action);
       REAP ();
       QUIT;
@@ -1592,6 +1651,8 @@
 	  if (continuing)
 	    break;
 	}
+
+      if (multi_variables && list == 0) break;
     }
 
   loop_level--;
@@ -1612,6 +1673,8 @@
     }
 #endif
 
+  if (multi_variables)
+      dispose_words (list_of_for_variables);
   dispose_words (releaser);
   discard_unwind_frame ("for");
   return (retval);
@@ -2073,10 +2136,23 @@
 	      pattern[0] = '\0';
 	    }
 
-	  /* Since the pattern does not undergo quote removal (as per
-	     Posix.2, section 3.9.4.3), the strmatch () call must be able
-	     to recognize backslashes as escape characters. */
-	  match = strmatch (pattern, word, FNMATCH_EXTFLAG) != FNM_NOMATCH;
+	  /*********************************************************************
+	   * If 'pattern' is quoted (W_QUOTED), then regular expression matching
+	   * should be done.  Quotes are removed in the same manner as here
+	   * document delimiter in ./make_cmd.c.
+	   *
+	   * --William Park <opengeometry@yahoo.ca>
+	   */
+	  if (list->word->flags & W_QUOTED) {
+	      free (pattern);
+	      pattern = string_quote_removal (list->word->word, 0);
+	      match = match_regex (word, pattern) == EXECUTION_SUCCESS;
+	  } else {
+	      /* Since the pattern does not undergo quote removal (as per
+		 Posix.2, section 3.9.4.3), the strmatch () call must be able
+		 to recognize backslashes as escape characters. */
+	      match = strmatch (pattern, word, FNMATCH_EXTFLAG) != FNM_NOMATCH;
+	  }
 	  free (pattern);
 
 	  dispose_words (es);
diff -ru bash-2.05b/subst.c bash/subst.c
--- bash-2.05b/subst.c	2004-01-03 00:10:57.000000000 -0500
+++ bash/subst.c	2004-01-06 16:34:05.000000000 -0500
@@ -4115,6 +4115,11 @@
   t = temp ? savestring (temp) : savestring ("");
   t1 = dequote_string (t);
   free (t);
+#if defined (ARRAY_VARS)
+  if (valid_array_reference (name))
+      assign_array_element (name, t1);
+  else
+#endif
   bind_variable (name, t1);
   free (t1);
   return (temp);
@@ -4359,7 +4364,7 @@
 #if defined (ARRAY_VARS)
     case VT_ARRAYVAR:
       a = (ARRAY *)value;
-      len = array_num_elements (a) + 1;
+      len = array_num_elements (a);
       break;
 #endif
     }
@@ -4475,6 +4480,9 @@
   char *temp, *val, *tt;
   SHELL_VAR *v;
 
+  int skip_like_sed;
+  intmax_t x, y;
+
   if (value == 0)
     return ((char *)NULL);
 
@@ -4484,15 +4492,58 @@
   if (vtype == -1)
     return ((char *)NULL);
 
-  r = verify_substring_values (val, substr, vtype, &e1, &e2);
-  if (r <= 0)
-    return ((r == 0) ? &expand_param_error : (char *)NULL);
+  /*****************************************************************************
+   * Check for Sed-style 'x~y' skipping, where 'x' and 'y' are positive
+   * integers.  Eg.
+   *	${*:1~2}
+   *	${@:1~2}
+   *	${array[*]:1~2}
+   *	${array[@]:1~2}
+   *	${string:1~2}
+   * all give every other positional parameters, array elements, and string
+   * characters, respectively, starting at 1.
+   *
+   * Whitespace is not allowed, in order to differentiate from a valid
+   * arithmetic bitwise negation (~).
+   *
+   * --William Park <opengeometry@yahoo.ca>
+   */
+  skip_like_sed = 0;
+  tt = xstrchr (substr, '~');
+  if (tt) {
+      *tt++ = '\0';
+      if (*substr && all_digits (substr) && legal_number (substr, &x) && x >= 0 &&
+	      *tt && all_digits (tt) && legal_number (tt, &y) && y >= 0)
+	  skip_like_sed = 1;
+      tt[-1] = '~';		/* restore the original string */
+  }
+  if (! skip_like_sed) {		/* original code */
+      r = verify_substring_values (val, substr, vtype, &e1, &e2);
+      if (r <= 0)
+	return ((r == 0) ? &expand_param_error : (char *)NULL);
+  }
 
   switch (vtype)
     {
     case VT_VARIABLE:
     case VT_ARRAYMEMBER:
-      tt = substring (val, e1, e2);
+      if (skip_like_sed) {
+	  size_t i, n;
+	  char *s;
+
+	  n = strlen (val);
+	  if (n == 0)
+	      return (char *)NULL;
+
+	  s = tt = (char *)xmalloc (n + 1);
+	  if (y <= 0) 
+	      y = 1;		/* don't want infinite loop */
+	  for (i = x; i < n; i += y)
+	      *s++ = val[i];
+	  *s = '\0';
+      } else
+	  tt = substring (val, e1, e2);               /* original code */
+
       if (vtype == VT_VARIABLE)
 	FREE (val);
       if (quoted & (Q_DOUBLE_QUOTES|Q_HERE_DOCUMENT))
@@ -4501,8 +4552,38 @@
 	temp = tt ? quote_escapes (tt) : (char *)NULL;
       FREE (tt);
       break;
+
     case VT_POSPARMS:
-      tt = pos_params (varname, e1, e2, quoted);
+      if (skip_like_sed) {
+	  WORD_LIST *out, *plist, *p;
+	  
+	  plist = list_rest_of_args ();
+	  if (plist == 0) 
+	      return (char *)NULL;
+
+	  out = (WORD_LIST *)NULL;
+	  for (p = plist; p; p = p->next) {
+	      while (p && --x > 0)
+		  p = p->next;
+	      if (p == 0) 
+		  break;
+	      out = make_word_list (make_bare_word (p->word->word), out);
+	      x = y;		/* for next time */
+	  }
+	  out = REVERSE_LIST (out, WORD_LIST *);
+
+	  if (varname[0] == '*')		/* copied from pos_params() */
+	      tt = (quoted & (Q_HERE_DOCUMENT|Q_DOUBLE_QUOTES)) ?
+		  string_list_dollar_star (quote_list (out)) : string_list (out);
+	  else
+	      tt = string_list ((quoted & (Q_HERE_DOCUMENT|Q_DOUBLE_QUOTES)) ?
+		      quote_list (out) : out);
+
+	  dispose_words (out);
+	  dispose_words (plist);
+      } else 
+	  tt = pos_params (varname, e1, e2, quoted);          /* original code */
+
       if ((quoted & (Q_DOUBLE_QUOTES|Q_HERE_DOCUMENT)) == 0)
 	{
 	  temp = tt ? quote_escapes (tt) : (char *)NULL;
@@ -4513,7 +4594,29 @@
       break;
 #if defined (ARRAY_VARS)
     case VT_ARRAYVAR:
-      tt = array_subrange (array_cell (v), e1, e2, quoted);
+      if (skip_like_sed) {
+	  ARRAY *out, *a;
+	  ARRAY_ELEMENT	*ae;
+
+	  a = array_cell (v);
+	  if (a == 0 || array_empty (a) || x > array_num_elements (a))
+	      return (char *)NULL;
+
+	  out = array_create ();
+	  x++;		/* only for first time, since array starts at 0 */
+	  for (ae = element_forw (a->head); ae != a->head; ae = element_forw (ae)) {
+	      while (ae != a->head && --x > 0)
+		  ae = element_forw (ae);
+	      if (ae == a->head)
+		  break;
+	      array_insert (out, element_index (ae), element_value (ae));
+	      x = y;	/* for next time */
+	  }
+	  tt = array_to_string (out, " ", quoted);
+	  array_dispose (out);
+      } else 
+	  tt = array_subrange (array_cell (v), e1, e2, quoted);		/* original code */
+
       if ((quoted & (Q_DOUBLE_QUOTES|Q_HERE_DOCUMENT)) == 0)
 	{
 	  temp = tt ? quote_escapes (tt) : (char *)NULL;
diff -ru bash-2.05b/test.c bash/test.c
--- bash-2.05b/test.c	2002-02-28 10:54:47.000000000 -0500
+++ bash/test.c	2004-01-07 19:22:18.000000000 -0500
@@ -507,6 +507,25 @@
 	}
     }
 
+  /*****************************************************************************
+   * To enable '=~' or '!~' as binary operator,
+   *	string =~ regex
+   *	string !~ regex
+   * compile with PATTERN_MATCHING.  This is just a frontend to builtin command
+   *	match string regex
+   * in builtin/array.def.
+   *
+   * --William Park <opengeometry@yahoo.ca>
+   */ 
+#if defined (PATTERN_MATCHING)
+  else if (op[0] == '=' && op[1] == '~' && op[2] == '\0') {	/* =~ */
+      return (match_regex (arg1, arg2) == EXECUTION_SUCCESS);
+  }
+  else if (op[0] == '!' && op[1] == '~' && op[2] == '\0') {	/* !~ */
+      return (match_regex (arg1, arg2) == EXECUTION_FAILURE);
+  }
+#endif
+
   return (FALSE);	/* should never get here */
 }
 
@@ -530,7 +549,13 @@
 #if defined (PATTERN_MATCHING)
   if ((w[0] == '=' || w[0] == '!') && w[1] == '~' && w[2] == '\0')
     {
-      value = patcomp (argv[pos], argv[pos + 2], w[0] == '=' ? EQ : NE);
+      /*************************************************************************
+       * I want regex matching, not Csh version of '==' or '!='.
+       * --William Park <opengeometry@yahoo.ca>
+       *
+       * value = patcomp (argv[pos], argv[pos + 2], w[0] == '=' ? EQ : NE);
+       */
+      value = binary_test (w, argv[pos], argv[pos + 2], 0);
       pos += 3;
       return (value);
     }


Relevant Pages

  • (patch for Bash) regex conditional tests
    ... 'regex' are returned in array variable SUBMATCH. ... Skipping of positional parameters, array elements, string ... int dollarflag, zeropad, compareflag; ... SHELL_VAR *var; ...
    (comp.unix.shell)
  • (patch for Bash) regex(3) splitting/matching
    ... I usually do this in Python. ... 'help array' will give you more info on other options for 'array' ... int dollarflag, zeropad, compareflag; ... SHELL_VAR *var; ...
    (comp.unix.shell)
  • How to pass a pointer to an unknown-size array?
    ... I can pass a "pointer to a double" to a function that accepts ... int func(double* var) { ... Now I want to pass a pointer to an array of doubles, ... int func{ ...
    (comp.lang.c)
  • $escalar = @array? and regexs
    ... I want to know if any $var is palindrome. ... And seems really difficult using regex, but is very easy if it would be an array. ... And what about if I want to check if $var2 has all $var's letters? ...
    (comp.lang.perl.misc)
  • Re: how to declare a string array when we dont know how big it is
    ... > can I do to clear this array before next time I use it. ... and declare a var ... int size_of_array; ...
    (comp.lang.java.help)