Re: ksh silently ignores function if mistakenly not autoloaded

From: dmercer@mn.rr.com
Date: 04/03/03


From: dmercer@mn.rr.com ()
Date: Thu, 03 Apr 2003 20:54:02 GMT

In article <b6h10t$dc6$1$8302bc10@news.demon.co.uk>,
        "Clyde Ingram" <cingram@pjocsNOSPAMORHAM.demon.co.uk> writes:
> Suppose I have a ksh function "myfunc" in a file "myfunc".
> In a ksh script which invokes the function "myfunc", I have set the FPATH to
> include the directory (on Solaris 2.6) where the file "myfunc" lives.
>
> But I carelessly forgot to autoload the function "myfunc".
>
> My ksh script motors past the call to the function:
>
> stuff=$( myfunc )
>
> without a single word of complaint. Why is this? Not even a bad status in
> $? to help me.
> Of course, I should carefully check whether function "myfunc" returned
> anything, on its STDOUT, to my variable $stuff . . .
>
> Is there any "set" option I can use to have ksh warn me about unset
> functions, or is there just no way for ksh to guess that "myfunc" is a
> function call in the first place because I did not autoload it?
>
> I already start the script with
> set -o nounset
> but this did not catch my error.
>
> Thank-you,
> Clyde
>
>

You only need to autoload functions when the function name
collides with the name of an executable file located in the PATH.

Here's the order of execution.

   Aliases are expanded. This really takes place prior to command
   execution. By changing the text of the command, it affects in
   what order the command will be executed (more later).
   
   Explicitly pathed executables. (/bin/ksh, ./bin/myscript)
   Builtin commands (cd, test, etc)
   Loaded and defined functions
   executables found in the PATH
   undefined functions found in the FPATH

If, for instance, you want a private ls function you would need
to autoload it to change its priority. For more:
  

Dan Mercer
dmercer@mn.rr.com

                     FUN WITH FUNCTIONS (and aliases)
                     ================================

Shell scripts, with all their power, have one major drawback - they
do not modify the current shell's environment. To do that, one
must resort to aliases and functions.

ALIASES
=======

An alias just performs a textual replacement of one string for another,
for instance:

      $ alias foo=bar
      $ foo
      ksh: bar: not found

When a command is parsed the first word will be checked against the
defined aliases. If an exact match is found, the word is replaced
by the defined text. If the word does not match exactly, it is not
replaced. Look what happens if I continue the example above using the
strings \foo, 'foo', "foo".

      $ \foo
      ksh: foo: not found
      $ 'foo'
      ksh: foo: not found
      $ "foo"
      ksh: foo: not found

Note that no aliasing occurred.

Since the text substitution occurs before the line is parsed, you
cannot pass parameters to ksh aliases like you can csh. Csh aliases
are a weaker version of ksh functions. If you want to pass parameters
you must use functions.

Aliases can be used to track frequently called programs, so that
you don't have to traverse the PATH variable everytime you call the
command. Ksh comes with builtin tracked aliases that can be listed
with the "alias" command after they have been accessed:

      $ alias -t -
      cat=/usr/bin/cat

The alias "hash" is an alias for "alias -t -". If you say:

      $ alias -t xterm

then the shell will examine the current value of the PATH and if
it finds an executable named xterm it will alias the full path
to the word "xterm":

      $ hash
      cat=/usr/bin/cat
      xterm=/usr/bin/X11/xterm

If the PATH is reset, the next time an aliased command is run the
alias will be recomputed. If you run "whence" on the command (and
remember, "type" is an alias for "whence -v", the alias will be
recomputed. For instance, continuuing the example:

      $ for i in /usr/bin/*;do type ${i##*/} >/dev/null 2>&1;done

That runs "whence -v" on the basename of every executable in /usr/bin.
Now when I run hash:

      $ hash
      cat=/usr/bin/cat
      cc=/usr/bin/cc
      chmod=/usr/bin/chmod
      cp=/usr/bin/cp
      date=/usr/bin/date
      ed=/usr/bin/ed
      grep=/usr/bin/grep
      ls=/usr/bin/ls
      mail=/usr/bin/mail
      mv=/usr/bin/mv
      pr=/usr/bin/pr
      sed=/usr/bin/sed
      sh=/usr/bin/sh
      vi=/usr/bin/vi
      who=/usr/bin/who
      xterm=/usr/bin/X11/xterm

You can set all commands to be tracked by turning on the
"trackall" option uisng either:

      set -o trackall

or

      set -h

Another interesting feature of aliases is that if an alias
contains a trailing space, the subsequent word will be examined
for ordinary (not tracked) alias expansion:

      $ alias ll='ls -l '
      $ alias inc=/usr/include
      $ ll inc
      total 1598
      drwxr-xr-x 2 root sys 5120 Jul 7 1999 FL
      dr-xr-xr-x 5 bin bin 1024 Oct 19 11:51 Motif1.2
      lr-xr-xr-t 1 root sys 18 Apr 1 1997 SC -> /opt/CC/include/
      SC
      drwxr-xr-x 3 bin bin 2048 Oct 19 12:09 X11
      dr-xr-xr-x 3 bin bin 1024 Mar 17 1998 X11R6
      -r--r--r-- 1 bin bin 605 May 30 1996 a.exec.h
      ...

This can be very useful in changing directories (see the cd example
below).

In command execution, builtin commands (like cd) have a higher
precedence than functions, which have a higher precedence over
external commands, which have a higher precedence over undefined
functions (I'm getting to those).

Aliasing takes place before parsing, so if you want to redefine a
builtin, you must use an alias. In the example below, "pd" is the
name of a function that changes directories, keeps a stack of the
last traversed directories, and changes the terminal title bar to
reflect the current directory. Since the alias ends in a trailing space,
it also allows directory paths to be set up as aliases:

      alias cd='pd '
      function pd
      {
      RT=${PWD:-$(pwd)}
      dir_history $RT
      \cd "$@"

      typeset t
      t="${HOST}:${PWD:=$(pwd)}"
      case $TERM in
         hp*) echo "\033&f0k${#t}D${t}\033&f-1k${#t}D${t}\c";;
         +(d|x|v)t*) echo "\033]2;${t}\007\c";;
      esac
      echo $PWD
      }

Note that when cd is finally called, it is escaped with a backslash
as discussed above. If it was not, aliasing would replace the
characters "cd" with "pd ", and the function would recurse until
the stack limit was hit. A fuller explanation of the above example
follows the discussiong on Functions.

FUNCTIONS
=========

Of Shells and Subshells:
-----------------------

When the shell goes to execute an external command it first forks
the current shell process giving you an entirely new process that
inherits all the information from the old. It has a new process
id, but the "$$" variable is still set to the process id of the
parent process. A form of the "exec(2)" function is then called.
The "exec(2)" function checks the permission of the file to make
sure it is executable by the current user. It then opens the
file and reads the first 32 bytes. If the magic number for a
binary executable is found, then the binary executable is loaded
and it replaces the current process state. If the magic number
"#!" (called a "shebang") is encountered, the rest of the line
is parsed for the explicit path to a file. If that file is not
executable, you get an error message (at least on HP-UX 10.20).
If it is a binary executable, that binary is called with the
script file name as either the first or second parameter - any
text on the "#!" line following the path being passed as the
first parameter. If it is not a binary exec returns an error to
the shell. What happens after that is up to the shell - ksh88 on
HP-UX 10.20 treats the "#!" line as a comment and reads the file
and executes its commands in the current subshell. "Csh" will
check the first character of the file - if it is a "#" it will
attempt to read and execute its commands in the current subshell
- if it is not a "#", the subshell will exec "/bin/sh" and pass
it the file to execute.

When a shell script is called by ksh, what is inherited by the
new process differs based on whether there is a shebang. If there
is not, the new subshell inherits all exported variables, exported
aliases (using "alias -x") and exported functions
(using "typeset -fx"). NOTE - ksh93 does not support exported aliases
or functions. If there is a shebang, the subshell execs the new
interpreter and the new interpreter only inherits exported scalar
variables.

Regardless of whether they have a shebang or not, shell scripts
cannot change the current environment, neither the current
working directory nor the environment variables. You can, however,
script changes to the current environment in one of two ways:

   o - use the special "." to "source" the script - i.e. execute
       its commands in the current shell.
   o - use a psecial kind of script called a "function".

In most ways the above methods will produce identical results,
but I am only going to discuss Korn Shell functions here.

A function is a collection of commands that run in the current
shell. Thus, it has access to the current shell's environment
and can change its variables and directory. A function can
either be defined or undefined. A defined function has had its
commands parsed and stored by the shell. An undefined function
exists in an external file whose path is known to the shell.

In the Korn Shell, there are two separate syntaxes for defining
a script - the POSIX (implicit) syntax and a Korn syntax
using the explicit keyword "function".

POSIX:

      tt() { echo "temporary test function"; }

Korn:

      function tt { echo "temporary test function"; }

In ksh88, the two syntaxes behave identically. In ksh93,
the POSIX syntax adopts POSIX behavior, the Korn maintains
the same behavior as all functions had in ksh88.

In Korn behavior, $0 of the function is the name of the function.
traps are reset inside the function and you can set traps particular
to the function. A trap on "exit" trips when the function is
exited. Variables within a Korn function can be made local to the
function by using the "typeset" command - this is very useful when
making multiple passes on an options list using "getopts" or when
doing data splitting by modifying the IFS variable.

In POSIX behavior, $0 is the $0 of the calling process. Traps and
variables are global. On the whole, far more powerful processing
is possible with Korn functions than POSIX.

The Korn shell adds another wrinkle to function processing -
the FPATH variable. FPATH is analagous to PATH, but instead of
a list of paths where executable files may be found, FPATH is a
list of paths where readable files containing function definitions
may be found. The names of the readable files in the FPATH
hierarchy are stored by the ksh as "undefined functions" (whether
they contain function definitions or not).

When a command is searched for, pathed names (those with a "/"
in them) are found first. If the name is not pathed, the list
of builtins is searched, then the list of known functions, then
the PATH is searched for a file with a matching name and finally
the FPATH is searched for a file with a matching name. All
defined functions are "known". You can make an undefined
function "known" by autoloading it with the "autoload" alias.
This will give the undefined function higher precedence than
PATH'd commands.

When an undefined function is used (not when it's autoloaded) the file
will be sourced, then the function called. The function file
therefore must contain a function definition, for instance:

      $ cat /usr/common/fun/pd
      function pd
      {
      RT=${PWD:-$(pwd)}
      dir_history $RT
      \cd "$@"

      typeset t
      t="${HOST}:${PWD:=$(pwd)}"
      case $TERM in
         hp*) echo "\033&f0k${#t}D${t}\033&f-1k${#t}D${t}\c";;
         +(d|x|v)t*) echo "\033]2;${t}\007\c";;
      esac
      echo $PWD
      }

The file can contain additional commands including multiple function
definitions that will all be defined simultaneously.

=======================================================================
   $ cat ~/fun/set_cdpath
   ((DEBUG)) && print "sourcing set_cdpath..."
   function set_cdpath
   {
   typeset _path
   if (($#))
      then
      [[ $1 = . ]] && set -- $PWD
      set -A _path "$@" "$CDPATH"
   else
      set -A _path . \
                   ~ \
                   ~/devel/src/applix \
                   ~/devel/src \
                   ~/devel \
                   ~/axhome \
                   ~asterx \
                   ~/.dt \
                   /usr/local
   fi
   typeset IFS=:
   CDPATH=${_path[*]}
   unset -f set_cdpath
   }

If $DEBUG is non-zero, when the file is initially sourced you will
see a message.

You can also include multiple function definitions in the same file.
All will be defined when the function matching the file name is sourced.
For instance, you could have a dtksh function to use the DtComboBox called
"DtComboBoxInitialize" which would install all the functions required
to use the combo box.

Once a file has been sourced, the function is now defined, and moves
ahead of the PATH search in the execution hierarchy. You may undefine
it using the "unset -f funcname" command as I have done in the example
above.

Functions operate in the current environment, so an exit will exit
the shell. You need to use "return" to exit a function.

Functions can be used in subshells or in command substitutions, but
in those cases they will not modify the current environment.

EXAMPLES
========

Setting the path:

function set_path
{
typeset IFS=":$IFS" _p
if (($#))
   then
   set -A _p ${PATH%:.}
   for dir
      do
      [[ $dir = . ]] && dir=$(/usr/bin/pwd)
      if [[ -d $dir ]]
         then
         [[ $PATH = ?(*:)$dir?(:*) ]] || {
            _p[${#_p[*]}]="$dir"
            }
      else
         echo "Directory $dir not found - not added to PATH"
      fi
      done
   PATH="${_p[*]}:."
else
   unset PATH
   set_path ~/bin*(2) \
            /usr/*(s)bin \
            /usr/dt/bin \
             /usr/bin/X11 \
            /usr/+(contrib|dt|local)/bin \
            /usr/+(contrib|local)/bin/X11 \
            /usr/X11R6.3/bin \
            /opt/+(CC|allbase|ansic|dtscript3.0|hpnp|image|langtools|gv|rcs)/bin
fi
export PATH

unset -f set_path
}

-- 
Dan Mercer
damercer@uswest.net
A few points to keep in mind:
In ksh88, functions behave identically whether defined using
POSIX style -- funcname() { list; } -- or using the "function"
keyword -- function funcname { list; }.  In ksh93,  POSIX
functions follow the dumbed down spec,  while those using the
"function" keyword follow the original Korn spec.  
      
   1. In Korn functions the getopts variables OPTIND and OPTARG are
      locally scoped so you can use getopts in a function without
      affecting its use in the main body of the script.  POSIX
      does not allow locally scoped variables.
   2. In both types of functions,  the arg list is local and set
      on call to the calling arguments.  In Korn functions,  $0
      is set to the name of the function, in POSIX it is set to
      the name of the main executable.
   
   3. In Korn functions LINENO is local.  Therefore,  setting
          PS4='${0##*/}: $LINENO: ' while debug is turned on
      
      will output the name of the function and offset within
      that function.
   
   4. Option settings are local in Korn functions,  global in
      POSIX.  Thus,  using "set -x" in a function will turn
      on debug only for the function in Korn,  but for everything
      in POSIX.
   5. In Korn functions, using the "typeset" keyword allows you
      to define variables scoped locally to the function. Unless
      typeset,  variables changed in a function are changed
      globally.
   4. In ksh93 you have a name reference variable created using:
         typeset -n nameref=vname
      
      This is very useful when you want to return values from
      a function.  For instance,  the following function takes
      a string and parses it into an array given the arrayname
      to be returned,  the string and the parsing character:
      
function array_from_string
   {
   ((($# < 2) || ($# > 3))) && {
      print -u2 " Format: array_from_string arrayname stringname [separator]"
      return 1
      }
   typeset -n array=$1 string=$2
   ((3==$#)) && [[ $3 = ? ]] && typeset IFS="${3}${IFS}"
   set -A array -- $string
   }
   


Relevant Pages

  • Re: how to make function known to subshell
    ... What do I have to do to make a function available in shell scripts or ... When a command is parsed the first word will be checked against the ... In the Korn Shell, there are two separate syntaxes for defining ...
    (comp.unix.shell)
  • Re: Spawning process with environment variables
    ... starting the shell). ... command on the fly. ... a single-line script to a shell for execution. ... execution of shell scripts instead of writing shell script controlling ...
    (comp.unix.programmer)
  • Re: Linux measuring elapsed time for a shell command
    ... Peter Grossi wrote: ... > I am trying to measure the elapsed time for execution of a shell ... > the command output to a file or pipe the time statistics will only go ...
    (comp.os.linux.misc)
  • RE: Perl code security (CGI related)
    ... Perl code security ... function allow execution of command such as "rm -rf *". ... However it does not have any effect if $code is shell command such as ...
    (SecProg)
  • Re: 16bit edit
    ... [[When a command is entered for execution by this shell, ... >>> on some other computers, it works perfectly under xp! ...
    (microsoft.public.windowsxp.configuration_manage)