Re: Writing a "substring"/"replace substring" function in ksh88



Andrew Fabbro <andrew.fab...@xxxxxxxxx> wrote:
I'm having the hardest time writing substring functions in ksh88
without resorting to cut.
What I'm trying to do is write functions to:
+ print characters from within a string given a range.
+ replace a character in a string given an index.

I'm aware I could call out to the shell and use cut -c (or sed/awk/
grep regex, etc.)...but I was trying to do it without an external
process call because I imagine it'd be faster without it. It'll be a
frequently-used library function. (Why not just write it in perl?
Don't ask).

Good man. Every single subshell invocation has the overhead
of _thousands_ of typical function calls. But, I tend to agree
with what I presume is your disposition to not want to spend
the time to write a satisfactory function library yourself, just
for the sake of efficiency -- especially on deadline.

SC's advice to use [n]awk(1) IMNSHO is optimal when the
entire program can be implemented in that language, as the
invocation overhead is especially egregious. Additionally, the
pattern matching of ksh88 is equal or even a bit superior to
[n]awk(1), and now, with the latter revisions of ksh93 (q.v.
next paragraph), now superior even to sed(1). I know, having
written a parser in awk(1), and regular expression debuggers
for both ksh(1) and sed(1).

In ksh88, there's no substring operator I can find. And regular
expressions in ksh88 don't support beginning- or end-of-line (^ and
$), or repeat factors like {N} and {M}, which would be pretty
essential to addressing the string by a position index.

All ksh-type [extended] pattern matching is de facto anchored
to BOS and EOS.

The latter revisions of ksh(1) version 1993 and newer have greatly
expanded parameter substitution, with a builtin global substitution
operator like sed(1)'s "s" command. Very impressive. Most online
ksh93 manpages don't even mention the best and latest
functionality:

?(pattern-list) - Optionally matches any one of the given patterns.
*(pattern-list) - Matches zero or more occurrences of the given
patterns.
+(pattern-list) - Matches one or more occurrences of the given
patterns.
@(pattern-list) - Matches exactly one of the given patterns.
!(pattern-list) - Matches anything except one of the given patterns.

New:
{n}(pattern-list) - Matches n occurrences of the given patterns.
{m,n}(pattern-list) - Matches from m to n occurrences of the given
patterns.

Note: pattern-lists are delimited by either "&" (all patterns must
match) or "|" (any pattern must match).
Note: Use "-(" instead of "(" for the shortest (nongreedy) match.
Note: \d, \D, \s, \S, \w, \W
Note: "(options:pattern-list)" (either options or :pattern-list can be
omitted)
can consist of one or more of the following characters:
+ Enable the following options. (default)
- Disable the following options.
i Treat the match as case insensitive.
g File the longest match (greedy). (default)

I don't suppose you can use ksh93, can you? It's freely
available through kornshell.com.

Is there a way to have IFS set to null?....

Nope, not for the behavior that you specify.

My last avenue of attack was to use typeset -L and typeset -R to chop
strings up. I thought if I wanted to replace character 2 of a 20-
character string, I could typeset -L1 to get character 1, typeset -R18
to get characters 3-18, and then echo left, new character, right.

Ah, so you have discovered the typical idiom of simulating
substring extraction. Add to that the use of the #, ##, %,
and %% parameter substitution operators, and you've very
appropriately reinvented the wheel. Congratulations! ;)

typeset -R apparently ignores spaces when right-justifying. Gack.
Just thought I'd ask the brain trust here if that is really the only way.

I ran into this [undocumented!] gotcha many years ago.
I haven't completely parsed your example, but IIRC I used
the workaround of prepending a known character (so whitespace
would never prepend the string) and after processing, making
sure to remove it.

And speaking of reinventing the wheel...

I'm surprised that JD didn't mention his own "strings"
function library from his impressive script archive:

"strings.ksh":
ftp://ftp.armory.com/pub/lib/ksh/strings

.... and CFAJ didn't explicitly point the OQ to his very
workable function:

"_gsub.ksh":
http://cfaj.freeshell.org/src/scripts/gsub-sh

To be complete, WP also has a strings(3)-like function
library [in bash(1)] at:

"string.bash":
http://home.eol.ca/~parkw/#string
http://home.eol.ca/~parkw/string.sh
http://linuxgazette.net/108/park.html

I myself wrote a strings(3) clone of the usual C strings
functions, including the *r* variants, in greedy _and_ non-
greedy forms(!) ... but they are not of distributable quality,
being left unfinished for a variety of technical reasons and
a change of specifications.

=Brian

.



Relevant Pages

  • Re: How to replace the last (and only last) character in a string?
    ... How can I replace the last '4' character? ... the "string" module is deprecated. ... Return a copy of string S with all occurrences of substring ... Notice the "all occurrences of substring" part. ...
    (comp.lang.python)
  • Re: egrep problem?
    ... user to enter pipe delimited patterns to search for multiple patterns ... one user entered a 300 character long string of pipe ... This is the GNU egrep 2.5.1 ...
    (linux.redhat)
  • Replace all occurrences of $ with abc
    ... Using FileMaker 8.5 Adv on Windows XP ... character in a field with a string of several characters. ... in other words replace all occurrences of '$' with 'abc'. ...
    (comp.databases.filemaker)
  • Exact string matching problem - algorithms
    ... Patterns - P: We have number of xml files and the paterns to be ... (multiple words can be in a string like "Abu Musa"). ... Pass through every character 'x' in the text, ...
    (comp.programming)
  • Re: Exact string matching problem - algorithms
    ... Patterns - P: We have number of xml files and the paterns to be ... (multiple words can be in a string like "Abu Musa"). ... Check whether there can be a pattern with the character 'x' in the ...
    (comp.programming)