Re: ascii nulls, awk, and tr

From: Richard L. Hamilton (Richard.L.Hamilton_at_mindwarp.smart.net)
Date: 08/03/04


Date: Tue, 03 Aug 2004 21:22:36 -0000

In article <jppvg0lugiqg7csdiqvtfcqm6fe605j5un@4ax.com>,
        Gerard C Blais <gerard.blais@mci.com> writes:
>
>
> I have text files that may contain ascii nulls.
>
> I'd like to delete the null characters.
>
> I also have a lot of other things to do to the rows, and I'm doing
> them in awk already, so adding something to an awk script would be a
> nice way to go.
>
> None of the awks on my Solaris (/usr/bin/awk, /usr/xpg4/bin/awk,
> /usr/bin/nawk) recognize ascii nulls. Someone suggested mawk, and I'm
> trying to get that installed. If anyone knows if mawk will or won't
> work, I'd be interested.

GNU awk (gawk) seems to handle null bytes. If you have the Sun freeware
installed for Solaris 9, it's in /opt/sfw/bin/gawk. If not, depending
on the version of Solaris you have, you might find pre-built binaries at
www.blastwave.org or www.sunfreeware.com. I haven't tried mawk, so I
don't know how well it compares or whether it can handle nulls. But what
I particularly like about gawk is that it seems to handle just about
anything: arbitrarily long lines that neither the supplied awk nor nawk
can handle, nulls just like other characters, etc.

> Meanwhile, it seemed like a perfect fit for tr.
>
> BUT, if I try tr -d \000 null.txt > null.nonull,
>
> tr just hangs, and there is no output. null.txt is a 3 line file
> with nulls in the middle of line 2.
>
> Any ideas?

tr uses standard input, not filenames on the command line (so it was
reading the terminal after you typed that command). Also, older versions
(/usr/bin/tr, /usr/ucb/tr) always strip null bytes, while /usr/xpg4/bin/tr
will handle them just like anything else.

So, either

# changing a space to a space looks like it does nothing, but older
# versions of tr always strip null bytes, so they do the job as a
# side-effect
/usr/bin/tr ' ' ' ' <null.txt >null.nonull
/usr/ucb/tr ' ' ' ' <null.txt >null.nonull

or

# /usr/xpg4/bin/tr handles null bytes like anything else
/usr/xpg4/bin/tr -d '\0' <null.txt >null.nonull

or if you just use "tr", be sure you know which version will be
found first on your PATH and use it accordingly.

-- 
mailto:rlhamil@smart.net  http://www.smart.net/~rlhamil
Lasik/PRK theme music:
    "In the Hall of the Mountain King", from "Peer Gynt"


Relevant Pages

  • [HPADM] RE: -SUMMARY- Using sed or awk
    ... Also thanks to Magnus Andersen and Tom Swigg for the awk approach: ... should be exactly the NUMBER of chars you want to keep. ... command are "special" as the command should work against ANY text line. ... actual data needed is only 54 characters the rest of the record is spaces. ...
    (HP-UX-Admin)
  • TIP #185: Null Handling
    ... nulls, and command modifications for manipulating them. ... Tcl deals with strings, the universal medium for representing data. ... is know and it is an empty string, but if a respondent forgets to give ...
    (comp.lang.tcl)
  • Re: Is Stored Procedure and in broken in MSDE 2000/SQL 2000 SP4?
    ... basically 'in' using statored procedures seems to be performing as 'in' ... >but put the select command in a stored proceedure and it will return all the ... as it makes SQL Server treat NULLS as defined in the ANSI ... With the ANSI standard ebhaviour for NULLS, ...
    (microsoft.public.sqlserver.mseq)
  • Re: Help with string containing null () chars ...
    ... But this does not 'ignore' nulls it replaces them with whatever was ... >characters to pass off to CkSm. ... must have a second parameter identifying the length of the array being ...
    (alt.comp.lang.learn.c-cpp)
  • Re: Storage Impact of nullable varchar column?
    ... SQL Server MVP ... > don't use nulls in your columns, apparently if your column is say 1000 ... > Can the use of NULLS in a database affect performance? ... So if you have a column that is 25 characters wide, ...
    (microsoft.public.sqlserver.programming)