Re: Removing non-text/whitespace chars from a text file: How ?

From: Ed Morton (morton_at_lsupcaemnt.com)
Date: 03/19/05


Date: Sat, 19 Mar 2005 07:45:07 -0600


Al Dykes wrote:
> When I highlight and copy test from a web browser and past it into a
> text file I frequently get extended ascii non-text bytes that I'd like
> to strip out. I like to remove anything above octal 126. What's the
> right tool for this ?

I don't know if there's a better tool for it and I don't know if this
exactly matches your request to "remove anything above octal 126", but a
POSIX sed will let you strip out any control (non-printable) characters:

        sed 's/[^[:print:]]//g' file > tmp
        mv tmp file

or with GNU sed:

        sed -i 's/[^[:print:]]//g' file

You learn more about POSIX character classes like [:print:] at
http://www.gnu.org/software/gawk/manual/html_node/Character-Lists.html
or just google for it.

Regards,

        Ed.