Re: Grep and mv

From: Robert Bonomi (bonomi_at_host122.r-bonomi.com)
Date: 07/16/05


Date: Sat, 16 Jul 2005 00:11:14 -0000

In article <11dg4kaphtsk42c@corp.supernews.com>,
WCB <wbarwell@munnged.mylinuxisp.com> wrote:
>Chris F.A. Johnson wrote:
>
>
>>>> If it does contain spaces my script will work. AS possible problem
>>>> is that leading spaces are converted to hyphens which mv then
>>>> interprets as options.
>>>>
>>>> for i in xx*
>>>> do
>>>> f=$(grep -m 1 "HCO*" "$i")
>>>> while :
>>>> do
>>>> case $f in
>>>> \ *) f=${f# } ;; ## remove leading space
>>>> *\ ) f=${f% } ;; ## remove trailing space
>>>> *\ *) f=${f// /-} ;; ## convert spaces to hyphens
>>>> *--*) f=${f//--/-} ;; ## convert multiple hyphens to a single
>>>> hyphen
>>>> *) break ;; ## nothing needs doing; exit loop
>>>> esac
>>>> done
>>>> mv "$i" "$f"
>>>> done
>>>>
>>>> You repeatedly leave out the quotes in the snippets you have
>>>> posted. Use the above script EXACTLY as it is; copy it (either cut
>>>> and paste it, or save the message and edit it), do not retype it.
>>>>
>>>> If it doesn't work, tell us EXACTLY what happens. Use set -x and
>>>> redirect stderr to a file.
>>>
>>> Cut and pasted. changed mod and owner.
>>> Loaded fresh test files in directory
>>>
>>> I ran this script.
>>> still it has trailing spaces.
>>
>> What is "it"?
>
>File names.
>
>>
>>> In the terminal, they show up as ?
>>
>> In what context do they (whatever "they" are) "show up"?
>
>
>If I type ls in the terminal, I see a ? at end of each file name.
>This is how a terminal shows a blank at end of a file name.
>If I go to KDE and peek at it with Kwrite is shows a space
>by a square. If I click on a file using properties, I can see
>the cursor is not at the end of the name, but a space beyond.
>If I erase that space. I no longe see the box in Kwrite nor
>nor "?" when I type ls in terminal.
>
>The file is a file, I can open and read it despite the trailing
>space. So it is a space, not some control character or the other
>nor a quirk in how its displayed.
>
>Its a very real space.

I say again, "you don't know what you don't know, and what you _think_
you know is wrong."

No, it is *NOT* a space.

ls uses '?' to mark *any* non-printing character.

>From other things you have written, that 'non-printing' character is a
CARRIAGE RETURN , aka [CR] , aka "control-M.

>Each and every file name has a jolly little space appended
>at the end this way.

I wouldn't be surprised.

>>> listing in the terminal looks like
>>>
>>> HCO-BULLETIN-OF-10-MARCH-1965?
>>> HCO-BULLETIN-OF-29-MARCH-1965?
>>> HCO-BULLETIN-OF-2-APRIL-AD15?
>>> HCO-BULLETIN-OF-4-APRIL-AD15?
>>> HCO-BULLETIN-OF-5-APRIL-1965?
>>> HCO-BULLETIN-OF-5-MARCH-1965?
>>> HCO-BULLETIN-OF-7-APRIL-AD15?
>>>
>>> Each file is a file, so its renaming a real file.
>>
>> What files did you have to start with?
>
>xx03, xx04, xx05 and so on.
>
>These files have been checked and run through dos2unix
>just in case.
>No control characters, not DOS not Mac, not Word,
>simple ascii. At the end of the name to be extracted,
>no spaces. cat -v shows a ^M at end of the string extracted.
>"^M", not " ^M". So our mystery space is not coming from
>there.

You demonstrate again, "you don't know what you don't know."

UNIX files do _not_ have ^M characters in them.
The UNIX 'end of line' character is ^J, called 'newline'.

>
>cat -v xx03 show no other characters beyond ascii than ^M.

not surprising.

>All other test files are likewise clean.

FALSE TO FACT.

Files containing ^M are "unclean", _by_definition_.
>
>My original file was extracted from a CD. It was one
>big text file I broke down into sections and then into
>individual files using csplit which had no
>problems with doing so. The Nano editor displays them
>with no artifacts or problems, nor does Kate nor Kwrite
>have problems. Cat -v shows no problems.
>
>So I am very definitely sure its not a buggered file
>nor word processor control characters that are at any
>way an issue.

What you are "sure of", and what is _reality_ are very different things.

>Now all I have to do is find how to make 4000 xxNN
>files have somewhat meaningful names.
>
>The files I am using here are clean and readable.
> My silly little grep script extracts the names as
>expected. No spaces or artifacts there showing
>up, so its again, not a problem with the files.

*YES*, it is a problem with the files.
the d*mn ^M characters *IN*THE*FILE* are a problem.

>> What files did you end up with?
>
>
>HCO-BULLETIN-OF-10-MARCH-1965?
>HCO-BULLETIN-OF-29-MARCH-1965?
>HCO-BULLETIN-OF-2-APRIL-AD15?
>HCO-BULLETIN-OF-4-APRIL-AD15?
>HCO-BULLETIN-OF-5-APRIL-1965?
>HCO-BULLETIN-OF-5-MARCH-1965?
>HCO-BULLETIN-OF-7-APRIL-AD15?
>
>They are files, I can cat them and read them.
>
>> Is that what you wanted? If not, what is different from what you
>> wanted? Please post the script you used, directly from the file.
>
>This script almost does the trick except for the spaces.
>Since cat -V shows the string extracted ends as an example,
>
> HCO-BULLETIN-OF-4-APRIL-AD15^M
>
>The space is not coming from within the file with this string.
>So there should not be a trailing space to cut.

*SEE* I _told_ you there was a ^M on the end of the string.

>So something seems to be ADDING a space.
>Since line 8 in your script cuts trailing spaces
>it seems a logical deduction that the space is added
>after line 8, somewhere I have no idea where that artifact
>is coming from. I tried moving the cut trailing spaces
>to last after dealing with hyphens.
>The resulting files still have a trailing space. So its obviously an
>artifact coming from the break statment on.

"Sorry Charlie!" applies again.

Some of the _viewing_ tools you use are displaying a '[CR]' character as
what you perceive to be a space. *YOUR* error.

>The other thing that weirds me out big time is, your
>mv "$i" "$f" seems to work.
>When I extract $f and $i, my echo "$f" >> F
>tests show repeatedly that I am extracting real
>data that mv should then use.
>
>Neither mv $i $f nor mv "$i" "$f" work, I just get
>different error messages.
>
>My script does one file and then gives a bunch of error
>messages.

Yup. It *should* do that.

Because you have an error *elsewhere* in your script.

Do you know what the shell does when it sees an UNQUOTED '*' character as
part of any word on a line? (*regardless* of the context in which it
appears)

Do you realize that that happens even when you are using that '*' as part
of a pattern-match argument to grep?

The first time through the loop, there are no file names that match
the globbing pattern HCO*, so that literal string is passed to grep,
and it finds that line in the file. and the 'mv' command renames the xx01
file appropriately.

The *SECOND* time through the loop, the shell sees the unquoted string: HCO*
and expands it to the "matching" file name(s). Which is the name of the
file that 'xx01' was renamed to. *NOW* grep looks for _THAT_NAME_ in the
2nd file -- *not* the 'original' pattern. *GUESS*WHAT*??? It _doesn't_
find any occurrence of that string from the first file, in the second file!

So, you get a 'null' string as output, and as your 'destination' file
name. Then when you try to rename the file to *that* name, it blows up
on you.

Everything is working *exactly* the way the programs are designed to.
There are no bugs in any of the programs.
There are no 'incompatibilities' with being called from inside a 'for do done'
loop.
There are no bugs in the shell.

*YOU* are screwing up.

Many of the things you "_think_ you know" are contrary to reality.

You insist on reporting your "interpretations" of what is going on,
instead of the actual _real_ details. When your "interpretations"
are contrary to fact you make it *impossible* for people provide any
constructive help.



Relevant Pages

  • Re: REQ - Help in editing a file - Script or Utility
    ... deal with it as an array. ... As the string gets bigger you're dealing with the need to ... going through the string 1 character at a time. ... I tested the script below on a 1.6 MB ...
    (microsoft.public.scripting.vbscript)
  • Re: Splitting a string into separate characters
    ... Then separate the string so each letter is then stored ... > array items) so then I can do things with each letter on its own. ... Midreturns the Nth character in a string... ... Download details: Windows Script Documentation ...
    (microsoft.public.scripting.vbscript)
  • Re: Help creating a random string in Perl
    ... For example if the string is ABcDeFG ... Creating random strings is easy. ... different result for each script run. ... The number of possible permutations for a seven character source creating ...
    (perl.beginners)
  • Re: sed question: Unknow command
    ... >values from another script I run. ... plain that the character '/' will confuse the parser - and I would guess ... It is not possible to specify a literal string in 'sed' but there ... This means that the text of the replacement ...
    (comp.unix.shell)
  • Re: Dos vs Unix style text files
    ... > into a string, ... > the terminator characters are removed and added as needed, ... Extracts a string from the stream 'is', ... If the last character of the extracted ...
    (comp.lang.cpp)