Re: Grep and mv

From: Chris F.A. Johnson (cfajohnson_at_gmail.com)
Date: 07/15/05


Date: Fri, 15 Jul 2005 16:35:20 -0400

On 2005-07-15, WCB wrote:
> Chris F.A. Johnson wrote:
[snip]
>>>> If it doesn't work, tell us EXACTLY what happens. Use set -x and
>>>> redirect stderr to a file.
>>>
>>> Cut and pasted. changed mod and owner.
>>> Loaded fresh test files in directory
>>>
>>> I ran this script.
>>> still it has trailing spaces.
>>
>> What is "it"?
>
> File names.
>
>>
>>> In the terminal, they show up as ?
>>
>> In what context do they (whatever "they" are) "show up"?
>
>
> If I type ls in the terminal, I see a ? at end of each file name.

   What shows up? A space or a question mark?

> This is how a terminal shows a blank at end of a file name.

   No, it isn't. It's something else, probably a carriage return (^M).

   Please post the output of:

grep HCO xx01 | hexdump -C

[snip]
> Each and every file name has a jolly little space appended
> at the end this way.

   Every file name? Or just the ones you have created with the script?

>> and are seen boxes in properties.

   Boxes are almost certainly not spaces.

[snip]
>>> listing in the terminal looks like
>>>
>>> HCO-BULLETIN-OF-10-MARCH-1965?

   What is the output of:

ls HCO-BULLETIN-OF-10-MARCH-1965* | hexdump -C

>>> HCO-BULLETIN-OF-29-MARCH-1965?
>>> HCO-BULLETIN-OF-2-APRIL-AD15?
>>> HCO-BULLETIN-OF-4-APRIL-AD15?
>>> HCO-BULLETIN-OF-5-APRIL-1965?
>>> HCO-BULLETIN-OF-5-MARCH-1965?
>>> HCO-BULLETIN-OF-7-APRIL-AD15?
>>>
>>> Each file is a file, so its renaming a real file.
>>
>> What files did you have to start with?
>
> xx03, xx04, xx05 and so on.
>
> These files have been checked and run through dos2unix
> just in case.
> No control characters, not DOS not Mac, not Word,
> simple ascii. At the end of the name to be extracted,
> no spaces. cat -v shows a ^M at end of the string extracted.
> "^M", not " ^M". So our mystery space is not coming from
> there.
>
> cat -v xx03 show no other characters beyond ascii than ^M.

    In other words, a DOS/Windows file, or a Mac file if there are no
    linefeeds. Unix test files do not contain ^M.

> All other test files are likewise clean.

    That's not clean.

> My original file was extracted from a CD. It was one
> big text file I broke down into sections and then into
> individual files using csplit which had no
> problems with doing so. The Nano editor displays them
> with no artifacts or problems, nor does Kate nor Kwrite
> have problems. Cat -v shows no problems.
>
> So I am very definitely sure its not a buggered file
> nor word processor control characters that are at any
> way an issue.
>
> Now all I have to do is find how to make 4000 xxNN
> files have somewhat meaningful names.
>
> The files I am using here are clean and readable.
> My silly little grep script extracts the names as
> expected. No spaces or artifacts there showing
> up, so its again, not a problem with the files.
>
>
>> What files did you end up with?
>
>
> HCO-BULLETIN-OF-10-MARCH-1965?
> HCO-BULLETIN-OF-29-MARCH-1965?
> HCO-BULLETIN-OF-2-APRIL-AD15?
> HCO-BULLETIN-OF-4-APRIL-AD15?
> HCO-BULLETIN-OF-5-APRIL-1965?
> HCO-BULLETIN-OF-5-MARCH-1965?
> HCO-BULLETIN-OF-7-APRIL-AD15?
>
> They are files, I can cat them and read them.
>
>> Is that what you wanted? If not, what is different from what you
>> wanted? Please post the script you used, directly from the file.
>
> This script almost does the trick except for the spaces.
> Since cat -V shows the string extracted ends as an example,
>
> HCO-BULLETIN-OF-4-APRIL-AD15^M

   EXACTLY! That a DOS/Windows line ending. IT IS NOT A SPACE.

> The space is not coming from within the file with this string.

   There is no space.

> So there should not be a trailing space to cut.

   There isn't.

> So something seems to be ADDING a space.

   There is no space.

> Since line 8 in your script cuts trailing spaces
> it seems a logical deduction that the space is added
> after line 8, somewhere.

   It would be, if there were a space. There isn't; it's a carriage
   return.

> I have no idea where that artifact
> is coming from.

   From your DOS/Win file.

[snip]
> The other thing that weirds me out big time is, your
> mv "$i" "$f" seems to work.
> When I extract $f and $i, my echo "$f" >> F
> tests show repeatedly that I am extracting real
> data that mv should then use.
>
> Neither mv $i $f nor mv "$i" "$f" work, I just get
> different error messages.
>
> My script does one file and then gives a bunch of error
> messages.
>
>
> Just before that step, if I do echo $i >> I
> and echo $f >> F I show that I am getting
> the xxNN files and extracted name files OK
> to that point.
>
> It does not matter if I use mv "$i" "$f" or mv $i $f
>
> ****************
> #1/bin/bash
>
> # mover4
>
> for i in xx*
> do
>
> grep -m 1 HCO* $i > x

   Not again! !@#%$#@. How many times do you need to be told?

   QUOTE THE WILDCARD (*).

> sed 's/^ *//' x > y
> sed 's/ /-/g' y > z
>
> # cat x >> X
> # cat y >> Y
> # cat z >> Z
> # OK to here so far
>
> f=$(cat z)
> # echo $i >I
> # echo $f >F
> # mv $i $f
> mv "$i" "$f"
> done
>
> *****************************
>
> files to start
>
> xx03 xx10 xx100 xx101 xx102 xx103 xx104
>
> If echo "$f" >> I
> if echo "$i" >> F
>
> I
>
> xx104
> xx03
> xx10
> xx100
> xx101
> xx102
> xx103
> xx104
>
> F
>
> HCO-BULLETIN-OF-27-DECEMBER-1967
> HCO-BULLETIN-OF-5-MARCH-1965
> HCO-BULLETIN-OF-18-APRIL-AD15
> HCO-BULLETIN-OF-11-OCTOBER-1967
> HCO-BULLETIN-OF-9-NOVEMBER-1967
> HCO-POLICY-LETTER-OF-22-NOVEMBER-1967
> HCO-BULLETIN-OF-28-NOVEMBER-1967
> HCO-BULLETIN-OF-27-DECEMBER-1967
>
> OK, this works!
>
> ************************************
>
> Now mv "$i" "$f"
> Exactly as used in your script.
>
> results?
>
> ls ..
>
> HCO-BULLETIN-OF-5-MARCH-1965?* xx10* xx101* xx103* y
> x xx100 xx102* xx104*
> z
>
> It does not work like yours. And adds a space
> as a final insult.
> Which space is NOT at end of echo "$f" >> F names

   No, but there's probably a ^M. Post the output of:

hexdump -C F

> This is bash 2.05 B patch level 0. As supplied by
> Mandrake 10.1.
>
> The error messages I get when running this script are:
> mv: cannot move `xx10' to 1`': No such file or directory

    NO, IT'S NOT!!! PLEASE DO NOT RETYPE ERROR MESSAGES!!!

> This for all remaining xxNN files.

   Then read the error message. Try to understand what it is telling
   you.

> Why your mv works, and not mine is a mystery to me.
> But this is where the mystery space is coming from also,
> obviously since the one file that did work with my script
> has one also.
>
> It sure look like a mv bug to me.
>
> Next step, googling for info on possible broken mv.

   Don't waste your time; mv is NOT broken.

   Try this:

CR=$'\r'
for i in xx*
do
  f=$(grep -m 1 HCO "$i")
  if [ -z "$f" ]
  then
    printf "File: %s, no HCO found; skipping\n" "$i"
    continue
  fi
  f=${f%$CR}
  while :
  do
     case $f in
        \ *) f=${f# } ;; ## remove leading space
        *\ ) f=${f% } ;; ## remove trailing space
        *\ *) f=${f// /-} ;; ## convert spaces to hyphens
        *--*) f=${f//--/-} ;; ## convert multiple hyphens to a single hyphen
        *) break ;; ## nothing more needs doing; exit the loop
     esac
  done
  mv "$i" "$f"
done

-- 
    Chris F.A. Johnson                     <http://cfaj.freeshell.org>
    ==================================================================
    Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress
    <http://www.torfree.net/~chris/books/cfaj/ssr.html>


Relevant Pages

  • Re: Problems converting vbscript to perl
    ... Here is the script. ... what error messages are you getting? ...
    (perl.beginners)
  • Re: What is the need for @ symbol in php script
    ... The '@' just suppresses error messages, which is never a good idea ... A fatal error will ALWAYS stop your script from working - that's why ... Hardly cumbersome, compared to managing code with list, and ... extractfunction makes the process easier. ...
    (comp.lang.php)
  • Re: RFC: Building the Perfect Tabbed Pane (an tutorial article)
    ... manipulation. ... that logic must reside in a script block ... head shouldn't be an issue as long as the script is after the closing ...
    (comp.lang.javascript)
  • Re: Javascript: string detection
    ... this script doesn' work, probably, because of the wrong syntax. ... A complete syntax check is better left to the server where existing code is more readily available and not a burden on the client. ... Be aware that even the regular expression above is restrictive as it won't accept literal IPv6 addresses, those that contain display names, or comments in some locations. ...
    (comp.lang.javascript)
  • Re: IE Wont Open
    ... notification about every script error." ... Open Internet Explorer. ... Two download versions are available for Windows Script 5.6. ... please post back to this thread with the details and any error messages. ...
    (microsoft.public.windows.inetexplorer.ie6.browser)