Re: Finding duplicate files
- From: pk <pk@xxxxxxxxxx>
- Date: Thu, 03 Apr 2008 20:17:28 +0200
ChrisV wrote:
I have a very large directory structure which I need to copy to a
Windows server. Unfortunately there are several directories which have
multiple files which have the same name but different case which is
obviously not going to be tolerated by Windows. I need to make a list
of all of these files so that the user can determine whether the
duplicates need to be moved, renamed, or deleted. I've searched around
but I can't find a script that will do this and I'm not very good with
regular expressions or recursion =)
The directory structure in question resides on a Fedora Core 4 server.
If renaming the files beforehand is acceptable, you could scan your tree and
rename files that would clash on windows with some significant and visually
outstanding suffix, so that users will immediately see where there are
problems. The following script produces a shell script which, when run on
the linux server, renames the files as described above.
Here's an example, assuming no file name contains ', and checks only for
duplicate files (not directories).
$ ls
AAA AAa Aaa CCC aAA aaA aaa bbb ccc ddd
$ find . -type f | awk -F '/' -v OFS="/" -v sq="'" '{
if (tolower($0) in a) {
o=$0;
$NF=$NF sprintf("-CHECK_THIS_ONE-%03d",++i[tolower($0)]);
print "mv "sq o sq" "sq $0 sq
} else {
a[tolower($0)]
}
}'
mv './AAa' './AAa-CHECK_THIS_ONE-001'
mv './Aaa' './Aaa-CHECK_THIS_ONE-002'
mv './aAA' './aAA-CHECK_THIS_ONE-003'
mv './aaA' './aaA-CHECK_THIS_ONE-004'
mv './aaa' './aaa-CHECK_THIS_ONE-005'
mv './ccc' './ccc-CHECK_THIS_ONE-001'
After you run the generated script, you can safely copy everything to
windows, and instruct users to look for files with "Check_this_one" (or any
other string you choose, for that matter) in the name.
--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.
.
- References:
- Finding duplicate files
- From: ChrisV
- Finding duplicate files
- Prev by Date: Re: Finding absolute path of a script from within
- Next by Date: Re: Finding absolute path of a script from within
- Previous by thread: Re: Finding duplicate files
- Next by thread: strange Dos metacharacters
- Index(es):
Relevant Pages
|