Re: Finding duplicate files



ChrisV wrote:

I have a very large directory structure which I need to copy to a
Windows server. Unfortunately there are several directories which have
multiple files which have the same name but different case which is
obviously not going to be tolerated by Windows. I need to make a list
of all of these files so that the user can determine whether the
duplicates need to be moved, renamed, or deleted. I've searched around
but I can't find a script that will do this and I'm not very good with
regular expressions or recursion =)

The directory structure in question resides on a Fedora Core 4 server.

If renaming the files beforehand is acceptable, you could scan your tree and
rename files that would clash on windows with some significant and visually
outstanding suffix, so that users will immediately see where there are
problems. The following script produces a shell script which, when run on
the linux server, renames the files as described above.

Here's an example, assuming no file name contains ', and checks only for
duplicate files (not directories).

$ ls
AAA AAa Aaa CCC aAA aaA aaa bbb ccc ddd
$ find . -type f | awk -F '/' -v OFS="/" -v sq="'" '{
if (tolower($0) in a) {
o=$0;
$NF=$NF sprintf("-CHECK_THIS_ONE-%03d",++i[tolower($0)]);
print "mv "sq o sq" "sq $0 sq
} else {
a[tolower($0)]
}
}'
mv './AAa' './AAa-CHECK_THIS_ONE-001'
mv './Aaa' './Aaa-CHECK_THIS_ONE-002'
mv './aAA' './aAA-CHECK_THIS_ONE-003'
mv './aaA' './aaA-CHECK_THIS_ONE-004'
mv './aaa' './aaa-CHECK_THIS_ONE-005'
mv './ccc' './ccc-CHECK_THIS_ONE-001'

After you run the generated script, you can safely copy everything to
windows, and instruct users to look for files with "Check_this_one" (or any
other string you choose, for that matter) in the name.

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.
.



Relevant Pages

  • Re: Windows/Macro Language Info?
    ... The point is that malware is often using Windows _features_. ... I totally understand the difference between client and server side (and you ... subverted by script code (the facilities to change file size, dates, etc. ...
    (comp.lang.cobol)
  • Re: W2k3 NETBIOS name change?
    ... >You need to reboot twice all member workstations, ... >2000, Windows XP, and Windows Server 2003 Server family ... the rename domain. ...
    (microsoft.public.windows.server.migration)
  • RE: NFS on w2k3 server question
    ... new Windows 2K3 server and it will work properly after migrating from ... Window NT4.0 to Windows 2K3. ... logon script migration is specific to how the logon ... it is recommended that you contact Microsoft Customer Support ...
    (microsoft.public.windows.server.migration)
  • Re: Windows 2008 Limitlogin
    ... We are using windows 2008 64 bit Enterprise, we are trying to limit concurrent user login using limit login but unfortunetely always fail. ... I'm one of the people that Paul was referring to who has written a script to control concurrent sessions. ... It currently prevents regular users from logging in more than once by first warning them of where their other session exists and then uses WMI to log the user off forcefully. ... For admins who log in to Windows Server, a separate perl script that ties into a 3rd party perl module must be used because for some reason WMI on Server is ignored. ...
    (microsoft.public.windows.server.active_directory)
  • RIS prestaging problem
    ... When trying to prestage Windows XP clients in a Windows 2003 Server ... The script also sets the Remote ...
    (microsoft.public.win2000.setup)