Re: Mount problems



_firstname_@lr_dot_los-gatos_dot_ca_dot_us wrote:
In article <GUTui.47278$rX4.23852@pd7urf2no>,
agnelo <agnelo.delacrotche@xxxxxxxxx> wrote:
Allthough Windows can preserve cases sometimes, FAT filesystems are case insensitive.

That's an oversimplification. Windows correctly stores and retrieves
case on file names, even on FAT. On some file systems, it can do so
directly in the directory; on others, it needs tricks such as the ones
that are used on FAT.

The only part where Windows is case-insensitive is when looking up
files by name: looking for "bill" will match files with names such as
"bill", "Bill", "BILL", and "bIlL".

As a corollary, Windows in most cases prohibits creating a file if the
new filename is only distinguished by case from an existing file. But
then, Windows also disallows many other types of file names, for
example "NUL" and "lpt1", in addition to any file name that contains
shell metacharacters such as ">". Don't expect file creation to be as
liberal as on Unix (where any byte is valid in a file name, except "/"
and NUL).

Face it: Different operating systems have different rules for what
constitutes a valid file name, and when file names match.

It wouldn't make sense to display the filenames as they appear in Windows, because they are not real.

Incorrect. They are absolutely real. Windows stores "bIlL" in the
directory (or in an appendix to the directory which is functionally
equivalent to the directory, because without studying the on-disk
format you can't distinguish it from the directory). It chooses to
display that file name occasionally as "BILL" in some tools, and it
chooses to allow a search for a file named "BiLl" to match "bIlL".

An attempt to write two files with the same name and different cases would more likely end up in a filesystem error under Windows.

True, but it does not mean that the mixed-case names are unreal.
Windows will also give many other errors, as explained above.


OK. It was indeed an oversimplification. So correct me if I'm wrong, because I haven't touched Windows for years ( and I'm perfectly happy with that situation :-) )

The long file names (LFN) are "real" and are written in the FAT on top of the directory. But what matters is still the 8.3 names. Maybe because FAT32 is juste a FAT16 with long file names 'capabilities' and bigger cluster size. If you delete the LFN entry on a low level with a disk editor, you haven't in fact deleted the file ( i.e you haven't removed the first letter of the 8.3 file name, which on FAT would declare the file as 'deleted'). If, on the contrary, you delete the short name entry, the file is gone (marked as deleted) and you end up with an orphan LFN, which would be considered as a file system error.

So calling the long file names 'not real' or 'faked' was an oversimplification. But is it not FAT32 which overcomplicates things ? As you mentionned above you cannot have bIll and Bill. And, because of this limitation, I find it reasonnable to display filenames on FAT partitions either all in uppercase or alle in lowercase, rather than as they would appear in Windows, precisely because you cannot rename bIll in Bill or create BiLl in the same directory.


So why should OpenBSD display faked filenames ?

OpenBSD should do what is convenient and useful to its users. In my
opinion, if a file name shows up as "bIlL" when running Windows,
OpenBSD should do the best it can to match that file name. That also
implies that the FAT (and NTFS) file system code on OpenBSD should
disallow creating a file named "BiLl" if "bIlL" is already present,
because allowing it would likely cause problems for the user later on,
when accessing the same file system from Windows.

This is perfectly right to display them either in uppercase or in lowercase ( depending on mount options ). Using mixed lower and uppercases for filenames on a FAT partition is not necessary and rather a bad idea.

Wrong - see above.

There is a whole other set of problems you haven't even brought up.
What character encoding are filenames in? If you do a "ls" (or
equivalently a series of readdir() system calls), you get a whole
bunch of strings back, which you interpret as a series of bytes.
Imagine you are running in a locale and with a display device that can
only display iso-8859-1, and imagine that the files were created in a
local that uses utf-8. You'll get a bunch of binary gobblygook back
in the file name, and you will display it wrong. This is bad. Now
the problem is that Windows stores file names in the Windows 16-bit
character set (which is extremly close to some older version of
Unicode, encoded in UCS-2), but Unix in general is locale-agnostic in
the kernel, and treats file names as opaque binary strings.

This is a fundamentally unsolvable problem, and will lead to files
whose names are represented wrong. There are some fixes (the common
one is for the file system to store everything in Unicode-4 encoded in
UTF-8, but convert to the locale of the calling user process on the
way in and out of the kernel), but they tend to have unintended
consequences. Here is my favorite horror example: Say you have a
directory that contains exactly two files, one named "a umlaut" (the
german a with two dots over it), the second one named "a grave" (the
french a with a little accent over it). Both files have single-glyph
file names. In iso-8859-1, the string representations of both file
names is one byte long (in UTF-8 encoded Unicode it is longer).

This is a recursive nightmare. Keep trying to convince Windows users not to use umlaut or accents in filenames ! But I'm afraid they won't listen.


Now say that you are logging into your Unix machine using an
old-fashioned terminal, such as a VT-100, which can only display 7-bit
ASCII. You set your locale accordingly. You do an ls command. You
will see two files, both of which are named "a" (which is the least
incorrect way to transcribe those two letters into 7-bit ASCII). This
is astonishing: how come you can have two files with the same name in
the same directory? Now you try to create a file that shall be named
"a", and your creat() call will succeed, because there is no file
named "a" yet, even though an ls (or a readdir!) will clearly show
that there are already two of those files. This may sound totally
illogical, but is (unfortunately) the most correct behavior a file
system can have where the local conversion is not informantion
preserving.

--
Ralph Becker-Szendy _firstname_@lr_dot_los-gatos_dot_ca_dot_us

.



Relevant Pages

  • Re: Trying to design low level hard disk manipulation program
    ... with Windows NT V3.1 in 1993, whereas FAT32 debuted with Windows 95 OSR2 in 1996. ... Yes but FAT32 is hardly a new file system. ... It is just a slightly modified version of FAT system that was used by DOS. ... This was enough for the original floppies, but had to be extended to 16-bit cluster numbers to handle the 10 MB hard drive on the IBM XT. ...
    (comp.arch)
  • Re: Mount problems
    ... the first letter of the 8.3 file name, which on FAT would declare the ... If you go into any file system, on any OS, with a low-level tool (such ... You are trying to force a Windows file system into Unix-ish ... kernel doesn't know what string encoding the user process is using, ...
    (comp.unix.bsd.openbsd.misc)
  • Re: mkfs.ext3 or mkfs.msdos
    ... idea to make their file system ext3 since I have only one Windows and do ... Windows in my laptop. ... Is ext3 more or less a FAT type file system? ...
    (Ubuntu)
  • Re: Mount problems
    ... Windows correctly stores and retrieves ... case on file names, even on FAT. ... The only part where Windows is case-insensitive is when looking up ... when accessing the same file system from Windows. ...
    (comp.unix.bsd.openbsd.misc)
  • Re: Did not "activate" my XP...???
    ... something that triggered Windows Product Activation ... Is it safe to write to the HD? ... Evacuate data from HD, test HD, test file system, exclude malware ... Make whatever non-HD boot disk you like; ...
    (microsoft.public.windowsxp.general)

Loading