Re: NFS/SMB storage solution



Logan Shaw wrote:
Mogens V. wrote:

I'm considering/planning a new storage system.
We're currently ~40 people in a development business, expecting continued growth. All users' homes are NFS and/or SMB mounted.

ATM moving ~10GB/day on a 500GB system based on FreeBSD for NFS/SMB, this will increase a good deal within the next half to one year.

Our experiences with Linux and FreeBSD (some before my time as sysadm, so I cannot fully evaluate those problems), have been NFS timeouts, usually some 2-3 minutes a time, with mounts magically coming back, no log entries et al...


No log entries on the server, you mean?

On both sides...

That's possibly fairly easy to diagnose (which isn't the same thing
as fixing). Run tcpdump on the client and server, filter to see
only NFS-related stuff between those two hosts, and see what who is
not responding to whom. On very common cause of NFS timeouts used
to be a lack of threads on the NFS server: each thread answers a
request (basically a packet in the old days of UDP-based NFS), and
there is a fixed number of threads, so if they are all busy, the
packet is simply dropped, and the client is expected to retransmit
it. The client can't tell the difference between this and a
crashed/missing/unreachable NFS server, so you see timeouts.

As I recall, there are two solutions to this: the first is to
increase the number of threads. This ensures that there will
always be one available when a request comes in. However, it's
not a perfect solution, because for a client to believe the NFS
server is OK, the server has to not only assign the request to
a thread but also do the physical I/O (read or write from disk)
and return a packet. That means that the more I/O load you have
on your server, the longer clients are going to have to wait, and
the more danger of timeouts. As far as I know, the only solution
to this (other than increasing disk I/O capacity of the server)
is to increase the timeouts on the clients and increase the time
before they retransmit requests (which further bog down the
server).

Anyway, the point is that timeouts aren't necessarily an indication
of a bug or anything like that. Sometimes they can just be an
indicator that performance tuning is needed.

Your above comments made me think a Bit. This FreeBSD server use a Promise TX4 4ch SATA card. I know the TX has had driver issues for a long time (maybe still has) on Linux. Could be the same on FreeBSD.
Disk I/O problems due to driver/firmware issues, combined with no tuning so far, might cause those timeout problems.
I only got the job 2½ month ago, so I haven't been into tuning yet.

At this stage, I'm unsure of:
Choise of technology:
. Go with the Sun-setup
. Continue with either Linux or FreeBSD
. Expect to be able to solve Linux/FreeBSD NFS timeouts/stability
. Continue with NFS, or look at AFS, CODA et al...


I suspect that as far as choosing Linux, FreeBSD, or Sun goes, you
are going to have advantages and disadvantages with any of those
in the area of stability and performance, because NFS compatibility
between client and server just varies with the different combinations.

As far as I know, Coda isn't really ready for prime time, so I would
personally avoid that. AFS isn't a bad option, but I'm not really
sure if it offers any huge advantages in your situation (which,
as far as you've described, is entirely confined to a small LAN).

On the other hand, if the NFS interoperability problems turn out to
be unsolvable, then any other filesystem might be a better choice. :-)
AFS probably has an advantage here since as far as I know there aren't
multiple independent implementations of AFS, and interoperability is
usually better between ports of the same implementation of something
than it is between independent implementations.

Agreed. I have bee thinking about how best to make different *nix's interoperate over NFS.
We're using RHEL/Centos/FC4, SLES, Debian, Gentoo, OSX as both Ws's and build/test servers, with FreeBSD only as server. Plus a lonesome Irix and Solaris for occational builds. Our True64 may never run again ;)
Getting so many different NFS implementations interoperate may prove difficult. It's curreently NFS3, and maybe a wonder problems aren't bigger than the 2-3 mins outage, plus some annoying SSH timeouts.
Otherwise it's working. I'll start by give the tuning a go, while trying to work out if the Promise driver/firmware could be another source.

Sun solution:


. Which filesystem I should choose, Sun's (zfs?) or Veritas


I don't think ZFS is ready for prime time either, yet, although it
promises to be ready pretty soon (maybe 6 months). Because it uses
copy on write for all writes, thus turning essentially all writes
into sequential I/O, it should be a pretty good combination with NFS,
since NFS really benefits from a filesystem that can complete write
requests quickly.

AFAIK, ZFS should be deemed ready with the upcoming june/july Solaris release. And yes, I was thinking about those features, combined with what others have described as quite good disk admin features.

Personally, I wouldn't go with Veritas, because I just don't see the
benefit over plain UFS, now that UFS has support for large filesystems,
journaling, etc. Especially since UFS is built in, which means
administration is simpler.

Thanks for all pointers, very helpful, I'm sure.

--
Kind regards,
Mogens V.



"During ongoing internal quality testing of the HP ProLiant DL585
server, a potential vulnerability has been identified, where a
remote unauthorized user may gain access to the server controls,
but only when the server is powered down."

.



Relevant Pages

  • Errors writing large files via NFS
    ... files larger than a certain size to a NFS server. ... client systems, although the definition of "too large" varies. ... network paths involved, I'm pretty sure we're not seeing a network problem. ...
    (Tru64-UNIX-Managers)
  • Re: Still getting NFS client locking up
    ... > the same NFS lockups. ... > Reading from the server works perfectly all the time. ... > NFS CLIENT: ... in particular, look at traces for any client blocked in NFS, ...
    (freebsd-current)
  • Re: Bugs in mkfs.xfs, device mapper, xfs, and /dev/ram
    ... it gets the first ENOSPC error back from the server at around 1.5GB ... the data that gets ENOSPC errors is ... I'm no great expert but isn't this a design flaw in NFS? ... corruption because the NFS client thinks it has written the data ...
    (Linux-Kernel)
  • Re: Problems mounting nfs from freebsd to Mac.
    ... Problems mounting nfs from freebsd to Mac. ... I've got an nfs server that's refusing to mount one client - via one ... That elimintes NFS on the client, and -most- of the NFS config on the ...
    (freebsd-questions)
  • Re: Help me replace some Windows installations
    ... > Possible with untrusted clients in SMB, and trusted clients in NFS. ... >> trust every client that might be connected to this network. ... > Still, user ABC on client, accesses to server with rights of the user ... > which Peter already told you about, or use SMB for Linux to Linux ...
    (comp.os.linux.setup)