RE: Yet another RAID Question (YARQ)

From: Sandy Rutherford (sandy_at_krvarr.bc.ca)
Date: 06/23/05

  • Next message: Erik Nørgaard: "Re: FreeBSD & mini-ITX"
    Date: Thu, 23 Jun 2005 01:14:56 -0700
    To: "Ted Mittelstaedt" <tedm@toybox.placo.com>
    
    

    >>>>> On Wed, 22 Jun 2005 23:37:20 -0700,
    >>>>> "Ted Mittelstaedt" <tedm@toybox.placo.com> said:

    > Seagate wrote a paper on this titled:

    > "Seagate Technology Paper 338.1 Estimating Drive Reliability in
    > Desktop Computers and Consumer Electronic Systems"

    > that explains how they define MTBF. Basically, they define MTBF as
    > what percentage of disks will fail in the FIRST year.

    Is this in the public domain? I wouldn't mind having a look at it.

    > What they are saying is if you purchase 160 Cheetahs and run them at
    > 100% duty cycle for 1 year then there is 100% chance that 1 out of the
    > 160 will fail.

    > Thus, if you only purchase 80 disks and run them at 100% duty cycle for 1
    > year, then you only have a 50% chance that 1 will fail. And so on.

    > Ain't statistics grand? You can make them say anything! For an encore
    > Seagate went on to prove that their CEO would live 3 centuries
    > by statistical grouping. :-)

    Now don't knock statistics. The problem does not lie with statistics,
    but with its misuse by people who do not understand what they are
    doing. No, I am not a statistician; however, I am a mathematician.

    > So, in getting back to the gist of what I was saying, the issue is
    > as you mentioned standard deviation. I think we all understand that
    > in a disk drive assembly line that it's all robotic, and that there
    > is an extremely high chance that disk drives that are within a few
    > serial numbers of each other are going to have virtually identical
    > characteristics. In fact I would say using the Seagate MTBF definition,
    > that 1 in every 160 drives manufactured in a particular run is going
    > to have a significant enough deviation to fail at a significantly
    > different
    > period of time, given identical workload.

    I am not so sure. If we were talking about can openers, I would
    agree. However, a disk drive is basically a mechanical object which
    performs huge numbers of mechanical actions over the course of a
    number of years. Even extremely minute variations in the
    physical characteristics of the materials could lead to substantive
    variations over time. However, the operative word here is "could".
    Real data is required. I tried to google for a relevant study, but
    came up empty. This surprised me as it seems like the sort of thing
    that masses of data should have been collected for.

    > In short you have better than 99% chance that if you install 2 brand
    > new Cheetahs that are from the same production run, they will have
    > virtually identical characteristics. And, failure due to wear is going
    > to be
    > very similar - there's only so many times the disk head can seek
    > before it's bearings are worn out - and your proposing to give them
    > the exact same usage.

    > I think the reason your seeing alternation is that the disks are
    > so damn fast that they complete their reads well before their internal
    > buffers have finished emptying themselves over the SCSI bus to the
    > array card. In other words, you wasted your money on your fast
    > disks,

    Not much money. After having been burned by failures of lower end
    drives, I bought high-end stuff on EBay. Made me nervous at the
    beginning, because who knows how many flights of stairs the drive
    bounced down before it was popped into the mail, and for that matter,
    who knows how many flights of stairs it bounced down while it was in
    the mail. However, so far it has worked out quite well.

    > if you had used slower disks you would see identical read performance
    > but you would see less alternative flickering
    > and more simultaneous and continuous activity.

    > If you got a faster array card you wouldn't see the alternative
    > flickering.

    > Or, it could be the PCI bus not being fast enough for the array card.

    It's almost certainly the PCI bus. The DAC1100, although not
    state-of-the-art, is still reasonably fast. It has 3 U2W channels and
    it could certainly max out my PCI bus.

    > Ah well, a computer just wouldn't be a computer without blinking
    > lights on it!!! ;-)

    Gotta agree there;-) Once upon a time I had the dip switch settings
    required to boot a PDP-11 from the front panel memorized, because I
    had to do it so often. Our data runs extended far beyond the typical
    uptime, so we did checkpoints by dumping the relevant bits of core to
    a teletype and I used to have to re-type in the data from the teletype
    when we brought it back up after a crash. Even on an old PDP-11, this
    took a while. We needed 3 months+ of uptime and we did well if we
    could keep that thing up for longer than a week. I became
    well-acquainted with those dip switches.

    Sandy
    _______________________________________________
    freebsd-questions@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-questions
    To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"


  • Next message: Erik Nørgaard: "Re: FreeBSD & mini-ITX"

    Relevant Pages

    • Re: Signed and unsigned int
      ... is not guaranteed that a signed int has more than 16 bits, ... file size or disk partition size. ... Disk drives nowadays are much smarter than the old ... The number goes up if your DVD supports double-sided ...
      (comp.lang.c)
    • RE: Yet another RAID Question (YARQ)
      ... and the more head movement the quicker the disk wears out. ... lights on all the disk drives all the time. ... >deviation data quoted by a manufacturer, which of course makes any MTBF ...
      (freebsd-questions)
    • Re: Any suggestions as to the code for making automated file names?
      ... may be that modern disk drives do successfully overwrite data. ... which unfortunately was occupied by the patient's head. ... fairly straightforward, so the edge effects ...
      (microsoft.public.vc.mfc)
    • RE: Yet another RAID Question (YARQ)
      ... >> that explains how they define MTBF. ... >> what percentage of disks will fail in the FIRST year. ... >> in a disk drive assembly line that it's all robotic, ... >> is an extremely high chance that disk drives that are within a few ...
      (freebsd-questions)
    • FreeBSD 6.2-STABLE crash under moderate disk activity
      ... FreeBSD 6.2-RELEASE and 6.2-STABLE both repeatedly crash with moderate disk activity. ... The problem occurs with two different disk drives, ... <ACPI PCI bus> on pcib0 ... 2 ports with 2 removable, ...
      (freebsd-stable)