Re: make -j as a stress test (was: Re: Quality of FreeBSD) [WARNING - 6.0-BETA1 still hosed!]

From: Karl Denninger (karl_at_denninger.net)
Date: 07/24/05

  • Next message: Karl Denninger: "Re: make -j as a stress test (was: Re: Quality of FreeBSD) [WARNING - 6.0-BETA1 still hosed!]"
    Date: Sat, 23 Jul 2005 20:35:10 -0500
    To: freebsd-stable@freebsd.org
    
    

    On Sat, Jul 23, 2005 at 08:53:02PM +0100, Robert Watson wrote:
    >
    > On Fri, 22 Jul 2005, Danny Howard wrote:
    >
    > >While I agree with Karl that introducing instability is a very bad
    > >thing, I guess we now have an answer to Karl's vexation yesterday: [
    > >http://lists.freebsd.org/pipermail/freebsd-stable/2005-July/017210.html
    > >]
    > >
    > > "What I don't understand Robert is why Soren's code is "too
    > > sensitive" to commit, but the explosive reduction in stability
    > > that the changes made between 4.x and 5.3 caused weren't
    > > enough to back THAT out until it could be fixed."
    > >
    > >The answer would seem to be that when someone actually does test the
    > >untested code, it is even worse than the code we are already upset with.
    > >:)
    >
    > I think part of the confusion here has to do with the nature of "tested
    > code". Hardware device drivers can be highly tested, but fail to work on
    > specific hardware that isn't in the hands of the people developing or
    > testing with. The 6.x code is presumably running on tens of thousands if
    > not hundreds of thousands of machines quite successfully under very high
    > load, running this same code. So far, since people having problems with
    > the 5.x code were reminded of the patches, we've seen one post of "this
    > works much better!" and one post of "this is even worse!", which should
    > make clear the challenges involved, and how important it is that as many
    > people as possible help in the testing process. And that it doesn't take
    > a whole lot of work to provide at least a little help in the testing --
    > Karl was able to uncover a problem by simply booting the installed
    > system, which was presumably an investment of less then twenty minutes of
    > his time. I'm sure Soren would love a donation of some nice new server
    > hardware if you happen to have it to spare, but getting involved in
    > testing code is the next best thing :-).
    >
    > Robert N M Watson

    No, Karl's investment involved a complete rebuild of his sandbox machine,
    a pullover of the hardware, and about eight hours of clock time for the
    resync to take place.

    My actual KEYBOARD time may have been 20 minutes or so, but the elapsed
    development time lost was approximately one full day.

    Also, the point here is that this isn't exactly esoteric hardware. It is
    bland stock-grade equipment from DIFFERENT manufacturers - its not even
    confined to one particular brand!

    Specifically, show me a PCI SATA card that DOES NOT use the SII chipset.
    Good luck, unless you're talking about $500+ embedded-RAID devices which
    decouple the disk chipset with some kind of internal on-card processor.

    Adaptec and Bustek, the two "brands" that everyone knows, both use the
    same chipset (which exhibits the problem) as do a whole crapload of
    no-name asian clone cards (which I presume ALSO show the same issue, but
    I'm not about to buy six different asian clones of the Adaptec and Bustek
    cards to try to find out - I already bought TWO different vendor's
    products, and both fail identically.)

    The disks which cause this problem are also from two DIFFERENT
    manufacturers - Maxtor and Hitachi. I bought the Hitachi drives after
    the original problem was reported and the developers said, basically,
    "Maxtor Sucks, but Hitachi is far better."

    Ok, if Maxtor sucks and Hitachi is far better, how come they BOTH have the
    same problem? Hmmmm... maybe neither Maxtor or Hitachi suck, and the CODE
    is bad eh?

    Most telling, the SAME DISK attached to the motherboard Intel ICH adapter
    DOES NOT have problems. My production machine happily grinds along with
    two SATA drives on the ICH motherboard SATA adapter - but as soon as I
    stick a THIRD disk on via the PCI bus on that card and do a "make
    buildworld" (on 5.4), BOOM!

    Pretty clearly, this is not a disk problem - it is some kind of (timing?)
    issue with the SII chipset on the PCI bus and FreeBSD's support for it.

    PR being filed now.

    --
    -- 
    Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist
    http://www.denninger.net	My home on the net - links to everything I do!
    http://scubaforum.org		Your UNCENSORED place to talk about DIVING!
    http://homecuda.com		Emerald Coast: Buy / sell homes, cars, boats!
    http://genesis3.blogspot.com	Musings Of A Sentient Mind
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
    

  • Next message: Karl Denninger: "Re: make -j as a stress test (was: Re: Quality of FreeBSD) [WARNING - 6.0-BETA1 still hosed!]"

    Relevant Pages

    • Re: Slow XP
      ... Did you update hardware drivers from the manufacturer when you upgraded OS ... When was the last time you checked your hard disk for problems? ... Microsoft has these suggestions for Protecting your computer from the ... The system restore feature is a new one - first appearing in Windows ...
      (microsoft.public.windowsxp.perform_maintain)
    • [HPADM] Re: [hpadm] disk problem
      ... Please check the disk using ioscan, ... Disk at hardware path 10/12.9.0: Hardware failure ... Product Identifier: SCSI Disk ...
      (HP-UX-Admin)
    • Re: Error Message on Server
      ... > file, disk controller error, virus infection, or memory hardware problems. ... Check the hardware manufacturer's Web site for updates to disk adapter ...
      (microsoft.public.backoffice.smallbiz2000)
    • Re: Choosing the proper disk setup. I need help and advice.
      ... established that inadequate disk I/O is in fact causing your performance ... inadequeate CPU speed, or network I/O as the bottleneck. ... <<Will the performance increases be worth using this configuration?>> ... which you can know that any hardware changes actually improved things. ...
      (microsoft.public.windows.server.general)
    • 2.6.19: EFAIL on MPATH failback
      ... Both boxes are multipath to Hitachi USP TagmaStore. ... disk sdf: ... from lun 0 while scanning, ...
      (Linux-Kernel)

    Loading