Re: 5.0.6 grinds to a complete halt
From: Barry Swane (bswane_at_rogers.com)
Date: 05/25/04
- Next message: Michael Suddith: "Re: 5.0.6 grinds to a complete halt"
- Previous message: Rick C: "Re: FIM keeps dying error with vsifax"
- In reply to: Bela Lubkin: "Re: 5.0.6 grinds to a complete halt"
- Next in thread: Michael Suddith: "Re: 5.0.6 grinds to a complete halt"
- Reply: Michael Suddith: "Re: 5.0.6 grinds to a complete halt"
- Reply: Bela Lubkin: "Re: 5.0.6 grinds to a complete halt"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 25 May 2004 08:15:43 -0700
Bela Lubkin <belal@sco.com> wrote in message news:<20040520080340.GS10272@sco.com>...
I think we are getting darned close to the root of my problem now.
I found the following on Seagate Web site:(these drives are ST336607)
-------------------------------------------------------------
Ultra 320 Time-Out Firmware Upgrade
The Ultra 320 firmware update applies to the following Seagate hard
drives:
Cheetah 10K.6: ST3146807; ST373307; ST336607
Cheetah 15K.3: ST373453; ST336753; ST318453
Problem description
Some Seagate Cheetah 10K.6 hard drives with OEM firmware up to 0006
and Cheetah 15K.3 hard drives with OEM firmware up to 0005 on Ultra
320 SCSI host adapters are experiencing time out issues when running
RAID 0, 1 and 5 with some host adapters. This issue has been observed
using U320 Adaptec and LSI SCSI controllers, but may not limited to
these host adapter manufacturers.
Root cause
The Cheetah 10K.6 and 15K.3 drives (models listed above) will
sometimes hang due to an issue in the firmware when reading and
writing at U320 packetized mode.
Corrective Action
Seagate has modified the firmware and added an additional register
check while in U320 packetized mode thus preventing the time out issue
due to system hang.
Contact your host adapter manufacturer for the latest BIOS revision
for your U320 SCSI controller.
To obtain the unique firmware download certificate number for your
hard drive, contact Seagate Technical Support via phone or by email at
discsupport@seagate.com.
Please have your drive part number (Example: 9U8006-001) and the
current firmware level available when contacting Technical Support.
For the 10K.6 models the new code will be OEM 0007
For the 15K.3 models the new code will be OEM 0006
Please backup any important files before upgrading the firmware.
Seagate is not responsible for any data loss.
Copyright ©2004, Seagate Technology LLC | About Seagate | Privacy
Policy | Legal
--------------------------------------------------
That last comment is a little chilling-- the implication being it
might blow out the entire RAID 5 hard drive?
Has anybody else had to deal with this?
Barry
Bela Lubkin <belal@sco.com> wrote in message news:<20040520080340.GS10272@sco.com>...
> Barry Swane wrote:
>
> > It appears I declared victory a little too early.
>
> > Killing the amirdmon process did indeed have salutory effects on the
> > performance. Customer stopped reporting noticeable slowness in system
> > performance.
> >
> > I'm now inclined to theorize that Bela's suggestion is correct-- that
> > the disk (RAID 5) has stopped responding completely. Would that be
> > consistent with the reported behavior? i.e., if you are in a shell,
> > you can type characters, and they echo, and you can do a carriage
> > return-- but nothing is ever executed?
>
> Perfectly consistent. OpenServer is very conservative about swapping;
> it never pushes process pages out to swap unless it's out of memory. On
> modern systems this generally means that swap is never touched. Thus,
> any active process resides entirely in memory. Also, the kernel itself
> is all hard-loaded in RAM -- none of it is pagable. If the disk
> subsystem hangs, the kernel continues to function. Each individual
> process continues to function until the first time it tries to access
> the disk.
>
> For instance, the program that provides the login prompt (`getty`, for
> console ttys) will continue to accept and echo characters. If you hit
> return on a name, it goes to exec `login`, which involves disk access,
> so you never get to the password prompt.
>
> If you're sitting at a shell prompt, you can type; you can run internal
> commands like "echo foo"; but any attempt to run a binary will hang.
> (Even if the binary is fully cached, its access time needs to be updated
> on disk.)
>
> > Also, this seems major-league weird-- that the system can perform
> > absolutely normally, all the time-- except once in a while it loses
> > contact with the disk?
>
> It isn't particularly weird. What you're describing is a fairly
> standard set of symptoms for a variety of conditions including SCSI bus
> timing, parity or signal integrity problems; internal errors in a disk
> drive; and so on. You might rightly expect a RAID controller to be a
> bit more thorough about error recovery, but apparently this particular
> one -- in this particular failure case, whatever it is -- isn't.
>
> You also mischaracterizze the situation here. It _isn't_ performing
> absolutely normally. It's running 6 times slower than older and
> presumably much slower machines.
>
> But I bet the two symptoms are actually unrelated, and you have two
> separate problems to solve. (1) complex application jobs run much more
> slowly than expected; (2) the disk subsystem occasionally hangs.
>
- Next message: Michael Suddith: "Re: 5.0.6 grinds to a complete halt"
- Previous message: Rick C: "Re: FIM keeps dying error with vsifax"
- In reply to: Bela Lubkin: "Re: 5.0.6 grinds to a complete halt"
- Next in thread: Michael Suddith: "Re: 5.0.6 grinds to a complete halt"
- Reply: Michael Suddith: "Re: 5.0.6 grinds to a complete halt"
- Reply: Bela Lubkin: "Re: 5.0.6 grinds to a complete halt"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|