Re: Need feedback on the A5200 storage array....

From: Dennis Clarke (dclarke_at_blastwave.org)
Date: 08/17/04


Date: Tue, 17 Aug 2004 13:06:53 -0400


> >> Well first of all, it's an antique. Sun EOL'd it early this year, and if
> >
> > No. It is not an antique. It is EOL. My Sparc20 may qualify as antique
> > as I can not run Solaris 10 on it but it runs Solaris 9 just fine.
>
> OK, I'll grant that 'antique' is an overstatement, but I will claim that
> most companies I know with them have retired them in the last year.

For someone who is looking at the A5200 I have to make the assumption that
they don't have the money for a big Hitachi array or a fast FC array with
hardware support. They are doing a bargain basement config most likely
for some prototype. In that case an A5200 is a good fit.

> > Who needs support for an A5200? That would serve no purpose. Simply
> > ensure that the FC-AL disks are under warrantee and then replace the
> > entire array if there is a problem. Better yet, since they are cheap, buy
> > two of them and then mirror across controllers.
>
> Well there's the thing. Big companies who want guaranteed vendor support

I was thinking along the lines of "bargain basement" as the OP was looking
at the A5200 and not a Hitachi array.

> > RAID5 is a concept whose time has come and gone. It no longer serves any
> > real purpose. The very name, which includes inexpensive disks in it, is
> > out of fashion. Five or eight years ago people had just cause to put
> > multiple 9Gb disks in a RAID5 configuration due to costs. These days we
> > simply buy two 73Gb FC disks and mirror them. Performance is great and so
> > is redundency. RAID5, in my opinion, should be avoided unless you have
> > fast hardware support for it. Even then it seems senseless to me.
>
> Dennis, this may be the most bizarre thing I've heard in um...ages.

Thanks. I am glad that I can entertain. :-)

> We just went through a massive string of meetings arguing in favour of
> RAID5 over mirroring in our Hitachi 9960 array.

I see a "Dilbert" cartoon in my head at the moment!

The OP was looking at an A5200 and I am guessing that he will need some
disk space but nothing massive.

> Obviously in this case we have the fast hardware support, but I don't
> see how you can call this senseless. Similarly with the NetApp gear, and
> RAID4.

Different planet entirely. Use RAID5 on that kind of hardware. In the
small world of a prototype of a blastwave server I would go with a stripe
and then mirror. On something a little larger, I would still go with
stripes and mirrors for another reason, I'll get to that later.

For a big array there is no cost savings really nor an admin advantage in
anything other than RAID5. Really, its a big box with storage in it and
at that level I really don't care what the guts are doing so long as it
performs its function. The CEO of a company does not look at such things
and if he needs to cut people from staff he will not say "whose idea was
it to go RAID5 on the Hitachi?" Most likely such things at that level are
just functional blocks and the RAID5 or stripe and mirror side of life is
just a nuance.

> > There are four ports on the back and you can split the backplane from
> > front to rear. You can get very nice performance out of these cheap
> > arrays by simply having enough GBICs and the time to tune the filesystem.
>
> > I agree, but also realize its strengths. This is a clear case of budget
> > meets requirements. Also be aware that you can connect the A5200 to
> > two servers at the same time and really cut costs.
>
> Aside from the RAID5 comments, there's nothing here I disagree with. I
> just wanted to make sure that the OP wasn't going to buy a cheap array
> off of eBay with the expectation of getting something comparable to the
> current state of the art HW RAID, for a fraction of the cost.

That would be nice! No chance though :-(

> I'll admit that my attitude is biased from working in the oil industry
> (especially these days), but once I start looking at spending tens of
> thousands on hardware, it had bloody well better be backed and
> guaranteed!

Ah yes, the oil industry. I guess you can burn money. My money. Other
peoples money. Everyones money. :-P

> As an aside, I would say that blastwave is a PERFECT example of where I'd
> heartily recommend one of these beasts--you're an admin for a nonprofit
> organisation, who clearly understands these things very well. My basement
> is another--my old multipack and SS20 are getting to be rather...antique. :-)

Great, now go to everyone in your company and get them to donate to
blastwave. I think that the maintainers would love a Hitachi array!

About the whole RAID5 comment. Let me see if I can expound on that in
some intelligent fashion.

I am most likely talking about my experiences with servers ( over the last
ten years or so ) that had internal RAID5 controllers. Now I am thinking
about Compaq Proliant servers and things like that. Not big servers with
big arrays. I was a sysadmin for an IBM 3090 mainframe for a while and
that was an interesting experience with a really really big server. We
did not use RAID5 there by the way but I don't think that is related.

>From time to time a server would crash and there would be some nasty
problem with Windows NT or 2000 on the server and I would be stuck with a
server which was using RAID5. I can't simply yank a disk and read its
contents. With a mirror I can.

Later in life I was using striped mirrors on all my smaller servers and
this meant that I could pull a set of disks in the event of a problem and
still read the data. With Solstice DiskSuite on Solaris 2.5.1 for either
x86 or Sparc I was able to create large stripe sets in external array
units. The old SparcCenter 2000E was great in that I could have three way
mirrors in the arrays connected to it and really know that I was safe from
a failure. I would take two arrays ( the old ones with three disk trays )
and load up ten disks per tray. These ten disks would be striped together
with a stripe depth that I would determine from experimentation. There
seems to be no really nice way to know what the best stripe depth will be
other than honest to goodness lab time and stringent tests. The ten disks
would be a metadevice that would then be mirrored with a tray on another
array on another controller. In this way I was able to ensure that the
metadevice mirror was on two controllers and two arrays. Sometimes I
would use three way mirrors across three arrays, three trays and tree
controllers. As you can imagine the redundency in this config was really
nice and the performance was, well, for a SparcCenter 2000E, about as good
as it gets. There were always questions about whether we should be using
a round robin schedule or a parallel schedule for the write policy and I
found the data to be less than conclusive at this level of hardware.

A little later in life I began to see that artifacts in the disk geometry
were related to the disk performance. That would seem obvious. Seem.
The problem was that disk geometry was no longer something that one could
easily determine. As disk technology improved we saw that the actual
geometry that was exposed to tools like format were not representative of
the Zone-Based-Recording ZBR techniques actually being used. I began to
experiment with bigger disks ( 180Gb ) in arrays while also tweaking
kernel parameters and on disk SCSI config. The results were interesting :

      http://www.blastwave.org/dclarke/performance/

It seemed that I could get different results depending on if I used a low
cylinder group as the stripe set or a high cylinder group. The on board
disk cache made obvious differences also. In general I was able to get
between 30 and 54Mb of date to the disks per second. That is a really
wide margin. The test performed also seemed very unreasonable. I was
writing data to the disks in one long continuous write. What I needed was
lots of small files being created as well as file append operations with
writes that were multiples of the UFS frag size as well as fractions of
the frag size.

I did testing for weeks with various disks in various arrays and all of
this was done without any sort of hardware controller other than SCSI or
some fibre controller. The array types were everything from unipacks on
separate controllers to the A5200s on multiple and single fibre port
controllers.

At no point was a RAID 5 config ever worth looking at because the hardware
did not exist to perform the data check for me.

However, I have also worked with RAID5 configs with both hardware
controllers and software configs. Some of the configs were using the HP
implemetation of the Reed-Solomon algorithm and some were simpler. The
use of a hardware RAID5 controller seems to do well to protect against a
single or even double disk failure but as the number of disks increases in
the array so to does the probability of a multiple disk failure. Most
disk arrays are built at one time and the disks begin their life at that
time. A large number of disks in one place with the same lifetime and
workload will begin to fail at about the same time also. In some cases I
have seen a disk failure occur once per week and also multiple disks per
week when we have a large collection of disks in one place. The big
expensive disk array technology most likely has multiple RAID5 arrays that
are concatonated together such that a multidisk failure does not bring the
entire array down.

I personally prefer a striped mirror set for smaller arrays in the absence
of a hardware controller but larger arrays need a different approach. I
just finished building three disk arrays over the weekend and I tuned them
for performance with millions of small files. My primary concern was
access time for any single file when multiple processes were competing for
access to other files on a single large file system. IO Contention may
occur beyond the controller and disk seek layer by simply having many
processes demanding files from all over the array simultaneously. This
leads to high wait times and low throughput, obviously.

My experience with larger arrays is limited to the field of bio-life
sciences compute clusters which require a SAN/NAS config. In these
situations I have employed a combination of Veritas Cluster technology as
well as a SVM config. The end result was a teraflop cluster that had
reasonable ability to slog about gigabyte files with redundency. Then
again, that was last year and perhaps I could have used something other
than a stack of A5200's and fibre switches. I don't think that the client
wanted to go beyond $250K for the storage backend.

Let me just check in with one of those A5200 units ...

# luxadm display mirror0

                                   SENA
                                 DISK STATUS
SLOT FRONT DISKS (Node WWN) REAR DISKS (Node WWN)
0 On (O.K.) 2000002037979c97 On (O.K.)
2000002037979d77
1 On (O.K.) 2000002037979d07 Not Installed
2 Not Installed Not Installed
3 Not Installed On (O.K.)
2000002037979d84
4 On (O.K.) 20000020374f67e0 On (O.K.)
2000002037979d78
5 On (O.K.) 2000002037979d0c Not Installed
6 On (O.K.) 2000002037979d71 On (O.K.)
2000002037979db9
7 Not Installed Not Installed
8 Not Installed Not Installed
9 Not Installed Not Installed
10 On (O.K.) 2000002037979c5c On (O.K.)
2000002037979d83

                                SUBSYSTEM STATUS
FW Revision:1.09 Box ID:0 Node WWN:508002000006a2e8 Enclosure
Name:mirror0
Power Supplies (0,2 in front, 1 in rear)
        0 O.K.(rev.-02) 1 O.K.(rev.-02) 2 O.K.(rev.-02)
Fans (0 in front, 1 in rear)
        0 O.K.(rev.-04) 1 O.K.(rev.-00)
ESI Interface board(IB) (A top, B bottom)
        A: O.K.(rev.-04)
                GBIC module (1 on left, 0 on right in IB)
                0 O.K.(mod.-05)
                1 Not Installed
        B: O.K.(rev.-04)
                GBIC module (1 on left, 0 on right in IB)
                0 O.K.(mod.-05)
                1 Not Installed

Disk backplane (0 in front, 1 in rear)
        Front Backplane: O.K.(rev.-03)
          Temperature sensors (on front backplane)
          0:36:C 1:34:C 2:34:C 3:34:C 4:34:C 5:36:C
          6:36:C 7:34:C 8:34:C 9:34:C 10:34:C (All temperatures are
NORMAL.)
        Rear Backplane: O.K.(rev.-03)
          Temperature sensors (on rear backplane)
          0:36:C 1:34:C 2:34:C 3:37:C 4:36:C 5:36:C
          6:36:C 7:34:C 8:34:C 9:34:C 10:36:C (All temperatures are
NORMAL.)
Interconnect assembly
        O.K.(rev.-02)
Loop configuration
        Loop A is configured as a single loop.
        Loop B is configured as a single loop.
Language USA English

I think that the A5200 is an excellent low cost choice for an array and
that RAID5 would be a very bad idea with this array.

Perhaps the larger array technology uses other methods to protect
themselves from a multiple disk failure with a RAID5 config but I don't
know how. Most likely Hitachi is not saying either. My big concern is
with the mean time to repair the array when a disk is replaced. If the
time is long then the repair cycle could trigger a multi-disk failure.

Dennis



Relevant Pages

  • [BUG] md/raid5 kernel panics
    ... I currently have three drives in the array, but the third isn't built yet, so the third drive begins rebuilding immediately when I run the array. ... raid5: automatically using best checksumming function: pIII_sse ... md: md0 stopped. ... device sdb operational as raid disk 0 ...
    (Linux-Kernel)
  • Re: HP EVA4000 / IBM DS4300 / EMC CX3-20/40
    ... Both EMC and EVA are great arrays and they will serve you well. ... disk array with the virtual raidsets on top. ... So, the system admin, and the DBAs had to create and manage lots of ...
    (comp.arch.storage)
  • Re: RAID 5 corruption, RAID 1 more stable?
    ... corruption to either the RAID array itself or the file system. ... The disk array to suffer so many errors (for example disk errors ... There is nothing the disk array can do if the host is broken and ...
    (comp.arch.storage)
  • Re: Finally it arrived!
    ... 10.5TB of disk, baby:) ... I'd do RAID5+1 and in each RAID5 array, hold one disk hostage for when ... Until Google stops doing evil on newsgroups, ...
    (comp.sys.mac.advocacy)
  • Bug+fix: PDC20271 RAID detection fails
    ... My array was not detected by my kernel. ... the PDC RAID superblock, that is located at the start ... of the last track on the disk. ... is a multiple of track size and if not, ...
    (comp.os.linux.hardware)