Re: ssa high disk % busy, low i/o rate



jstonacek wrote:

First off, this is aix 5.1, internal scsi drives (no san), and the volume groups contain raw logical volumes for an oracle database, no filesystems on the lvs in question.

I'm seeing one particular hdisk go nearly 100% busy throughout the day, but from looking at topas
and nmon, there isn't very much i/o activity on this drive. The kb/sec for this drive is in the
10 kb/sec range when it is being reported 90-100% busy. I am seeing user response times go up during the times when I'm seeing this hdisk go busy.

I don't see any drive errors in errpt.

Has anybody seen this type of behavior before? Or, any place besides errpt that I should be looking for drive errors?

Thanks


A high percent-busy but with low IO rates could be several things:
- Lots of small random IOs all over the disk
- Poor intra allocations or high fragmentations causing lots of seeks
- Shortage of IO buffers leading to long waits for reuse

Assuming you have root authority, you may want to use the filemon utility.

Run this for a short time while the IO activity is high. Then look at the reports for most active
volumes (LV, FS & hdisks). You will be able to see the mix of read/write, and also the seek times
and distances. This should help you determine what the disk is actually doing, and where to focus
your efforts.

Also run vmstat -v and check for shortages of pbufs (for raw LV access), and maybe also fsbufs if
you have active filesystems on this volume. Note that these values are cumulative, so you need
to arrange 2 consecutive points and compute the delta values. If you see those counters increasing,
use the ioo command to check/change the configuration.

Lastly, as this is a SCSI disk you may want to adjust the command queue length. Many systems still
have them set to the default (3) which is too small to really allow the drive to optimize the
various IO requests. (use lsattr -El hdiskx). If you have sar setup, sar -d will give some useful
info about disk operations and queue requests outstanding.

There is a good section in the Performance Management Guide on Monitoring Disk I/O that goes into
more details:
http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.doc/aixbman/prftungd/sysperfmon2.htm



Eric
.



Relevant Pages

  • Re: Filesystem snapshots dog slow
    ... small filesystems, that are not busy. ... your disk is hammered making copies of all the cylinder ... groups, skipping those that are 'busy', and coming back to them later. ... This doesn't mean ZFS is ...
    (freebsd-hackers)
  • Re: Balance I/O
    ... Is it a good idea to decrease the disks which are near 100% busy? ... files on the same disk spindle? ... Redo log file switch frequency (only need to execute this the last ... session to determine the SQL statements executed by the session, ...
    (comp.databases.oracle.server)
  • Re: migratepv... is there an AIX 4.3 equivalent?
    ... If the disk is failed anyway, the mirror is ... Is it an SSA disk? ... Identify which pdisk goes with which hdisk ...
    (comp.unix.aix)
  • Re: Apparent strange disk behaviour in 6.0
    ... > Will get twice as much done and still keep the disk 100% busy. ... > how quickly the queue drains or fills. ... I'm trying to work out why teh read and write queues are empty for so ...
    (freebsd-current)
  • Re: [opensuse] Errors on raid disk
    ... Not too busy, but busy enough during work hours, because the 30% hasn't ... the normal disk activity has priority. ... When the command that caused the error occurred, the device was active or idle. ... you know that you can have an "active spare" inside the raid. ...
    (SuSE)