SUMMARY: Disk contention

From: DAUBIGNE Sebastien - BOR ( SDaubigne_at_bordeaux-bersol.sema.slb.com ) (SDaubigne_at_bordeaux-bersol.sema.slb.com)
Date: 08/01/03

  • Next message: Dilip M: "SUMMARY: kill -9 is NOT working for 'defunc' process!!"
    Date: Fri, 01 Aug 2003 12:10:04 +0200
    To: sunmanagers@sunmanagers.org
    
    

    Kevin, Michael, Karl, Mike, Steve : thank you for your answer.

    Original question :
    What are the good iostat thresholds to detect disks bottlenecks ?

    Answer :

    There was a general consensus about the 30 ms service time threshold, but
    I'm still convinced that it's not significant for big blocks (say 1 Mb I/O).
    A simple sequential read test (dd with 1MB bs) shows that svt_t can reach
    +30ms and %b=99% even for a single reading process (You should agree that
    one single sequential-read process can't be suspected to generate a disk
    bottleneck, right ?)

    As stated by Karl :

    * The significant bottleneck threshold is %b (percent time disk busy)
    > 20% AND (20 ms < svc_t (ServiceTime) < 30 ms)

    * The critical bottleneck threshold is %b (percent time disk busy)
    > 20% AND ( svc_t (ServiceTime) > 30 ms)

    As Karl gave many other tuning advice, I have reproduced them at the end of
    this message.

    ---
    Sebastien DAUBIGNE 
    sdaubigne@bordeaux-bersol.sema.slb.com
    <mailto:sdaubigne@bordeaux-bersol.sema.slb.com>  - (+33)5.57.26.56.36
    SchlumbergerSema - SGS/DWH/Pessac
    	-----Message d'origine-----
    	De:	Karl Vogel 
    	Objet:	Re: Disk contention
    	>> On Tue, 22 Jul 2003 18:49:22 +0200, 
    	>> "DAUBIGNE Sebastien - BOR" said:
    	S> We have a Solaris 2.6/Oracle box which has poor throughput and a
    high
    	S> (from 50 to 100) number of IO busy processes (column "b" of
    vmstat).
    	S> CPU (50%)/memory (no paging) are OK, so I assume the poor
    throughput is
    	S> due to the disk part.
    	   Maybe.  I've included some other things to look at below.
    	   First, *strongly* consider upgrading to Solaris-8.  Lots of
    throughput
    	   improvements, different memory management scheme.
    	   We have an Enterprise E450, 1 Gb of memory for our main system.
    Tuning
    	   took awhile because the information is spread out all over the
    planet,
    	   but it runs pretty well now.  Our /etc/system is below.
    	   Your directory/inode cache (measured by the dnlc script below)
    should
    	   have a hit rate of at least 90-95%.
    	   Add "noatime,logging" to the mount options field in /etc/vfstab
    to get
    	   the biggest performance and boot time improvement.  You might
    have to
    	   put in a patch to have logging capabilities under Solaris-6; this
    is
    	   probably the single biggest improvement you can make.
    	S> Also, what is the good interval for iostat samples : 30 sec ?  5
    min ?
    	   I've read that 30 seconds is as low as you should go, because
    kernel
    	   counters aren't updated more often.
    	-- 
    	Karl Vogel                      I don't speak for the USAF or my
    company
    	vogelke at pobox dot com
    http://www.pobox.com/~vogelke <http://www.pobox.com/~vogelke> 
    	If a nation expects to be ignorant and free in a state of
    civilization,
    	it expects what never was and never will be.            --Thomas
    Jefferson
    	
    ===========================================================================
    	#!/bin/sh
    	# dnlc: print directory/inode cache
    	PATH=/bin:/usr/bin
    	export PATH
    	cmd='
    	   BEGIN  { fmt = "%-13s %9d %s\n" }
    	   /:.../ { s = substr ($0, 30); printf fmt, $1, $2, s }
    	   /perc/ { s = substr ($0, 30); printf fmt, " ", $1, s }
    	'
    	echo 'Directory/inode cache statistics'
    	echo '(See /usr/include/sys/dnlc.h for more information)'
    	echo
    	adb -k /dev/ksyms /dev/mem <<END | expand | awk "$cmd"
    	maxphys/D"Max physical request"
    	ufs_ninode/D"Inode cache size"
    	sq_max_size/D"Streams queue"
    	ncsize/D"Directory name cache size"
    	ncstats/D"# of cache hits that we used"
    	+/D"# of misses"
    	+/D"# of enters done"
    	+/D"# of enters tried when already cached"
    	+/D"# of long names tried to enter"
    	+/D"# of long name tried to look up"
    	+/D"# of times LRU list was empty"
    	+/D"# of purges of cache"
    	*ncstats%1000>a
    	*(ncstats+4)%1000>b
    	*(ncstats+14)%1000>c
    	<a+<b+<c>n
    	<a*0t100%<n=D"Hit rate percentage"
    	END
    	exit 0
    	
    ===========================================================================
    	#!/bin/sh
    	# getkern: show predefined kernel tunables
    	kernelvars () {
    	adb -k /dev/ksyms /dev/mem << EFF | \
    	  awk '/^[a-zA-Z_-]+:/ { \
    	        if (!i) { i++; next } \
    	        if ($2 >= 0) { printf "%-20s %s\n",$1,$2; }
    	        next } \
    	       /^[a-z_-]+[ \t0-9a-f]+$/ { next } \
    	        { print }'
    	autoup/D
    	bufhwm/D
    	coredefault/D
    	desfree/E
    	fastscan/E
    	lotsfree/E
    	max_nprocs/D
    	maxpgio/E
    	maxphys/D
    	maxuprc/D
    	maxusers/D
    	minfree/E
    	nbuf/D
    	ncsize/D
    	nrnode/D
    	physmem/E
    	rlim_fd_cur/D
    	rlim_fd_max/D
    	slowscan/E
    	sq_max_size/D
    	swapfs_minfree/E
    	tune_t_fsflushr/D
    	tune_t_gpgslo/D
    	ufs_HW/D
    	ufs_LW/D
    	ufs_ninode/D
    	ufs_throttles/D
    	EFF
    	}
    	kernelvars
    	exit 0
    	
    ===========================================================================
    	* $Id: etc-system,v 1.3 2001/07/26 20:39:55 vogelke Exp $
    	* $Source: /space/sitelog/newmis/RCS/etc-system,v $
    	*
    	* NAME:
    	*    /etc/system
    	*
    	* SYNOPSIS:
    	*    Tailors kernel variables at boot time.
    	* 
    	* DESCRIPTION:
    	*    The most frequent changes are limited to the number of file
    	*    descriptors, because the socket API uses file descriptors for
    	*    handling internet connectivity.  You may want to look at the
    hard
    	*    limit of filehandles available to you.  Proxies like Squid have
    to
    	*    count twice to thrice for each request: open request
    descriptors
    	*    and an open file and/or (depending what squid you are using) an
    	*    open forwarding request descriptors.  Similar calculations are
    true
    	*    for other caches.
    	*
    	* WARNING:
    	*    SUN does not make any guarantees for the correct working
    	*    of your system if you use more file descriptors than 4096.
    	*    Programs like fvwm (window manager) may have to be recompiled.
    	*
    	*    If you experience SEGV core dumps from your select(3c) system
    call
    	*    after increasing your file descriptors above 4096, you have to
    	*    recompile the affected programs.  The select(3c) call is known
    to
    	*    Squid users for its bad temper concerning the maximum number of
    	*    file descriptors.
    	*
    -----------------------------------------------------------------------
    	* rlim_fd_cur 
    	* Since 8: default 256, no recommendations
    	* 
    	*   This parameters defines the soft limit of open files you can
    	*   have.  Use at your own risk values above 256, especially if you
    	*   are running old binaries.  A value of 4096 may look harmless
    	*   enough, but may still break old binaries.
    	*
    	*   Another source mentions that using more than 8192 file
    	*   descriptors is discouragable.  It mentions that you ought to use
    	*   more processes, if you need more than 4096 file descriptors.
    	*   On the other hand, an ISP of my acquaintance is using 16384
    	*   descriptors to his satisfaction.
    	*
    	*   The predicate rlim_fd_cur <= rlim_fd_max must be fulfilled.
    	*
    	*   Please note that Squid only cares about the hard limit (next
    	*   item).  With respect to the standard IO library, you should not
    	*   raise the soft limit above 256.  Stdio can only use <= 256 FDs.
    	*   You can either use AT&T'ssfio library, or use Solaris 64-bit
    mode
    	*   applications which fix the stdio weakness.  RPC prior to 2.6 may
    	*   break, if more than 1024 FDs are available to it.
    	*
    	*   Also note that RPC prior to Solaris 2.6 may break, if more than
    	*   1024 FDs are available to it.  Also, setting the soft limit to
    or
    	*   above 1024 implies that your license server queries break (first
    	*   hand experience).  Using 256 is really a strong recommendation.
    	set rlim_fd_cur = 256
    	*
    -----------------------------------------------------------------------
    	* rlim_fd_max 
    	* default 1024, recommended >=4096
    	* 
    	*   This parameter defines the hard limit of open files you can
    have.
    	*   For a Squid and most other servers, regardless of TCP or UDP,
    the
    	*   number of open file descriptors per user process is among the
    	*   most important parameter.  The number of file descriptors is one
    	*   limit on the number of connections you can have in parallel.
    	*
    	*   You should consider a value of at least 2 * tcp_conn_req_max
    	*   and you should provide at least 2 * rlim_fd_cur.  The predicate
    	*   rlim_fd_cur <= rlim_fd_max must be fulfilled.
    	*
    	*   Use at your own risk values above 1024.  SUN does not make any
    	*   warranty for the workability of your system, if you increase
    this
    	*   above 1024.
    	set rlim_fd_max = 1024
    	*
    -----------------------------------------------------------------------
    	* ufs_ninode
    	* default 4323 = 17*maxusers+90 (with maxusers 249)
    	* 
    	*   Specifies the size of an inode table.  The actual value will be
    	*   determined by the value of maxusers.  A memory-resident inode is
    	*   used whenever an operation is performed on an entity in the file
    	*   system (e.g.  files, directories, FIFOs, devices, Unix sockets,
    	*   etc.).  The inode read from disk is cached in case it is needed
    	*   again.  ufs_ninode is the size that the Unix file system
    attempts
    	*   to keep the list of idle inodes.  As active inodes become idle,
    if
    	*   the number of idle inodes increases above the limit of the
    cache,
    	*   the memory is reclaimed by tossing out idle inodes.
    	*
    	*   Must be equal to ncsize.
    	set maxusers = 2048
    	set ufs_ninode = 512000
    	*
    -----------------------------------------------------------------------
    	* ncsize 
    	* default 4323 = 17*maxusers+90 (with maxusers 249)
    	* 
    	*   Specifies the size of the directory name lookup cache (DNLC).
    	*   The DNLC caches recently accessed directory names and their
    	*   associated vnodes.  Since UFS directory entries are stored in
    	*   a linear fashion on the disk, locating a file name requires
    	*   searching the complete directory for each entry.  Also, adding
    	*   or creating a file needs to ensure the uniqueness of a name for
    	*   the directory, also needing to search the complete directory.
    	*   Therefore, entire directories are cached in memory.  For
    instance,
    	*   a large directory name lookup cache size significantly helps NFS
    	*   servers that have a lot of clients.  On other systems the
    default
    	*   is adequate.  The default value is determined by maxusers.
    	* 
    	*   Every entry in the directory name lookup cache (DNLC) points
    	*   to an entry in the inode cache, so both caches should be sized
    	*   together.  The inode cache should be at least as big as the DNLC
    	*   cache.  For best performance, it should be the same size in the
    	*   Solaris 2.4 through Solaris 8 operating environments.
    	*
    	*   Warning: Do not set ufs_ninode less than ncsize.  The ufs_ninode
    	*   parameter limits the number of inactive inodes, rather than the
    	*   total number of active and inactive inodes.  With the Solaris
    	*   2.5.1.  to Solaris 8 software environments, ufs_ninode is
    	*   automatically adjusted to be at least ncsize.  Tune ncsize to
    get
    	*   the hit rate up and let the system pick the default ufs_ninode.
    	*
    	*   I have heard from a few people who increase ncsize to 30000 when
    	*   using the Squid webcache.  Imagine, a Squid uses 16 toplevel
    	*   directories and 256 second level directories.  Thus you'd need
    	*   over 4096 entries just for the directories.  It looks as if
    	*   webcaches and newsserver which store data in files generated
    from
    	*   a hash need to increase this value for efficient access.
    	*
    	*   You can check the performance of your DNLC - its hit rate - with
    	*   the help of the vmstat -s command.  Please note that Solaris 7
    	*   re-implemented the algorithm, and thus doesn't have the toolong
    	*   entry any more:
    	* 
    	*     $ vmstat -s ...
    	*     1743348604 total name lookups (cache hits 95%) 32512 toolong
    	* 
    	*   Up to Solaris 7, only names less than 30 characters are cached.
    	*   Also, names too long to be cached are reported.  A cache miss
    	*   means that a disk I/O may be needed to read the directory
    (though
    	*   it might still be in the kernel buffer cache) when traversing
    the
    	*   path name components to get to a file.  A hit rate of less than
    90
    	*   percent requires attention.
    	*
    	*   For an E450 with maxusers = 2048, ~800,000 files:
    	*     default ncsize = 128512 which gives about 90% hit rate.
    	*     setting ncsize = 262144 gives about 94% hit rate.
    	set ncsize = 512000
    	*
    -----------------------------------------------------------------------
    	* tcp_conn_hash_size
    	* default 512
    	*
    	*   This can be set to help address connection backlog.  During high
    	*   connection rates, TCP data structure kernel lookups can be
    expensive
    	*   and can slow down the server.  Increasing the size of the hash
    	*   table improves lookup efficiency.  This is the kernel hash table
    	*   size for managing active TCP connections.  A larger value makes
    	*   searches far more efficient when there are many open
    connections.
    	*   On Solaris, this value is a power of two and can be set as small
    	*   as 256 (default) or as large as 262144 as is typically used in
    	*   benchmarks.  A larger tcp_conn_hash_size requires more memory,
    	*   but it is clearly worth the extra investment if many concurrent
    	*   connections are expected.  This parameter must be a power of 2,
    	*   and can be set in the /etc/system kernel configuration file.
    The
    	*   current size is shown at the start of the read-only
    tcp_conn_hash
    	*   display using ndd.
    	set tcp:tcp_conn_hash_size = 32768
    	*
    -----------------------------------------------------------------------
    	* noexec_user_stack 
    	*   Since 2.6: default 0, recommended: see CERT CA-98.06, or
    DE-CERT.
    	*   Limited to sun4[mud] platforms! Warning: This option might crash
    	*   some of your application software, and endanger your system's
    	*   stability!
    	*
    	*   By default, the Solaris 32 bit application stack memory areas
    are
    	*   set with permissions to read, write and execute, as specified in
    	*   the SPARC and Intel ABI.  Though many hacks prefer to modify the
    	*   program counter saved during a subroutine call, a program
    snippet
    	*   in the stack area can be used to gain root access to a system.
    	*
    	*   If the variable is set to a non-zero value, the stack defaults
    to
    	*   read and write, but not executable permissions.  Most programs,
    	*   but not all, will function correctly, if the default stack
    	*   permissions exclude executable rights.  Attempts to execute code
    	*   on the stack will kill the process with a SIGSEGV signal and log
    	*   a message in kern:notice.  Program which rely on an executable
    	*   stack must use the mprotect(2) function to explicitly mark
    	*   executable memory areas.
    	*
    	*   Refer to the System Administration Guide for more information on
    	*   this topic.  Admins which don't want the report about executable
    	*   stack can set the noexec_user_stack_log variable explicitly to
    	*   0.  Also note that the 64 bit V9 ABI defaults to stacks without
    	*   execute permissions.
    	* set noexec_user_stack = 1
    	*   Log attempted stack exploits.
    	* set noexec_user_stack_log = 1
    	*
    -----------------------------------------------------------------------
    	* Swap
    	*   System keeps 128 Mbytes (1/8th of memory) for swap.
    	*   Reduce that to 32 Mbytes (4096 8K pages).
    	set swapfs_minfree=4096
    	*
    -----------------------------------------------------------------------
    	* Network
    	*   Set to 100 Mbps.
    	set hme:hme_adv_autoneg_cap = 0
    	set hme:hme_adv_100T4_cap   = 0
    	set hme:hme_adv_100fdx_cap  = 1
    	set hme:hme_adv_100hdx_cap  = 1
    	set hme:hme_adv_10fdx_cap   = 0
    	set hme:hme_adv_10hdx_cap   = 0
    	*
    -----------------------------------------------------------------------
    	* Memory management
    	*
    	*   http://www.carumba.com/talk/random/tuning-solaris-checkpoint.txt
    <http://www.carumba.com/talk/random/tuning-solaris-checkpoint.txt> 
    	*   Tuning Solaris for FireWall-1
    	*   Rob Thomas robt@cymru.com <mailto:robt@cymru.com> 
    	*   14 Aug 2000
    	*
    	*   On firewalls, it is not at all uncommon to have quite a bit of
    	*   physical memory.  However, as the amount of physical memory is
    	*   increased, the amount of time the kernel spends managing that
    	*   memory also increases.  During periods of high load, this may
    	*   decrease throughput.
    	*
    	*   To decrease the amount of memory fsflush scans during any scan
    	*   interval, we must modify the kernel variable autoup.  The
    default
    	*   is 30.  For firewalls with 128MB of RAM or more, increase this
    	*   value.  The end result is less time spent managing buffers,
    	*   and more time spent servicing packets.
    	set autoup = 120
    	*
    -----------------------------------------------------------------------
    	*   http://www.sunperf.com/perfmontools.html
    <http://www.sunperf.com/perfmontools.html> 
    	*   
    	*   NETSTAT
    	*     One key indicator is nocanput being non-zero.
    	*   
    	*       root# netstat -k hme0
    	*       hme0:
    	*       ipackets 228637416 ierrors 0 opackets 269844650 oerrors 0
    	*       collisions 0 defer 0 framing 0 crc 0 sqe 0 code_violations 0
    	*       len_errors 0 ifspeed 100000000 buff 0 oflo 0 uflo 0 missed 0
    	*       tx_late_collisions 0 retry_error 0 first_collisions 0
    	*       nocarrier 0 nocanput 62 allocbfail 0 runt 0 jabber 0 babble
    0
    	*       tmd_error 0 tx_late_error 0 
    	*       ...
    	*   
    	*     If this is the case, your streams queue is too small.  It
    should
    	*     be set to 400 per GB of memory.  Put a similar line in your
    	*     /etc/system file.  This assumes you have 4GB RAM.
    	*     
    	*       set sq_max_size=1600
    	set sq_max_size = 400
    	*
    -----------------------------------------------------------------------
    	*   http://www.london-below.net/~adrianc/2002/cookbook.html
    <http://www.london-below.net/~adrianc/2002/cookbook.html> 
    	*   Recipe bufhwm: Large Active Filesystem (>>TB)
    	*   Tell tale sign: small hit rate in the buffer cache
    	*   Fix: increase bufhwm
    	*   Drawback: may consume memory for little benefit
    	*   Created: July 19 2001
    	*
    	*     Tune the default bufhwm value if you have a small hit ratio on
    	*     the buffer cache during periods of high activity:
    	*
    	*       "sar -b 1 10" shows %rcache or %wcache < 90%
    	*
    	*     A maximum bufhwm KB of kernel memory is used to cache metadata
    	*     information (e.g.  block indirection data).  bufhwm defaults
    	*     to 2% of system memory, it cannot be more than 20%.  The
    buhfwm
    	*     configured on your system can be obtained with
    	*       /usr/sbin/sysdef | grep bufhwm
    	*
    	*     The requirements for bufhwm should be:
    	*       'Sum Total of Active Filesystem Size' / 2M.
    	*
    	*     For a 100GB filesystem then configure 50MB of "bufhwm" kernel
    	*     memory and set bufhwm = 50000 (in units of K).  Our current
    	*     setting is about 20 MB:
    	*
    	*       me% /usr/sbin/sysdef | grep bufhwm
    	*       20725760   maximum memory allowed in buffer cache (bufhwm)
    	*
    	*     We're using 86 GB out of about 203 total, so use 50 Mb.
    Overall
    	*     hits/lookups are around 98% according to netstat -k:
    	*       biostats:
    	*       buffer_cache_lookups 127637848 buffer_cache_hits 125365885
    	*       new_buffer_requests 0 waits_for_buffer_allocs 0
    	*       buffers_locked_by_someone 6131 duplicate_buffers_found 53 
    	set bufhwm = 50000
    	*
    -----------------------------------------------------------------------
    	*   http://www.london-below.net/~adrianc/2002/cookbook.html
    <http://www.london-below.net/~adrianc/2002/cookbook.html> 
    	*   Recipe segmap_percent: Dedicated I/O server on large Dataset
    	*   Tell tale sign: small segmap cache hit rate
    	*   Fix: increase segmap_percent
    	*
    	*     Only a portion of memory is readily mapped in the kernel in
    	*     "segmap" to be the target of an actual I/O.  For a read or
    	*     write call, being or not in segmap can cause a performance
    	*     difference of approximately 20%.  Solaris 8 introduced a new
    	*     kernel parameter called segmap_percent that controls the size
    of
    	*     segmap.  The segmap is sized to be portion of free memory
    after
    	*     boot; C17 uses default value of 12%.
    	*
    	*     On a dedicate I/O server it may be beneficial to increase this
    	*     value.  This actually consumes little additionally memory for
    	*     segmap structures (< 1%) but it should be noted that the
    segmap
    	*     portion of the filesystem cache is not considered free memory.
    	*
    	*   WARNING: setting too high can result in paging storm.
    	* set segmap_percent = 20
    	*
    -----------------------------------------------------------------------
    	*   http://www.london-below.net/~adrianc/2002/cookbook.html
    <http://www.london-below.net/~adrianc/2002/cookbook.html> 
    	*   Recipe ufs_HW: GBs of data written to a file
    	*   Tell tale Sign: ufs_throttles keeps increasing
    	*   Fix: increase ufs_HW
    	*   Created: July 19 2001
    	*
    	*     UFS keeps track for each file of the number of bytes of data
    being
    	*     written to disk.  Those are bytes in transit between the page
    	*     cache and the disks.  When this amounts exceeds the threshold
    	*     ufs_HW then subsequent write(2) will be blocked until enough
    of
    	*     the I/O operation complete.
    	*
    	*     We can set ufs_HW/ufs_LW parameters to values that should
    limit
    	*     the adverse condition:
    	*
    	*         ufs_HW should be set to many times maxphys
    	*         ufs_LW should be 2/3 of ufs_HW
    	*
    	*     When throttling happens, a process is blocked for a time of
    the
    	*     order of a physical write, say 0.01s.  This means that a
    process
    	*     can achieve of the order of ufs_HW/0.01s or 100*ufs_HW
    Bytes/s.
    	*     The default of 384K throttles a process around 38MB/sec.
    	*
    	*     Our ufs_HW is the default (384K); doubling it slowed down
    	*     throttling but didn't eliminate it.
    	set ufs:ufs_HW = 4194304
    	set ufs:ufs_LW = 2796202
    	*
    -----------------------------------------------------------------------
    	*   http://www.samag.com/documents/sam0213b/
    <http://www.samag.com/documents/sam0213b/> 
    	*   Solaris 8 Performance Tuning
    	*   maxphys
    	*
    	*   The maxphys setting, often seen in conjunction with JNI and
    Emulex
    	*   HBAs, is the upper limit on the largest chunk of data that can
    be sent
    	*   down the SCSI path for any single request.  There are no real
    issues
    	*   with increasing the value of this variable to 8 Mb (in
    /etc/system,
    	*   set maxphys=8388608), as long as your IO subsystem can handle
    it.
    	*   All current Fibre Channel adapters are capable of supporting
    this,
    	*   as are most ultra/wide SCSI HBAs, such as those from Sun,
    Adaptec,
    	*   QLogic, and Tekram.
    	*
    	*   Try 1Mb for now.
    	set maxphys = 1048576
    	
    ===========================================================================
    	Notes from a Lotus Domino site running on Solaris
    	Disk bottlenecks are the most likely bottlenecks.  Here are the
    	thresholds you should look for using the different monitoring tools.
    	VMSTAT
    	  vmstat is one of the simplest and most useful tools because it
    	  reports important data in the categories of CPU, memory
    utilization,
    	  and disk-I/O.  To see the system activity for 3 seconds with a 1
    second
    	  reporting interval use:
    	  
    	    vmstat 1 3
    	  In the process (procs) group of statistics, there are two
    important
    	  stats, r and b:
    	    r is the number of processes in the CPU run queue. 
    	    b is the number of processes blocked for resources I/O, paging,
    	      and so forth.
    	  In the memory group of statistics, the important stat is sr:
    	    sr is the number of pages scanned and can be an indicator of a
    RAM
    	       shortage.
    	  The cpu group of statistics gives a breakdown of the percentage
    usage
    	  of CPU time.  On MP systems, this is an average across all
    processors.
    	    us is the percentage of user CPU time. 
    	    sy is the percentage of system CPU time.
    	  
    	  The following is an example of the results of doing a vmstat 1 3.
    	  The r, b, sr, us, and sy columns are most important.
    	    procs     memory            page            cpu
    	    r b w   swap  free  re  mf pi po fr de sr  us sy id
    	    0 0 0 354696 10616   0   7  3  0  0  0  0  65 13 22
    	    0 0 0 368976  8104   0   9  0  0  0  0  0   0  1 99
    	    0 0 0 368976  8104   0   0  0  0  0  0  0   0  0 100
    	  * A significant bottleneck threshold occurs if b (processes
    blocked
    	    for resources) approaches r (# in run queue)
    	  * A critical bottleneck threshold occurs if b (processes blocked
    for
    	    resources) = or > r (# in run queue)
    	IOSTAT
    	  You can add the switch -x to provide extended statistics, which
    makes
    	  the output more readable because each disk has its own line.  You
    can
    	  also add the -c switch to report the percentage of time the system
    	  has spent in user mode, in system mode, waiting for I/O, and
    idling.
    	  
    	  The following is an example of the results of doing iostat -nxtc
    30 3
    	  The svc_t, %b, us, sy, and wt columns are most important.
    	  	    extended device statistics              
    	      r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b
    device
    	      0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 fd0
    	      3.2    1.0   11.4    3.0  0.0  0.0    0.0    4.6   0   2
    c0t0d0
    	      0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0
    c0t1d0
    	      0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0
    c0t2d0
    	      0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0
    c1t6d0
    	      0.1    1.5    1.1    5.6  0.0  0.0    0.0    6.9   0   1
    c2t0d0
    	     22.7    0.2 2045.5    0.7  0.0  0.2    0.0    7.0   0  11
    c2t1d0
    	      0.0    1.3    0.0    5.4  0.0  0.0    0.0    6.6   0   1
    c2t2d0
    	      0.0    0.1    0.0    0.4  0.0  0.0    0.0    2.9   0   0
    c3t0d0
    	      0.0    1.5    0.0    5.6  0.0  0.0    0.0    4.4   0   1
    c3t1d0
    	      0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0
    c3t2d0
    	      0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0
    c3t3d0
    	  %b is the percent of time the disk is busy (transactions in
    	  progress).
    	  The column that I pay most attention to is the wsvc_t column.
    This is
    	  the average service time in milliseconds.  A high number is a sign
    	  that the disk is becoming a bottleneck.  A rule of thumb is >35 is
    	  cause for investigation.  Large numbers in the r/s and w/s is an
    	  indication of a too small block size.  This could also be a poorly
    	  tuned application that is making many small reads/writes instead
    of
    	  a few large reads/writes.
    	  The kr/s and kr/s give you a good indication of how much bandwidth
    	  you are using.  For a single Ultra Wide Differential SCSI disk I
    	  would expect to get 10MB/s in throughput.  For a correctly
    configured
    	  stripe, I would expect to get 10MB/s x number of disks in the
    stripe.
    	  On a read from RAID 5 you should get similar performance.  On
    write
    	  the cache will help and you should get close to the same
    performance
    	  while the cache is not being over run.
    	  * The significant bottleneck threshold is %b (percent time disk
    busy)
    	    > 20% AND (20 ms < svc_t (ServiceTime) < 30 ms)
    	  * The critical bottleneck threshold is %b (percent time disk busy)
    	    > 20% AND ( svc_t (ServiceTime) > 30 ms)
    _______________________________________________
    sunmanagers mailing list
    sunmanagers@sunmanagers.org
    http://www.sunmanagers.org/mailman/listinfo/sunmanagers
    

  • Next message: Dilip M: "SUMMARY: kill -9 is NOT working for 'defunc' process!!"

    Relevant Pages

    • Re: how to measure memory usage (shortage) quantatively
      ... > Solaris 9 in particular. ... the disk cache pulled memory from the free list. ... just low RAM due to I/O. ...
      (comp.unix.solaris)
    • Re: NFS server usage
      ... > 12 disks per channel, so disk shouldn't be an issue, and it will ... would bottleneck on disk I/O without more than maybe 10% or 15% CPU ...
      (freebsd-performance)
    • Re: NFS server usage
      ... > 12 disks per channel, so disk shouldn't be an issue, and it will ... would bottleneck on disk I/O without more than maybe 10% or 15% CPU ...
      (freebsd-questions)
    • RE: sunmanagers Digest, Vol 20, Issue 34
      ... EMAIL SERVER SOFTWARE ... Solaris 8 to 9 upgrade problem with disk space ... My test device was not recognized as a disk in format, ... During the upgrade process installation I am always asked to provide ...
      (SunManagers)
    • hai...find me a solution in M5000 server
      ... series server. ... B B B B B i want to do install one more solaris Over there. ... Solaris cannot see all drives on Areca RAID controller ... popped it (a single disk at this point) into machine1. ...
      (SunManagers)