Memory / CPU problems

bryan.mills_at_lynx.co.uk
Date: 08/27/03

  • Next message: Laura B. Eilers: "How to change message displayed BEFORE the login prompt"
    Date: Wed, 27 Aug 2003 20:12:50 +0100
    To: tru64-unix-managers@ornl.gov
    
    

    I'm trying to fathom out why our GS60 is grinding to a halt. I believe
    it is application, but I need to prove it. I'm also not sure whether
    it's a CPU or memory problem. I have just spent 2 hours trawling the
    archives and am now more confused than ever!

    It is 4 processors, with 8Gb memory running TRU64 5.1a. The application
    uses 'Universe' database, has 650 users, and typically a user needs about
    6 - 10Mb, a few users get a little higher,

    What we are seeing is, after a time (2-3 days) the performance grinds to
    a halt. We do have some jobs that periodically use 60-70% CPU, but they
    have been like that for years. My two main concerns are,

    In 'top', I never see any CPU idle time, and I feel that 'system' is
    excessively high. At the start of the week after a reboot idle time was
    typically 3-5% (And users were happy)

    load averages: 13.28, 13.37, 13.24
                                        19:42:38
    838 processes: 16 running, 1 waiting, 196 sleeping, 622 idle, 3 zombie
    CPU states: 44.0% user, 0.0% nice, 55.9% system, 0.0% idle
    Memory: Real: 6741M/8030M act/tot Virtual: 17359M use/tot Free: 32M

    Frequently, top crashes with a memory fault.

    'ps aux' shows an oddity on kernel idle, but the archives seem to suggest
    that this isn't a problem.

    root 0 4.2 3.5 10.2G 290M ?? R < Aug 23 03:29:37
    [kernel idle]

    'uptime' seems a little higher than most people experience. This was at
    around 20% for each but most people have gone home now.

    19:46 up 4 days, 7:31, 306 users, load average: 15.52, 14.93, 14.81

    Swap seems good, in that we are not swapping, swapon -s shows,

    Swap partition /dev/disk/dsk11c (default swap):
        Allocated space: 2221961 pages (16.95GB)
        In-use space: 1 pages ( 0%)
        Free space: 2221960 pages ( 99%)

    Total swap allocation:
        Allocated space: 2221961 pages (16.95GB)
        Reserved space: 291467 pages ( 13%)
        In-use space: 1 pages ( 0%)
        Available space: 1930494 pages ( 86%)

    sysconfigdb has

    vm_swap_eager = 1

    But I guess that as we are not swapping then that's not really an issue
    anyway ?

    One other symtom is that after a reboot the backup to an MDR fibre
    channel DLT takes about 4 hours, when the machine get into this state
    it's more like 10 hours +. I don't believe it's disk I/O, it's a fairly
    new HSG80 Fibre SAN. Backup is done by creating and mounting a clone
    fileset.

    I'm at a loss to know where to look next and would appreciate some input
    to try and help me identify this one.

    Regards,

    Bryan Mills.

    This message is intended only for the use of the person(s) ("The intended
    Recipient(s)") to whom it is addressed. It may contain information which
    is privileged and confidential within the meaning of applicable law. If
    you are not the intended recipient, please contact the sender as soon as
    possible. The views expressed in this communication are not necessarily
    those held by LYNX Express Limited.


  • Next message: Laura B. Eilers: "How to change message displayed BEFORE the login prompt"

    Relevant Pages

    • Getting Urgent svchost = 99% cpu
      ... more of the my SBS2k3's CPU ... experiencing a low level of idle time. ... Which is not a suprise seeing as svchost is ... cannot find any clues in the event logs, concidered the fact that I could be ...
      (microsoft.public.windows.server.sbs)
    • Re: Controlling CPU utilization of a process
      ... mechanism that will be triggered if the idle time is lower than 10 %. ... And my process is a low priority one that uses all the CPU time that is ... So it's the overload protection which is broken; ...
      (comp.unix.solaris)
    • Re: [GIT PULL] cputime patch for 2.6.30-rc6
      ... more idle time than elapsed time. ... compare elapsed time * #cpus with the idle time. ... uptime and idle) should be amenable to simple interpretation. ... On a single cpu system this works. ...
      (Linux-Kernel)
    • Re: Mozilla... the new NULL process
      ... >> Why does Mozilla eatup CPU cycles when it is sitting on a web page with ... I am way behind on this thread but last fall after installing CSWB v1.0 on a ... I saw the CPU utilization elevated with no effective ... We have some in house software that calculates idle time. ...
      (comp.os.vms)