Solaris 2.8 & 2.9 kernel eating all my memory?
From: William Hathaway (wdh_at_perfectorder.com)
Date: 03/19/04
- Next message: Kevin Goodman: "Thread safe alternative to system() on Solaris 2.6?"
- Previous message: foo_at_bar.com: "Re: best book for learning Solaris"
- Next in thread: William Hathaway: "SOLVED Re: Solaris 2.8 & 2.9 kernel eating all my memory?"
- Reply: William Hathaway: "SOLVED Re: Solaris 2.8 & 2.9 kernel eating all my memory?"
- Reply: William Hathaway: "problem solved - Solaris 2.8 & 2.9 kernel eating all my memory?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 18 Mar 2004 15:54:19 -0800
I'm working with a set of 280Rs (2x750mhz or 2x900, 2GB) that are used
as load generators (LG) for a tcp based application written in C.
Previously the machines had been tweaked to allow up to approx 60k
simultaneous outbound connections each (lowering
tcp_smallest_anon_port, raising tcp_conn_hash_size and fd limits). The
load generation test involves connecting to the remote servers on a
socket, sending a few hundred bytes back and forth, and then opening
the next socket (leaving current one open).
In the past, I had been able to run many tests with 50k or so
simultaneous connections without any problems. We've had some code
updates (including our tcp application level protocol now including
NULL characters), and I had not ran a big load test in a while. When
trying to run the same load tests that had previously ran without a
hitch, all the LG boxes became hung.
Further investigation showed that the machines were running out of
memory once 10-14k connections were established. The application used
for the testing (customized version of open source program
"pasvlogin") was only using approx 100M of memory(via ps,prstat,top),
besides that application, only the basic OS programs are on the
machine. I started watching the system memory use via mdb's ::memstat
command and saw that the vast majority of the memory allocation was
going to kernel space.
Here is a catastrophic sample taken from a panic dump I forced
by dropping the machine to the ok prompt and running 'sync' after it
had hung
>mdb -k *3
Loading modules: [ unix krtld genunix ip ipc ufs_log usba nfs ptm ]
> ::memstat
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 246987 1929 100%
Anon 62 0 0%
Exec and libs 0 0 0%
Page cache 10 0 0%
Free (cachelist) 331 2 0%
Free (freelist) 156 1 0%
Total 247546 1933
During the test, I could clearly see the kernel memory usage rising
sharply as the number of tcp connections increased. There was no
other activity on the machine besides the load test and a few
monitoring commands running such as vmstat,netstat.
A sample of a ::memstat before launching the test:
mdb -k
Loading modules: [ unix krtld genunix ip ipc ufs_log usba nfs random
ptm ]
> ::memstat
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 16458 128 7%
Anon 1732 13 1%
Exec and libs 756 5 0%
Page cache 110 0 0%
Free (cachelist) 227632 1778 92%
Free (freelist) 858 6 0%
Total 247546 1933
My questions are:
* Should an application that just opens and does a few reads/writes
from sockets (no other IPC performed) be able to cause the kernel to
use so much memory?
* Is there any other tactic/techniques I can use to trace down what
is causing this?
I'm working on a very small version of a tcp client/server so I can
run these tests with the original data protocol used (and variations)
to see if somehow the app data going across the socket is triggering
the extreme kernel memory usage, but am I off-base that this shouldn't
be happening?
Machines were originally running Sol 8 KP 22, once problem was
noticed, KP 27 was applied, and since that didn't help, machines were
re-jumped to Sol 9 KP 11, which still didn't help (but at least now I
have ::memstat :-) )
Any comments or suggestions or RTFM (but say which one) are most
welcome!
Thanks,
-William Hathaway
wdh@perfectorder.com
- Next message: Kevin Goodman: "Thread safe alternative to system() on Solaris 2.6?"
- Previous message: foo_at_bar.com: "Re: best book for learning Solaris"
- Next in thread: William Hathaway: "SOLVED Re: Solaris 2.8 & 2.9 kernel eating all my memory?"
- Reply: William Hathaway: "SOLVED Re: Solaris 2.8 & 2.9 kernel eating all my memory?"
- Reply: William Hathaway: "problem solved - Solaris 2.8 & 2.9 kernel eating all my memory?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|