AIX 5.2 maxperm & Oracle
- From: paul_holman_uk@xxxxxxxxxxx
- Date: 23 May 2006 09:53:27 -0700
My colleagues suggested I posted my experiences with a performance
problem on AIX 5.2, as it may help as a reference to others hitting a
similra problem. This applies to versions 4.3 upwards, I think.
Environment: IBM P650 AIX 5.2 server, 8 CPUs, 32GB memory, EMC DMX disk
over SAN, running seven Oracle 9.2 databases of various sizes, plus
misc backend processes (not many or large). Filesystems all JFS (not
JFS2).
Symptoms: First we noticed poor performance, represented by overnight
batch jobs finishing late. Investigation with vmstat and nmon showed
huge levels of pagescanning. maxperm was set to 63%, but numperm was
increasing to nearly 80% overnight. Actual values for pagescan were
exceeding 1,000,000 at times. Pages scanned to pages freed ratio
exceeded 10:1 most of the time. Further investigation indicated that
the problem was mainly triggered by backup and export activity (ie:
large volumes of data IO going through the AIX filesystem cache).
Conclusion: batch jobs are being delayed whilst they effectively fight
for memory with filesystem cache (perm memory pages). AIX seems to be
too generous in allowing filesystem cache to grab perm pages, even when
it already has more than maxperm!
Resolution: First course of action, recommended by IBM support was to
switch off JFS caching for all filesystems containing Oracle datafiles.
This is done by mounting the filesystems with the "rbrw" option. The
idea is that this would free up memory that was being used to cache
Oracle data IO, which can be used to increase Oracle SGA and PGA
values. The performance gain from the Oracle increases should be
greater than the performance losses from turning off JFS caching.
What we found, to our cost, is that although this sounds fine in
theory, in practice it is difficult to make work successfully. After
the changes were made performance was no better or, in the case of some
small databases, worse. As we only suffered the problem on production,
and could not replicate it in test, it was important that we tackled it
in a safe but effective way, and this was not it.
The first and most immediate problem with trying the filesystem cache
(rbrw option) solution is that the filesystems have to be remounted to
make the change. This means shutting down the database/application,
which, for production, means an out of hours change, so it is not quick
to implement - we had to wait for a free slot on a weekend.
Furthermore, should the change prove problematic (as it did), it is
similarly not easy to back out, as the database/app must again be
shutdown.
The second problem with the filesystem caching change is that it is a
binary change - it is either on or off. There is no scope for tuning
or tweaking it to modify the degree of effect; it is either all or
nothing. So the change is crude and drastic, and so is more likely to
be problematic.
After the change was made we did not see any increase in free memory,
although AIX is notoriously tricky at showing this. I believe the
problem is that there were still many filesystems that did not contain
Oracle files and so did not have caching turned off. I think these
filesystems were just having an easier time of grabbing lots of memory
for caching, leaving us with just as much memory lost to cache, just
with different data in it. It seems that AIX is just too generous when
dishing out memory for filesystem caching, when you are trying to run
it as an Oracle database server, and turning off caching for some
filesystems appeared to be too indirect an approach in our case.
Observed levels of numperm, using nmon, showed no significant decrease.
Based on tech papers and doc I found on the internet, and success with
a similar problem on another AIX433 server, I tackled the memory
problem more directly. Memory used for filesystem caching is called
perm memory. It can be monitored by a system metric called numperm,
which shows the amount of memory used for perm pages. There is a
kernel parameter called maxperm that controls the maximum that numperm
can grow to. maxperm is a "soft" limit by default. Due to the soft
limit, AIX seems to allow numperm to exceed maxperm by large amounts
when lots of IO is happening (backups, exports, etc), and programs are
left competing for memory to run in, causing AIX to frantically scan
memory to find free pages for everyone.
The fix we ended up with was to use a parameter called strict_maxperm,
which changes maxperm to a "hard" limit. However, if you try this
beware - if you turn on strict_maxperm when numperm is much larger than
maxperm then the system will hang for quite some time whilst AIX
reorganises memory. First, make the change at a quiet time, like 6pm
Friday. Increase maxperm to the same as numperm, turn on
strict_maxperm, then reduce maxperm by 1% at a time to the target
value. Inbetween each reduction, wait for the system to finish sorting
out its virtual memory - you see some page IO activity in vmstat.
What is a good value for a strictly limited maxperm? Well, we are
using 66% and getting good performance, but values as low as 30% or
less are recommended by some and may gain further, if the memory freed
up is allocated to Oracle SGA and PGA effectively; we have not had the
chance to explore these possibilities. Note: these are 32GB machines,
so reducing maxperm from 66% to 30% would free up 11.52 GB to give to
Oracle! I would recommend observing the highest levels that numperm
and pagescans are going to , and if pagescans are present, but not
large, use a strict maxperm setting of a couple of % less than the
maximum that numperm reached. Continue to monitor after the change and
adjust further if necessary.
Two nice things about using maxperm tuning, compared to rbrw option on
filesystem: firstly, it can be changed without interrupting processing
(if done carefully); there is no need to stop apps/databases.
Secondly, you can monitor the effects (by vmstat, nmon, etc) and tweak
it accordingly to get the best from it, again without interrupting any
application processing.
So maxperm provided a much less intrusive and more flexible approach to
this problem, and in practical terms seems a better approach,
especially, if, like me, you had to sort out a production system ASAP.
.
- Follow-Ups:
- Re: AIX 5.2 maxperm & Oracle
- From: Mark Taylor
- Re: AIX 5.2 maxperm & Oracle
- Prev by Date: Re: HBA Install - AIX 4.3.3
- Next by Date: ftp site for latest catalog.mic
- Previous by thread: Unix Administrator Position Available through COMSYS
- Next by thread: Re: AIX 5.2 maxperm & Oracle
- Index(es):
Relevant Pages
|