SUMMARY: Solaris 10 zone - user processes crashing randomly
- From: Pascal Grostabussiat <pascal@xxxxxxxxxx>
- Date: Mon, 08 Oct 2007 17:32:08 +0200
Hi,
Many thanks to those who replied and tried to help.
We have been investigating that problem for a while and so far no real
explanation has been found. From our own analysis we found a potential
issue/bug with the libclntsh.so library from Oracle (10.2.0.2 and
10.2.0.3 for the version we have been using) that our software uses,
something with the sslsshandler in that lib. When talking to Oracle the
only feedback we got was regarding a bug in Oracle 8 (check for bug
2012268 on Oracle's side for more details). However, as mentioned
earlier, we are running Oracle 10. We went back to Oracle 9 and the
issue disappeared. When going back to Oracle 10, the issue came back. So
we have now implemented the work-around suggested by Oracle (for Oracle
8) and adapted it for Oracle 10 and got a much better stability ... !?
For the record:
Before starting the process which performs dlopen/dlclose of a module linked
with Oracle set the environment variable LD_PRELOAD to point to the
libclntsh.
so file that is being used. For example:
setenv LD_PRELOAD $ORACLE_HOME/rdbms/lib/libclntsh.so.8.0
This maps libclntsh permanently and avoids the core dump. This variable
must
only be set for programs that encounter the core dump.
__ <file:///metalink/plsql/showdoc?db=Bug&id=2012268>
<file:///metalink/plsql/showdoc?db=Bug&id=2012268>Regards,
/Pascal
Pascal Grostabussiat wrote:
Hi guys,_______________________________________________
I am puzzled by that issue and I have never seen such things happening
before. I hope you can point me to some new directions or any
information sources on the net that might be relevant.
I am in a Solaris 10 environment. Our applications have been installed
in a dedicated zone. The applications are nothing new, we have been
running them in many different kind of environments including similar
environments (Solaris 10 zone) and no such issue has been seen before.
User processes have been running for a month or two, and one day some of
them started crashing for no reason. After a few repeated crashes they
were stable again. Then a few hours later sometimes the day after other
or similar user processes crashed again. This has been going on for
about two/three weeks now. User processes are both C/C++ processes and
Java processes, and user processes crashing are or both kinds. Sometimes
on specific user process crashes, sometimes 2, 3 or 4 at the same time,
not simultaneously but coming up and down within the same chaotic period
of time (from 1 hour to 2-3 hours), before things get stable again for
several hours.
We have inspected the logs of our applications and of course the
core-files but could not get any clue !? According to some core-files it
looks like some processes sometimes get a SIG ABORT signal (regular kill
(SIGTERM) signal are handled by the applications as normal shutdown),
while others seemed like being waiting in their normal course of action
just before they crashed (still according to some core-files). Our
developpers checked the core-files in detail but could not get any clue.
I have checked the resource limitations on the platform and they are not
different from other environment where applications are stable. We have
been investigating core-files using pflags but could not get more clues
on that side. Remote DB and network have been investigating to but
nothing has been found there neither. I have asked people in the project
to report activities they were performing at crash-time but could not
get any pattern. I have discussed with local sysadmins to track any kind
of external activities (with respect to our zone) that might be
triggered now and then, but nothing.
So my question is: is there someone that experienced such REALLY weird
events in their own environment ?
Feel free to send ANY idea, or point to any tools or commands (cannot
really be root) that might help, because I am stuck and getting short of
ideas !? I have been working with Sun environments since SunOS 4, from
Sparc Classic ;-) to SF15K, and I have never seen this before !?!?
MANY thanks in advance!
Regards,
/Pascal
_______________________________________________
sunmanagers mailing list
sunmanagers@xxxxxxxxxxxxxxx
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
sunmanagers mailing list
sunmanagers@xxxxxxxxxxxxxxx
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
- Prev by Date: SUMMARY: pkgadd: ERROR: PKG parameter is not defined
- Next by Date: SANbox2 command line tftp dump of config
- Previous by thread: Solaris etc release file
- Next by thread: SANbox2 command line tftp dump of config
- Index(es):
Relevant Pages
|