Re: Interpreting program core dump in mdb



On Thu, 27 Mar 2008 09:22:22 -0400, "Mr. Uh Clem" <uhclem@xxxxxxxxxxxxxxxxxx> wrote:
At $DAY_JOB, we've got a customer who has installed our product on a
Solaris 10 Sparc system and is getting a mysterious segment violation in
one of our background processes. Of course, this problem does not occur
on any of our inhouse systems.

We did get the customer to send us a core file, but aren't very handy
with the debug tools on Solaris.

# mdb prog core
Loading modules: [ libc.so.1 ld.so.1 ]
::stack
strncpy+0x5d0(20, 7182f4, 1b, 726f6f74, 0, 20)
secure+0x1b8(2e4088, b1978, c6068, 1f, 717298, 0)
process_request+0x41c(2e7d8, 1, c60e4, 1, 5750bc, 0)
open_socket+0x310(0, c8bf0, 5, 7efefeff, 81010100, ffbff9bc)
main+0x664(1, ffbffc1c, ffbffc24, c6000, c80fc, 3)
_start+0x108(0, 0, 0, 0, 0, 0)

I've googled up countless articles telling me that ::stack gets a
stack dump, but have yet to find one which tells me what the
values in the display **ARE**.

It looks like the daemon is overrunning a buffer inside strncpy().
Tracking down this sort of memory corruption can be tricky if it happens
in a child process (forking daemon), but you can use the libumem library
and mdb to debug this.

Early on, it calls secure() which is linked from a different .o file:

char user_name[USER_LENGTH + 1]; /* global in .c containing secure */

secure(host)
char *host;
{
...
struct passwd *pw;
...

pw = getpwuid(getuid());
if (pw != NULL)
strncpy(user_name, pw->pw_name, sizeof(user_name)-1);

We seem to blow up on trying to move the user name from pw->pw_name,
which is very strange given that pw is supposed to point to static
space allocated by getpwuid().

Is it possible that you have corrupted the stack elsewhere?

You can try enabling the debugging and auditing features of libumem.so
by running your program inside an mdb session, after setting up the
environment like this:

$ UMEM_DEBUG=default ; export UMEM_DEBUG
$ UMEM_LOGGING=transaction ; export UMEM_LOGGING
$ LD_PRELOAD=libumem.so.1 ; export LD_PRELOAD
$ mdb a.out

Then when inside mdb, set up a breakpoint at _exit and run the program:

> ::sysbp _exit
> ::run

After it crashes, load libumem.so and try the memory allocation tricks
described at:

http://developers.sun.com/solaris/articles/libumem_library.html

.



Relevant Pages

  • Re: Solaris crash dump analysis tools
    ... The kernel development group uses mdb, which ships with Solaris. ... whether you're analyzing a crash or debugging live... ...
    (comp.unix.solaris)
  • mdb kernel message buffer
    ... who I can see the message buffer when debugging a core file ... I have core files from a solaris 9 box, trying the mdb on another solaris ...
    (SunManagers)
  • Re: Solaris crash dump analysis tools
    ... > Bart> The kernel development group uses mdb, which ships with Solaris. ...
    (comp.unix.solaris)
  • mdb execution control
    ... For mdb on Solaris 8 are the execution control commands available, ... mdb: invalid command '::run': unknown dcmd name ...
    (comp.unix.solaris)