Interpreting program core dump in mdb



At $DAY_JOB, we've got a customer who has installed our product on a
Solaris 10 Sparc system and is getting a mysterious segment violation in
one of our background processes. Of course, this problem does not occur
on any of our inhouse systems.

We did get the customer to send us a core file, but aren't very handy
with the debug tools on Solaris.


# mdb prog core
Loading modules: [ libc.so.1 ld.so.1 ]
> ::stack
strncpy+0x5d0(20, 7182f4, 1b, 726f6f74, 0, 20)
secure+0x1b8(2e4088, b1978, c6068, 1f, 717298, 0)
process_request+0x41c(2e7d8, 1, c60e4, 1, 5750bc, 0)
open_socket+0x310(0, c8bf0, 5, 7efefeff, 81010100, ffbff9bc)
main+0x664(1, ffbffc1c, ffbffc24, c6000, c80fc, 3)
_start+0x108(0, 0, 0, 0, 0, 0)


I've googled up countless articles telling me that ::stack gets a
stack dump, but have yet to find one which tells me what the
values in the display **ARE**.


Some specifics on this one: It's a daemon process which accepts
a connection and forks off a worker process to handle the connection.
Early on, it calls secure() which is linked from a different .o file:


char user_name[USER_LENGTH + 1]; /* global in .c containing secure */


secure(host)
char *host;
{
....
struct passwd *pw;
....

pw = getpwuid(getuid());
if (pw != NULL)
strncpy(user_name, pw->pw_name, sizeof(user_name)-1);


We seem to blow up on trying to move the user name from pw->pw_name,
which is very strange given that pw is supposed to point to static
space allocated by getpwuid().

struct passwd {
char *pw_name;
char *pw_passwd;
uid_t pw_uid;
gid_t pw_gid;
char *pw_age;
char *pw_comment;
char *pw_gecos;
char *pw_dir;
char *pw_shell;
};


Understanding the context around the stack frame seems really
crucial. One thing that is really strange is that
strncpy+0x5d0(20, 7182f4, 1b, 726f6f74, 0, 20)
contains r o o t which should be in
memory at the address pointed to by pw_name...


We're pretty sure we're doing Something Stupid(tm), but don't see
how we could muck up the static space returned by getpwuid between
the time the program starts and getting to this point. This is
code that has been running for quite a while on various Unix flavors
including Solaris 7 and upward. We now see that we have two
Solaris 10 customers with this problem. The code was compiled
under a Solaris 8 system.

So anyway, some pointers to interpreting the context around a crash
using mdb would be appreciated.

TIA

--
Clem
"If you push something hard enough, it will fall over."
- Fudd's first law of opposition
.



Relevant Pages

  • Re: Interpreting program core dump in mdb
    ... with the debug tools on Solaris. ... stack dump, but have yet to find one which tells me what the ... char *pw_name; ... some pointers to interpreting the context around a crash ...
    (comp.unix.programmer)
  • FW (BUGTRAQ): top format string bug exploit code (exploitable)
    ... possible to get root priviledge in solaris. ... * freebsd x86 top exploit ... variable like "XSEO=" ... char fmt[]= ...
    (FreeBSD-Security)
  • top format string bug exploit code (exploitable)
    ... possible to get kmem priviledge in the XXXXBSD which is still not patched, ... possible to get root priviledge in solaris. ... and set it up into an environment variable like "XSEO=" ... char fmt[]= ...
    (Bugtraq)
  • error porting JNI programs from Windows to Sparc Solaris 10
    ... I have a JNI program which runs correctly on windows XP jdk1.4.2_07. ... the jdk on solaris is jdk1.5.0_01. ... int InvokeJVM(JVM *jvm, const char *classpath) ...
    (comp.lang.java.programmer)
  • Access violation error
    ... bool IRRecordvalidlength ... bool CheckAllNumeric(const char * string) ... irrecord* CheckDigit(ofstream& prnfile, ofstream& validdata, char* ... // process the customer code ...
    (comp.lang.cpp)