Mysterious crash on Sol 10 x86
- From: "Henrik Goldman" <henrik_goldman@xxxxxxxxxxxx>
- Date: Tue, 26 Dec 2006 22:08:55 +0100
Hello,
I have a solaris specific problem where my code crashes. Not being a solaris
expert I turn to you in the hope that you can give me useful suggestions on
what goes wrong.
Essentially I created a stress test application which spawns alot of
threads. My initial assumption is that my problem is a stack overrun
problem. I read that Solaris on x86 creates threads with less stack space
then in 64 bit.
My observation has been that the problem occurs only when compiling in
release mode under 32 bit. It seems to work ok in 64 bit mode and in 32 bit
debug mode too.
It takes a while to reproduce the problem since several threads are
executing at the same time.
The code below is the problematic one (gathering information from ethernet
devices):
SOCKET s;
char **pp;
struct arpreq arpreq;
struct sockaddr_in *pSin;
struct hostent *pHost;
char szHostname[MAXHOSTNAMELEN+1];
MY_STRUCT f00;
printf("1\n");
if (gethostname(szHostname, sizeof(szHostname)-1) != 0)
return false;
printf("2\n");
if ((pHost = gethostbyname(szHostname)) == NULL)
return false;
printf("3\n");
pp = pHost->h_addr_list;
memset(&f00, 0, sizeof(f00));
f00.nType = ETHERNET;
printf("4\n");
for ( ; *pp != NULL; pp++)
{
if (pHost->h_addrtype != AF_INET)
return false;
printf("5\n");
if ((s = socket(AF_INET, SOCK_DGRAM, 0)) == SOCKET_ERROR)
return false;
printf("6\n");
pSin = (struct sockaddr_in *) &arpreq.arp_pa;
printf("7\n");
memset(pSin, 0, sizeof(struct sockaddr_in));
printf("8\n");
pSin->sin_family = AF_INET;
printf("8 1/2\n");
memcpy(&pSin->sin_addr, *pp, sizeof(struct in_addr));
printf("9\n");
if (ioctl(s, SIOCGARP, &arpreq) >= 0)
printf("10\n");
memcpy(f00.sValue, arpreq.arp_ha.sa_data, 6);
f00.nLength = 6;
m_F00List.push_back(f00);
printf("11\n");
//unsigned char *p = (unsigned char *) &arpreq.arp_ha.sa_data[0];
//printf("%x:%x:%x:%x:%x:%x\n",p[0], p[1], p[2], p[3], p[4], p[5]);
}
....
Essentially this algorithm should be ok in itself. The crash point is always
right after 8½ but before 9:
memcpy(&pSin->sin_addr, *pp, sizeof(struct in_addr));
According to gdb I get the following backtrace:
(gdb) bt
#0 0x08074aa3 in
JABXFEHPNCXEFKMBWNUZPNWVEQTJGN::JALTBMTYGIKZZILFNDUIZIPMKBOFWE ()
#1 0x08074b84 in
JABXFEHPNCXEFKMBWNUZPNWVEQTJGN::HUTMVNMSSYNGPJDFNWBAIVEWFBJWGT ()
#2 0x08075b81 in
JABXFEHPNCXEFKMBWNUZPNWVEQTJGN::URGNDQBDDEIWVFKGPNFAQKEZRWHYUQ ()
#3 0x0807663d in
RWLGMIZVLFZGYYTIXSXTOYJBIIAWAB::GWHXEUYELORKSITJBOWTPEWOKCKFTK ()
#4 0x08076915 in
HNEHFNWGWZZIDHCTCCKXWPZGPGIAEV::YUUUPPNONXLUBFLGUNSJHWVRWPBXJJ ()
#5 0x080e3587 in
JKJDLNZZKFFGJRLMWOBEGZRGWMRTUF<SCTCMNNKNTYNFPBWMLECRHMZIDWABO>::PUPDTLZIQWEMFQNIDWSNHXJHKGKDJT
() at basic_string.h:218
#6 0x08076a2b in
CFITZTUOZIMEQYEKBIHZINEZFPOUCL::XXUKLBMDKUGZCHZXUJNAWIZHFMEHUG ()
#7 0x080772db in
CFITZTUOZIMEQYEKBIHZINEZFPOUCL::YPQBEVQXBKWEMSGNMPQAIHGXGHJPGB ()
#8 0x0807c522 in
GDNWATQQXIACXTAFOMEQQWTDFDFDWX::CWCMISPDOKHHVCQSDMKQLQCIZVLNRM ()
#9 0x08081bc2 in DUFMTVETHMQMJVB ()
#10 0x080cba17 in ABBAAANB::ABBAFF()
#11 0x0806a416 in SOBYLYUKNJYBNRSVCNIYBIDQQWFISA ()
#12 0xfee1f92e in _thr_setup () from /lib/libc.so.1
#13 0xfee1fc10 in L3_doit () from /lib/libc.so.1
---Type <return> to continue, or q <return> to quit---
#14 0xfacb0400 in ?? ()
#15 0x00000000 in ?? ()
#16 0x00000000 in ?? ()
Cannot access memory at address 0xfacae000
Notice that the problem is a memory address which is alligned. This makes me
to believe that it's a stack related problem.
Don't worry about the weird looking function names. It's intentional.
Can any of you tell me a bit about my observations? The next obvious
question is how one can do anything about this using pthreads. Maybe there
are some Solaris specific limitations I'm not aware of?
-- Henrik
.
- Follow-Ups:
- Re: Mysterious crash on Sol 10 x86
- From: Casper H . S . ***
- Re: Mysterious crash on Sol 10 x86
- Prev by Date: Re: Replacing a processor in a Sunfire V100
- Next by Date: Need help with "Boot load failed" on Solaris 7
- Previous by thread: Replacing a processor in a Sunfire V100
- Next by thread: Re: Mysterious crash on Sol 10 x86
- Index(es):