Re: Strange problem with semaphores and localtime_r function.



Nate Eldredge <nate@xxxxxxxxxx> writes:
David Schwartz <davids@xxxxxxxxxxxxx> writes:
On Mar 30, 4:57 am, "Dmitry V. Krivenok" <krivenok.dmi...@xxxxxxxxx>
wrote:

struct tm tm_, *ptm;
ptm = ::localtime_r(&current_unix_time, &tm_);

[snip]
ACE_Thread_Semaphore sem(0);

Here's what's happening: The localtime_r function thinks a 'struct tm'
is larger than your header files say it is. So 'localtime_r' overflows
'tm_' and corrupts 'sem'.

You can test this by passing localtime_t a 512-byte buffer filled with
0xea bytes. Then see how far into the buffer it's modified. My bet is
it will exceed 'sizeof(struct tm)' as reported by your header files.

Diagnosis: Probable glibc/header file mismatch.

No, that's not it.

I'm able to reproduce this on an i386 debian unstable system, using
packages

libace-5.6.3 5.6.3-5
libace-dev 5.6.3-5
libc6 2.9-3
g++ 4:4.3.2-3

The source file below can be used to reproduce this phenomenon with
Debian stable:

--------------
#include <ace/Thread_Semaphore.h>

int main(void)
{
unsigned *p;

p = new unsigned[4];
p[1] = -1;
delete p;

ACE_Thread_Semaphore sem(0);

sem.acquire();
return 0;
}
-------------

and the solution to the problem is to use -lpthread instead of
-pthread when linking libACE. Explanation of what's going on here:

NB: I have not exactly determined why the idea above works
(at least for me), only what the problem is.

Before the semaphore constructor is invoked, the code allocates four
words, sets the second one to -1 and again deallocates this memory
area (this is roughly the same what happened during localtime_r,
insofar it is relevant). The next memory allocation will happen inside
the semaphore constructor and it will use the same block of memory the
main-routine allocated (and freed) before. On first allocation, the
contents of the area will be all zeroes[*], while the second allocation
will return an area with the content written to the memory in question
after it was allocated for the first time. Ultimatively, the semaphore
constructor calls sem_init on this area. glibc contains two versions
of 'sem_init', an older one where a semaphore structure is defined as

struct old_sem
{
unsigned int value;
};

and a newer one, whose semaphore representation looks like below:

struct new_sem
{
unsigned int value;
int private;
unsigned long int nwaiters;
};

The 'new' sem_wait xors the value of the 'private' member with 'some
constant' to calculate the value of the actual futex-operation
supposed to be performed (via lll_futex_wait, IIRC). When not
explictly linked against libpthread, the (internal) ACE code will pull
in the older sem_init, which only intializes the value member, leaving
the number written to the second word of the storage area originally
used by main as is. The sem.acquire is translated in the context of
the to-be-compiled program (via inclusion of
/usr/include/ace/Semaphore.inl) pulling in and using the 'newer'
sem_wait, thus clobbering the futex-operation with the (sem_init-2.0)
uninitialized value of the private member, ie the value originally
written to this location by main.

[*] Moving the ctor invocation before the other code causes
the program to work, too.

NB: This information has been determined by using gdb to single-step
through the executed machine code (stepi, x/i $pc), inspecting the
sources and 'some luck' (I basically guessed that the problem was
cause by reuse of previously allocated memory whose contents had been
modified 'by someone else').

Um den Kreis zu schliessen, muesste sich jetzt noch ein Teutscher
Unversitaeter melden, der mich ohne sich naeher mit der Sache befasst
zu haben, fuer vollkommen schwachsinning, unzurechnungsfaehig,
unfaehig und verbloedet erklaert, weil er kein Wort verstanden hat
(und ich ja nur ein dummer, kleiner Hilfsarbeiter ohne
Papierkompetenznachweis bin ...)
.



Relevant Pages