Re: dereferencing type-punned pointer will break strict-aliasing rules

From: Bruce Evans (bde_at_zeta.org.au)
Date: 07/29/03

  • Next message: Terry Lambert: "Re: LOR with filedesc structure and Giant"
    Date: Tue, 29 Jul 2003 16:37:08 +1000 (EST)
    To: Thomas Moestl <t.moestl@tu-bs.de>
    
    

    On Mon, 28 Jul 2003, Thomas Moestl wrote:

    > On Mon, 2003/07/28 at 09:30:08 +0900, Jun Kuriyama wrote:
    > >
    > > Is this caused by -oS option?
    > >
    > > ----- in making BOOTMFS in make release
    > > cc -c -Os -pipe -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions -std=c99 -nostdinc -I- -I. -I/usr/src/sys -I/usr/src/sys/dev -I/usr/src/sys/contrib/dev/acpica -I/usr/src/sys/contrib/ipfilter -I/usr/src/sys/contrib/dev/ath -I/usr/src/sys/contrib/dev/ath/freebsd -D_KERNEL -include opt_global.h -fno-common -finline-limit=15000 -mno-align-long-strings -mpreferred-stack-boundary=2 -ffreestanding -Werror /usr/src/sys/geom/geom_dev.c
    > > /usr/src/sys/geom/geom_dev.c: In function `g_dev_open':
    > > /usr/src/sys/geom/geom_dev.c:198: warning: dereferencing type-punned pointer will break strict-aliasing rules
    > > [...]
    >
    > Yes, by implying -fstrict-aliasing, so using -fno-strict-aliasing is a
    > workaround.

    gcc.info claims that -fstrict-aliasing is implied by -Wall, but this is
    apparently wrong -- adding -fstrict-aliasing to our standard warning flags
    gives the same warnings. -Os gives it since -Os is more like -O2 than -O,
    and gcc.info's claim that -fstrict-aliasing is implied by -O2 is apparently
    not wrong. Related bug: our warning flags are missing other new gcc warnings.

    > The problem is caused by the i386 PCPU_GET/PCPU_SET
    > implementation:
    >
    > #define __PCPU_GET(name) ({ \
    > __pcpu_type(name) __result; \
    > \
    > [...]
    > } else if (sizeof(__result) == 4) { \
    > u_int __i; \
    > __asm __volatile("movl %%fs:%1,%0" \
    > : "=r" (__i) \
    > : "m" (*(u_int *)(__pcpu_offset(name)))); \
    > __result = *(__pcpu_type(name) *)&__i; \
    > [...]
    >
    > In this case, the PCPU_GET is used to retrieve curthread, causing
    > sizeof(__result) to be 4, so the cast at the end of the code snippet
    > is from a u_int * to struct thread *, and __i is accessed through the
    > casted pointer, which violates the C99 aliasing rules.
    > An alternative is to type-pun via a union, which is also a bit ugly,
    > but explicitly allowed by C99. Patch attached (but only superficially
    > tested).

    Uh, this macro is hideous enough without making it larger. I found that
    using __builtin_memcpy() avoids the warning at little or no cost in
    object code size (the memcpy gets optimized away in almost the same way
    as the assignment). Then I tried to improve the macro. I didn't quite
    succeed -- the smller versions gave slightly larger object code for
    5-20 out of 300 object files that depend on pcpu.h.

    Annotated version:

    % Index: pcpu.h
    % ===================================================================
    % RCS file: /home/ncvs/src/sys/i386/include/pcpu.h,v
    % retrieving revision 1.35
    % diff -u -2 -r1.35 pcpu.h
    % --- pcpu.h 1 Oct 2002 14:01:58 -0000 1.35
    % +++ pcpu.h 28 Jul 2003 14:02:30 -0000
    % @@ -78,5 +80,5 @@
    % * Evaluates to the address of the per-cpu variable name.
    % */
    % -#define __PCPU_PTR(name) ({ \
    % +#define __PCPU_PTR(name) __extension__ ({ \

    Unrelated change. This prevents warnings when compiled with certain
    strict warning flags (-pedantic?).

    % __pcpu_type(name) *__p; \
    % \
    % @@ -96,21 +98,21 @@
    % \
    % if (sizeof(__result) == 1) { \
    % - u_char __b; \
    % - __asm __volatile("movb %%fs:%1,%0" \
    % - : "=r" (__b) \
    % - : "m" (*(u_char *)(__pcpu_offset(name)))); \
    % - __result = *(__pcpu_type(name) *)&__b; \
    % + __pcpu_type(name) __res; \

    The temporary variable with a basic type is to avoid gcc getting confused
    doing reloads in dead code. E.g., if __pcpu_type(name) is "struct timeval",
    then the type can't be loaded into a byte register, and gcc would die
    trying since it apparently tries to load things in dead code, especially
    in asms. However, this problem doesn't seem to affect __PCPU_GET().
    There is no problem loading the source operand since only its address
    can really be loaded, and no problem loading the target operand since the
    asm does it. There might be a problem unloading the target operand, but
    there apparently isn't one for dead code. This should be tested with -O0.

    So we don't need the temporary variable hack and can use the natural
    type for __PCPU_GET(). This goes most of the way towards making the
    code the same for all type sizes 1/2/4. I still use a temporary variable
    and (now literally) duplicated code since it somehow gives slightly
    better object code.

    % + __asm __volatile("mov%L0 %%fs:%1,%0" \

    "%L0" is to avoid spelling out the operand size. This makes this line
    literally the same for all cases.

    % + : "=r" (__res) \

    The temporary variable is now spelled `__res' for all cases.

    % + : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \

    The type is now spelled `__pcpu_type(name)' for all cases.

    % + __result = *(__pcpu_type(name) *)&__res; \

    `__result' and `__res' now have the same type (they could even be the
    same variable), so this could be a simple assignment now. However, type
    aliasing as above or using __builtin_memcpy() somehow gives better code.
    And (IIRC) at least the above gives binary identical code.

    % } else if (sizeof(__result) == 2) { \
    % - u_short __w; \
    % - __asm __volatile("movw %%fs:%1,%0" \
    % - : "=r" (__w) \
    % - : "m" (*(u_short *)(__pcpu_offset(name)))); \
    % - __result = *(__pcpu_type(name) *)&__w; \
    % + __pcpu_type(name) __res; \
    % + __asm __volatile("mov%L0 %%fs:%1,%0" \
    % + : "=r" (__res) \
    % + : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \
    % + __result = *(__pcpu_type(name) *)&__res; \
    % } else if (sizeof(__result) == 4) { \
    % - u_int __i; \
    % - __asm __volatile("movl %%fs:%1,%0" \
    % - : "=r" (__i) \
    % - : "m" (*(u_int *)(__pcpu_offset(name)))); \
    % - __result = *(__pcpu_type(name) *)&__i; \
    % + __pcpu_type(name) __res; \
    % + __asm __volatile("mov%L0 %%fs:%1,%0" \
    % + : "=r" (__res) \
    % + : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \
    % + __result = *(__pcpu_type(name) *)&__res; \

    Similar changes for sizes 2 and 4. Collapsing the now-identical code for
    the size 1/2/4 cases somehow gives worse object code.

    % } else { \
    % __result = *__PCPU_PTR(name); \
    % @@ -123,29 +125,29 @@
    % * Sets the value of the per-cpu variable name to value val.
    % */
    % -#define __PCPU_SET(name, val) ({ \
    % +#define __PCPU_SET(name, val) { \

    Warnings could be prevented by declaring the expression-statment as an
    __extension, but since __PCPU_SET() doesn't return anything, using an
    expression statement for it is just bogus; don't use one.

    % __pcpu_type(name) __val = (val); \
    % \
    % if (sizeof(__val) == 1) { \
    % u_char __b; \

    Basic types are still needed for __PCPU_SET().

    % - __b = *(u_char *)&__val; \
    % - __asm __volatile("movb %1,%%fs:%0" \
    % - : "=m" (*(u_char *)(__pcpu_offset(name))) \
    % + __builtin_memcpy(&__b, &__val, sizeof(__val)); \

    This avoids the aliasing problem.

    % + __asm __volatile("mov%L1 %1,%%fs:%0" \

    "%L1" is to avoid spelling out the operand size. This makes this line
    literally the same for all cases.

    % + : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \

    A basic type is not needed for this load, so we can use the natural type.
    This makes this line literally the same for all cases.

    % : "r" (__b)); \
    % } else if (sizeof(__val) == 2) { \
    % u_short __w; \
    % - __w = *(u_short *)&__val; \
    % - __asm __volatile("movw %1,%%fs:%0" \
    % - : "=m" (*(u_short *)(__pcpu_offset(name))) \
    % + __builtin_memcpy(&__w, &__val, sizeof(__val)); \
    % + __asm __volatile("mov%L1 %1,%%fs:%0" \
    % + : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \
    % : "r" (__w)); \
    % } else if (sizeof(__val) == 4) { \
    % u_int __i; \
    % - __i = *(u_int *)&__val; \
    % - __asm __volatile("movl %1,%%fs:%0" \
    % - : "=m" (*(u_int *)(__pcpu_offset(name))) \
    % + __builtin_memcpy(&__i, &__val, sizeof(__val)); \
    % + __asm __volatile("mov%L1 %1,%%fs:%0" \
    % + : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \
    % : "r" (__i)); \

    Similarly. The code is now only mostly the same for the size 1/2/4 cases.

    % } else { \
    % *__PCPU_PTR(name) = __val; \

    __PCPU_SET() is not used very much, so maybe it should be optimized for
    code simplicity by always going through the pointer.

    BTW, how does locking for accesses through the pointer work? I think it
    can only work due to accidental CPU affinity if Giant is not held.

    % } \
    % -})
    % +}

    Part of not using an expression-statement for __PCPU_SET().

    %
    % #define PCPU_GET(member) __PCPU_GET(pc_ ## member)

    Unannotated version:

    %%%
    Index: pcpu.h
    ===================================================================
    RCS file: /home/ncvs/src/sys/i386/include/pcpu.h,v
    retrieving revision 1.35
    diff -u -2 -r1.35 pcpu.h
    --- pcpu.h 1 Oct 2002 14:01:58 -0000 1.35
    +++ pcpu.h 28 Jul 2003 14:02:30 -0000
    @@ -78,5 +80,5 @@
      * Evaluates to the address of the per-cpu variable name.
      */
    -#define __PCPU_PTR(name) ({ \
    +#define __PCPU_PTR(name) __extension__ ({ \
             __pcpu_type(name) *__p; \
                                                                             \
    @@ -96,21 +98,21 @@
                                                                             \
             if (sizeof(__result) == 1) { \
    - u_char __b; \
    - __asm __volatile("movb %%fs:%1,%0" \
    - : "=r" (__b) \
    - : "m" (*(u_char *)(__pcpu_offset(name)))); \
    - __result = *(__pcpu_type(name) *)&__b; \
    + __pcpu_type(name) __res; \
    + __asm __volatile("mov%L0 %%fs:%1,%0" \
    + : "=r" (__res) \
    + : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \
    + __result = *(__pcpu_type(name) *)&__res; \
             } else if (sizeof(__result) == 2) { \
    - u_short __w; \
    - __asm __volatile("movw %%fs:%1,%0" \
    - : "=r" (__w) \
    - : "m" (*(u_short *)(__pcpu_offset(name)))); \
    - __result = *(__pcpu_type(name) *)&__w; \
    + __pcpu_type(name) __res; \
    + __asm __volatile("mov%L0 %%fs:%1,%0" \
    + : "=r" (__res) \
    + : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \
    + __result = *(__pcpu_type(name) *)&__res; \
             } else if (sizeof(__result) == 4) { \
    - u_int __i; \
    - __asm __volatile("movl %%fs:%1,%0" \
    - : "=r" (__i) \
    - : "m" (*(u_int *)(__pcpu_offset(name)))); \
    - __result = *(__pcpu_type(name) *)&__i; \
    + __pcpu_type(name) __res; \
    + __asm __volatile("mov%L0 %%fs:%1,%0" \
    + : "=r" (__res) \
    + : "m" (*(__pcpu_type(name) *)(__pcpu_offset(name)))); \
    + __result = *(__pcpu_type(name) *)&__res; \
             } else { \
                     __result = *__PCPU_PTR(name); \
    @@ -123,29 +125,29 @@
      * Sets the value of the per-cpu variable name to value val.
      */
    -#define __PCPU_SET(name, val) ({ \
    +#define __PCPU_SET(name, val) { \
             __pcpu_type(name) __val = (val); \
                                                                             \
             if (sizeof(__val) == 1) { \
                     u_char __b; \
    - __b = *(u_char *)&__val; \
    - __asm __volatile("movb %1,%%fs:%0" \
    - : "=m" (*(u_char *)(__pcpu_offset(name))) \
    + __builtin_memcpy(&__b, &__val, sizeof(__val)); \
    + __asm __volatile("mov%L1 %1,%%fs:%0" \
    + : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \
                         : "r" (__b)); \
             } else if (sizeof(__val) == 2) { \
                     u_short __w; \
    - __w = *(u_short *)&__val; \
    - __asm __volatile("movw %1,%%fs:%0" \
    - : "=m" (*(u_short *)(__pcpu_offset(name))) \
    + __builtin_memcpy(&__w, &__val, sizeof(__val)); \
    + __asm __volatile("mov%L1 %1,%%fs:%0" \
    + : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \
                         : "r" (__w)); \
             } else if (sizeof(__val) == 4) { \
                     u_int __i; \
    - __i = *(u_int *)&__val; \
    - __asm __volatile("movl %1,%%fs:%0" \
    - : "=m" (*(u_int *)(__pcpu_offset(name))) \
    + __builtin_memcpy(&__i, &__val, sizeof(__val)); \
    + __asm __volatile("mov%L1 %1,%%fs:%0" \
    + : "=m" (*(__pcpu_type(name) *)(__pcpu_offset(name))) \
                         : "r" (__i)); \
             } else { \
                     *__PCPU_PTR(name) = __val; \
             } \
    -})
    +}

     #define PCPU_GET(member) __PCPU_GET(pc_ ## member)
    %%%

    Bruce
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"


  • Next message: Terry Lambert: "Re: LOR with filedesc structure and Giant"

    Relevant Pages

    • Re: give me some tips
      ... Rod Pemberton schrieb: ... avoid the use casts, except to ensure the type and precision of numerical ... If you decide that the warning is not critical and your compiler ... offers a #pragma to switch off a warning for a code section, ...
      (comp.lang.c)
    • Re: need some help with threading module...
      ... But how will i learn using threads if i avoid using them ... That is an excellent reason to mess with them, it was just a warning ... controller class, please look again at what I gave you this as an ... You also mention the MVC pattern. ...
      (comp.lang.python)
    • Re: Confirm each email when running mail merge
      ... The warning comes up (for security purposes) when "another program" is ... To avoid having to click Yes on each occasion, ... services on a paid consulting basis. ...
      (microsoft.public.word.mailmerge.fields)
    • Re: OOP leads to lousy websites?
      ... Chung Leong wrote: ... Plus the error you avoid is essentially using ... An empty string is not 'no data', not more than 0 is 'no data'. ... > probably generates an warning ...
      (comp.lang.php)
    • [SLE] Compiling Perl Modules (Frequencies.pm) and IVTV
      ... V4l.xs:652: warning: `mindata' might be used uninitialized in this function ... Running Mkbootstrap for Video::Capture::V4l ... VBI.xs:296: warning: suggest parentheses around arithmetic in operand of | ...
      (SuSE)