Re: LOR route vr0

From: John Baldwin (jhb_at_FreeBSD.org)
Date: 09/02/05

  • Next message: John Baldwin: "[PATCH] Cleanup asm constraints in atomic operations"
    To: freebsd-current@freebsd.org
    Date: Fri, 2 Sep 2005 14:37:31 -0400
    
    

    On Thursday 01 September 2005 08:39 pm, Don Lewis wrote:
    > On 1 Sep, John Baldwin wrote:
    > > On Thursday 01 September 2005 01:22 pm, Don Lewis wrote:
    > >> On 1 Sep, Fredrik Lindberg wrote:
    > >> > I'm seeing both the rentry and the tcpinp LORs on my fxp interface
    > >> > on a machine running a few days old -current (Aug 25).
    > >> >
    > >> > lock order reversal
    > >> > 1st 0xc1e30d38 inp (tcpinp) @ /usr/src/sys/netinet/tcp_input.c:742
    > >> > 2nd 0xc1b74018 fxp0 (network driver)
    > >> > @/usr/src/sys/dev/fxp/if_fxp.c:1172
    > >> >
    > >> > lock order reversal
    > >> > 1st 0xc1e06bb8 rtentry (rtentry) @ /usr/src/sys/net/route.c:1269
    > >> > 2nd 0xc1b74018 fxp0 (network driver)
    > >> > @/usr/src/sys/dev/fxp/if_fxp.c:1172
    > >> >
    > >> > As for their backtraces they are almost identical to the
    > >> > once already posted.
    > >>
    > >> Are you using any applications that use multicast? Can you break into
    > >> DDB and capture the output of "show witness"?
    > >
    > > Also, are you using DEVICE_POLLING?
    >
    > I can reproduce this if I add DEVICE_POLLING to my kernel. And I see
    > Giant under "network driver" in the output of "show witness".
    >
    > If I apply your witness patch:
    > http://www.FreeBSD.org/~jhb/patches/witness.patch
    > then I get the following LOR:
    >
    > lock order reversal
    > 1st 0xc23e2018 fxp0 (network driver) @ /usr/src/sys/dev/fxp/if_fxp.c:1907
    > 2nd 0xc09387e0 Giant (Giant) @ /usr/src/sys/kern/kern_poll.c:460
    > KDB: stack backtrace:
    > kdb_backtrace(0,ffffffff,c0946470,c0947f28,c08d3a84) at kdb_backtrace+0x29
    > witness_checkorder(c09387e0,9,c086d0d3,1cc) at witness_checkorder+0x53c
    > _mtx_lock_flags(c09387e0,0,c086d0d3,1cc) at _mtx_lock_flags+0x5b
    > ether_poll_deregister(c23de000,c23e2000,c23e2018,0,e9295b60) at
    > ether_poll_deregister+0x1d fxp_stop(c23e2000,c23e2018,1,c084c9ff,787) at
    > fxp_stop+0x21
    > fxp_init_body(c23e2000,c23e2018,0,c084c9ff,773) at fxp_init_body+0x31
    > fxp_init(c23e2000,8020690c,c23e2000,c264bb00,e9295bc0) at fxp_init+0x23
    > ether_ioctl(c23de000,8020690c,c264bb00,0,c264bb00) at ether_ioctl+0x50
    > fxp_ioctl(c23de000,8020690c,c264bb00,1,c0a86503) at fxp_ioctl+0x232
    > in_ifinit(c23de000,c264bb00,c24b3490,0,e9295c38) at in_ifinit+0x206
    > in_control(c270fde8,8040691a,c24b3480,c23de000,c248e900) at
    > in_control+0x882 ifioctl(c270fde8,8040691a,c24b3480,c248e900,0) at
    > ifioctl+0x198
    > soo_ioctl(c2647dc8,8040691a,c24b3480,c2271d00,c248e900) at soo_ioctl+0x2db
    > ioctl(c248e900,e9295d04,3,1,286) at ioctl+0x370
    > syscall(3b,3b,3b,8056e40,8059140) at syscall+0x22f
    > Xint0x80_syscall() at Xint0x80_syscall+0x1f
    > --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x48136e4b, esp = 0xbfbfe5ec,
    > ebp = 0xbfbfee38 --- fxp0: link state changed to UP

    Yeah, because of this bug, DEVICE_POLLING really needs debug.mpsafenet=0.
    Perhaps someone should add NET_NEEDS_GIANT(polling) to sys/kern/kern_poll.c
    for now? The problem is that the polling code needs to use something other
    than Giant to protect its internal data that it accesses in
    ether_poll_deregister() since all the drivers I've seen call
    ether_poll_deregister() with the driver lock held.

    > I also get another LOR:
    >
    > cd0: Attempt to query device size failed: NOT READY, Medium not present
    > lock order reversal
    > 1st 0xe35e0cc4 g_xdown (g_xdown) @ /usr/src/sys/geom/geom_io.c:465
    > 2nd 0xc09387e0 Giant (Giant) @ /usr/src/sys/geom/geom_disk.c:99
    > KDB: stack backtrace:
    > kdb_backtrace(0,ffffffff,c0945e30,c0947f28,c08d3a84) at kdb_backtrace+0x29
    > witness_checkorder(c09387e0,9,c0866bc0,63) at witness_checkorder+0x53c
    > _mtx_lock_flags(c09387e0,0,c0866bc0,63) at _mtx_lock_flags+0x5b
    > g_disk_start(c2632a50,e35e0cc4,0,c086722e,1d1) at g_disk_start+0x152
    > g_io_schedule_down(c2275480) at g_io_schedule_down+0x160
    > g_down_procbody(0,e35e0d38,0,c0606960,0) at g_down_procbody+0x5a
    > fork_exit(c0606960,0,e35e0d38) at fork_exit+0xa0
    > fork_trampoline() at fork_trampoline+0x8
    > --- trap 0x1, eip = 0, esp = 0xe35e0d6c, ebp = 0 ---
    > Trying to mount root from ufs:/dev/da0s1a

    Hummmm. That means if anyone does a msleep(g_xdown) while holding Giant then
    it could deadlock on resume since msleep() always acquires Giant first.
    Perhaps g_xdown should be an sx lock or some such.

    -- 
    John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
    "Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
    _______________________________________________
    freebsd-current@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-current
    To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org"
    

  • Next message: John Baldwin: "[PATCH] Cleanup asm constraints in atomic operations"