Re: Panic in 6.2-PRERELEASE with bge on amd64



On Tue, 2007-01-09 at 12:50 +1100, Bruce Evans wrote:
On Mon, 8 Jan 2007, Sven Willenberger wrote:

On Mon, 2007-01-08 at 16:06 +1100, Bruce Evans wrote:
On Sun, 7 Jan 2007, Sven Willenberger wrote:

The short and dirty of the dump:
...
--- trap 0xc, rip = 0xffffffff801d5f17, rsp = 0xffffffffb371ab50, rbp = 0xffffffffb371aba0 ---
bge_rxeof() at bge_rxeof+0x3b7

What is the instruction here?

I will do my best to ferret out the information you need. For the
bge_rxeof() at bge_rxeof+0x3b7 line, the instruction is:

0xffffffff801d5f17 <bge_rxeof+951>: mov %r15,0x28(%r14)
...
Looks like a null pointer panic anyway. I guess the instruction is
movl to/from 0x28(%reg) where %reg is a null pointer.


from the above lines, apparently %r14 is null then.

Yes. It's a bit suprising that the access is a write.

...
#8 0xffffffff801db818 in bge_intr (xsc=0x0) at /usr/src/sys/dev/bge/if_bge.c:2707

What is the statement here? It presumably follow a null pointer and only
the exprssion for the pointer is interesting. xsc is already null but
that is probably a bug in gdb, or the result of excessive optimization.
Compiling kernels with -O2 has little effect except to break debugging.

the block of code from if_bge.c:

2705 if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
2706 /* Check RX return ring producer/consumer. */
2707 bge_rxeof(sc);
2708
2709 /* Check TX ring producer/consumer. */
2710 bge_txeof(sc);
2711 }

Oops. I should have asked for the statment in bge_rxeof().

#7 0xffffffff801d5f17 in bge_rxeof (sc=0xffffffff8836b000) at /usr/src/sys/dev/bge/if_bge.c:2528
2528 m->m_pkthdr.len = m->m_len = cur_rx->bge_len - ETHER_CRC_LEN;

(where m is defined as:
2449 struct mbuf *m = NULL;
)



By default -O2 is passed to CC (I don't use any custom make flags other
than and only define CPUTYPE in my /etc/make.conf).

-O2 is unfortunately the default for COPTFLAGS for most arches in
sys/conf/kern.pre.mk. All of my machines and most FreeBSD cluster
machines override this default in /etc/make.conf.

With the override overridden for RELENG_6 amd64, gcc inlines bge_rxeof(),
so your environment must be a little different to get even the above
ifo. I think gdb can show the correct line numbers but not the call
frames (since there is no call). ddb and the kernel stack trace can
only show the call frames for actual calls.

With -O1, I couldn't find any instruction similar to the mov to the
null pointer + 28. 28 is a popular offset in mbufs

If you have a suggestion for an /etc/make.conf line, I can recompile the
kernel accordingly assuming it still panics or locks up after the change
of interface noted below.


The short of it is that this interface sees pretty much non-stop traffic
as this is a mailserver (final destination) and is constantly being
delivered to (direct disk access) and mail being retrieved (remote
machine(s) with nfs mounted mail spools. If a momentary down of the
interface is enough to completely panic the driver and then the kernel,
this hardly seems "robust" if, in fact, this is what is happening. So
the question arises as to what would be causing the down/up of the
interface; I could start looking at the cable, the switch it's connected
to and ... any other ideas? (I don't have watchdog enabled or anything
like that, for example).

I don't think down/up can occur in normal operation, since it takes ioctls
or a watchdog timeout to do it. Maybe some ioctls other than a full
down/up can cause problems... bge_init() is called for the following
ioctls:
- mtu changes
- some near down/up (possibly only these)
Suspend/resume and of course detach/attach do much the same things as
down/up.

BTW, I added some sysctls and found it annoying to have to do down/up
to make the sysctls take effect. Sysctls in several other NIC drivers
require the same, since doing a full reinitialization is easiest.
Since I am tuning using sysctls, I got used to doing down/up too much.

Similarly for the mtu ioctl. I think a full reinitialization is used
for mtu changes mainly in cases the change switches on/off support for
jumbo buffers. Then there is a lot of buffer reallocation to be
done, and interfaces have to be stopped to ensure that the bufferes
being deallocated are not in use, etc.

Bruce

As this was connected to a gigE switch with mtu left at 1500 I supposed
it is possible that perhaps some mtu discovery/change may have been
happening on the switch but that seems a bit out in left field. For now
I am using the fxp interface connected to the same switch to see if the
issue continues (the change of interface was driven by a hard lockup
yesterday where I could not even type anything on the term).

Sven

_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • Re: Configuring management interface on Catalyst 4507
    ... routing topology just for switch management purposes, ... > then I would need to patch that routed interface into a switch port, ... > Option 1 Configure a loopback interface for managing the switch. ...
    (comp.dcom.sys.cisco)
  • Re: cant ping or telnet to or from a cat 3550
    ... Logging is enabled but shows nothing at all other than a couple ... Interface FastEthernet0/18, changed state to down ... I am trying to ping from a host on Vlan 9, ... from the switch which also does not work. ...
    (comp.dcom.sys.cisco)
  • Re: Random ALL ON Signals
    ... I'm not very familiar with the ACT TI103 interface but IIRC it will ... Im going to do it for the heck of it, and then switch to noise logging ... Homepro Amplified X10 Coupler Repeater W/ Repeated Signal Detection ...
    (comp.home.automation)
  • Re: Network drivers that dont suspend on interface down
    ... to ensure that the interface is downed. ... the driver simply can't do anything about it, because the switch ... is hardwired to the card and either the card's firmware takes care of ...
    (Linux-Kernel)
  • bonding driver issues: slave interface not coming up
    ... gigabit cards in XOR mode. ... duplex/speed forced on both switch and card: ... attempting to down and up the interface in /etc/rc.d/init.d/network: ...
    (Linux-Kernel)