Re: After install - Fatal trap 18 ATA problem?

I'm trying to get FreeBSD installed on one of my systems
and I'm getting the error stated below. I did have
FreeBSD 6-STABLE installed a few months ago on this very
system. The only change is that FreeBSD is now installed
on the second harddrive instead of the first. This is
using the -CURRENT snapshot for this month. The install
goes just fine. I also get a very similar error when I
install 6.1 too.

This seems to be the same problem as:
06-0 3/ms g00539.html

But I don't have a built-in compact flash reader attached
via. ATA.

Full verbose boot+backtrace:

rr232x: no controller detected.
ata0-slave: pio=PIO4 wdma=WDMA2 udma=UDMA100 cable=80
wire ata0-master: pio=PIO4 wdma=WDMA2 udma=UDMA66
cable=80 wire ad0: setting PIO4 on nForce2 Pro chip
ad0: setting UDMA66 on nForce2 Pro chip
ad0: 17206MB <IBM DJNA-371800 J78OA30K> at ata0-master

Fatal trap 18: integer divide fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer = 0x20:0xc089b49f
stack pointer = 0x28:0xc0c20b64
frame pointer = 0x28:0xc0c20bec
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 (swapper)
[thread pid 0 tid 0 ]
Stopped at __qdivrem+0x3b: divl %ecx,%eax
db> bt
Tracing pid 0 tid 0 td 0xc0a02fb8
__qdivrem(219b700,0,0,0,0) at __qdivrem+0x3b
__udivdi3(219b700,0,0,0) at __udivdi3+0x16

Looks like an attempt to divide something (0x219b700) by
zero using quad_t arithmetics.

at ad_describe+0x1b3
ad_attach(c26e8580) at ad_attach+0x1e7
0) at device_attach+0x58
device_probe_and_attach(c26e8580) at
bus_generic_attach(c25d2a80,c25d2a80,1,0,c26e8580) at
ata_identify(c25d2a80) at ata_identify+0x1c8
ata_boot_attach(0) at ata_boot_attach+0x3e
af5) at run_interrupt_driven_config_hooks+0x18
mi_startup() at mi_startup+0x96
begin() at begin+0x2c
db> ps
Anish Mistry

FWIW, I saw an integer divide fault apparently related to
the ata driver when I tried to test a low-end VIA-based
mobo with FreeBSD. I gave it away soon and had had no time
for debugging though.

Could you see using gdb what C code is at ad_describe+0x1b3
in your kernel?

How do I do this without creating a kernel dump? Do I need
to setup remote GDB over a serial console?

No, you don't. It's much easier than that. You were
installing FreeBSD from a CURRENT snapshot when the panic
happened, weren't you? If so, get a working machine with
not-too-old GDB first. FreeBSD 5.x or 6.x will do. Then locate
kernel.debug or kernel.symbols in the boot/kernel subdir on the
installation CD. It's the kernel that panic'ed. Well,
kernel.symbols isn't the kernel itself, but its symbols only.
OTOH, we need nothing but the symbols.

Unpack the snapshot's kernel source to somewhere. This is as
easy as typing:

cd /cdrom/7.0-CURRENT/src

For the archives...
You need to create the usr/src directory or tar will fail:
mkdir -p /usr/home/me/somewhere/usr/src

Yes, you're quite right here!

env DESTDIR=/usr/home/me/somewhere sh sys

And now load the kernel binary in GDB (not kgdb):

gdb /cdrom/boot/kernel/kernel.symbols
(gdb) dir /usr/home/me/somewhere

Perhaps GDB will find the source files more readily if you put
them just into /usr/src (after renaming the original /usr/src
to, e.g., /usr/src.orig). So you'll also prevent GDB from
picking the wrong source tree.

mv /usr/src /usr/src.orig
mkdir /usr/src
cd /cdrom/7.0-CURRENT/src
sh sys
gdb /cdrom/boot/kernel/kernel.symbols

Now you should be able to examine the source code using binary
code offsets:

(gdb) list *(ad_describe+0x1b3)

The "list" command will show you which line in which source
file is responsible for the division by zero, and 9 more lines
around it to provide a context. The output can be shown here
as is, it's quite informative.

(gdb) list *(ad_describe+0x1b3)
0xc04e224b is in ad_describe

I suppose you put the CURRENT sources under /usr/src at last,
didn't you?

378 device_get_unit(ch->dev),
379 (atadev->unit ==
ATA_MASTER) ? "master" : "slave",
380 (adp->flags &
AD_F_TAG_ENABLED) ? "tagged " : "",
381 ata_mode2str(atadev->mode));
382 if (bootverbose) {
383 device_printf(dev, "%ju sectors [%juC/%dH/%dS] "
384 "%d sectors/interrupt %d depth
queue\n", adp->total_secs,
385 adp->total_secs / (adp->heads *
386 adp->heads, adp->sectors,
atadev->max_iosize / DEV_BSIZE,
387 adp->num_tags + 1);

Consequently, adp->heads or adp->sectors was 0 for ad0. It means
that the ata(4) driver had some kind of trouble when reading the
disk's parameters from the ATA controller. Now you may want to
contact the author of ata(4), Soren Schmidt <sos@xxxxxxxxxxx>, for
further instructions on how to debug this problem. I hope he'll
find all this info useful. Thanks!
Do you have any insight on what I can do further to debug this


Anish Mistry

