Re: Panic on 7.4-RELEASE-p5



On Thu, Jan 26, 2012 at 07:09:06PM -0800, Jeremy Chadwick wrote:
On Thu, Jan 26, 2012 at 09:48:15PM -0500, Gary Palmer wrote:
Hi,

My Soekris firewall just panic'd

Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x14001d
fault code = supervisor write, page not present
instruction pointer = 0x20:0xc06fd2d3
stack pointer = 0x28:0xd63557bc
frame pointer = 0x28:0xd63558a4
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 1869 (tcpdump)
trap number = 12
panic: page fault
KDB: stack backtrace:
db_trace_self_wrapper(c079bbfc,c07fb800,c078d680,d6355660,d6355660,...) at db_tr
ace_self_wrapper+0x26
panic(c078d680,c07b829c,c36d6d24,1,1,...) at panic+0xed
trap_fatal(c2c352d0,140000,2,8,d63556c8,...) at trap_fatal+0x234
trap_pfault(c2b13620,0,c368f460,4,c36d6b00,...) at trap_pfault+0x27a
trap(d635577c) at trap+0x34e
calltrap() at calltrap+0x6
--- trap 0xc, eip = 0xc06fd2d3, esp = 0xd63557bc, ebp = 0xd63558a4 ---
softdep_disk_io_initiation(c2ade474,1000,c3003738,1,c2aab9b8,...) at softdep_dis
k_io_initiation+0xb3
ffs_geom_strategy(c3003738,c2aab9b8,1000,edb000,0,...) at ffs_geom_strategy+0x10
c
ufs_strategy(d6355910,d6355910,c07f3980,c378bbdc,c2aab9b8,...) at ufs_strategy+0
x64
bufstrategy(c378bc9c,c2aab9b8,c368f460,c2aab9b8,c2bd46b4,...) at bufstrategy+0x2
e
bufwrite(c2aab9b8,c2aabb04,20,c368f460,0,...) at bufwrite+0xf4
cluster_wbuild(c378bbdc,4000,3b7,0,8,...) at cluster_wbuild+0x6c9
cluster_write(c378bbdc,c2bd46b4,edc000,0,7f,...) at cluster_write+0x715
ffs_write(d6355bc0,c07091e4,c378bc34,0,c378bc64,...) at ffs_write+0x837
VOP_WRITE_APV(c07e9500,d6355bc0,c368f460,c07a2aec,252,...) at VOP_WRITE_APV+0xa0
vn_write(c2e2adf4,d6355c54,c36d7500,0,c368f460,...) at vn_write+0x26f
dofilewrite(d6355c54,ffffffff,ffffffff,0,c2e2adf4,...) at dofilewrite+0x84
kern_writev(c368f460,4,d6355c54,d6355c74,1,...) at kern_writev+0x58
write(c368f460,d6355cf8,c,c0798e1b,39bfbd8f,...) at write+0x50
syscall(d6355d38) at syscall+0x1b9
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (4, FreeBSD ELF32, write), eip = 0x28354663, esp = 0xbfbfe85c, ebp =
0xbfbfe878 ---
Uptime: 11d5h5m49s
Physical memory: 503 MB
Dumping 112 MB: 97 81 65 49 33 17 1
Dump complete
Automatic reboot in 1 seconds - press a key on the console to abort

fsck prevented the box from coming up automatically

/dev/ufs/varlog: PARTIALLY TRUNCATED INODE I=1496123
/dev/ufs/varlog: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY.
THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY:
ufs: /dev/ufs/varlog (/var/log)

Anyone seen this before? Any clues?

Please shed some light as to your filesystem and disk setup.
Partitioning details, any GEOM layers you use, you get the idea.
The more verbose the better. Please be sure to state which filesystems
use softupdates and which do not (this matters a lot in this situation).

kryten# mount
/dev/ad0s1a on / (ufs, local, read-only)
devfs on /dev (devfs, local)
/dev/md0 on /etc (ufs, local)
/dev/md1 on /etc (ufs, local)
/dev/ufs/home on /home (ufs, local, soft-updates)
/dev/ufs/var on /var (ufs, local, soft-updates)
/dev/ufs/varlog on /var/log (ufs, local, soft-updates)
devfs on /var/db/dhcpd/dev (devfs, local)

(note the second md on /etc is an anomaly - there is normally only one and
the daily security report already flagged that up as being different.
Haven't looked into where / why that happened yet. This is a nanobsd
build)

kryten# glabel list
Geom name: ad0s2a
Providers:
1. Name: ufsid/4ef571bfcc16bfa6
Mediasize: 1015636480 (969M)
Sectorsize: 512
Mode: r0w0e0
secoffset: 0
offset: 0
seclength: 1983665
length: 1015636480
index: 0
Consumers:
1. Name: ad0s2a
Mediasize: 1015636480 (969M)
Sectorsize: 512
Mode: r0w0e0

Geom name: ad1s1d
Providers:
1. Name: ufs/varlog
Mediasize: 107374182912 (100G)
Sectorsize: 512
Mode: r1w1e1
secoffset: 0
offset: 0
seclength: 209715201
length: 107374182912
index: 0
Consumers:
1. Name: ad1s1d
Mediasize: 107374182912 (100G)
Sectorsize: 512
Mode: r1w1e2

Geom name: ad1s1e
Providers:
1. Name: ufs/home
Mediasize: 8589934592 (8.0G)
Sectorsize: 512
Mode: r1w1e1
secoffset: 0
offset: 0
seclength: 16777216
length: 8589934592
index: 0
Consumers:
1. Name: ad1s1e
Mediasize: 8589934592 (8.0G)
Sectorsize: 512
Mode: r1w1e2

Geom name: ad1s1f
Providers:
1. Name: ufs/var
Mediasize: 3438215680 (3.2G)
Sectorsize: 512
Mode: r1w1e1
secoffset: 0
offset: 0
seclength: 6715265
length: 3438215680
index: 0
Consumers:
1. Name: ad1s1f
Mediasize: 3438215680 (3.2G)
Sectorsize: 512
Mode: r1w1e2

Geom name: ad0s1a
Providers:
1. Name: ufsid/4f125433ead0d4df
Mediasize: 1015636480 (969M)
Sectorsize: 512
Mode: r0w0e0
secoffset: 0
offset: 0
seclength: 1983665
length: 1015636480
index: 0
Consumers:
1. Name: ad0s1a
Mediasize: 1015636480 (969M)
Sectorsize: 512
Mode: r0w0e0

Geom name: ad0s3
Providers:
1. Name: ufsid/47d97f881feaef9a
Mediasize: 17031168 (16M)
Sectorsize: 512
Mode: r0w0e0
secoffset: 0
offset: 0
seclength: 33264
length: 17031168
index: 0
Consumers:
1. Name: ad0s3
Mediasize: 17031168 (16M)
Sectorsize: 512
Mode: r0w0e0


# grep 'ad[01]' /var/run/dmesg.boot
ad0: 1953MB <SanDisk SDCFH-2048 HDX 4.04> at ata0-master WDMA2
ad1: DMA limited to UDMA33, device found non-ATA66 cable
ad1: 114473MB <Hitachi HTS542512K9SA00 BB2OC31P> at ata0-slave UDMA33
Trying to mount root from ufs:/dev/ad0s1a

As for the actual crash -- I assume the filesystem in question which
induced this has softupdates.

I have no idea from the backtrace which filesystem was being written
to at the time. The filesystem that it complained was unable to
be checked on boot does have softupdates on it (although thinking about
it I probably don't actually need SU)

Have you booted this box into single-user and issued a manual fsck on
all the filesystems? If not, please do so ASAP and provide details of
the results.

I don't use background fsck and when the box tried rebooting it couldn't
handle an unexpected inconsistency in one filesystem and dropped to
single user mode. I did 2 fsck passes from single user mode before
exiting out of the shell and letting the box finish booting.

( from a login shell after the boot has finished )

# grep fsck /etc/rc.conf
background_fsck="NO"

( from my console log )

Starting file system checks:
/dev/ad0s1a: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0s1a: clean, 985492 free (33988 frags, 118938 blocks, 1.7% fragmentation)
/dev/ad0s3: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0s3: clean, 30883 free (139 frags, 3843 blocks, 0.4% fragmentation)
/dev/ufs/home: 142 files, 1974 used, 4158281 free (81 frags, 519775 blocks, 0.0%
fragmentation)
/dev/ufs/var: UNREF FILE I=75348 OWNER=someuser MODE=100600
/dev/ufs/var: SIZE=2048 MTIME=Jan 25 14:47 2012 (CLEARED)
/dev/ufs/var: UNREF FILE I=75349 OWNER=someuser MODE=100644
/dev/ufs/var: SIZE=39402 MTIME=Jan 25 14:43 2012 (CLEARED)
/dev/ufs/var: UNREF FILE I=75350 OWNER=someuser MODE=100600
/dev/ufs/var: SIZE=2048 MTIME=Jan 25 14:47 2012 (CLEARED)
/dev/ufs/var: FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED)
/dev/ufs/var: SUMMARY INFORMATION BAD (SALVAGED)
/dev/ufs/var: BLK(S) MISSING IN BIT MAPS (SALVAGED)
/dev/ufs/var: 499 files, 2861 used, 1661962 free (234 frags, 207716 blocks, 0.0%
fragmentation)
/dev/ufs/varlog: PARTIALLY TRUNCATED INODE I=1496123
/dev/ufs/varlog: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY.
THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY:
ufs: /dev/ufs/varlog (/var/log)
Automatic file system check failed; help!
ERROR: ABORTING BOOT (sending SIGTERM to parent)!

# fsck -p
/dev/ad0s1a: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0s1a: clean, 985492 free (33988 frags, 118938 blocks, 1.7% fragmentation)
/dev/ad0s3: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0s3: clean, 30883 free (139 frags, 3843 blocks, 0.4% fragmentation)
/dev/ufs/home: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/home: clean, 4158281 free (81 frags, 519775 blocks, 0.0% fragmentation)
/dev/ufs/var: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/var: clean, 1661962 free (234 frags, 207716 blocks, 0.0% fragmentation)
/dev/ufs/varlog: PARTIALLY TRUNCATED INODE I=1496123
/dev/ufs/varlog: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY.
THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY:
ufs: /dev/ufs/varlog (/var/log)
# fsck -y /dev/ufs/varlog
** /dev/ufs/varlog
** Last Mounted on /var/log
** Phase 1 - Check Blocks and Sizes
PARTIALLY TRUNCATED INODE I=1496123
SALVAGE? yes

** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes

SUMMARY INFORMATION BAD
SALVAGE? yes

BLK(S) MISSING IN BIT MAPS
SALVAGE? yes

907 files, 15527664 used, 36687228 free (860 frags, 4585796 blocks, 0.0% fragmen
tation)

***** FILE SYSTEM MARKED CLEAN *****

***** FILE SYSTEM WAS MODIFIED *****
# fsck -y /dev/ufs/varlog
** /dev/ufs/varlog
** Last Mounted on /var/log
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
907 files, 15527664 used, 36687228 free (860 frags, 4585796 blocks, 0.0% fragmen
tation)
# fsck -p
/dev/ad0s1a: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0s1a: clean, 985492 free (33988 frags, 118938 blocks, 1.7% fragmentation)
/dev/ad0s3: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ad0s3: clean, 30883 free (139 frags, 3843 blocks, 0.4% fragmentation)
/dev/ufs/var: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/home: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/home: clean, 4158281 free (81 frags, 519775 blocks, 0.0% fragmentation)
/dev/ufs/varlog: FILE SYSTEM CLEAN; SKIPPING CHECKS
/dev/ufs/varlog: clean, 36687228 free (860 frags, 4585796 blocks, 0.0% fragmenta
tion)
/dev/ufs/var: clean, 1661962 free (234 frags, 207716 blocks, 0.0% fragmentation)

These are confirmed cases of background_fsck not properly
catching/dealing with all filesystem errors, thus letting some
transparently pass through. You are running 7.4-p2, which makes this
likelihood even more probable. This is why I advocate (and have for
years) background_fsck="no" in rc.conf. I can dig up the (very long and
semi-heated) thread discussing this if you want, but I probably won't
get to it until next week. I have too much going on with job-related
things outside of FreeBSD to spend a lot of time with it now.

Way ahead of you Jeremy :) None of my boxes use background FSCK for
that very reason. I prefer to wait a bit longer for the boot than
suffer ... oddities.

Thanks,

Gary
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages