Re: 6.1 kernel unable to find /dev ?



On Sat, 3 Jun 2006, Brian Tao wrote:

I had a very stable 6.1-R amd64 server (once I swapped out some
bad RAM, that is) that needed a couple more hard drives installed.
There were some problems with the upgrade (device renumbering woes,
basically... topic of another thread), and it had to be rolled back.

Upon rolling back, the previously-good kernel would no longer
complete the boot after the device probe. I saw two types of panics
on the serial console:

| Trying to mount root from ufs:/dev/ad4s1a
| Lookup of /dev for devfs, error: 20

Error 20 is ENOTDIR which means something along the requested path exists, but it is not a directory. From this output it looks the root directory entry is somehow corrupted or being misinterpeted.

| exec /sbin/init: error 20
| exec /sbin/oinit: error 20
| exec /sbin/init.bak: error 20
| exec /rescue/init: error 20
| exec /stand/sysinstall: error 20
| init: not found in path
| /sbin/init:/sbin/oinit:/sbin/init.bak:/rescue/init:/stand/sysinstall
| panic: no init
| Uptime: 8s
| Cannot dump. No dump device defined.
| Automatic reboot in 15 seconds - press a key on the console to abort
| --> Press a key on the console to reboot,
| --> or switch off the system now.

... and:

| Trying to mount root from ufs:/dev/ad4s1a
| pid 47 (sh), uid 0: exited on signal 11
| TPTE at 0xffff8000040028e0 IS ZERO @ VA 80051c000
| panic: bad pte
| Uptime: 8s

This is usually indicative of bad RAM or a faulty processor. Since you seem to be having disk problems, it may just be due to the disk returning faulty data. Or there is a bad kernel module in the mix that is randomly corrupting data.

The first one is suggesting that /dev does not exist (or is not a
directory)... I'm thinking this means that devfs is somehow
unavailable, but I did not think it is even possible to disable devfs
via the kernel config file these days.

The second one leaves me clueless... I have not been able to find
any useful information on that panic during boot. Granted, I've only
see the "bad pte" panic twice... all other reboot attempts result in
the first type of problem.

Fortunately, I did happen to keep an old 6.0-RELEASE-p6 kernel
around (Apr 15 2006 build). That kernel boots fine, using the same
filesystem as newer kernels on that drive. I am up-to-date with the
RELENG_6_1 tag. Should I perhaps to a make installkernel installworld
before rebooting? The installed binaries on the server are from an
early 6.1-RELEASE (which *was* successfully booted by this server). I
am running into a few minor but surmountable problems because of the
older kernel version, but I obviously would like to get my world and
kernel back in sync ASAP.

My gut feeling is that there is still a disconnect on what the root filesystem is. That or there is hidden corruption that 6.0 isn't noticing that 6.1 is. Here's what I'd do next:

1. Capture the boot output from both the working 6.0 kernel and your broken 6.1 kernel and compare the two. If there are differences or errors being returned from the ATA controller or disks then those will need to be addressed.

2. Try a splat-over reinstall of 6.1-R from CD to force everything to match up. Mount the filesystems but don't mark them to be newfs'd. Install the GENERIC kernel only.

If you are going to be tracking a branch, please read the instructions at the end of src/UPDATING on how to perform the build. There is a specific procedure and not following it can cause significant issues. While unlikely, it is possible to irreparibly damage the system by not following the instructions to the letter.

--
Doug White | FreeBSD: The Power to Serve
dwhite@xxxxxxxxxxxxx | www.FreeBSD.org
_______________________________________________
freebsd-stable@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@xxxxxxxxxxx"



Relevant Pages

  • RE: make world : did it fail?
    ... I know from the script output that the buildworld and buildkernel succeeded. ... The only step that is suspect is the installkernel. ... It reported the unpatched kernel. ... When I heard the computer reboot (it's sitting ...
    (freebsd-questions)
  • Re: [nvidia | shared irq] umass disconnects [was: panic dd-ing from a USB "disk" ]
    ... Under heavy load it panics after ... It causes a kernel panic and a forced reboot. ... so I can fiddle with it as need be to capture useful information. ...
    (freebsd-stable)
  • installkernel succeeds, but old kernel boots
    ... make -DALWAYS_CHECK_MAKE installkernel KERNCONF=PACO ... reboot and run 'uname -a' I see: ... That's my old kernel that I was trying to replace. ... It sure looks like I booted my old kernel somehow, ...
    (freebsd-stable)
  • 6.1 kernel unable to find /dev ?
    ... the previously-good kernel would no longer ... | exec /sbin/oinit: error 20 ... | Automatic reboot in 15 seconds - press a key on the console to abort ... Should I perhaps to a make installkernel installworld ...
    (freebsd-stable)
  • Re: make installkernel NDIS disaster
    ... installing a kernel and for some reason I got some NDIS errors. ... but of a make installkernel KERNCONF=KAYVE_KERN ... I am running this disk now so I can mount the othere data. ... *** Error code 1 ...
    (freebsd-questions)