SUMMARY: domain panic

From: Cohen, Andy (Andy.Cohen_at_cognex.com)
Date: 07/16/04

  • Next message: mike kirkland: "conflicting info between df -k and du -xk commands""
    Date: Fri, 16 Jul 2004 15:12:54 -0400
    To: "Tru64-Unix-Managers (E-mail)" <tru64-unix-managers@ornl.gov>
    
    

    whew! More oustanding help from the list!

    Basically the suggestions were to run /sbin/advfs/verify and/or run /sbin/advfs/fixfdmn. I did both but it was the fixfdmn that did the trick for me.

    ... and to those that kindly suggested I RTFM -- I would've if I had one :-)

    The most information came from Derek Haining:

    +_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_
    Domain panics render the filesystem unavailable until the next mount.
    Sometimes you must reboot the system before you can attempt to mount
    the filesystem again.

    At this point I would recommend commenting this filesystem out of
    /etc/fstab,
    rebooting the system, and then checking the filesystem using
    /sbin/advfs/fixfdmn.

    Allow me to give you a few tips of fixfdmn if you haven't used fixfdmn
    before.
    First, I always recommend running it in the "no fix" mode first. This is
    done
    using the "-n" flag.

    Once fixfdmn is done examining the domain (which must not be in use at the
    time)
    you should examine the output log file. This is named something like:

            fixfdmn.kingdom_domain.log

    :)

    I expect that you will find that it attempts to clear 512 pages of the
    transaction
    log file. This is normal and, in fact, is always done. (Unless you have a
    really
    old copy of fixfdmn.) Look for problems that fixfdmn finds and attempts to
    correct. It may simply be that the problem that was causing you difficulty
    in
    removing the volume also caused you to have problems updating the other data
    structures that keep information about the domain, such as the number of
    volumes
    in use. If this is all it is, this should be a simple fix.

    Anyway if fixfdmn finds problems then you should probably repair the domain
    by
    running fixfdmn without the "-n" flag.

    You could try running verify, but verify mounts the domain on a hidden mount
    point and uses the kernel to check and/or fix the domain. If the meta-data
    on the disk is corrupted, the kernel will simply force another domain panic.
    Which is why we wrote fixfdmn -- to get away from the restrictions imposed
    by the kernel. We *know* the meta-data could be corrupt. We're trying t
    fix it, for goodness' sake! :)
    +_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_

    but thanks also to Graham Allen, Jenny Butler, George Banane, and Kevin Raubenolt.

    Andy

    ORIGINAL QUESTION
    =================
    I removed a volume from a domain and now the domain has panicked:

    The domain was:

    root@thor==> showfdmn -k home_domain

                  Id Date Created LogPgs Version Domain Name
    3d0e2d5a.0007c3dd Mon Jun 17 14:41:30 2002 512 4 home_domain

     Vol 1K-Blks Free % Used Cmode Rblks Wblks Vol Name
      1L 835760 710560 15% on 256 256 /dev/disk/dsk12a
      2 839472 735432 12% on 256 256 /dev/disk/dsk12b
      3 839472 722672 14% on 256 256 /dev/disk/dsk12d
      4 839472 724952 14% on 256 256 /dev/disk/dsk12e
      5 835840 721952 14% on 256 256 /dev/disk/dsk12f
      6 3548920 3142696 11% on 256 256 /dev/disk/dsk13b
      7 7125776 7109096 0% on 256 256 /dev/disk/dsk15b <===
      8 3567800 3239152 9% on 256 256 /dev/disk/dsk13a
      9 3548920 3216272 9% on 256 256 /dev/disk/dsk13d
     10 3548920 3202840 10% on 256 256 /dev/disk/dsk13e
     11 3567968 3146144 12% on 256 256 /dev/disk/dsk13f
         ---------- ---------- ------
           29098320 26671768 8%

    I issued:

    rmvol /dev/disk/dsk15b

    and removed the volume.

    Now I can't do anything with /home. This message is in /var/adm/messages:

    Jul 15 15:07:57 thor vmunix:
    Jul 15 15:07:57 thor vmunix: bs_inherit - bmtr_get_rec failed, return code = -1043
    Jul 15 15:07:57 thor vmunix: AdvFS Domain Panic; Domain home_domain Id 0x3d0e2d5a.0007c3dd
    Jul 15 15:07:57 thor vmunix: An AdvFS domain panic has occurred due to either a metadata write error or an internal inconsistency. This domain is being rende
    red inaccessible.
    Jul 15 15:07:57 thor vmunix: Please refer to guidelines in AdvFS Guide to File System Administration regarding what steps to take to recover this domain.

    in etc/fdmns/home_domain:

    lrwxr-xr-x 1 root system 16 Nov 12 2002 dsk12a -> /dev/disk/dsk12a
    lrwxr-xr-x 1 root system 16 Nov 12 2002 dsk12b -> /dev/disk/dsk12b
    lrwxr-xr-x 1 root system 16 Nov 12 2002 dsk12d -> /dev/disk/dsk12d
    lrwxr-xr-x 1 root system 16 Nov 12 2002 dsk12e -> /dev/disk/dsk12e
    lrwxr-xr-x 1 root system 16 Nov 12 2002 dsk12f -> /dev/disk/dsk12f
    lrwxr-xr-x 1 root system 16 Jul 14 15:20 dsk13a -> /dev/disk/dsk13a
    lrwxr-xr-x 1 root system 16 Jul 14 15:31 dsk13b -> /dev/disk/dsk13b
    lrwxr-xr-x 1 root system 16 Jul 14 15:31 dsk13d -> /dev/disk/dsk13d
    lrwxr-xr-x 1 root system 16 Jul 14 15:31 dsk13e -> /dev/disk/dsk13e
    lrwxr-xr-x 1 root system 16 Jul 14 15:31 dsk13f -> /dev/disk/dsk13f

    I tried advscan:

    root@thor==> /sbin/advfs/advscan -a -f home_domain

    Scanning devices /dev/rdisk/dsk0 /dev/rdisk/dsk4 /dev/rdisk/dsk12 /dev/rdisk/dsk13 /dev/rdisk/dsk5
                   /dev/rdisk/dsk6 /dev/rdisk/dsk7 /dev/rdisk/dsk1 /dev/rdisk/dsk8

    Attempting to fix link/dev_count for domain

           home_domain

    Nothing to fix

    but it still doesn't work.

    How can I fix this? Salvage seems overkill. Can I remove the links in /etc/fdmns/home_domain and recreate the home_domain from scratch using the same configuration?

    Thank you!!

    Andy

    Andy Cohen
    Database Systems Administrator
    Cognex Corporation
    1 Vision Drive
    Natick, MA 01760


  • Next message: mike kirkland: "conflicting info between df -k and du -xk commands""

    Relevant Pages