DiskSuite seems to have severely broken a Solaris 8 host

From: Daniel Baldoni (sunman_at_lcds.com.au)
Date: 08/27/05

  • Next message: Gold Sun: "Summary: mailx mails not sent out or not arriving?"
    Date: Sun, 28 Aug 2005 02:33:16 +0800
    To: sunmanagers@sunmanagers.org
    
    

    G'day folks,

    A client (and, as you might guess because of the local time I'm posting this,
    a mate) is having severe problems with a Solaris 8 box, refusing to boot.
    He's getting a whole series of (for example) "/kernel/misc/sparcv9/md_raid:
    undefined symbol md_unit_incopen" errors (the errors are reported for each
    of the forceload'd "md_*" modules, with many symbols listed for each). The
    machine doesn't even successfully reach single-user mode (the password prompt
    is displayed but the machine locks at this point).

    The story is he had a broken root mirror and tried metadetach'ing,
    metaclear'ing and metaroot'ing to get access to the "raw" partition - all
    with no success. As a last resort, he (this is how it was described to me):
            1. Took a copy of the root partition (and a few others) and stored
                    it away safely.
            2. Booted a Solaris 9 CD
            3. Mounted the partition containing the backup
            4. newfs'd the underlying raw partition (thought to be okay, even
                    though DiskSuite insisted it "needs maintenance")
            5. Copied the backed-up root filesystem over the newfs (after
                    mounting the partition).
            6. Installed the appropriate boot-block on that partition (from
                    a successfully mounted Solaris 8 /usr filesytem)
            7. Hand-edited /etc/system to delete the rootdev= entry
            8. Hand-edited /etc/vfstab to update the / entry
            9. Hand-edited /etc/lvm/md.cf to delete the problemmatic mirror and
                    submirrors
            10. Rebooted

    And, that's when all h*ll broke loose. :-/

    I don't have access to his boxes and I doubt I can solve his problem (I've
    never seen anything like this, before) if I did. From what I have been told,
    his /etc/system file contains forceloads only for (forgive the "shell short
    cuts"):
            misc/md_{hotspares,mirror,raid,sp,stripe,trans}
            drv/{dad,isp,pci_pci,pcipsy,sd,simba,uata}

    The machine in question is an Ultra 10, with two internal IDE drives (one
    of which appears to be severely dying, which is what led to all these issues),
    an internal CD-ROM, and 5 (or 6 - he couldn't tell me which) SCSI drives (in
    one of Sun's external enclosures).

    A bit of exploration (using nm on my own Solaris 8 box) shows that the
    symbols being complained about (at least those he was able to quote to me
    before his screen filled up) can all be found in /kernel/drv/md (and
    /kernel/drv/sparcv9/md). A last-ditch suggestion to him was to boot off his
    CD and add a foceload of drv/md to his /etc/system file. Alas, this didn't
    seem to help (his words were "Nope - no change" ... but I don't know if there
    was any change in the errors he was seeing).

    Has anybody got any idea on how to get this system back in working order? He
    can't just install a new OS as he has a ~50-70GB RAID5 (including a single
    disk acting as a hot-spare) spread over the external enclosure (but, luckily,
    his database replicas appear to be okay).

    Any help would be much appreciated. Ciao.

    -------------------------------------------------------+---------------------
    Daniel Baldoni BAppSc, PGradDipCompSci | Technical Director
    require 'std/disclaimer.pl' | LcdS Pty. Ltd.
    -------------------------------------------------------+ 856B Canning Hwy
    Phone/FAX: +61-8-9364-8171 | Applecross
    Mobile: 041-888-9794 | WA 6153
    URL: http://www.lcds.com.au/ | Australia
    -------------------------------------------------------+---------------------
    "Any time there's something so ridiculous that no rational systems programmer
      would even consider trying it, they send for me."; paraphrased from "King Of
      The Murgos" by David Eddings. (I'm not good, just crazy)
    _______________________________________________
    sunmanagers mailing list
    sunmanagers@sunmanagers.org
    http://www.sunmanagers.org/mailman/listinfo/sunmanagers


  • Next message: Gold Sun: "Summary: mailx mails not sent out or not arriving?"

    Relevant Pages

    • Solaris 8 and Timezone
      ... I have an odd question regarding Solaris 8, I have only had experience with ... Solaris 9 so far so this doesn't apply there. ... I live in Dubai, part of the United Arab Emirates, which is GMT+4. ... However when I select this as the TZ=GMT-4 then the local time is correct ...
      (SunManagers)
    • Re: How to tell Solaris to use UTC?
      ... Im having a problem with Solaris. ... installed on my Computer and when I installed Solaris I had my BIOS ... In *BSD there is a system variable that is added to the BIOS time to ... If you set your TZ properly you should see local time where it is appropriate. ...
      (comp.unix.solaris)
    • Help! Timezones - Simple yet baffling
      ... I have a solaris 8 box in a country 1 hour ahead of the uk. ... I set the time to local time with date. ... Sammy ...
      (SunManagers)
    • Re: How to build this time format?
      ... > I am developing an application on Solaris 2.7 platform which C. ... You're missing the centuries. ... > '-' Local time in in quarter hours retarded in relation to UCT time ...
      (comp.unix.programmer)