ASE hiccup leads to domain panic leads to strange ASE/AdvFS state

From: Speakman, John H./Epidemiology-Biostatistics (speakmaj_at_MSKCC.ORG)
Date: 09/19/03

  • Next message: Rich Glazier: "SUMMARY:ftp site?"
    Date: Fri, 19 Sep 2003 15:37:48 -0400
    To: tru64-unix-managers@ornl.gov
    
    

    Hi all
     
    We have a little cluster of two very old Alphas running 4.0E - they are clustered using ASE over a private network (i.e. a crossover cat 5 cable). We haven't changed the configuration in years and two nights ago it had a hiccup.. what we see in syslog is...
     
    Sep 18 03:36:28 biosta vmunix: arp: local IP address 192.168.32.228 in use by hardware address 00-00-...
    Sep 18 03:36:29 biosta vmunix: arp: local IP address 192.168.32.227 in use by hardware address 00-00-...
    Sep 18 03:36:29 biosta vmunix: arp: local IP address 192.168.32.226 in use by hardware address 00-00-...
     
    These IP addresses are the internal (non-public) IP addresses of three of the NFS volumes shared by ASE. The hardware address is the address of the NIC on the other server in the cluster that's connected to the crossover cat 5 cable. Both servers got this message at the same time, each complaining that the other guy was holding the IP address.
     
    The next set of messages on server A (the server that was hosting the services at the time) are a nasty series of domain I/O errors and domain panics on these three domains. Server B reported no further problems (in syslog anyway).
     
    Two of the three domains relocated (via ASE) on the other server and also magically seemed to reconstitute themselves (not via ASE) on the same server (i.e., the domains now appear on the 'df' of both servers, something we have not seen before (the third domain, which is configured not to automatically fail over, reconstituted itself on the same server only, just fine).
     
    So basically we have these two "fake" AdvFS domains which ASE doesn't know about, on server A, as well as the two "real" domains which are on server B (our ASE is configured to automatically relocate services back to the preferred server when it becomes available again after failover). Furthermore, 'df' on server A reveals that the "fake" AdvFS domains are not consistent with the real ones in terms of space occupied; they are a little out, like they are no longer in sync.
     
    Everything is working fine to the users, nobody has complained. The only reason we fould out was a backup job that was running at the time suddenly disappeared (its log file is on one of the domains in question, maybe that's why). But now we have this strangeness and I'm guessing that if I reboot the cluster, something bad might happen, like a domain not come back.
     
    So I was going to try and use asemgr to fail the services back over to server A and hope that everything will magically sync itself. Anyone think that would be a mistake?
     
    Thanks
    John Speakman
    Memorial Sloan-Kettering Cancer Center, NYC
     

     
         =====================================================================
         
         Please note that this e-mail and any files transmitted with it may be
         privileged, confidential, and protected from disclosure under
         applicable law. If the reader of this message is not the intended
         recipient, or an employee or agent responsible for delivering this
         message to the intended recipient, you are hereby notified that any
         reading, dissemination, distribution, copying, or other use of this
         communication or any of its attachments is strictly prohibited. If
         you have received this communication in error, please notify the
         sender immediately by replying to this message and deleting this
         message, any attachments, and all copies and backups from your
         computer.


  • Next message: Rich Glazier: "SUMMARY:ftp site?"

    Relevant Pages

    • Re: Broken pipe issues
      ... small attachments, or no attachments. ... communication issue between my server and other servers, ... Packet trace to see who's dropping the ball and what's being sent. ...
      (comp.mail.sendmail)
    • Re: sunmanagers Digest, Vol 11, Issue 6
      ... check this e-mail and any attachments for viruses. ... > not an easy way of doing it, since my print server is ... >> Regards, ... > I did a tail on the messages file to get a stop action of the flying ...
      (SunManagers)
    • Re: Email attachments fom one user can not be opened on SBS server
      ... Attachments from other users outside of the office come through fine. ... and they seemed to open okay except at one other location that also has SBS ... the Small Business Server, then have the user send the email again. ... If her email is sent to me, then I forward it to the SBS server it opens ...
      (microsoft.public.windows.server.sbs)
    • Re: Pros and Cons to allowing 5-10 MB attachments
      ... server will start to grow at a rapid pace. ... Remember that setting those limits does not only affect your users. ... to publish a file on a FTP server, then train the recipients on how to ... Personally I've always would limit attachments to 2 MB. ...
      (microsoft.public.exchange.admin)
    • Re: Email attachments fom one user can not be opened on SBS server
      ... Attachments from other users outside of the office come through fine. ... We had her send emails to test addresses at other customer sites and they seemed to open okay except at one other location that also has SBS 2003 and Symantec Antivirus. ... The same thing happens if you open it with word viewer on the server or Word 2003 on the workstation. ... If her email is sent to me, then I forward it to the SBS server it opens OK. ...
      (microsoft.public.windows.server.sbs)