ADVFS disk full, but not really full!

From: Andrew Raine (Andrew.Raine_at_mrc-dunn.cam.ac.uk)
Date: 10/03/03

  • Next message: Christian Wessely: "UPDATE: reset alarm in Raidshelf?"
    Date: Fri, 03 Oct 2003 13:33:07 +0100
    To: tru64-unix-managers@ornl.gov
    
    

    Dear Tru64 Managers,

    I wonder if any of you can shed any light on my current problem?

    I have a 2-node cluster (DS20 + ES40 + HSG80, 5.1, PK3) which has, I
    think, got itsself confused about a ADVFS domain/fileset:

    The volume, /scratch, appears to be full, and is causing problems when
    processes try to write to it:

    alpha # df -k /scratch
    Filesystem 1024-blocks Used Available Capacity Mounted on
    scratch_domain#scratch 20000000 20000000 0 100% /scratch

    However, when I look at the space actually used on it I get:

    alpha # du -sk /scratch/* | sort -n
    0 /scratch/vh
    1 /scratch/NEO.log
    8 /scratch/admin
    8 /scratch/el
    8 /scratch/root
    8 /scratch/tm2
    8 /scratch/tsh
    16 /scratch/jrg
    20 /scratch/tmp
    33 /scratch/atpase
    80 /scratch/quota.group
    152 /scratch/quota.user
    290 /scratch/ar
    392210 /scratch/rk
    6188864 /scratch/lf
    8161222 /scratch/smb
    8764514 /scratch/backup
    11386965 /scratch/kunji

    which adds up to 34894407*1024 bytes (~30 GB, which is more than the
    20000000*1024, ~20GB, blocks in the df output isn't it?)

    But, my memory is that the /scratch volume is much bigger than either
    20 or 30 GB:

    alpha # showfdmn scratch_domain

                   Id Date Created LogPgs Version Domain Name
    3b3c8f9e.010893e9 Fri Jun 29 15:24:30 2001 512 4 scratch_domain

      Vol 512-Blks Free % Used Cmode Rblks Wblks Vol Name
       1L 213291744 143058608 33% on 256 256 /dev/disk/dsk12c

    which looks like the volume is ~100 GB, and only 33% used (which fits
    with the 30GB used figure above)

    Any idea what has happened? How to fix it? I've rebooted each of the
    cluster members in turn, but nothing changed. I'm reluctant to take
    both nodes down simultaneously, as this is an NFS-server with several
    active connections. However, I'd guess that a full reboot might be
    needed?

    Regards,

    Andrew

    --
    Dr. Andrew Raine, Head of IT, MRC Dunn Human Nutrition Unit, 
    Wellcome Trust/MRC Building, Hills Road, Cambridge, CB2 2XY, UK
    phone: +44 (0)1223 252830   fax: +44 (0)1223 252835
    web: www.mrc-dunn.cam.ac.uk email: Andrew.Raine@mrc-dunn.cam.ac.uk
    

  • Next message: Christian Wessely: "UPDATE: reset alarm in Raidshelf?"