4.9p1 deadlock on "inode"

From: Peter Jeremy (peter.jeremy_at_alcatel.com.au)
Date: 12/23/03

  • Next message: Eugene Grosbein: "stable, PAE and ata(4)"
    Date: Tue, 23 Dec 2003 13:42:05 +1100
    To: freebsd-stable@freebsd.org
    
    

    This morning I found one of my systems would not let me login or issue
    commands but still seemed to be running. ddb showed that lots of
    processes were waiting on "inode". I forced a crash dump and found
    166 processes total, 95 waiting on inode and 94 on the same wchan:

    (kgdb) p *(struct lock *)0xc133eb00
    $9 = {lk_interlock = {lock_data = 0}, lk_flags = 0x200440, lk_sharecount = 0,
      lk_waitcount = 94, lk_exclusivecount = 1, lk_prio = 8,
      lk_wmesg = 0xc02b0a8a "inode", lk_timo = 101, lk_lockholder = 304}
    (kgdb)

    The lockholder is cron - the process waiting on inode on a different
    lock:
    (kgdb) p *(struct lock *)0xc1901a00
    $10 = {lk_interlock = {lock_data = 0}, lk_flags = 0x200440, lk_sharecount = 0,
      lk_waitcount = 1, lk_exclusivecount = 1, lk_prio = 8,
      lk_wmesg = 0xc02b0a8a "inode", lk_timo = 101, lk_lockholder = 15123}
    (kgdb)

    Pid 15123 is another cron process waiting on "vlruwk" because there are
    too many vnodes in use:
    (kgdb) p numvnodes
    $12 = 8904
    (kgdb) p freevnodes
    $13 = 24
    (kgdb) p desiredvnodes
    $14 = 8879

    Process vnlru is waiting on "vlrup" with vnlru_nowhere = 18209.

    Looking through the mountlist, mnt_nvnodelistsize was sane on all
    filesystems except one (/mnt), where it was 8613 (97% of all vnodes).
    Only one process was actively using files in /mnt, though some other
    processes may have been using it for $PWD or similar. This process
    was scanning most of the files in /mnt (about 750,000) checking for
    files with identical content - basically all files that could
    potentially be the same (eg same length) are mmap'd and compared.
    This process had 2816 entries in its vm_map. (It's just occurred to
    me that there would be one set of data that would appear in a large
    number of files (~30000) but I would have expected this to result in
    an error during an mmap(), not a deadlock).

    Scanning through the mnt_nvnodelist on /mnt:
    5797 entries were for directories with entries in v_cache_src
    2804 entries were for files with a usecount > 0
      11 entries were for directories with VFREE|VDOOMED|VXLOCK
       1 VNON entry

    This means that none of the vnodes in /mnt were available for
    recycling (and the total vnodes on the other filesystems would not be
    enough to reach the hysteresis point to unlock the vnode allocation).
    I can understand that an mmap'd file holds a usecount on the file's
    vnode but my understanding is that vnode entries with v_cache_src
    entries should be able to be recycled (though this will slow down
    namei()). If so, should vnlru grow a "try harder" loop that will
    recycle these vnodes if it winds up stuck in entries?

    I notice vlrureclaim() contains the comment "don't set kern.maxvnodes
    too low". In this case, it is auto-tuned based on 128MB RAM and
    "maxusers=0". Maybe this is too low for my purposes but it would be
    much nicer if the system managed to handle this situation gracefully
    rather than by deadlocking.

    And finally, a question on vlrureclaim(): Why does this process scan
    through mnt_nvnodelist and perform a TAILQ_REMOVE(), TAILQ_INSERT_TAIL()
    on each node? Wouldn't it be cheaper to just scan the list, rather than
    moving every node to the end of the list?

    Peter
    _______________________________________________
    freebsd-stable@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-stable
    To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"


  • Next message: Eugene Grosbein: "stable, PAE and ata(4)"