SUMMARY: Out of Memory Message in Huge amount of Files

From: Raul Sossa S. (RSossa_at_datadec.co.cr)
Date: 12/23/03

  • Next message: Dave Sill: "Administrivia: Tru64-UNIX-Managers information and policy statement"
    Date: Tue, 23 Dec 2003 11:46:15 -0600
    To: tru64-unix-managers@ornl.gov
    
    

    n 09-Dec-03, Raul Sossa S. wrote:
    > Hello Guys!
    > We have about 3 millions of files in one Tru64UNIX directory.
    > The binary files are people pictures that in average contains 80K of
    general
    > size.
    > When we do a "ls -al " we're getting an "Out of memory" error message.
    > Does anyone knows about any kernel parameter or swap or any tuning issue
    > that might help to evoid this message and get the real output from the
    > Operating System Shell. What is the Tru64UNIX max files that can we have
    > in a directory ?

    Answers:
    I applied all of this suggestions, thank you very much :
    __________________________________________________________________________
    From: Alan Rollow - Dr. File System's Home for Wayward Inodes.
    [mailto:alan@desdra.cxo.cpqcorp.net]

    ls(1) is constrained by the same per-process virtual memory
            limits as any other program. The long ls(1) listing needs
            enough memory for the stat(2) data for each file, so that
            it can get the data and then sort it. For 3 million files,
            this is going to be a lot of data and could easily run the
            process into the datasize virtual memory limit.

            Depending on your shell, there may be a built-in command
            that will let you raise the per-process limits from the
            default to the maximum or something inbetween. For most
            shells this is either "limit" or "ulimit". You're shell's
            manual page may document the command.

            The per-process limits can be configured with sysconfig(8).
            The sys_attrs_proc(5) manual page documents the parameters.
            The limits come in two values; default and maximum. An
            unmodified system typically has limits of:

            Process Data Space:

                    Maximum: 1 GB
                    Default: 128 MB

            Process Stack Space:

                    Maximum: 32 MB
                    Default: 2 MB

            There are also limits for total address space and on some
            versions total system virtual memory. The amount of page
            and swap space you have can also limit virtual memory use,
            so check that as well.

            On my V4.0G system, the structure used by stat(2) is 80
            bytes in length. 3,000,000 of them is going to take at
            least 228 MB of virtual memory. Clearly, that's more than
            the typical default process data size. If reorganizing the
            data isn't a good option, then you might want to consider
            raising the default process data space size to be large
            enough to run this particular command. The sysconfig(8)
            manual page for information about changing the parameters.

            The system tuning guide may also have some advice. If you
            don't have a paper copy, PDF and HTML versions are on the
            documentation CDROM.

            As for limits on the number of files in a directory, I
            don't recall if there are any enforced limits. There
            are limits which come just from the amount of space to
            store all the metadata for a file; the directory size
            itself can't larger than the file system can hold as a
            file. If there is a limit, it is on the order of 2^31-1
            or 2^32. Since Tru64 UNIX uses 64 bit integers for many
            things if the 32 bit integer limit is relevant to the
            number of files in a directory, the 64 bit integer limit
            is too high to worry about.

            Depending on the file system and version, there may be
            practical limits on the number of files in a directory.
            UFS is well known not handling large numbers of files
            gracefully. Your 3,000,000 is well beyond the point
            where the UFS limitations become an issue. If you're
            using UFS for this file system, fixing the memory problem
            is only the start.

            Older versions of AdvFS didn't handle large number of
            files well, but it did bettery than UFS. I believe
            there were metadata changes in V5 that significantly
            help the performance of processing directory with lots
            of files. So, a V5 created AdvFS may not much trouble
            with so many files.
    __________________________________________________________________________
    From: Tim Cutts [mailto:tjrc@sanger.ac.uk]

    UNIX directory performance is dreadful if there are man entries in a
    directory. The practical limit, for performance, in my experience is about
    10,000 files, so you have already massively exceeded this! You'll find that
    lots of other things may be broken with directories like this
    - I wouldn't guarantee that dumps will work properly, for example.

    You should reorganise the data into subdirectories, each of which has around
    1000 files. I have some perl code to create such a directory structure, if
    you are interested.

    Tim

    __________________________________________________________________________
    From: Dr Thomas.Blinn@HP.com [mailto:tpb@doctor.zk3.dec.com]
    There is no practical upper limit to the number of files you can put in a
    single directory, other than the number of files you can put in a single
    file system.

    The message you are seeing is coming directly from the "ls" utility. As "ls"
    reads in the directory information, it has to sort it, since it's not in any
    particular sorted order in the directory itself. So "ls" has to allocate
    memory to hold the name strings (as well as the node numbers and some other
    data) and then once it's got all of the names in memory, it sorts the
    listing in memory and then displays it.

    On most systems, the default memory size limits are low enough that in a
    directory with HUGE numbers of file, "ls" simply runs out of memory for
    building the listing.

    There are a couple of ways to work around this. One is to use the "find"
    utility with the "-ls" option and then use an external sort with the sort
    utility and then use "awk" to format the sorted data if you want something
    more sophisticated than what the "find -ls" is giving you.

    Another is to relax the memory limits. You can do this for every process on
    the system by default (which is risky if you don't have the memory and swap
    resources to support memory hogs), or you can do it for the users who need
    to use "ls". In any case, what you need to look at is things like the
    maximum address space and the maximum data size (that is, heap size)
    parameters. You can learn more from the system tuning guide or the
    reference pages for the various system attributes. You do the tuning in
    /etc/sysconfigtab.

    If you have the max values already large enough, you can use the "limit"
    command (which varies by shell) to adjust the limits for the particular
    users. On my V4.0G system, I routinely bump up the limits for address space
    and data space in shell scripts where I need to use "ls" in directories with
    large numbers of files, and since it's wrapped in a shell script, I don't
    otherwise change the limits for my normal processing.

    Hope this helps you understand your options.

    Tom

    __________________________________________________________________________
    From: tru64-unix-managers-owner@ornl.gov
    [mailto:tru64-unix-managers-owner@ornl.gov]

    You can change proc
    Max_per_proc_data_size

    Nasır YILMAZ


  • Next message: Dave Sill: "Administrivia: Tru64-UNIX-Managers information and policy statement"

    Relevant Pages

    • Re: [RFC][0/3] Virtual address space control for cgroups (v2)
      ... I thought I addressed some of those by adding a separate config option. ... I am not against making it a separate controller. ... Trying to account/control physical memory or swap usage via virtual ... you suggested that VA limits provide a "soft-landing". ...
      (Linux-Kernel)
    • Re: [RFC][0/3] Virtual address space control for cgroups (v2)
      ... Trying to account/control physical memory or swap usage via virtual ... you suggested that VA limits provide a "soft-landing". ... memory controller configured. ... example would work for free if they were two separate subsystems. ...
      (Linux-Kernel)
    • Re: Maximum filesize?
      ... I found some information on Charles Williams's site concerning Memory Limits ... Excel 2003 has a substantially increased memory capacity I have successfully ... to save the backup copy. ...
      (microsoft.public.excel.worksheet.functions)
    • Odd Machine Crashes
      ... There are some odd messages coming from named and the machine is ... obviously running out of memory. ... I've since started limiting the resources of class daemon and running ... Resource limits for class daemon: ...
      (FreeBSD-Security)
    • Re: fastest way to parse a file; Most efficient way to store the data?
      ... tradeoff between memory and speed. ... Since you have to sort anyway, you might as well do that and your ... a string, an offset, and a length. ... Create a Field object for each of them. ...
      (microsoft.public.dotnet.languages.csharp)