Re: Need to process 3 billion (yes billion!) rows

From: Andre Majorel (
Date: 04/17/04

  • Next message: kissdadog: "!!!! A chance of a lifetime !!!"
    Date: Sat, 17 Apr 2004 14:12:45 +0000 (UTC)

    On 2004-04-17, C Stabri <> wrote:

    > I have a flat text file which is 3 billion rows deep (220GB in size).
    > I need to process this file in the following way:
    > 1. Sort it
    > 2. Perform some calculations on it by taking every row and calculating
    > a value based on values of the row above and the row below the current
    > row.

    Unless you have a machine with 220GB of memory, you'll need to
    split the file into manageable chunks, sort each chunk, and then
    reassemble the sorted chunk. I don't think sort(1) can do that

    > Please can you advise me what the best unix tools are for this job. A
    > korn script calling the read function or use awk or a combination.

    For such a large data set, the performance hit of using a shell
    script will be huge. On my machine, piping 3e9 lines into (while
    read n; do :; done) would take about 14 days. Awk might be
    workable, though: 3e9 lines into awk '{x = y; y = $0}' would
    take about 5 hours. But, depending on the sort of calculations
    you do, it may be faster overall to write a C program.

    André Majorel <URL:>
    "Finally I am becoming stupider no more." -- Paul Erdös' epitaph

  • Next message: kissdadog: "!!!! A chance of a lifetime !!!"