Re: sort(1) memory usage
- From: Dag-Erling Smørgrav <des@xxxxxx>
- Date: Mon, 04 Feb 2008 14:52:35 +0100
Jeremy Chadwick <koitsu@xxxxxxxxxxx> writes:
As you said: the code shows that when no files are specified (e.g. read
off a pipe), sort will make some assumptions regarding the initial
buffer size to read data into. The buffer size allocated in that case
is fairly large, rather than basing it off of the first line off stdin;
it looks like this is done to save CPU time in the long run (otherwise
you'd have to rellocate more later and take a hit; initbuf() is
responsible for that).
Oh give me a break, the self-starting exponential algorithm for growth
of dynamically allocated buffers has been known for decades. In case
any GNU sort developers are reading this, here it comes, free of charge:
static char *buf = NULL;
static size_t bufsz = 0;
static size_t buflen = 0;
int
buf_append(const char *str)
{
size_t len;
len = strlen(str);
if (buflen + len + 1 > bufsz) {
size_t nbufsz = bufsz;
char *nbuf;
while (buflen + len + 1 > nbufsz)
nbufsz = nbufsz * 2 + 1;
nbuf = realloc(buf, nbufsz);
if (nbuf == NULL)
return (-1);
buf = nbuf;
bufsz = nbufsz;
}
memcpy(buf + buflen, str, len);
buf[buflen + len] = '\0';
buflen += len;
return (0);
}
With a good allocator - and depending to some extent on the memory usage
pattern of the rest of your program - if you jump-start it by initally
allocating 16 kB or so (and setting bufsz accordingly), realloc() will
never need to copy anything - but even in the worst case, the amortized
cost will be O(2n), IIRC. This is practically unnoticeable next to the
cost of the sorting algorithm itself, which will be O(n log n) at best
and O(n*n) at worst.
Looking at the code, it seems to go to extreme lengths to get itCan you expand on this? Looking at the code, it doesn't appear that's
absolutely wrong. For instance, if hw.physmem / 8 > hw.usermem, it will
pick the former, which means it's pretty much guaranteed to either fail
or hose your system (or both).
possible. The code in question is default_sort_size(), which is used
when no -S or --buffer-size argument is specified.
I looked at how it computes the cap, which is MAX(total / 8, avail) - in
other words, never mind what's actually feasible, I want more! More!
More, I say!
Count this as a vote for ditching GNU sort in favor of a BSD-licensedIn this specific case, I think you're bashing GNU just because you feel
implementation (from {Net,Open}BSD for instance).
like it. Come on man... =/
No, I'm bashing GNU because it's bloated crap, as this example clearly
shows. It wouldn't be the first time a BSD rewrite of a GNU tool ended
up performing better; see for instance bsdtar. Besides, the FreeBSD
project has a tradition of replacing GNU tools with BSD-licensed
equivalents as long as no functionality is lost.
DES
--
Dag-Erling Smørgrav - des@xxxxxx
_______________________________________________
freebsd-hackers@xxxxxxxxxxx mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@xxxxxxxxxxx"
- References:
- sort(1) memory usage
- From: Dag-Erling Smørgrav
- Re: sort(1) memory usage
- From: Ed Schouten
- Re: sort(1) memory usage
- From: Erik Trulsson
- Re: sort(1) memory usage
- From: Dag-Erling Smørgrav
- Re: sort(1) memory usage
- From: Dag-Erling Smørgrav
- Re: sort(1) memory usage
- From: Jeremy Chadwick
- sort(1) memory usage
- Prev by Date: Re: sort(1) memory usage
- Next by Date: Re: gettimeofday() in hping
- Previous by thread: Re: sort(1) memory usage
- Next by thread: Re: sort(1) memory usage
- Index(es):
Relevant Pages
|
Loading