That seems to work here:
gwern@localhost:2249~/foo>time (head -c 10179869184 /dev/zero > bigtempfile) &&
time (darcs init --darcs-2) && time (darcs add bigtempfile) && time (darcs whats -s)
(; head -c 10179869184 /dev/zero > bigtempfile; ) 1.52s user 37.22s system 15%
cpu 4:07.02 total
(; darcs init --darcs-2; ) 0.00s user 0.01s system 0% cpu 2.590 total
(; darcs add bigtempfile; ) 0.00s user 0.00s system 0% cpu 1.997 total
A ./bigtempfile
(; darcs whats -s; ) 2.03s user 4.43s system 7% cpu 1:27.42 total
(I reduced the file size because of ulimit considerations here.)
---
I find myself wondering - if the slowdown is due to the double memchr loop over
the entire file concerned (in the worst case, it's a huge text file and memchr
doesn't bail out early, so the entire file gets scanned front to back, front to
back), then what can we do to optimize it? The obvious solution is to just scan
some fixed bytes (or percentage), ie:
return !!(memchr(s, 0, 1024) || memchr(s, 26, 1024));
or maybe
return !!(memchr(s, 0, (len / 10) || memchr(s, 26, (len / 10)));
This would probably cause us to miss some binary files and interpret them as
text files, but:
A) Is that really so bad? (I'm serious here, I don't know how bad it is to
mistakenly interpret a binary file as being text)
B) How many binary files don't have the magical characters anywhere near the
beginning? Is there any smarter definition than just 'neither \0 or ^Z appear
anywhere in the file'?
The timing for (len/10) is:
gwern@localhost:2232~/foo>time darcs whatsnew -s
[12:55PM]
A ./bigtempfile
darcs whatsnew -s 1.79s user 3.67s system 8% cpu 1:01.55 total
Unfortunately, this change seems to fail the merging_newlines, an add test,
either-dependency.sh test, and it seems whatsnew.pl (but I'm not sure they're
supposed to pass or if failures are bad).
Looking at the memchr man page, I see that there is a reverse memchr, 'memrchr'.
I don't know much about low-level stuff, but it strikes me as possible that if
we changed the definition to
return !!(memchr(s, 0, len) || memrchr(s, 26, len));
we'd get better cache locality - we'd scan forward through the long ByteString,
and then we'd reverse course and scan backwards, which allows the cached end of
the string to be reused, instead of having to restart at the beginning and
re-reading the entire darn thing! Using reverse memchr on the dummy repo I gave
timings for above gives the result:
gwern@localhost:2220~/foo>time darcs whatsnew -s
[12:46PM]
A ./bigtempfile
darcs whatsnew -s 1.63s user 3.55s system 10% cpu 50.062 total
Which is a noticeable improvement, but I fear it caused some Perl tests to fail
:( Combining memrchr with the (len/10) change brings it down to 47s, which
doesn't surprise me - looking at top shows that darcs is not using much memory
or CPU, so that leaves IO-bound stuff.
A final suggestion is more difficult, since I'm not familiar with C: instead of
looping over the string and matching against a single character, surely there
must be some way to loop over the string once, matching against either \0 or ^Z;
this would seem to me to be more efficient than looping twice (since we're
interested in performance on strings that obviously blow all the caches). |