I've come up with a hideous new optimization:
gwern@localhost:2234~/darcs.net_2>new
[ 2:59PM]
hunk ./src/fpstring.c 60
- return !!(memchr(s, 0, len) || memchr(s, 26, len));
+#ifdef _WIN32
+ return !!(memchr(s, 26, (len/10)));
+#else
+ return !!(memchr(s, 0, (len/10)));
+#endif
which gives a timing of
gwern@localhost:2231~/foo>time darcs whatsnew -s
[ 2:55PM]
A ./bigtempfile
darcs whatsnew -s 1.64s user 3.92s system 10% cpu 55.468 total
Of course, on the one hand, this is quite a nice speed boost for a nigh-trivial
change; on the other hand this change would seem to imply that a Linux darcs
could fail to understand a Windows binary and conversely. And on the gripping
hand, I don't see any new test suite failures for this change in particular, so
the 'specification', as it were, allows for this change.
No doubt there are further better ways to do it; I was discussing this on #c,
and I heard:
14:45 <@twkm> gwern: sounds horrible. you could dump the memchr's and use your
own loop, though that might miss optimizations available. or
go assembly -- to gain access to that optimal instruction sequence
-- with the loop as a fall-back, for systems without
assembly yet. |