darcs

Issue 814 darcs whatsnew -s does too much work

Title darcs whatsnew -s does too much work
Priority bug Status given-up
Milestone Resolved in
Superseder Nosy List darcs-devel, dmitry.kurochkin, gwern, kowey, thorkilnaur, tommy
Assigned To
Topics Performance, ThePendingPatch

Created on 2008-04-25.04:55:21 by dagit, last changed 2017-07-30.23:22:13 by gh.

Files
File name Uploaded Type Edit Remove
make-whatsnew-lazier.dpatch dagit, 2008-04-25.18:14:18 text/x-darcs-patch
Messages
msg4328 (view) Author: dagit Date: 2008-04-25.04:55:20
The following commands work on OS X, and cause darcs to read bigtempfile before
telling you that it is to be added.

For other unices, use --bytes= instead of -c, the --darcs-2 to init doesn't
matter, but I wanted to demonstrate that darcs-2 repos do not fix this issue.

time head -c 17179869184 /dev/zero > bigtempfile
darcs init --darcs-2
darcs add bigtempfile
darcs whats -s

You'll notice this takes much longer than it should.  I'm pretty sure this is
because adding the file puts it pending and the summary needs to know if this is
a binary or non-binary patch.  Determining that a file is not binary requires
reading the whole file in smart_diff and diff_files because has_bin calls
is_funky to look at every char in the file checking for characters that are
typical in binary files.
msg4338 (view) Author: gwern Date: 2008-04-25.16:59:22
That seems to work here:

gwern@localhost:2249~/foo>time (head -c 10179869184 /dev/zero > bigtempfile) &&
time (darcs init --darcs-2) && time (darcs add bigtempfile) && time (darcs whats -s)
(; head -c 10179869184 /dev/zero > bigtempfile; )  1.52s user 37.22s system 15%
cpu 4:07.02 total
(; darcs init --darcs-2; )  0.00s user 0.01s system 0% cpu 2.590 total
(; darcs add bigtempfile; )  0.00s user 0.00s system 0% cpu 1.997 total
A ./bigtempfile
(; darcs whats -s; )  2.03s user 4.43s system 7% cpu 1:27.42 total

(I reduced the file size because of ulimit considerations here.)

---

I find myself wondering - if the slowdown is due to the double memchr loop over
the entire file concerned (in the worst case, it's  a huge text file and memchr
doesn't bail out early, so the entire file gets scanned front to back, front to
back), then what can we do to optimize it? The obvious solution is to just scan
some fixed bytes (or percentage), ie:

 return !!(memchr(s, 0, 1024) || memchr(s, 26, 1024));

or maybe

 return !!(memchr(s, 0, (len / 10) || memchr(s, 26, (len / 10)));

This would probably cause us to miss some binary files and interpret them as
text files, but:

A) Is that really so bad? (I'm serious here, I don't know how bad it is to
mistakenly interpret a binary file as being text)
B) How many binary files don't have the magical characters anywhere near the
beginning? Is there any smarter definition than just 'neither \0 or ^Z appear
anywhere in the file'?

The timing for (len/10) is: 

gwern@localhost:2232~/foo>time darcs whatsnew -s                               
                                                    [12:55PM]
A ./bigtempfile
darcs whatsnew -s  1.79s user 3.67s system 8% cpu 1:01.55 total

Unfortunately, this change seems to fail the merging_newlines, an add test,
either-dependency.sh test, and it seems whatsnew.pl (but I'm not sure they're
supposed to pass or if failures are bad).

Looking at the memchr man page, I see that there is a reverse memchr, 'memrchr'.
I don't know much about low-level stuff, but it strikes me as possible that if
we changed the definition to

    return !!(memchr(s, 0, len) || memrchr(s, 26, len));

we'd get better cache locality - we'd scan forward through the long ByteString,
and then we'd reverse course and scan backwards, which allows the cached end of
the string to be reused, instead of having to restart at the beginning and
re-reading the entire darn thing! Using reverse memchr on the dummy repo I gave
timings for above gives the result:

gwern@localhost:2220~/foo>time darcs whatsnew -s                               
                                                    [12:46PM]
A ./bigtempfile                      
darcs whatsnew -s  1.63s user 3.55s system 10% cpu 50.062 total

Which is a noticeable improvement, but I fear it caused some Perl tests to fail
:( Combining memrchr with the (len/10) change brings it down to 47s, which
doesn't surprise me - looking at top shows that darcs is not using much memory
or CPU, so that leaves IO-bound stuff.

A final suggestion is more difficult, since I'm not familiar with C: instead of
looping over the string and matching against a single character, surely there
must be some way to loop over the string once, matching against either \0 or ^Z;
this would seem to me to be more efficient than looping twice (since we're
interested in performance on strings that obviously blow all the caches).
msg4351 (view) Author: dagit Date: 2008-04-25.18:14:18
I disabled the test in is_funky and found that it was still taking ages to
compute 'whatsnew -s'.  After a fair bit of debugging trying to figure out where
the culprit is, I found that it may be a problem with reading the pending or
what we do with the pending after we read it.

The attached patch passes all the tests on my system, but may not be the correct
way to solve the problem.  Also, I have tried to send this patch twice once to
darcs-devel and once to darcs-users, but both times it was silently ignored. 
I'm not sure why.
Attachments
msg4356 (view) Author: gwern Date: 2008-04-25.19:12:07
I've come up with a hideous new optimization:

gwern@localhost:2234~/darcs.net_2>new                                          
                                                    [ 2:59PM]
hunk ./src/fpstring.c 60
-    return !!(memchr(s, 0, len) || memchr(s, 26, len));
+#ifdef _WIN32
+  return !!(memchr(s, 26, (len/10)));
+#else
+  return !!(memchr(s, 0, (len/10)));
+#endif

which gives a timing of

gwern@localhost:2231~/foo>time darcs whatsnew -s                               
                                                    [ 2:55PM]
A ./bigtempfile            
darcs whatsnew -s  1.64s user 3.92s system 10% cpu 55.468 total


Of course, on the one hand, this is quite a nice speed boost for a nigh-trivial
change; on the other hand this change would seem to imply that a Linux darcs
could fail to understand a Windows binary and conversely. And on the gripping
hand, I don't see any new test suite failures for this change in particular, so
the 'specification', as it were, allows for this change.

No doubt there are further better ways to do it; I was discussing this on #c,
and I heard:

14:45 <@twkm> gwern: sounds horrible.  you could dump the memchr's and use your
own loop, though that might miss optimizations available.  or 
              go assembly -- to gain access to that optimal instruction sequence
-- with the loop as a fall-back, for systems without 
              assembly yet.
msg4649 (view) Author: kowey Date: 2008-05-14.08:35:22
Folks, what's the relationship between this bug and issue711?  Duplicate?
msg4703 (view) Author: gwern Date: 2008-05-14.16:54:43
kowey, I don't think it's a dupe. The bug there is about 'whatsnew -sl'
performing badly in directories with lots of uncontrolled files, right? This bug
is about 'whatsnew -s' (no -l) performing badly in directories with controlled,
but big, files.

"`darcs2 whatsnew -s' only takes a small fraction of a second.
As does `darcs2 whatsnew' without any options."
msg4704 (view) Author: kowey Date: 2008-05-14.16:56:42
Thanks for clarifying that!
msg7054 (view) Author: thorkilnaur Date: 2009-01-12.10:45:01
It appears that further work is needed on this issue: There is a patch attached 
that has not been applied, as far as I can see. So I need a volunteer to look into 
this, verifying whether the patch (or a modified version of it: Things have changed 
in the meantime) solves the reported performance problem. Alternatively, assuming 
the problem can be reproduced, to investigate further how to solve it.

Thanks and regards
Thorkil
msg7057 (view) Author: kowey Date: 2009-01-12.10:59:56
Marking target-2.3 (which is perhaps an implicit way of saying "we should work
on this for the next darcs hacking sprint")
msg8629 (view) Author: dagit Date: 2009-09-02.06:07:03
This is very much related to the bad performance of "initial imports".

What is really happening here is that when we read the pending we try to eagerly
construct the pending patches.  This is why the attached patch works.  It adds
an explicit short circuit so that the pending patches are not constructed. 
While this may very well solve the problem in this case, we should look at a
more fundamental solution.

See this thread for discussion:
http://lists.osuosl.org/pipermail/darcs-users/2009-August/021076.html

In summary, we could apply the patch here (as a bandaid, assuming the summary is
still correct), but even if we do that we need to redesign the reading of
pending because it has wider impact than whatsnew.

Also, to answer Eric's question:  Yes, I suspect issue711 is related.  I haven't
spent time debugging that issue, but if it is also constructing pending patches
then the problem is related.
msg8631 (view) Author: kowey Date: 2009-09-02.07:18:46
Jason: It may be a good idea to send the patch to darcs-users so that it gets
noticed
msg8635 (view) Author: kowey Date: 2009-09-02.14:21:09
Jason: please disregard my last suggestion to send the patch to the mailing list
(I wasn't paying enough attention).  

I tend to think we should just accept the cost of a "no
bandaids/shortcuts/hacks" attitude.  Not perfectionism but trying to get things
right more or less right the first time.  We should fix performance by making
things cleaner and better conceived, not by twiddling knobs or by adding complexity.

Perhaps you could create a new ticket explaining the fundamental problem with
our use of the pending patch problem and your proposed solution?  This ticket
can then point to that.

Thanks!
History
Date User Action Args
2008-04-25 04:55:21dagitcreate
2008-04-25 16:59:24gwernsetstatus: unread -> unknown
nosy: + gwern
messages: + msg4338
2008-04-25 18:14:20dagitsetfiles: + make-whatsnew-lazier.dpatch
nosy: tommy, beschmi, dagit, gwern
messages: + msg4351
2008-04-25 19:12:08gwernsetnosy: tommy, beschmi, dagit, gwern
messages: + msg4356
2008-05-14 08:35:24koweysettopic: + Performance
nosy: + kowey
messages: + msg4649
2008-05-14 16:54:45gwernsetnosy: tommy, beschmi, kowey, dagit, gwern
messages: + msg4703
2008-05-14 16:56:44koweysetnosy: tommy, beschmi, kowey, dagit, gwern
messages: + msg4704
2009-01-12 10:45:04thorkilnaursetstatus: unknown -> needs-reproduction
nosy: + dmitry.kurochkin, simon, thorkilnaur
messages: + msg7054
2009-01-12 11:00:08koweysettopic: + Target-2.3
nosy: tommy, beschmi, kowey, dagit, simon, thorkilnaur, gwern, dmitry.kurochkin
messages: + msg7057
2009-08-06 16:04:38koweysettopic: + Target-2.4, - Target-2.3
nosy: tommy, beschmi, kowey, dagit, simon, thorkilnaur, gwern, dmitry.kurochkin
2009-08-06 21:03:11adminsetnosy: - beschmi
2009-08-11 00:11:07adminsetnosy: - dagit
2009-08-25 17:38:51adminsetnosy: + darcs-devel, - simon
2009-08-27 14:21:21adminsetnosy: tommy, kowey, darcs-devel, thorkilnaur, gwern, dmitry.kurochkin
2009-09-02 06:07:07dagitsetnosy: + dagit
messages: + msg8629
2009-09-02 07:18:48koweysetnosy: tommy, kowey, darcs-devel, dagit, thorkilnaur, gwern, dmitry.kurochkin
messages: + msg8631
2009-09-02 14:21:15koweysettopic: + ThePendingPatch, - Target-2.4
nosy: tommy, kowey, darcs-devel, dagit, thorkilnaur, gwern, dmitry.kurochkin
messages: + msg8635
2009-09-29 06:48:29dagitsetnosy: - dagit
2017-07-30 23:22:13ghsetstatus: needs-reproduction -> given-up