darcs

Issue 2541 whatsnew -l much slower in darcs 2.12.5

Title whatsnew -l much slower in darcs 2.12.5
Priority bug Status needs-reproduction
Milestone Resolved in
Superseder Nosy List gh, qdunkan
Assigned To
Topics Performance

Created on 2017-07-31.17:43:46 by gh, last changed 2017-08-10.16:15:12 by gh.

Messages
msg19504 (view) Author: gh Date: 2017-07-31.17:43:42
I've been using the darcs 2.10.1 binary available for OS X, but I
recently tried 2.12.5 and whatsnew -l is much slower.  E.g.:

% time ~src/hs/darcs-2.12.5/dist/build/Darcs/darcs  w -l
...
~src/hs/darcs-2.12.5/dist/build/Darcs/darcs w -l  2.48s user 0.47s
system 101% cpu 2.909 total

% time darcs w -l
...
darcs w -l  0.13s user 0.04s system 96% cpu 0.181 total

Restricting to a single file, e.g. 'darcs w -l X' is still very slow.
Without -l, both old and new are fast.  This is on a medium sized
repo:

% darcs show repo
        Format: hashed, darcs-2
          Root: /Users/elaforge/src/seq/main
      Pristine: HashedPristine
         Cache: thisrepo:/Users/elaforge/src/seq/main,
cache:/Users/elaforge/Library/Caches/darcs, repo:hub.darcs.net:karya
boringfile Pref: boring
Default Remote: hub.darcs.net:karya
   Num Patches: 5692

The slowdown is also visible in the linux version.  On linux, I tried
running with strace, and it seems to be constantly running
mkdir("..cache/patches"), stat(".../.cache/patches"), then
link("_darcs/patches/...", ".cache/patches") -> EEXIST, repeating for
many different patches.  It doesn't seem to get stuck at any time,
just be processing lots of patches.  wc on the strace output shows
130k lines.  So maybe that's related?

Anyone else see this?
msg19505 (view) Author: gh Date: 2017-08-01.21:40:21
I realized that this is not related to the cache system but to the patch
 "resolve issue2138: report conflicting files in whatsnew -s".

Since that patch, running whatsnew with flag -s (or -l, since -l implies
-s) makes Darcs check whether the repository has conflicts. This makes
Darcs read the history of the repository (same as `darcs
mark-conflicts`), because there is currently no cache that stores which
files contain a conflict.

I agree the current situation (having a slow whatsnew -l because we
check for conflicts) is not cool and should be fixed.

Possible fixes I'm thinking about:

1. report conflicting files only when --machine-readable is passed
2. same but with a new flag --show-conflicts
3. implement aforementioned cache (as a file in _darcs). Note that with
such a cache, running whatsnew -s/-l for the first time in a given state
of a repository would be slow anyway; unless we maintain such cache each
time the set of patches of a repository changes (like when we maintain
the patch-index)
msg19506 (view) Author: qdunkan Date: 2017-08-02.20:25:37
Any of 1 2 3 are fine with me.  The simpler #3 might be a bit confusing when 
whatsnew seems to hang but only sometimes, but if --verbose causes it to print 
"recreating conflicts cache" or something then I would have figured out pretty 
quickly what's going on.  And it would be nice to have a --show-conflicts in any 
case (default true is fine), so I can turn it off to avoid even that occasional 
hang.

In the bigger picture, it kind of undermines the usefulness of a summary command 
if it omits things for speed, or if it doesn't but has to be slow, so assuming 
conflicts are valuable summary information, #3 is the only real option.
msg19507 (view) Author: gh Date: 2017-08-03.00:44:08
Well there is another solution:

3bis. Maintain _darcs/conflicts each time a change occurs in the history
of the repository. This file would be maintained as would be the patch
index (when enabled). Indeed, each time a merge occurs in a repository
we do check for conflicts, so it's "just" a matter of storing and
updating this piece of data accordingly.
msg19508 (view) Author: qdunkan Date: 2017-08-03.01:07:23
Right, that would be ideal, unless it's complicated or error-prone.
msg19546 (view) Author: bf Date: 2017-08-10.15:57:19
One obvious idea how to implement this is to extend the patch-index so
that it covers information about conflicts, too. However, I am uncertain
if this is possible: whether a patch touches a certain file is a
property of the patch itself; OTOH, wether it conflicts with another
patch (and which files exactly are conflicting) depends on the context,
too. Commuting a patch A past one with a conflict B might transfer the
conflict from B to A. It would be necessary to update the index in that
case, whereas with what the index currently tracks this would not be
necessary, right?
msg19547 (view) Author: gh Date: 2017-08-10.16:15:11
> One obvious idea how to implement this is to extend the patch-index so

One main reason why this conflict cache should not be part of the
patch-index is that the patch-index is not always enabled. (And it
should stay so for performance sake). On the other hand, the conflict
cache is a piece of data that is always computed after merging patches,
we "just" need to save it on the disk.
History
Date User Action Args
2017-07-31 17:43:46ghcreate
2017-08-01 21:40:23ghsetmessages: + msg19505
2017-08-02 00:20:58ghsettopic: - Hashed
2017-08-02 20:25:39qdunkansetmessages: + msg19506
2017-08-03 00:44:10ghsetmessages: + msg19507
2017-08-03 01:07:24qdunkansetmessages: + msg19508
2017-08-10 15:57:20bfsetmessages: + msg19546
2017-08-10 16:15:12ghsetmessages: + msg19547