Issue 2574 unrecord -a --last=1 can be extremely inefficient

Title unrecord -a --last=1 can be extremely inefficient
Priority Status unknown
Milestone Resolved in
Superseder Nosy List bf
Assigned To

Created on 2018-03-06.12:01:40 by bf, last changed 2018-03-06.13:15:41 by bf.

msg19943 (view) Author: bf Date: 2018-03-06.12:01:38
I tried 'darcs unrecord -a --last=1' on a clone of the darcs repo and it
seems to hang forever. I expected it to unrecord the last patch without
asking and be done with it. So I added --debug to see what's going on.
It is not hanging, just copying patches for no apparent reason, a few at
a time, then furiously calcuates for a few seconds (it uses nearly 100%
cpu all the time), then repeat. It looks as if this starts behind the
last tag.

Here is a slightly annotated and filtered output:

>darcs unrecord -a --last=1 -v --debug 2>&1 | grep copyFileUsingCache
I'm doing copyFileUsingCache on
I'm doing copyFileUsingCache on
I'm doing copyFileUsingCache on

[...5 to 10 second pause...]

I'm doing copyFileUsingCache on
I'm doing copyFileUsingCache on
I'm doing copyFileUsingCache on
I'm doing copyFileUsingCache on
I'm doing copyFileUsingCache on

[...5 to 10 second pause...]

I'm doing copyFileUsingCache on
I'm doing copyFileUsingCache on
I'm doing copyFileUsingCache on
I'm doing copyFileUsingCache on
I'm doing copyFileUsingCache on
I'm doing copyFileUsingCache on
I'm doing copyFileUsingCache on

[...and so on and so on...]

I cannot reproduce this behavior with small test repos. I have removed
_darcs/prefs/{sources,defaultrepo,repos} but that doesn't make the
slightest difference.

I tested with the head and with darcs-2.10.2, same behavior.

This is quite bizarre.
msg19944 (view) Author: bf Date: 2018-03-06.12:30:07
Correction: it does not start behind the last tag. The first patch
mentioned (after the last one that is to be unrecorded) is

| head -3
[resolve issue2410: detect replaces even with tokens of different lengths
Guillaume Hoffmann <guillaumh@gmail.com>**20150217223248
 Ignore-this: 21859ec1e6a7652a68b4409c71dd08cd

I did a 'darcs log' and searched for the title of this patch and found
that tag '2.10.0' depends on it. So I made an experiment. I cloned the
repo up to the tag 2.10.0, then pulled the next tag which is 2.12.0,
then the rest. Then I did the same unrecord -a --last=1 again and this
time it terminated reasonably fast (considering that there are 500
patches since the last, now clean, tag and the inventory gets re-written
completely). When I use the --debug and the copyFileXxx filter it still
shows me 150 patches being copied.
msg19945 (view) Author: bf Date: 2018-03-06.13:15:40
Some additional remarks.

(1) It is obvious that there is absolutely *no* need to touch *any*
patch other than the last one for this operation. The last patch is
needed because its effect must be added to pending.

(2) Even if we unrecord patches that are not at the top of the repo, we
only ever have to read patches from the first match up to the top, so we
can do the necessary commutation (or fail).

(3) Even in interactive mode we don't normally need the patch data, just
the meta data, which are stored in the inventory. The only case where we
have to inspect a patch file is when we pass -v or if the user hits 'v',
or if we need to commute patches.

I don't understand this. We do have the tools in place to read the patch
data on demand. PatchInfoAnd goes to great pains to allow the actual
patch data not be be present. But it seems this feature is not used.

(4) When I look at the output of 'darcs unrecord -a --last=1 --debug' I
see a strange artefact, even in the optimized repo with clean tags:

About to gzFetchFilePS from
Caught exception, now attempt creating cache.
About to gzFetchFilePS from
Caught exception, now attempt creating cache.
In fetchFileUsingCachePrivate I'm directly grabbing file contents from
In fetchFileUsingCachePrivate I'm directly grabbing file contents from
No sources from which to fetch file

in spite of the fact that

>find -name

Is this to be expected?

(5) The debug output tells me

  Beginning writing inventory...

before all the copyFileUsingCache messages. And then, later on, I get
*all of this again*. So we are not only accessing (or at least
copying/hardlinking) patch files unnecessarily, we do it twice.
Date User Action Args
2018-03-06 12:01:40bfcreate
2018-03-06 12:30:09bfsetmessages: + msg19944
2018-03-06 13:15:41bfsetmessages: + msg19945