darcs

Issue 2574 unrecord -a --last=1 can be extremely inefficient

Title unrecord -a --last=1 can be extremely inefficient
Priority Status given-up
Milestone Resolved in
Superseder Nosy List bf
Assigned To
Topics

Created on 2018-03-06.12:01:40 by bf, last changed 2018-04-19.06:44:16 by bf.

Messages
msg19943 (view) Author: bf Date: 2018-03-06.12:01:38
I tried 'darcs unrecord -a --last=1' on a clone of the darcs repo and it
seems to hang forever. I expected it to unrecord the last patch without
asking and be done with it. So I added --debug to see what's going on.
It is not hanging, just copying patches for no apparent reason, a few at
a time, then furiously calcuates for a few seconds (it uses nearly 100%
cpu all the time), then repeat. It looks as if this starts behind the
last tag.

Here is a slightly annotated and filtered output:

>darcs unrecord -a --last=1 -v --debug 2>&1 | grep copyFileUsingCache
I'm doing copyFileUsingCache on
patches/0000002215-acc0868acc13d27ef4db7e0fd6d6f45efa09f7dd389874f7c1243638cf511389
I'm doing copyFileUsingCache on
inventories/0000220070-6ef010a955c38fc4301787092979994bafd366eb50152b66e089deff649d35da
I'm doing copyFileUsingCache on
patches/0000002054-b5af519485c5541ea7bb13a8811b79adc31132ccbda59e42c3d8ab6d2f4ae3e7

[...5 to 10 second pause...]

I'm doing copyFileUsingCache on
patches/0000000312-4e990bdb6425573aa79a73e9a57b968d4bf29a2fc272cc4b242e246539690369
I'm doing copyFileUsingCache on
patches/0000000220-8819affe17963ea8ec7556a59970747985fef8db59f00c9ec3cb31f146ebf182
I'm doing copyFileUsingCache on
patches/0000000608-dc8c6eff5a6f95f2c4f11c82af670d1cdb3353cd225e4e57ae7622219faa4081
I'm doing copyFileUsingCache on
patches/0000002770-71b6e420f61b98ce3910a450aa690fc821e6ea858195d6e2574561ee338ca3f6
I'm doing copyFileUsingCache on
patches/0000000702-53c8065c7bb035387c99bbd875ca65f59f6e1ffdc55124fd2f341842aa026b5e

[...5 to 10 second pause...]

I'm doing copyFileUsingCache on
patches/0000000701-89a0a3c9142a1575eb30385612e410caa7d19aaa2322f7471d416366dc583f3f
I'm doing copyFileUsingCache on
patches/0000004438-e5a94deba308abe8340e8642f984aec13685067e1ac12a53d7af4bd4638d4344
I'm doing copyFileUsingCache on
patches/0000000753-dee09dca901c3b22f5f276c7c50349f07061054f1a1fb4f937e68ebc759827aa
I'm doing copyFileUsingCache on
patches/0000001144-25a2776459af7ef92b7f094ee5bbb312fef7848fcbcafeb2282880ad3db6ac10
I'm doing copyFileUsingCache on
patches/0000000451-79da4982a56ea41b1094d52b31667c191444d9ac280e711c44187fc838025b91
I'm doing copyFileUsingCache on
patches/0000000514-b3f9ac168214ec2856a3e5f202402488b5752fb9a7d56f7b5b5dddd0b77cd419
I'm doing copyFileUsingCache on
patches/0000001535-1bd00ef1c4315630721ec9f60d2091e073caf51ea5d6aa1d5e8a10802456c9f1

[...and so on and so on...]

I cannot reproduce this behavior with small test repos. I have removed
_darcs/prefs/{sources,defaultrepo,repos} but that doesn't make the
slightest difference.

I tested with the head and with darcs-2.10.2, same behavior.

This is quite bizarre.
msg19944 (view) Author: bf Date: 2018-03-06.12:30:07
Correction: it does not start behind the last tag. The first patch
mentioned (after the last one that is to be unrecorded) is

zcat
_darcs/patches/0000002054-b5af519485c5541ea7bb13a8811b79adc31132ccbda59e42c3d8ab6d2f4ae3e7
| head -3
[resolve issue2410: detect replaces even with tokens of different lengths
Guillaume Hoffmann <guillaumh@gmail.com>**20150217223248
 Ignore-this: 21859ec1e6a7652a68b4409c71dd08cd

I did a 'darcs log' and searched for the title of this patch and found
that tag '2.10.0' depends on it. So I made an experiment. I cloned the
repo up to the tag 2.10.0, then pulled the next tag which is 2.12.0,
then the rest. Then I did the same unrecord -a --last=1 again and this
time it terminated reasonably fast (considering that there are 500
patches since the last, now clean, tag and the inventory gets re-written
completely). When I use the --debug and the copyFileXxx filter it still
shows me 150 patches being copied.
msg19945 (view) Author: bf Date: 2018-03-06.13:15:40
Some additional remarks.

(1) It is obvious that there is absolutely *no* need to touch *any*
patch other than the last one for this operation. The last patch is
needed because its effect must be added to pending.

(2) Even if we unrecord patches that are not at the top of the repo, we
only ever have to read patches from the first match up to the top, so we
can do the necessary commutation (or fail).

(3) Even in interactive mode we don't normally need the patch data, just
the meta data, which are stored in the inventory. The only case where we
have to inspect a patch file is when we pass -v or if the user hits 'v',
or if we need to commute patches.

I don't understand this. We do have the tools in place to read the patch
data on demand. PatchInfoAnd goes to great pains to allow the actual
patch data not be be present. But it seems this feature is not used.

(4) When I look at the output of 'darcs unrecord -a --last=1 --debug' I
see a strange artefact, even in the optimized repo with clean tags:

About to gzFetchFilePS from
"/home/ben/src/darcs/hm2/_darcs/inventories/0000153295-4ebfa2bc72876474b0f5a6693570fa6b0a8020022f7d48d752d09f63912911f0"
Caught exception, now attempt creating cache.
About to gzFetchFilePS from
"/home/ben/.cache/darcs/inventories/4e/0000153295-4ebfa2bc72876474b0f5a6693570fa6b0a8020022f7d48d752d09f63912911f0"
Caught exception, now attempt creating cache.
In fetchFileUsingCachePrivate I'm directly grabbing file contents from
/home/ben/src/darcs/hunkmove/_darcs/inventories/0000153295-4ebfa2bc72876474b0f5a6693570fa6b0a8020022f7d48d752d09f63912911f0
In fetchFileUsingCachePrivate I'm directly grabbing file contents from
./_darcs/inventories/0000153295-4ebfa2bc72876474b0f5a6693570fa6b0a8020022f7d48d752d09f63912911f0
No sources from which to fetch file
`0000153295-4ebfa2bc72876474b0f5a6693570fa6b0a8020022f7d48d752d09f63912911f0'

in spite of the fact that

>find -name
0000153295-4ebfa2bc72876474b0f5a6693570fa6b0a8020022f7d48d752d09f63912911f0
./_darcs/inventories/0000153295-4ebfa2bc72876474b0f5a6693570fa6b0a8020022f7d48d752d09f63912911f0

Is this to be expected?

(5) The debug output tells me

  Beginning writing inventory...

before all the copyFileUsingCache messages. And then, later on, I get
*all of this again*. So we are not only accessing (or at least
copying/hardlinking) patch files unnecessarily, we do it twice.
msg20120 (view) Author: bf Date: 2018-04-19.06:44:14
I am not able to reproduce the original problem. I guess I made the
clone with a broken test version of darcs or something like that.
History
Date User Action Args
2018-03-06 12:01:40bfcreate
2018-03-06 12:30:09bfsetmessages: + msg19944
2018-03-06 13:15:41bfsetmessages: + msg19945
2018-04-19 06:44:16bfsetstatus: unknown -> given-up
messages: + msg20120