darcs

Issue 2273 unrevert context with mergers can cause major slowdowns

Title unrevert context with mergers can cause major slowdowns
Priority Status needs-diagnosis/design
Milestone Resolved in
Superseder Nosy List ganesh, markstos
Assigned To
Topics Performance

Created on 2012-12-10.17:48:26 by markstos, last changed 2017-10-01.12:14:42 by bfrk.

Messages
msg16387 (view) Author: markstos Date: 2012-12-10.17:48:24
1. Summarise the issue (what were doing, what went wrong?)

I used "darcs amend" to add a one line change to a patch I just created. 
Perhaps was very good while selecting the changes to include and the 
patch to amend. However, performance became extemely slow after 
selecting the final "Yes".

2. What behaviour were you expecting instead?

Using "darcs amend" to add a single line to a patch just created should 
by very fast. 

3. What darcs version are you using? (Try: darcs --exact-version)

darcs 2.9.6, although darcs 2.8.0 displays the same behavior. 

4. What operating system are you running?

Ubuntu Linux 12.04 on EC2.

By using "darcs amend --verbose", I can get a little more insight into 
the slowness. 

###
Disabling progress reports...
Shall I add this change? (1/1)  [ynW...], or ? for more options: y
Reenabling progress reports.
Adding changes to pending...
...
Adjusting the context of the unrevert changes...
Removing 1 patches in removeFromUnrevertContext!

Then it rather slowly gets into a sequence of calls to 
copyFileUsingCache (why are these needed for amend-record):

I'm doing copyFileUsingCache on patches/0000000631-
b3a3c437fb66d89e171a436150fb962677a450be1e5db11b4d351072dfc065ab

After about 6 patches, it hangs for about 10 minutes before I give up 
and kill it. 

Unfortunately, the repo in question can't be shared.
msg16389 (view) Author: markstos Date: 2012-12-10.20:32:41
I found that I could "fix" this issue by doing a "darcs get" on the repo, 
and starting to use the new repo instead. 

After getting the same hang with "darcs unrecord" in the old repo, I notice 
that in the new repo, the same command is not even attempting to call 
copyFileUsingCache on patches, just on inventories. That's more what I 
would expect. 

I wonder where darcs was going off the rails previously?

Here's an example output from new, working run:

###

Reenabling progress reports.
About to write out (potentially) modified patches...
Beginning defining set of chosen patches
Done defining set of chosen patches
Adding changes to pending...
Removing changes from tentative inventory...
in writeTentativeInventory...
Beginning writing inventory
Writing hash file to inventories
Done writing inventory
still in writeTentativeInventory...
I'm doing copyFileUsingCache on inventories/0000000283-
3fa98a41579624db2d2312cf468d68f83a86d90faa2695b29
Finalizing changes...
Optimizing the inventory...
in writeTentativeInventory...
Beginning writing inventory
Beginning optimizing inventory
Done optimizing inventory
Writing hash file to inventories
Done writing inventory
still in writeTentativeInventory...
I'm doing copyFileUsingCache on inventories/0000311776-
1b9295168ede9bc659e7d6ba829a0fc9df311aad3cca3953d
Writing new pending:  _darcs/patches/pending.new
Finished writing new pending:  _darcs/patches/pending.new
Done finalizing changes...
I'm doing copyFileUsingCache on inventories/0000648963-
361ce256bf05ef858bb15926148522fb8e8c4401a200c237c
I'm doing copyFileUsingCache on inventories/0003150447-
7ef4ce9037772a43602cda39a0db0e83b04add4bca7b5d264
I'm doing copyFileUsingCache on inventories/0000648963-
361ce256bf05ef858bb15926148522fb8e8c4401a200c237c
I'm doing copyFileUsingCache on inventories/0003150447-
7ef4ce9037772a43602cda39a0db0e83b04add4bca7b5d264
About to create patch index...
Patch index created
Finished unrecording.
Beginning identifying repository .
Done identifying repository .

#####

And here's how the derailed version when, before I cancel'ed it:

Reenabling progress reports.
About to write out (potentially) modified patches...
Beginning defining set of chosen patches
Done defining set of chosen patches
Adding changes to pending...
Adjusting the context of the unrevert changes...
Removing 1 patches in removeFromUnrevertContext!
Reading patch file: Fri Dec  7 15:23:07 EST 2012  Mark Stosberg 
<mark@summersault.com>
  * obfu'ed 1
I'm doing copyFileUsingCache on patches/0000001792-
78da6ee81faff21818ec2899e180d6ab966caa23945bf305050f2
Reading patch file: Fri Dec  7 15:36:22 EST 2012  Mark Stosberg 
<mark@summersault.com>
  * obfu'ed 2
I'm doing copyFileUsingCache on patches/0000002276-
0678cd86d24102d9095e170c8e54f6251afd14a0d14ede0f00a6e
Reading patch file: Fri Dec  7 15:46:22 EST 2012  Mark Stosberg 
<mark@summersault.com>
  * obfu'ed 3
I'm doing copyFileUsingCache on patches/0000000251-
b2427b2831533552575d446ca5d71cce52fd1697a8fbead42c13c
Reading patch file: Fri Dec  7 15:55:17 EST 2012  Mark Stosberg 
<mark@summersault.com>
  * obfu'ed 4
I'm doing copyFileUsingCache on patches/0000001626-
c9c1325a78f8145f084d7daa0106b9d0e0a456014891039271387
^CwithSignalsHandled: Interrupted!
msg16390 (view) Author: markstos Date: 2012-12-10.20:42:26
Looks like the culprit was 104 mergers in _darcs/patches/unrevert.

The workaround is:

 mv _darcs/patches/unrevert _darcs/patches/unrevert.bak

Now, could darcs do better at detecting or avoiding this situation? 

I can't imagine when I would ever want to "unrevert" complex merger. 

I rarely use the "unrevert" feature, and would have happy with an option 
to express that I don't care if that data is preserved.
msg19678 (view) Author: bfrk Date: 2017-10-01.12:14:39
Unrevert has these issues because it is implemented as a patch bundle.
This was probably the easiest way to do it but I think we can do better.
Suppose we store it as a normal patch (but still separate from the
normal patch inventories). I like to picture it as hanging off the
working state:

  origin --r--> recorded --p--> pending --w--> working --u--> unrevert

where r are our recorded patches, p is the pending patch and u is the
unrevert patch. In an ideal world we could adapt u whenever we change w,
but w is not under our control, so we can't do that. So we have to
remember -- along with u itself -- as much of w as is needed to apply u.
This is what is called a "contexted patch" in the camp paper and I
believe is the same (modulo named prim patches which we don't have yet)
as what we have as Darcs.Patch.V2.Non. This extra context must be
adapted whenever we change r or p. Writing c:u for the contexted
unrevert, we arrive at

  origin --r--> recorded --p--> pending --w--> working
                                        \
                                         --c:u--> unrevert

and the unrevert operation is then a merge of c:u with w, resulting in
some c':u' (we throw away the other branch) which we can apply on top of
w. Of course the merged c':u' can have conflicts, and I think unrevert
should support the usual conflict options (allow, no-allow, mark) with
--no-allow-conflicts as a sensible default.

How does this solve the issue at hand? Well, we never store conflicts in
the context of u (remember it is a list of prim patches, like the
pending patch p). When we calculate a new context (for u, after
modifying r or p, or for a reduced u', after merging with w) we only
need to consider the effects of any patch involved, which we add or
subtract and then simplify (re-group and coalesce) like we do with the
pending patch. At least that is what I think at the moment...

A nice side-effect is that  we get rid the "this will make unrevert
impossible, proceed?" warning/prompt.
History
Date User Action Args
2012-12-10 17:48:26markstoscreate
2012-12-10 17:49:03markstossettopic: + Performance
2012-12-10 20:32:42markstossetmessages: + msg16389
2012-12-10 20:42:27markstossetmessages: + msg16390
2012-12-10 20:47:38ganeshsetstatus: unknown -> needs-diagnosis/design
nosy: + ganesh
title: darcs amend: sometimes very slow due to calls to copyFileUsingCache -> unrevert context with mergers can cause major slowdowns
2017-10-01 12:14:42bfrksetmessages: + msg19678