darcs

Issue 2097 unpull hangs forever

Title unpull hangs forever
Priority bug Status needs-testcase
Milestone 2.8.1 Resolved in
Superseder Nosy List wehr
Assigned To
Topics Regression

Created on 2011-08-23.14:33:50 by wehr, last changed 2012-01-06.19:12:05 by kowey.

Files
File name Uploaded Type Edit Remove
hl7fr-broken.tar.gz wehr, 2011-08-23.14:33:49 application/octet-stream
strace.txt simon, 2011-10-31.20:32:42 text/plain
Messages
msg14686 (view) Author: wehr Date: 2011-08-23.14:33:49
Dear darcs developer,

I've just detected a bug with darcs where unpull hangs indefinitely. Just
untar the attached tarball and then do

---
$ cd hl7fr-broken
$ darcs unpull
Tue Aug 23 11:07:03 CEST 2011  Stefan Wehr <wehr@factisresearch.com>
  * corrected scalatest dependency
Shall I unpull this patch? (1/53)  [ynW...], or ? for more options: y
Tue Aug 23 09:43:44 CEST 2011  Stefan Wehr <wehr@factisresearch.com>
  * AMEND: cleanup package layout
Shall I unpull this patch? (2/53)  [ynW...], or ? for more options: d
# hangs forever
---

Hint: Executing the command

$ rm project/build/Project.scala

leads to the following behavior:

---
$ darcs unpull
Tue Aug 23 11:07:03 CEST 2011  Stefan Wehr <wehr@factisresearch.com>
  * corrected scalatest dependency
Shall I unpull this patch? (1/53)  [ynW...], or ? for more options: y
Tue Aug 23 09:43:44 CEST 2011  Stefan Wehr <wehr@factisresearch.com>
  * AMEND: cleanup package layout
Shall I unpull this patch? (2/53)  [ynW...], or ? for more options: d

darcs failed:  Can't unpull patch without reverting some unrecorded
change.
---

Stefan Wehr

-- 
Dr. Stefan Wehr
Forschung & Entwicklung
factis research GmbH
Merzhauserstraße 177 / 79100 Freiburg i.Br.
Tel.: +49 (0)761 8 96 45 - 75
Fax:  +49 (0)761 8 96 45 - 69
Geschäftsführung: Dr. Harald Fischer, David Leuschner
Handelsregister: Amtsgericht Freiburg HRB 704694
PGP: Schlüssel auf Server wwwkeys.de.pgp.net, ID 0x0B9F5CE4
Attachments
msg14687 (view) Author: kowey Date: 2011-08-23.14:56:45
Ugh.

Potentially good news: I was able to unpull the patch with darcs 2.3.1 and 
darcs 2.4.4.  I don't have a 2.5.x series darcs handy, unfortunately.
msg14744 (view) Author: markstos Date: 2011-10-11.16:24:37
I just stumbled upon this ticket. We ran into this case with darcs 2.5.1 
this week as well, although I don't know if it's exactly the same. In 
our case, we can't share the repo with you.

As a workaround, we ended up making a copy of the repo with "darcs get -
-to-patch", so that the problem patches weren't included, rewriting the 
problem patches, and then pulling in the rest of the patches. Luckily, 
we are in an environment where we can destroy and recreate patches in 
all related repos if we need to. 

We are still using the hashed repo format.
msg14772 (view) Author: markstos Date: 2011-10-13.13:21:48
wehr, I assume you were using a production release of darcs at the time, 
maybe darcs 2.5?

Since this doesn't appear to a regression since darcs 2.5, I'm moving 
the target milestone to 2.10 (although it still would be great to fix 
soon-- I've been struck by it, too!)
msg14788 (view) Author: markstos Date: 2011-10-27.17:12:23
I've reproduced this issue now with darcs 2.5.2, as packaged for Ubuntu 
Linux 11.10. I also have darcs 2.5 and darcs 2.5.1 handy, so I'll test 
with those. 

By turning on "--debug-verbose", I can see that it hangs here:


I'm doing copyFileUsingCache on patches/0000016056-
d1d75df236d5369245b603f7ad913c563702a6500c68ac8b65a6971589e6eecb
msg14789 (view) Author: markstos Date: 2011-10-27.17:22:14
This issue has now been produced with darcs 2.5.1 on FreeBSD. The 
provided repo is in the "darcs-2" format. I'm experiencing the issue in a 
separate project using the "hashed" format. Next I'll test a darcs 2.5 
binary.
msg14790 (view) Author: markstos Date: 2011-10-27.17:31:02
I've now reproduced the issue with darcs 2.5 on FreeBSD, waiting a 
minute or 
so before giving up. 

I also reproduced the bug not being present with a darcs 2.4 binary I 
had 
around-- the patch is oblit'ed immediately. Eric Kow said the 2.4.4 
binary 
was fine. 

So, do now have enough of a test case and a window for a developer to 
use 
"darcs trackdown" to trackdown the problem patch that introduced the 
bug, 
which would be somewhere between 2.4.4 and 2.5.0?

I'm upgrading the priority to "critical". For workflows like mine, 
"oblit" is 
how we "unlaunch" a bad patch to production. Not having this ability to 
undo 
something in a repo is very bad. There are workarounds, but people may 
discover this bug when timing is critical.
msg14791 (view) Author: markstos Date: 2011-10-27.17:58:16
I've also reproduced the bug now with the latest darcs from the screened 
repo:  2.7.3 (+ 300 patches)
msg14792 (view) Author: ganesh Date: 2011-10-27.18:01:11
I think this is critical enough to at least provisionally go on the 2.8 
fix list
msg14803 (view) Author: simon Date: 2011-10-31.20:16:34
With darcs 2.5.2 and 2.7.98.1, I can reproduce the hang with this tarball. 
But if I first revert -a, obliterate (modern name for unpull) completes 
successfully.
msg14804 (view) Author: simon Date: 2011-10-31.20:32:42
Here's strace output when it hangs. I'm not sure what it's telling us.
Attachments
msg14805 (view) Author: markstos Date: 2011-11-01.18:27:20
> With darcs 2.5.2 and 2.7.98.1, I can reproduce the hang with this
> tarball.  But if I first revert -a, obliterate (modern name for unpull)
> completes successfully.

In my repo, I was able to reproduce the same workaround: If I first
reverted the changes, oblit finishes immediately. The target patch to
oblit and the unrecorded changes were in the same subdirectory, but they
shared no files in common.

To follow-up on this angle, I created a test script to check to see if
oblit *always* hangs in the face of unrecorded changes. That script
is/was here, and shows that this is not the case... there must be an
additional factor that our repos have in common that we are unaware of:

 http://hpaste.org/53403

In related news, bsrk used trackdown to find the patch that introduced
the issue. He believes it is the named:
"A new implementation of PatchSet and its operations."
The timestamp and author are:

 May  8 14:49:15 IST 2010  Petr Rockai <me@mornfall.net>

However, this patch is now so far back in history that it can't be
easily unrecorded or rolled back to address the regression. Plus, I
heard on #darcs that the older PatchSet code may have it's own bugs.

So, it seems like the way forward is to continue to drill into this
patch and clues we have see about devising a fix.

One possibility: It may be easier to try fixing this first in the 2.5
branch, which also has the bug, but has a lot less history between HEAD
and the problem patch than the 2.8 branch does.
msg14971 (view) Author: owst Date: 2012-01-06.12:56:24
I did some preliminary investigation of this last night, from what I can
tell so far, you originally did something like:

* removed a bunch of files
* reverted the removes
* instead, darcs mv'd them, and made a fair few other changes, recording
a patch.
* made another patch fiddling with some unrelated files.
* added a simple oneline change patch - which you're trying to obliterate.

What appears to be happening is Darcs is hanging in a function named
mergeThem (a function to merge two Patchsets - the tentative repo and
the unrevert patch bundle (~ line 777 in Darcs/Repository/Internal.hs)),
when trying to remove the to-be-obliterated patch out of the unrevert
context (I'm not at home at the moment - this detail may not be quite
correct :-)).

Darcs seemingly hangs (although, it's not really hanging) when trying to
merge the unrevert patch (the original rm files) with the
mv-and-other-stuff patch, in the mergeFL function (called at ~line 319
in Darcs/Patch/Depends.hs).
With the limited investigation I did, I noticed that Darcs is doing a
lot of commutation, and getting slower as it does each. Since the
unrevert patch and the mv-and-other-stuff patch will conflict (the rm
and the mv of the same file conflict) I imagine there's some Conflictor
fun going on.

That's as far as I've got so far, but wanted to let you know my
progress, and that someone is finally looking at this!
msg14972 (view) Author: owst Date: 2012-01-06.12:58:14
Oops, I hit submit too soon!

All that said, I attempted to create a script to generate a test repo,
using steps similar to those you (probably) took, and I couldn't get the
hang behaviour to re-occur. So, there must be something particular about
the other changes you made in addition to the file moves, which I'll
need to take a closer look at.
History
Date User Action Args
2011-08-23 14:33:50wehrcreate
2011-08-23 14:56:46koweysetpriority: bug
topic: + Regression
messages: + msg14687
2011-10-11 16:24:38markstossetmessages: + msg14744
2011-10-13 12:26:46markstossetmilestone: 2.8.0
2011-10-13 13:21:49markstossetmessages: + msg14772
milestone: 2.8.0 -> 2.10.0
2011-10-27 17:12:24markstossetmessages: + msg14788
2011-10-27 17:22:15markstossetmessages: + msg14789
2011-10-27 17:31:02markstossetmessages: + msg14790
2011-10-27 17:58:17markstossetmessages: + msg14791
2011-10-27 18:01:11ganeshsetmessages: + msg14792
milestone: 2.10.0 -> 2.8.0
2011-10-31 20:16:34simonsetmessages: + msg14803
2011-10-31 20:32:44simonsetfiles: + strace.txt
messages: + msg14804
2011-11-01 18:27:22markstossetmessages: + msg14805
2012-01-01 23:18:30ganeshsetmilestone: 2.8.0 -> 2.8.1
2012-01-06 12:56:27owstsetstatus: unknown -> needs-reproduction
messages: + msg14971
2012-01-06 12:58:15owstsetmessages: + msg14972
2012-01-06 19:12:05koweysetstatus: needs-reproduction -> needs-testcase