darcs

Issue 2093 darcs record freezes and cannot be killed

Title darcs record freezes and cannot be killed
Priority bug Status given-up
Milestone Resolved in
Superseder Nosy List mulander, nccb
Assigned To
Topics Hashed

Created on 2011-07-31.14:26:32 by nccb, last changed 2017-07-31.00:36:27 by gh.

Messages
msg14619 (view) Author: nccb Date: 2011-07-31.14:26:31
I have a small darcs repository (hashed, darcs-2), that is only a
megabyte or so, with a mix of text (short source code files) and some
small images.  Quite frequently (maybe 1 in 2, 1 in 3), darcs record on
this repository hangs.  I do "darcs rec", then press a, enter a patch
name, don't add a long name, and then it hangs.  top shows darcs doing
nothing.  But then darcs will not respond to Ctrl-C, or Ctrl-Z, it
cannot be killed using kill, and even repeated kill -9 will not kill the
process!  This means I have to reopen that terminal window, and the
process stays on my system until I reboot.  I get this behaviour on
darcs 2.4.4 (my OS-installed darcs), and darcs 2.5.2 (my cabal-installed
darcs).  My OS is Ubuntu 10.10 running inside VirtualBox (but I haven't
noticed any darcs problems before).  I have recorded to other
repositories chained from this one on other machines without any issues.
 (If it might affect things, I darcs get-ted this repository other a
mounted sshfs connection from a Solaris machine; it worked fine on the
machine that I pushed to the Solaris machine from.)  If it would help, I
am happy to privately mail a gzip of the repository to a darcs developer.
msg14620 (view) Author: dixiecko Date: 2011-07-31.17:48:46
Hi! Can you try extract "strace" output out of the situation when you
observe frozen darcs again?

Use strace command from the other terminal window:
strace -p <PID_OF_DARCS> -o darcs.strace

It writes the system calls from the darcs process into darcs.strace text
file. After minute press CTRL+C....it should contain enough of it.
And upload that file here.

You can install "strace" with "apt-get install strace".

Include also "ps aux | grep darcs" output or something like that.

Of course try also "darcs check" if everything is fine in your repo.

Thanks,
Rado
msg14621 (view) Author: nccb Date: 2011-07-31.19:24:56
Aha, strace showed me the problem.  Darcs was trying to access a patch
from my local copy of the repository during the record, and found it
missing.  Then it went to get the patch from the sshfs-mounted
repository that I had pulled it from.  It turns out that the reason for
hanging then had nothing to do with darcs.  All processes would hang
(once the sshfs connection had been on for a few hours) when they tried
to go near that directory.  A bit of googling suggests I am falling foul
of this Debian/Ubuntu sshfs bug:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=571005 (Debian bug
#571005).  So it's nothing to do with darcs (and I don't think darcs can
work around it, since the problem hangs all processes that try to touch
the mount).  Sorry for the bug report, but since darcs was the only
thing needing to touch that mountpoint, I hadn't realised the whole
thing was broken, it just manifested in darcs.  I've closed the ticket
(I hope), thanks for your strace suggestion.
msg14622 (view) Author: ganesh Date: 2011-07-31.19:29:18
Arguably darcs should help in diagnosing this, though. Would using --
verbose have helped?
msg14623 (view) Author: nccb Date: 2011-08-01.07:58:14
Ganesh: I tried with --verbose, but there's no verbose output between
the line asking me for a long comment and darcs hanging on trying to
access that patch.  However, --debug-verbose does say that it is copying
a patch (from my sshfs mountpoint), so that would have been almost as
useful as the strace in pointing to the parent repository as a problem.
msg14625 (view) Author: dixiecko Date: 2011-08-01.21:27:17
great :-) I didn't suggest --verbose stuff because it is not very useful
when it happen sporadicaly. With strace it is possible to extract
problem out of the already "frozen" process.

here can be useful some "time based watchdog" functionality when all
operations supposed to be silent should generate some warning on
terminal about currently running operations when it is running too long. 

Anyway, it is great that you discovered/resolved problem with sshfs!
msg15121 (view) Author: kowey Date: 2012-02-18.22:02:08
Thanks again for the report, Neil. I realise this is probably a closed 
matter as far as your concern, but could I ask about those missing files?

By any chance, was the local repository lazily fetched?  If not, any 
hints for why the patches in question might have missing (eg. were 
deleted, some permissions thing?)

I think we've picked up some interesting hints from this:
- that telling users to --debug-verbose in these situations is a good 
thing
- and that we need to improve our feedback when fetching files you are 
not expecting to fetch (issue2123)

Also, darcs-developers: we need to think a bit here and figure out if 
this symptom is a bug: darcs record wants to fetch patches. See issue2143 
for my attempt at puzzling through that.

Thanks, Neil.  I'm marking this as waiting-for your details.  After we 
are clearer on that, darcs-devs, I *think* it'd be safe to move back to 
resolved, having had the extra clue saved for posterity
msg15123 (view) Author: nccb Date: 2012-02-18.22:28:02
I still have the repository in question (I've recorded a few more 
patches since I filed the bug, but it will only have had some darcs 
record, darcs push and darcs pull, no re-getting or anything).  I'm 
happy to try to help -- how do I tell if the repository was fetched 
lazily?  I suspect it was just done using "darcs get" with whatever the 
default behaviour is on 2.5.2, but I'm guessing/hoping there's a way to 
tell for sure by running some command (darcs show repo doesn't seem to 
help).
msg15180 (view) Author: kowey Date: 2012-02-26.09:57:43
Thanks, Neil.  I'm afraid there's no way to tell if the repository was 
fetched with --lazy or not, sorry!  I've suggested a _darcs/log in 
issue2148 for the future so that maybe future Neils would be able to 
tell in similar situations.

OK, so I guess we give up on seeing if lazy has anything to do with 
this.  Where to next?  Is it safe for us move back to Neil's original 
resolved state or do we have more information we can try to tease out?
History
Date User Action Args
2011-07-31 14:26:32nccbcreate
2011-07-31 17:48:47dixieckosetmessages: + msg14620
2011-07-31 19:24:57nccbsetpriority: bug -> invalid
status: unknown -> resolved
messages: + msg14621
2011-07-31 19:29:19ganeshsetstatus: resolved -> unknown
messages: + msg14622
2011-08-01 07:58:16nccbsetmessages: + msg14623
2011-08-01 21:27:19dixieckosetmessages: + msg14625
2012-02-18 22:02:09koweysetpriority: invalid -> bug
status: unknown -> waiting-for
messages: + msg15121
2012-02-18 22:28:03nccbsetmessages: + msg15123
2012-02-22 09:41:09mulandersetnosy: + mulander
2012-02-26 09:41:24koweylinkissue2143 superseder
2012-02-26 09:57:45koweysetstatus: waiting-for -> unknown
topic: + Hashed
messages: + msg15180
2017-07-31 00:36:27ghsetstatus: unknown -> given-up