darcs

Issue 1278 darcs hangs when invoked with no arguments and pager quits

Title darcs hangs when invoked with no arguments and pager quits
Priority urgent Status wont-fix
Milestone Resolved in
Superseder Nosy List darcs-devel, dleverton, dmitry.kurochkin, dsf, fx, gour, kowey, mornfall, salty-horse, thorkilnaur, tux_rocker
Assigned To
Topics UI

Created on 2008-12-22.21:43:44 by tux_rocker, last changed 2009-08-27.14:20:34 by admin.

Files
File name Uploaded Type Edit Remove
darcs-2.2.0pre1-issue1278-strace.txt tux_rocker, 2009-01-20.21:39:07 text/plain
Messages
msg6868 (view) Author: tux_rocker Date: 2008-12-22.21:43:39
When I simply type "darcs<ENTER>" and then, in the pager, type "q" to quit,
darcs will hang eating 100% CPU. Ctrl-C and SIGTERM don't stop it, only SIGKILL
works.

This was tested on Ubuntu Intrepid on an ASUS M51Tr laptop. Some more information:

$ uname -a
Linux adim 2.6.27-9-generic #1 SMP Thu Nov 20 22:15:32 UTC 2008 x86_64 GNU/Linux

$ darcs --version
2.2.0pre1 (prerelease 1)

$ dpkg -l less
ii  less       418-1      Pager program similar to more
msg6879 (view) Author: mornfall Date: 2008-12-24.23:24:29
Cannot reproduce : - (. I have same version of less and same version of darcs, 
but on Debian (almost) Sid. Running in urxvt, both with and without screen, 
both with small and big window size works fine. Could you maybe produce some 
more data on when and how it hangs? A strace (strace -olog darcs) might be 
useful -- if it has anything interesting, please attach the log to the report. 
Maybe try ltrace if strace doesn't help.

Thanks!
msg6880 (view) Author: mornfall Date: 2008-12-24.23:28:29
Just for the record it works in xterm as well... Can you check if you have 
haskeline enabled, btw? (I wouldn't expect that to make a difference, but you 
never know...)
msg6963 (view) Author: tux_rocker Date: 2009-01-04.15:56:40
Interestingly, I can't reproduce the bug anymore, even though I changed nothing
about my darcs.

I originally tried it in konsole (the KDE terminal emulator), but I thought it
couldn't make a difference because a pipe between darcs and less is a matter
that the terminal does not have anything to do with.
msg7013 (view) Author: kowey Date: 2009-01-08.11:43:07
I wonder if there is a way we can trigger this more predictably with some sort
of fake pager, for example a shell script that exits immediately.
msg7068 (view) Author: fx Date: 2009-01-12.14:10:23
I'm seeing the same thing with the current sources.

Attaching gdb, I see it stopped like this:

#0  0x00007faca0dc2907 in sched_yield () from /lib/libc.so.6
#1  0x000000000096403c in shutdownCapability ()
#2  0x000000000094db47 in exitScheduler ()
#3  0x000000000094cb37 in hs_exit_ ()
#4  0x000000000094cc3a in shutdownHaskellAndExit ()
#5  0x000000000093db3f in base_GHCziTopHandler_zdLr1ldzdwccall_info ()
#6  0x0000000000000000 in ?? ()

and it doesn't return from shutdownCapability.

This is on Ubuntu Hardy(-ish) with less as pager.  Sorry I don't have time
to try to debug it now.
msg7135 (view) Author: mornfall Date: 2009-01-18.14:23:52
If someone still can reproduce the problem, please provide output of darcs +RTS 
--info. We just had a case of this with x86_64 GHC RTS. I have never seen this 
happen on i386 myself. I'll try to build a 64-bit darcs binary.
msg7137 (view) Author: dleverton Date: 2009-01-18.14:45:42
I'm seeing this.  Requested information:

shiny-one ~ # darcs +RTS --info
 [("GHC RTS", "Yes")
 ,("GHC version", "6.8.3")
 ,("RTS way", "rts_thr")
 ,("Host platform", "x86_64-unknown-linux")
 ,("Build platform", "x86_64-unknown-linux")
 ,("Target platform", "x86_64-unknown-linux")
 ,("Compiler unregisterised", "NO")
 ,("Tables next to code", "YES")
 ]

Anything else you need to know, feel free to ask.
msg7138 (view) Author: mornfall Date: 2009-01-18.15:35:24
(Looking at the gdb output, this is obviously 64 bits... My own 64b binary 
doesn't hang though, sadly... Consulting #ghc...)

It would be helpful if you could try with this binary: http://
repos.mornfall.net/darcs/darcs-2.2.0-ghc6.10-libwww-static.bin -- it is 32bit 
so I don't expect to see any hangs, but if you reproduce them, that will at 
least bring some new data.

Thanks!
msg7139 (view) Author: fx Date: 2009-01-18.18:44:47
Petr Ročkai <bugs@darcs.net> writes:

> Petr Ročkai <me@mornfall.net> added the comment:
>
> If someone still can reproduce the problem, please provide output of darcs +RTS 
> --info. We just had a case of this with x86_64 GHC RTS. 

Yes, I see it on x86_64:

$ darcs +RTS --info
 [("GHC RTS", "Yes")
 ,("GHC version", "6.8.2")
 ,("RTS way", "rts_thr")
 ,("Host platform", "x86_64-unknown-linux")
 ,("Build platform", "x86_64-unknown-linux")
 ,("Target platform", "x86_64-unknown-linux")
 ,("Compiler unregisterised", "NO")
 ,("Tables next to code", "YES")
 ]
msg7145 (view) Author: tux_rocker Date: 2009-01-19.22:37:42
It hangs on my laptop again. Requested info:

reinier@adim:~$ darcs +RTS --info
 [("GHC RTS", "Yes")
 ,("GHC version", "6.8.2")
 ,("RTS way", "rts_thr")
 ,("Host platform", "x86_64-unknown-linux")
 ,("Build platform", "x86_64-unknown-linux")
 ,("Target platform", "x86_64-unknown-linux")
 ,("Compiler unregisterised", "NO")
 ,("Tables next to code", "YES")
 ]

I may try to debug a little bit more if I find the time for it, and the problem
keeps appearing predictably enough.
msg7147 (view) Author: dmitry.kurochkin Date: 2009-01-20.10:32:03
I can reproduce this on 64 bits with darcs version 2.2.0 (+ 48 patches). It
happens in both konsole and xterm.

$ ./darcs +RTS --info
 [("GHC RTS", "Yes")
 ,("GHC version", "6.8.2")
 ,("RTS way", "rts_thr")
 ,("Host platform", "x86_64-unknown-linux")
 ,("Build platform", "x86_64-unknown-linux")
 ,("Target platform", "x86_64-unknown-linux")
 ,("Compiler unregisterised", "NO")
 ,("Tables next to code", "YES")
 ]

darcs-2.2.0-ghc6.10-libwww-static.bin does not hang.

Regards,
  Dmitry
msg7148 (view) Author: tux_rocker Date: 2009-01-20.21:39:06
I'm attaching an strace darcs 2.2.0pre1 hanging this way. The last line of the
strace (sched_yield() call returning 0) is repeated ad infinitum.
Attachments
msg7149 (view) Author: tux_rocker Date: 2009-01-20.21:47:00
mornfall's 32-bit binary works on my laptop where my 64-bit binary fails. Here
is the final bit of its strace:

waitpid(9988, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 9988
--- SIGCHLD (Child exited) @ 0 (0) ---
unlink("/tmp/darcs-pager9985")          = 0
timer_settime(0, 0, {it_interval={0, 20000000}, it_value={0, 20000000}}, NULL) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 17179869184000}, ru_stime={0, 0}, ...}) = 0
gettimeofday({1232487841, 931231}, NULL) = 0
write(4, "\376", 1)                     = 1
clock_gettime(0xfffffffe /* CLOCK_??? */, {0, 4887508}) = 0
gettimeofday({1232487841, 931430}, NULL) = 0
rt_sigprocmask(SIG_BLOCK, [HUP INT ABRT PIPE TERM], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x9b30140, 0x85 /* FUTEX_??? */, 1) = 1
futex(0x9b3016c, 0x81 /* FUTEX_??? */, 1) = 1
sched_yield()                           = 0
sched_yield()                           = 0
timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
timer_delete(0)                         = 0
fcntl64(0, F_GETFL)                     = 0x8002 (flags O_RDWR|O_LARGEFILE)
fcntl64(1, F_GETFL)                     = 0x8002 (flags O_RDWR|O_LARGEFILE)
fcntl64(2, F_GETFL)                     = 0x8002 (flags O_RDWR|O_LARGEFILE)
rt_sigprocmask(SIG_BLOCK, [TTOU], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL}, NULL, 8) = 0
rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 0}, ru_stime={0, 0}, ...}) = 0
gettimeofday({1232487841, 932737}, NULL) = 0
exit_group(0)                           = ?
msg7151 (view) Author: mornfall Date: 2009-01-21.06:46:57
Ok, I have done some investigation, and it seems that things are hanging in GHC 
RTS. My current best estimate is this sequence:
- cap->spare_workers = NULL but it holds that !emptyRunQueue(cap)
- the system is about to quit, and shutdownCapability(cap) is called

This is in shutdownCapability(cap), inside a for(...;;...):
   714         if (!emptyRunQueue(cap) || cap->spare_workers) {
   715             debugTrace(DEBUG_sched, 
   716                        "runnable threads or workers still alive, 
yielding");
   717             releaseCapability_(cap); // this will wake up a worker
   718             RELEASE_LOCK(&cap->lock);
   719             yieldThread();
   720             continue;
   721         }

Now looking at releaseCapability_:
   321     if (!cap->spare_workers) {
   322         // Create a worker thread if we don't have one.  If the system
   323         // is interrupted, we only create a worker task if there
   324         // are threads that need to be completed.  If the system is
   325         // shutting down, we never create a new worker.
   326         if (sched_state < SCHED_SHUTTING_DOWN || !emptyRunQueue(cap)) {
   327             debugTrace(DEBUG_sched,
   328                        "starting new worker on capability %d", cap->no);
   329             startWorkerTask(cap, workerStart);
   330             return;
   331         }
   332     }
Note that system is shutting down at this point, so the work is never done. The 
system spins, just yielding the CPU in the shutdownCapability loop. There is no 
cap->running_task either, since that way, shutdownCapability would never fall 
through to the above code. We know that releaseCapability_ is being called, 
since darcs +RTS -vs gives following trace infinitely many times:
   346     trace(TRACE_sched | DEBUG_sched, "freeing capability %d", cap->no);

To me, this seems to be at least two RTS bugs:
- It should never happen that a shutdown happens while !emptyRunQueue(cap) && 
cap->spare_workers == NULL && cap->running_task == NULL, but this is violated 
on x86_64, sometimes
- The system will go into an infinite loop in this case.

I don't fully understand the system of course. It seems to me, that fixing the 
latter bug would make darcs magically not hang, although the first bug is the 
"real" problem, I suppose. Seems much harder to debug, too.
msg7152 (view) Author: mornfall Date: 2009-01-21.06:48:09
PS: If someone who can reproduce would be able to build with a debug-enabled 
GHC RTS, that would be great, although I suppose this is a lot of trouble and 
since the probem seems to be somewhat racy, it might just go away...
msg7153 (view) Author: fx Date: 2009-01-21.10:55:05
Reinier Lamers <bugs@darcs.net> writes:

> Reinier Lamers <tux_rocker@reinier.de> added the comment:
>
> I'm attaching an strace darcs 2.2.0pre1 hanging this way. The last line of the
> strace (sched_yield() call returning 0) is repeated ad infinitum.

Sorry, I thought I'd shown that with gdb.

Some extra info is that although I normally see the hang, it's not
consistent.  I saw it in one window, while the same thing in another
window in the same session was OK, at least on a coupe of attempts.  I
wonder whether it's really a 64-bit issue (at the FFI level?) or
actually a race due to speed or multi-processing (dual core 64-bit
systems), but I guess wondering isn't very useful.
msg7154 (view) Author: mornfall Date: 2009-01-21.11:21:29
I don't think it is multi-processing related, but you can try this:
darcs +RTS -N1
darcs +RTS -N2
to force use of 1 or 2 OS threads. If both of these invocations exhibit the 
same behaviour of hangs (which I would expect), then it's quite likely 
independent of real parallelism. Darcs does not use forkOS nowadays, so there 
should be no bound threads whatsoever in the equation.
msg7155 (view) Author: tux_rocker Date: 2009-01-21.19:06:22
I can see the hanging sometimes with "darcs +RTS -N1".
msg7156 (view) Author: tux_rocker Date: 2009-01-21.19:07:55
BTW, if the bug is in the GHC RTS, the interesting question is whether this can
be reproduced on the latest GHC. fx and I are both on 6.8 afaik.
msg7157 (view) Author: mornfall Date: 2009-01-21.19:31:24
I believe there was at least one case of 6.10.1 exhibiting the hangs (cropped 
up in an IRC discussion).
msg7260 (view) Author: dsf Date: 2009-02-02.19:44:44
I get hangs consistently when darcs is built with ghc 6.10, none when it is
built with ghc 6.8.  I suspect it is related to this: 
http://hackage.haskell.org/trac/ghc/ticket/2971
msg7261 (view) Author: mornfall Date: 2009-02-03.00:11:22
For the record, I don't think this is the /proc/whatever read hangs issue. We 
have plenty of reports of darcs built with 6.8 hanging as well.
msg7656 (view) Author: kowey Date: 2009-04-09.17:16:54
dsf: would you be willing to try with GHC 6.10.2?  I have no idea if it would
help, but I figure that it couldn't hurt

(this suggestion comes from gour on #darcs)
msg7725 (view) Author: gour Date: 2009-04-20.16:07:46
Hi!

Let me just inform you that I've just built ghc-6.10.2 and cannot reproduce the
problem any longer :-)


Sincerely,
Gour
msg7865 (view) Author: tux_rocker Date: 2009-06-02.09:06:13
It appears to be a GHC problem in GHC 6.10.1. Marking wont-fix as the GHC guys
fixed it for us.
History
Date User Action Args
2008-12-22 21:43:44tux_rockercreate
2008-12-24 23:24:33mornfallsetstatus: unread -> unknown
nosy: + mornfall
messages: + msg6879
2008-12-24 23:28:32mornfallsetnosy: kowey, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall
messages: + msg6880
2009-01-04 15:56:43tux_rockersetnosy: kowey, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall
messages: + msg6963
2009-01-08 11:43:10koweysetnosy: kowey, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall
messages: + msg7013
2009-01-12 14:10:27fxsetnosy: + fx
messages: + msg7068
2009-01-18 14:23:55mornfallsetnosy: kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall
messages: + msg7135
2009-01-18 14:27:31mornfalllinkissue879 superseder
2009-01-18 14:45:44dlevertonsetnosy: + dleverton
messages: + msg7137
2009-01-18 15:35:27mornfallsetnosy: kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton
messages: + msg7138
2009-01-18 18:44:50fxsetnosy: kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton
messages: + msg7139
2009-01-19 22:37:45tux_rockersetnosy: kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton
messages: + msg7145
2009-01-20 10:32:06dmitry.kurochkinsetnosy: kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton
messages: + msg7147
2009-01-20 21:39:12tux_rockersetfiles: + darcs-2.2.0pre1-issue1278-strace.txt
nosy: kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton
messages: + msg7148
2009-01-20 21:47:02tux_rockersetnosy: kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton
messages: + msg7149
2009-01-21 06:47:00mornfallsetnosy: kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton
messages: + msg7151
2009-01-21 06:48:11mornfallsetnosy: kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton
messages: + msg7152
2009-01-21 10:55:09fxsetnosy: kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton
messages: + msg7153
2009-01-21 11:21:32mornfallsetnosy: kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton
messages: + msg7154
2009-01-21 19:06:25tux_rockersetnosy: kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton
messages: + msg7155
2009-01-21 19:07:57tux_rockersetnosy: kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton
messages: + msg7156
2009-01-21 19:31:26mornfallsetnosy: kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton
messages: + msg7157
2009-02-02 19:44:46dsfsetnosy: + dsf
messages: + msg7260
2009-02-03 00:11:27mornfallsetnosy: kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton, dsf
messages: + msg7261
2009-02-05 15:54:46mornfalllinkissue1341 superseder
2009-04-06 22:55:51salty-horsesetnosy: + salty-horse
2009-04-09 17:16:56koweysetnosy: + gour
messages: + msg7656
2009-04-20 16:07:49goursetnosy: kowey, fx, salty-horse, simon, thorkilnaur, tux_rocker, gour, dmitry.kurochkin, mornfall, dleverton, dsf
messages: + msg7725
2009-06-02 09:06:17tux_rockersetstatus: unknown -> wont-fix
nosy: kowey, fx, salty-horse, simon, thorkilnaur, tux_rocker, gour, dmitry.kurochkin, mornfall, dleverton, dsf
messages: + msg7865
2009-06-02 09:09:32tux_rockerlinkissue1475 superseder
2009-08-25 17:36:41adminsetnosy: + darcs-devel, - simon
2009-08-27 14:20:34adminsetnosy: kowey, fx, darcs-devel, salty-horse, thorkilnaur, tux_rocker, gour, dmitry.kurochkin, mornfall, dleverton, dsf