Created on 2008-12-22.21:43:44 by tux_rocker, last changed 2009-08-27.14:20:34 by admin.
msg6868 (view) |
Author: tux_rocker |
Date: 2008-12-22.21:43:39 |
|
When I simply type "darcs<ENTER>" and then, in the pager, type "q" to quit,
darcs will hang eating 100% CPU. Ctrl-C and SIGTERM don't stop it, only SIGKILL
works.
This was tested on Ubuntu Intrepid on an ASUS M51Tr laptop. Some more information:
$ uname -a
Linux adim 2.6.27-9-generic #1 SMP Thu Nov 20 22:15:32 UTC 2008 x86_64 GNU/Linux
$ darcs --version
2.2.0pre1 (prerelease 1)
$ dpkg -l less
ii less 418-1 Pager program similar to more
|
msg6879 (view) |
Author: mornfall |
Date: 2008-12-24.23:24:29 |
|
Cannot reproduce : - (. I have same version of less and same version of darcs,
but on Debian (almost) Sid. Running in urxvt, both with and without screen,
both with small and big window size works fine. Could you maybe produce some
more data on when and how it hangs? A strace (strace -olog darcs) might be
useful -- if it has anything interesting, please attach the log to the report.
Maybe try ltrace if strace doesn't help.
Thanks!
|
msg6880 (view) |
Author: mornfall |
Date: 2008-12-24.23:28:29 |
|
Just for the record it works in xterm as well... Can you check if you have
haskeline enabled, btw? (I wouldn't expect that to make a difference, but you
never know...)
|
msg6963 (view) |
Author: tux_rocker |
Date: 2009-01-04.15:56:40 |
|
Interestingly, I can't reproduce the bug anymore, even though I changed nothing
about my darcs.
I originally tried it in konsole (the KDE terminal emulator), but I thought it
couldn't make a difference because a pipe between darcs and less is a matter
that the terminal does not have anything to do with.
|
msg7013 (view) |
Author: kowey |
Date: 2009-01-08.11:43:07 |
|
I wonder if there is a way we can trigger this more predictably with some sort
of fake pager, for example a shell script that exits immediately.
|
msg7068 (view) |
Author: fx |
Date: 2009-01-12.14:10:23 |
|
I'm seeing the same thing with the current sources.
Attaching gdb, I see it stopped like this:
#0 0x00007faca0dc2907 in sched_yield () from /lib/libc.so.6
#1 0x000000000096403c in shutdownCapability ()
#2 0x000000000094db47 in exitScheduler ()
#3 0x000000000094cb37 in hs_exit_ ()
#4 0x000000000094cc3a in shutdownHaskellAndExit ()
#5 0x000000000093db3f in base_GHCziTopHandler_zdLr1ldzdwccall_info ()
#6 0x0000000000000000 in ?? ()
and it doesn't return from shutdownCapability.
This is on Ubuntu Hardy(-ish) with less as pager. Sorry I don't have time
to try to debug it now.
|
msg7135 (view) |
Author: mornfall |
Date: 2009-01-18.14:23:52 |
|
If someone still can reproduce the problem, please provide output of darcs +RTS
--info. We just had a case of this with x86_64 GHC RTS. I have never seen this
happen on i386 myself. I'll try to build a 64-bit darcs binary.
|
msg7137 (view) |
Author: dleverton |
Date: 2009-01-18.14:45:42 |
|
I'm seeing this. Requested information:
shiny-one ~ # darcs +RTS --info
[("GHC RTS", "Yes")
,("GHC version", "6.8.3")
,("RTS way", "rts_thr")
,("Host platform", "x86_64-unknown-linux")
,("Build platform", "x86_64-unknown-linux")
,("Target platform", "x86_64-unknown-linux")
,("Compiler unregisterised", "NO")
,("Tables next to code", "YES")
]
Anything else you need to know, feel free to ask.
|
msg7138 (view) |
Author: mornfall |
Date: 2009-01-18.15:35:24 |
|
(Looking at the gdb output, this is obviously 64 bits... My own 64b binary
doesn't hang though, sadly... Consulting #ghc...)
It would be helpful if you could try with this binary: http://
repos.mornfall.net/darcs/darcs-2.2.0-ghc6.10-libwww-static.bin -- it is 32bit
so I don't expect to see any hangs, but if you reproduce them, that will at
least bring some new data.
Thanks!
|
msg7139 (view) |
Author: fx |
Date: 2009-01-18.18:44:47 |
|
Petr Ročkai <bugs@darcs.net> writes:
> Petr Ročkai <me@mornfall.net> added the comment:
>
> If someone still can reproduce the problem, please provide output of darcs +RTS
> --info. We just had a case of this with x86_64 GHC RTS.
Yes, I see it on x86_64:
$ darcs +RTS --info
[("GHC RTS", "Yes")
,("GHC version", "6.8.2")
,("RTS way", "rts_thr")
,("Host platform", "x86_64-unknown-linux")
,("Build platform", "x86_64-unknown-linux")
,("Target platform", "x86_64-unknown-linux")
,("Compiler unregisterised", "NO")
,("Tables next to code", "YES")
]
|
msg7145 (view) |
Author: tux_rocker |
Date: 2009-01-19.22:37:42 |
|
It hangs on my laptop again. Requested info:
reinier@adim:~$ darcs +RTS --info
[("GHC RTS", "Yes")
,("GHC version", "6.8.2")
,("RTS way", "rts_thr")
,("Host platform", "x86_64-unknown-linux")
,("Build platform", "x86_64-unknown-linux")
,("Target platform", "x86_64-unknown-linux")
,("Compiler unregisterised", "NO")
,("Tables next to code", "YES")
]
I may try to debug a little bit more if I find the time for it, and the problem
keeps appearing predictably enough.
|
msg7147 (view) |
Author: dmitry.kurochkin |
Date: 2009-01-20.10:32:03 |
|
I can reproduce this on 64 bits with darcs version 2.2.0 (+ 48 patches). It
happens in both konsole and xterm.
$ ./darcs +RTS --info
[("GHC RTS", "Yes")
,("GHC version", "6.8.2")
,("RTS way", "rts_thr")
,("Host platform", "x86_64-unknown-linux")
,("Build platform", "x86_64-unknown-linux")
,("Target platform", "x86_64-unknown-linux")
,("Compiler unregisterised", "NO")
,("Tables next to code", "YES")
]
darcs-2.2.0-ghc6.10-libwww-static.bin does not hang.
Regards,
Dmitry
|
msg7148 (view) |
Author: tux_rocker |
Date: 2009-01-20.21:39:06 |
|
I'm attaching an strace darcs 2.2.0pre1 hanging this way. The last line of the
strace (sched_yield() call returning 0) is repeated ad infinitum.
Attachments
|
msg7149 (view) |
Author: tux_rocker |
Date: 2009-01-20.21:47:00 |
|
mornfall's 32-bit binary works on my laptop where my 64-bit binary fails. Here
is the final bit of its strace:
waitpid(9988, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 9988
--- SIGCHLD (Child exited) @ 0 (0) ---
unlink("/tmp/darcs-pager9985") = 0
timer_settime(0, 0, {it_interval={0, 20000000}, it_value={0, 20000000}}, NULL) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 17179869184000}, ru_stime={0, 0}, ...}) = 0
gettimeofday({1232487841, 931231}, NULL) = 0
write(4, "\376", 1) = 1
clock_gettime(0xfffffffe /* CLOCK_??? */, {0, 4887508}) = 0
gettimeofday({1232487841, 931430}, NULL) = 0
rt_sigprocmask(SIG_BLOCK, [HUP INT ABRT PIPE TERM], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
futex(0x9b30140, 0x85 /* FUTEX_??? */, 1) = 1
futex(0x9b3016c, 0x81 /* FUTEX_??? */, 1) = 1
sched_yield() = 0
sched_yield() = 0
timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
timer_delete(0) = 0
fcntl64(0, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fcntl64(1, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fcntl64(2, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
rt_sigprocmask(SIG_BLOCK, [TTOU], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL}, NULL, 8) = 0
rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0
getrusage(RUSAGE_SELF, {ru_utime={0, 0}, ru_stime={0, 0}, ...}) = 0
gettimeofday({1232487841, 932737}, NULL) = 0
exit_group(0) = ?
|
msg7151 (view) |
Author: mornfall |
Date: 2009-01-21.06:46:57 |
|
Ok, I have done some investigation, and it seems that things are hanging in GHC
RTS. My current best estimate is this sequence:
- cap->spare_workers = NULL but it holds that !emptyRunQueue(cap)
- the system is about to quit, and shutdownCapability(cap) is called
This is in shutdownCapability(cap), inside a for(...;;...):
714 if (!emptyRunQueue(cap) || cap->spare_workers) {
715 debugTrace(DEBUG_sched,
716 "runnable threads or workers still alive,
yielding");
717 releaseCapability_(cap); // this will wake up a worker
718 RELEASE_LOCK(&cap->lock);
719 yieldThread();
720 continue;
721 }
Now looking at releaseCapability_:
321 if (!cap->spare_workers) {
322 // Create a worker thread if we don't have one. If the system
323 // is interrupted, we only create a worker task if there
324 // are threads that need to be completed. If the system is
325 // shutting down, we never create a new worker.
326 if (sched_state < SCHED_SHUTTING_DOWN || !emptyRunQueue(cap)) {
327 debugTrace(DEBUG_sched,
328 "starting new worker on capability %d", cap->no);
329 startWorkerTask(cap, workerStart);
330 return;
331 }
332 }
Note that system is shutting down at this point, so the work is never done. The
system spins, just yielding the CPU in the shutdownCapability loop. There is no
cap->running_task either, since that way, shutdownCapability would never fall
through to the above code. We know that releaseCapability_ is being called,
since darcs +RTS -vs gives following trace infinitely many times:
346 trace(TRACE_sched | DEBUG_sched, "freeing capability %d", cap->no);
To me, this seems to be at least two RTS bugs:
- It should never happen that a shutdown happens while !emptyRunQueue(cap) &&
cap->spare_workers == NULL && cap->running_task == NULL, but this is violated
on x86_64, sometimes
- The system will go into an infinite loop in this case.
I don't fully understand the system of course. It seems to me, that fixing the
latter bug would make darcs magically not hang, although the first bug is the
"real" problem, I suppose. Seems much harder to debug, too.
|
msg7152 (view) |
Author: mornfall |
Date: 2009-01-21.06:48:09 |
|
PS: If someone who can reproduce would be able to build with a debug-enabled
GHC RTS, that would be great, although I suppose this is a lot of trouble and
since the probem seems to be somewhat racy, it might just go away...
|
msg7153 (view) |
Author: fx |
Date: 2009-01-21.10:55:05 |
|
Reinier Lamers <bugs@darcs.net> writes:
> Reinier Lamers <tux_rocker@reinier.de> added the comment:
>
> I'm attaching an strace darcs 2.2.0pre1 hanging this way. The last line of the
> strace (sched_yield() call returning 0) is repeated ad infinitum.
Sorry, I thought I'd shown that with gdb.
Some extra info is that although I normally see the hang, it's not
consistent. I saw it in one window, while the same thing in another
window in the same session was OK, at least on a coupe of attempts. I
wonder whether it's really a 64-bit issue (at the FFI level?) or
actually a race due to speed or multi-processing (dual core 64-bit
systems), but I guess wondering isn't very useful.
|
msg7154 (view) |
Author: mornfall |
Date: 2009-01-21.11:21:29 |
|
I don't think it is multi-processing related, but you can try this:
darcs +RTS -N1
darcs +RTS -N2
to force use of 1 or 2 OS threads. If both of these invocations exhibit the
same behaviour of hangs (which I would expect), then it's quite likely
independent of real parallelism. Darcs does not use forkOS nowadays, so there
should be no bound threads whatsoever in the equation.
|
msg7155 (view) |
Author: tux_rocker |
Date: 2009-01-21.19:06:22 |
|
I can see the hanging sometimes with "darcs +RTS -N1".
|
msg7156 (view) |
Author: tux_rocker |
Date: 2009-01-21.19:07:55 |
|
BTW, if the bug is in the GHC RTS, the interesting question is whether this can
be reproduced on the latest GHC. fx and I are both on 6.8 afaik.
|
msg7157 (view) |
Author: mornfall |
Date: 2009-01-21.19:31:24 |
|
I believe there was at least one case of 6.10.1 exhibiting the hangs (cropped
up in an IRC discussion).
|
msg7260 (view) |
Author: dsf |
Date: 2009-02-02.19:44:44 |
|
I get hangs consistently when darcs is built with ghc 6.10, none when it is
built with ghc 6.8. I suspect it is related to this:
http://hackage.haskell.org/trac/ghc/ticket/2971
|
msg7261 (view) |
Author: mornfall |
Date: 2009-02-03.00:11:22 |
|
For the record, I don't think this is the /proc/whatever read hangs issue. We
have plenty of reports of darcs built with 6.8 hanging as well.
|
msg7656 (view) |
Author: kowey |
Date: 2009-04-09.17:16:54 |
|
dsf: would you be willing to try with GHC 6.10.2? I have no idea if it would
help, but I figure that it couldn't hurt
(this suggestion comes from gour on #darcs)
|
msg7725 (view) |
Author: gour |
Date: 2009-04-20.16:07:46 |
|
Hi!
Let me just inform you that I've just built ghc-6.10.2 and cannot reproduce the
problem any longer :-)
Sincerely,
Gour
|
msg7865 (view) |
Author: tux_rocker |
Date: 2009-06-02.09:06:13 |
|
It appears to be a GHC problem in GHC 6.10.1. Marking wont-fix as the GHC guys
fixed it for us.
|
|
Date |
User |
Action |
Args |
2008-12-22 21:43:44 | tux_rocker | create | |
2008-12-24 23:24:33 | mornfall | set | status: unread -> unknown nosy:
+ mornfall messages:
+ msg6879 |
2008-12-24 23:28:32 | mornfall | set | nosy:
kowey, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall messages:
+ msg6880 |
2009-01-04 15:56:43 | tux_rocker | set | nosy:
kowey, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall messages:
+ msg6963 |
2009-01-08 11:43:10 | kowey | set | nosy:
kowey, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall messages:
+ msg7013 |
2009-01-12 14:10:27 | fx | set | nosy:
+ fx messages:
+ msg7068 |
2009-01-18 14:23:55 | mornfall | set | nosy:
kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall messages:
+ msg7135 |
2009-01-18 14:27:31 | mornfall | link | issue879 superseder |
2009-01-18 14:45:44 | dleverton | set | nosy:
+ dleverton messages:
+ msg7137 |
2009-01-18 15:35:27 | mornfall | set | nosy:
kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton messages:
+ msg7138 |
2009-01-18 18:44:50 | fx | set | nosy:
kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton messages:
+ msg7139 |
2009-01-19 22:37:45 | tux_rocker | set | nosy:
kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton messages:
+ msg7145 |
2009-01-20 10:32:06 | dmitry.kurochkin | set | nosy:
kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton messages:
+ msg7147 |
2009-01-20 21:39:12 | tux_rocker | set | files:
+ darcs-2.2.0pre1-issue1278-strace.txt nosy:
kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton messages:
+ msg7148 |
2009-01-20 21:47:02 | tux_rocker | set | nosy:
kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton messages:
+ msg7149 |
2009-01-21 06:47:00 | mornfall | set | nosy:
kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton messages:
+ msg7151 |
2009-01-21 06:48:11 | mornfall | set | nosy:
kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton messages:
+ msg7152 |
2009-01-21 10:55:09 | fx | set | nosy:
kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton messages:
+ msg7153 |
2009-01-21 11:21:32 | mornfall | set | nosy:
kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton messages:
+ msg7154 |
2009-01-21 19:06:25 | tux_rocker | set | nosy:
kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton messages:
+ msg7155 |
2009-01-21 19:07:57 | tux_rocker | set | nosy:
kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton messages:
+ msg7156 |
2009-01-21 19:31:26 | mornfall | set | nosy:
kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton messages:
+ msg7157 |
2009-02-02 19:44:46 | dsf | set | nosy:
+ dsf messages:
+ msg7260 |
2009-02-03 00:11:27 | mornfall | set | nosy:
kowey, fx, simon, thorkilnaur, tux_rocker, dmitry.kurochkin, mornfall, dleverton, dsf messages:
+ msg7261 |
2009-02-05 15:54:46 | mornfall | link | issue1341 superseder |
2009-04-06 22:55:51 | salty-horse | set | nosy:
+ salty-horse |
2009-04-09 17:16:56 | kowey | set | nosy:
+ gour messages:
+ msg7656 |
2009-04-20 16:07:49 | gour | set | nosy:
kowey, fx, salty-horse, simon, thorkilnaur, tux_rocker, gour, dmitry.kurochkin, mornfall, dleverton, dsf messages:
+ msg7725 |
2009-06-02 09:06:17 | tux_rocker | set | status: unknown -> wont-fix nosy:
kowey, fx, salty-horse, simon, thorkilnaur, tux_rocker, gour, dmitry.kurochkin, mornfall, dleverton, dsf messages:
+ msg7865 |
2009-06-02 09:09:32 | tux_rocker | link | issue1475 superseder |
2009-08-25 17:36:41 | admin | set | nosy:
+ darcs-devel, - simon |
2009-08-27 14:20:34 | admin | set | nosy:
kowey, fx, darcs-devel, salty-horse, thorkilnaur, tux_rocker, gour, dmitry.kurochkin, mornfall, dleverton, dsf |
|