darcs

Issue 876 darcs2 dead-locks on GHC subrepo (darcs get --partial http://darcs.haskell.org/ghc-stable/packages/pretty)

Title darcs2 dead-locks on GHC subrepo (darcs get --partial http://darcs.haskell.org/ghc-stable/packages/pretty)
Priority bug Status resolved
Milestone 2.0.x Resolved in
Superseder get => Possible bug in URL.waitNextUrl (2.0.2 + 20)
View: 977
Nosy List Serware, darcs-devel, dmitry.kurochkin, karel.gardas, kowey, mornfall, thorkilnaur, tommy, tux_rocker
Assigned To dmitry.kurochkin
Topics HTTP, SunOS

Created on 2008-05-23.12:33:54 by kgardas, last changed 2010-06-15.21:20:32 by admin.

Files
File name Uploaded Type Edit Remove
darcs_live_lock_stacks.txt.gz kgardas, 2008-05-23.12:33:52 application/x-gzip
Messages
msg4848 (view) Author: karel.gardas Date: 2008-05-23.12:33:52
While trying to get http://darcs.haskell.org/ghc-stable/packages/pretty
by invoking `darcs get --partial
http://darcs.haskell.org/ghc-stable/packages/pretty' command as part of
GHC buildprocess on GHC buildbot I've notices that this darcs never
finishes its `Identifying repository
http://darcs.haskell.org/ghc-stable/packages/pretty
' step. It seems there is some live-lock involved while calling licurl
functions. This is on Solaris 11, ghc 6.8.2.20080430, darcs from May 22,
linked against curl 7.18.0 with curl pipelining enabled.

As this is on Solaris I've saved few process's stacktraces by using
pstack(1) command. It seems darcs is again and again invoking following
trace:

-----------------  lwp# 1 / thread# 1  --------------------
 fea545b7 gettimeofday (80459c0, 8561e60, 80459d8, feed829d, 80459c0,
8561e64) + 7
 feed82da curl_multi_perform (8561e60, 8045a2c, 0, 0, 0, 0) + 8e
 0841e5db curl_wait_next_url (0, 0, 0, 0, 0, 0) + 37
 08189e3d ???????? (fe3ec8c1, fe3ec8ce, fe3ec8e7, fe3ec8f0, 818b29c,
fe3ec8fa)
 00000001 ???????? (c5830818, a79ae9fc, f689002e, 384ab0, 4, ffff0022)
 b11c0045 ???????? ()


Anyway whole log saved by few seconds running `while (true) do pstack
<darcs pid>; done is attached.

Could anybody be so kind and try to duplicate this issue on Linux? Also
if anybody is able to tell me how to get rid of question marks in the
trace and replace them with appropriate GHC generated functions name,
then I would be glad to provide "fixed" stacktrace to show more
information about the issue. Thanks!

PS: Please note this is in Solaris running on real-metal, not in broken
VirtualBox!
Attachments
msg4849 (view) Author: kowey Date: 2008-05-23.14:24:27
For what it's worth, I was able to get that repository under Linux (darcs 2.0.0
+ 264)
msg4850 (view) Author: karel.gardas Date: 2008-05-23.14:44:02
Actually I think Zooko on darcs-user might right claiming that this
might be issue in libcurl badly handling of time changes. i.e. I'm
running ntpd and perhaps at the right time it moved clock backward which
might confuse curl enough to break whole darcs. But this would require
some testing for a proof.
msg4855 (view) Author: tux_rocker Date: 2008-05-24.10:02:30
I don't think this is caused by ntpd. So many boxes are running that, that if it
ever caused trouble, it would have been fixed already. Can you reproduce it
under Solaris?

Also, removing the question marks from ptrace output is impossible AFAIK. A tool
like ptrace assumes that the processor registers are used according to some C
ABI, but GHC's stack and register layout is totally different from C's.

Creating a tracing tool for GHC-compiled executables seems like a fun project
though :)
msg4925 (view) Author: karel.gardas Date: 2008-06-02.19:24:42
It seems that this issue also might cause strange dead-locks in darcs testsuite
as shown on the darcs buildbots. Sometimes either make test or make test_network
hangs forever. Following little scripts duplicates this issue on Solaris:

[code]
while (true); do gmake test; gmake test_network;  done
[/code]
msg5003 (view) Author: kowey Date: 2008-06-11.10:55:34
At the risk of throwing in some noise, I notice that under Windows (darcs
2.0.1rc1 + 15) with curl pipelining enabled, darcs will sit there forever trying
to identify the repository format.

The debugging info says that it has successfully retrieved the format file, but
if you ls it, the tmpfile it's using is 0 byte.

Attached
Attachments
msg5004 (view) Author: kowey Date: 2008-06-11.12:23:02
Argh.  Nevemind.  That was noise.  I was (a) using a silly URL and (b) had not
set my proxy server
msg5050 (view) Author: kowey Date: 2008-06-16.10:02:08
Just to check, is it just darcs get that's causing problems?

If we echo changes.sh > tests/network/tests_to_run (disabling the get test), do
we still have a hang?
msg5444 (view) Author: mornfall Date: 2008-08-12.22:17:39
I suspect this might be related to curl versus threaded runtime. See also
issue977 -- Dmitry is looking into that, I believe.
msg5655 (view) Author: kowey Date: 2008-08-22.17:39:22
Dmitry says that issue977 has been resolved.  Any improvements with the latest
darcs, Karel?
msg5657 (view) Author: karel.gardas Date: 2008-08-22.18:23:38
The only improvement is that instead of live-lock I get dead-lock. So darcs does
not consume 100% CPU, but just waits silent forever. I've reported this to
darcs-users@ and Dmitry has kindly asked me to duplicate with --debug
--debug-http, but I've not had any free time to do this yet. Sorry, for that.
For more information see Solaris' buildbot and its failing test_network test. It
locks all the time now.
Well, but if you care I may open different bug for this if you like to have this
one closed. Thanks!
msg5934 (view) Author: dmitry.kurochkin Date: 2008-09-08.08:22:42
Karel, can you test with the latest darcs? There were few HTTP related fixes,
one Solaris specific.

If you are behind HTTP proxy - take a look at issue1035, there is a bug in curl.

Regards,
  Dmitry
msg5969 (view) Author: karel.gardas Date: 2008-09-09.17:12:27
Hello, as per Dmitry request I've tested and I'm not able to duplicate
the dead-lock while running make test_network several times (11x). So
perhaps this is really fixed, since previously it happened every time.
Thanks! Karel
msg5971 (view) Author: dmitry.kurochkin Date: 2008-09-09.20:37:38
Thanks Karel! Closing this one.

Regards,
  Dmitry
History
Date User Action Args
2008-05-23 12:33:54kgardascreate
2008-05-23 14:24:29koweysetstatus: unread -> unknown
nosy: + kowey
messages: + msg4849
2008-05-23 14:24:50koweysettopic: + SunOS
nosy: tommy, beschmi, kowey, dagit, kgardas
2008-05-23 14:44:04karel.gardassetnosy: + karel.gardas
messages: + msg4850
2008-05-24 09:48:27koweylinkissue877 superseder
2008-05-24 09:48:42koweylinkissue878 superseder
2008-05-24 10:02:32tux_rockersetnosy: + tux_rocker, karel.gardas, - karel.gardas
messages: + msg4855
2008-06-02 19:24:44kgardassetnosy: tommy, beschmi, kowey, dagit, tux_rocker, karel.gardas, kgardas
messages: + msg4925
2008-06-11 10:55:35koweysetfiles: + hang-windows
nosy: tommy, beschmi, kowey, dagit, tux_rocker, karel.gardas, kgardas
messages: + msg5003
2008-06-11 12:23:04koweysetnosy: tommy, beschmi, kowey, dagit, tux_rocker, karel.gardas, kgardas
messages: + msg5004
2008-06-11 12:23:12koweysetfiles: - hang-windows
nosy: tommy, beschmi, kowey, dagit, tux_rocker, karel.gardas, kgardas
2008-06-16 10:02:09koweysetnosy: tommy, beschmi, kowey, dagit, tux_rocker, karel.gardas, kgardas
messages: + msg5050
2008-08-12 22:17:42mornfallsetnosy: + mornfall, karel.gardas, - karel.gardas
messages: + msg5444
2008-08-22 17:39:29koweysetstatus: unknown -> testing
nosy: tommy, beschmi, kowey, dagit, tux_rocker, karel.gardas, kgardas, mornfall
messages: + msg5655
topic: + Target-2.1
assignedto: karel.gardas
superseder: + get => Possible bug in URL.waitNextUrl (2.0.2 + 20)
2008-08-22 18:23:42kgardassetnosy: tommy, beschmi, kowey, dagit, tux_rocker, karel.gardas, kgardas, mornfall
messages: + msg5657
2008-08-22 21:47:54dmitry.kurochkinsetnosy: + dmitry.kurochkin
2008-08-23 07:57:53koweysetstatus: testing -> waiting-for
nosy: + droundy, Serware
topic: + Target-2.0, - Target-2.1
title: darcs2 live-locks on GHC subrepo (darcs get --partial http://darcs.haskell.org/ghc-stable/packages/pretty) -> darcs2 dead-locks on GHC subrepo (darcs get --partial http://darcs.haskell.org/ghc-stable/packages/pretty)
2008-08-25 09:31:42koweysettopic: + HTTP
nosy: droundy, tommy, beschmi, kowey, dagit, tux_rocker, dmitry.kurochkin, Serware, karel.gardas, kgardas, mornfall
2008-09-08 08:22:44dmitry.kurochkinsetnosy: droundy, tommy, beschmi, kowey, dagit, tux_rocker, dmitry.kurochkin, Serware, karel.gardas, kgardas, mornfall
messages: + msg5934
2008-09-08 08:24:22dmitry.kurochkinsetstatus: waiting-for -> unknown
nosy: droundy, tommy, beschmi, kowey, dagit, tux_rocker, dmitry.kurochkin, Serware, karel.gardas, kgardas, mornfall
assignedto: karel.gardas -> dmitry.kurochkin
2008-09-09 17:12:29karel.gardassetnosy: droundy, tommy, beschmi, kowey, dagit, tux_rocker, dmitry.kurochkin, Serware, karel.gardas, kgardas, mornfall
messages: + msg5969
2008-09-09 20:37:39dmitry.kurochkinsetstatus: unknown -> resolved-in-unstable
nosy: droundy, tommy, beschmi, kowey, dagit, tux_rocker, dmitry.kurochkin, Serware, karel.gardas, kgardas, mornfall
messages: + msg5971
2009-04-22 03:29:57twbsetstatus: resolved-in-unstable -> resolved
nosy: + simon, thorkilnaur
2009-08-06 17:59:54adminsetnosy: + markstos, jast, darcs-devel, zooko, - droundy, tux_rocker, karel.gardas, kgardas
2009-08-06 21:05:50adminsetnosy: - beschmi
2009-08-10 22:20:58adminsetnosy: + tux_rocker, karel.gardas, kgardas, - markstos, darcs-devel, zooko, jast
2009-08-11 00:15:12adminsetnosy: - dagit
2009-08-25 17:44:09adminsetnosy: + darcs-devel, - simon
2009-08-27 14:18:18adminsetnosy: tommy, kowey, darcs-devel, thorkilnaur, tux_rocker, dmitry.kurochkin, Serware, karel.gardas, kgardas, mornfall
2009-10-23 22:36:21adminsetnosy: - karel.gardas
2009-10-23 22:44:24adminsetnosy: + serware, - Serware
2009-10-23 23:30:14adminsetnosy: + Serware, - serware
2009-10-23 23:59:25adminsetnosy: + karel.gardas, - kgardas
2010-06-15 21:20:30adminsetmilestone: 2.0.x
2010-06-15 21:20:32adminsettopic: - Target-2.0