Issue879

Title show authors => hangs (2.0.0 + 268)
Priority bug Status chatting
Superseder Nosy List Serware, beschmi, dagit, droundy, kowey, lele, tommy
Assigned To lele Topics Regression, ReleaseCritical_2.1

Created on 2008-05-26.14:27:54 by lele, last changed 2008-06-03.14:54:25 by lele.

Messages
msg4932 (view) Author: lele Date: 2008-06-03.14:54:23
On Mon, 2 Jun 2008 14:18:34 +0100
"Eric Kow" <eric.kow@gmail.com> wrote:

> Maybe we can reproduce this more easily by setting PAGER to varying
> values of sleep...

I tried that, with this simple script:

    #!/bin/bash

    i=0
    while true
    do
      i=$[i+3]
      cat >/tmp/mypager <<EOF
    echo Waiting for $i seconds...
    sleep $i
    exit 0
    EOF
      chmod 755 /tmp/mypager
      DARCS_PAGER=/tmp/mypager darcs query a
      echo ok
    done

but for some reason, that does not seem to trigger the problem, at
least not in the first minute or so...

Instead I gave strace enough time: when darcs behave normally, I see
this trace:

    ... 
    wait4(24975, 0x7fff1c389554, 0, NULL)   = ? ERESTARTSYS (To be restarted)
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
    rt_sigreturn(0x1a)                      = 61
    wait4(24975, 0x7fff1c389554, 0, NULL)   = ? ERESTARTSYS (To be restarted)
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
    rt_sigreturn(0x1a)                      = 61
    wait4(24975, 0x7fff1c389554, 0, NULL)   = ? ERESTARTSYS (To be restarted)
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
    write(4, "\377", 1)                     = 1
    rt_sigreturn(0x2)                       = 61
    wait4(24975, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 24975
    --- SIGCHLD (Child exited) @ 0 (0) ---
    unlink("/tmp/darcs-pager24972")         = 0
    getcwd("/home/lele/wip/darcs-2.0", 4096) = 25
    chdir("/home/lele/wip/darcs-2.0")       = 0
    chdir("/home/lele/wip/darcs-2.0")       = 0
    getrusage(RUSAGE_SELF, {ru_utime={0, 192012}, ru_stime={0, 24001}, ...}) = 0
    write(4, "\376", 1)                     = 1
    clock_gettime(0xfffffffe /* CLOCK_??? */, {0, 214695426}) = 0
    rt_sigprocmask(SIG_BLOCK, [HUP INT ABRT PIPE TERM], [], 8) = 0
    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
    futex(0xbe0eac, 0x85 /* FUTEX_??? */, 1) = 1
    futex(0xbe0ed8, 0x81 /* FUTEX_??? */, 1) = 1
    futex(0xbcad80, 0x81 /* FUTEX_??? */, 1) = 1
    futex(0xbe0eac, 0x85 /* FUTEX_??? */, 1) = 1
    futex(0xbe0ed8, 0x81 /* FUTEX_??? */, 1) = 1
    futex(0xbcad80, 0x81 /* FUTEX_??? */, 1) = 1
    sched_yield()                           = 0
    timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
    timer_delete(0)                         = 0
    fcntl(0, F_GETFL)                       = 0x8002 (flags O_RDWR|O_LARGEFILE)
    fcntl(1, F_GETFL)                       = 0x8002 (flags O_RDWR|O_LARGEFILE)
    fcntl(2, F_GETFL)                       = 0x8002 (flags O_RDWR|O_LARGEFILE)
    rt_sigprocmask(SIG_BLOCK, [TTOU], [], 8) = 0
    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
    getrusage(RUSAGE_SELF, {ru_utime={0, 192012}, ru_stime={0, 24001}, ...}) = 0
    exit_group(0)                           = ?

But after "threshold" seconds I get this instead:

    ...
    wait4(26621, 0x7fff8c6c6884, 0, NULL)   = ? ERESTARTSYS (To be restarted)
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
    rt_sigreturn(0x1a)                      = 61
    wait4(26621, 0x7fff8c6c6884, 0, NULL)   = ? ERESTARTSYS (To be restarted)
    --- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
    write(4, "\377", 1)                     = 1
    rt_sigreturn(0x2)                       = 61
    wait4(26621, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 26621
    --- SIGCHLD (Child exited) @ 0 (0) ---
    unlink("/tmp/darcs-pager26618")         = 0
    getcwd("/home/lele/wip/darcs-2.0", 4096) = 25
    chdir("/home/lele/wip/darcs-2.0")       = 0
    chdir("/home/lele/wip/darcs-2.0")       = 0
    getrusage(RUSAGE_SELF, {ru_utime={0, 208013}, ru_stime={0, 28001}, ...}) = 0
    write(4, "\376", 1)                     = 1
    clock_gettime(0xfffffffe /* CLOCK_??? */, {0, 231146281}) = 0
    rt_sigprocmask(SIG_BLOCK, [HUP INT ABRT PIPE TERM], [], 8) = 0
    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
    futex(0xbe10dc, 0x85 /* FUTEX_??? */, 1) = 1
    sched_yield()                           = 0
    sched_yield()                           = 0
    sched_yield()                           = 0
    sched_yield()                           = 0

and that final "sched_yield()" goes on forever... That's the loop that
requires a "kill -9" to be stopped.

ciao, lele.
-- 
nickname: Lele Gaifax    | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas    | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net |                 -- Fortunato Depero, 1929.
msg4919 (view) Author: kowey Date: 2008-06-02.13:09:42
> How long is "enough" seems related to the number of patches: the
> problem appears when my mental counter exceeds 8-10 on the tailor
> repository (1600 patches), while on the darcs one (5782 patches) it
> happens only after at least 25 mental-ticks...

[oops, resending to list]

Maybe we can reproduce this more easily by setting PAGER to varying
values of sleep...
msg4917 (view) Author: lele Date: 2008-06-02.12:47:00
On Fri, 30 May 2008 11:58:15 -0000
Eric Kow <bugs@darcs.net> wrote:

> 
> Eric Kow <eric.kow@gmail.com> added the comment:
> 
> Ok, well, I'm going to mark this as resolved in the meantime.  Please
> re-open if it strikes back.

I dunno if, and how, I should reopen this, but I have one more
observation as it just happened again.

I ran "darcs query authors" once, immediately found what I was looking
for and exited shortly after to the shell prompt, without problems.

A minute later I needed another info, and repeated the query: this time
it took a little longer to find it out... and the issue reappeared!

I repeated the two steps several times: if I exit immediately from the
pager, darcs regains control and correctly quits, while if I wait
"enough" before quitting the pager (less, in my case), darcs hungs
when it reenters.

How long is "enough" seems related to the number of patches: the
problem appears when my mental counter exceeds 8-10 on the tailor
repository (1600 patches), while on the darcs one (5782 patches) it
happens only after at least 25 mental-ticks...

This "have to wait in the pager" to see the problem may have fooled me
when I was trying to figure out if the -O2 patch was the culprit.

Hope this helps,
ciao, lele.
-- 
nickname: Lele Gaifax    | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas    | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net |                 -- Fortunato Depero, 1929.
msg4893 (view) Author: kowey Date: 2008-05-30.11:58:14
Ok, well, I'm going to mark this as resolved in the meantime.  Please re-open if
it strikes back.
msg4891 (view) Author: lele Date: 2008-05-30.11:49:01
On Fri, 30 May 2008 10:16:48 -0000
Eric Kow <bugs@darcs.net> wrote:

> Ok, so if you pull the optimisation patch, and make clean (maybe even
> make maintainer-clean) and make again, does it still hang?

Oh boy, I'm really sorry for that, but this time (going thru
maintainer-clean) recompiling *with* that patch I got a non-hanging
darcs...

So, I'll keep this monitored, but seems a rather strange and
intermittent compilation bug that affects only me.

ciao, lele.
-- 
nickname: Lele Gaifax    | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas    | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net |                 -- Fortunato Depero, 1929.
msg4889 (view) Author: kowey Date: 2008-05-30.10:16:46
Ok, so if you pull the optimisation patch, and make clean (maybe even make
maintainer-clean) and make again, does it still hang?
msg4888 (view) Author: lele Date: 2008-05-30.06:38:40
On Thu, 29 May 2008 17:04:35 -0000
Eric Kow <bugs@darcs.net> wrote:

> Lele, please submit a rollback patch for this, making sure it really
> does solve the problem. Hopefully David can have a look (or just
> apply it...) and see what the root cause is.

My problem is that "making sure it really solves the problem"
part. For sure *all* the times I recompiled darcs with that patch *on
this machine*, an ubuntu 8.04 on amd64, with ghc 6.8.2, it hangs.

But yesterday I pulled a few recent patches from the darcs.net and
recompiled: even if I skipped the mentioned patch, I got the same
behaviour... :-|

This was the first and only time I got the hang /without/ the -O2
patch, but nevertheless I checked twice before recompilation, the
patch wasn't there but the newly built darcs exhibited the hang.

A "make clean; make" cycle solved the problem.

I did some other investigation, that didn't enlight me. With an
"hanging darcs":

* "darcs query authors | less" does not hang
* "strace query authors" obviously shows a lot of "wait4()" calls but
  as soon as I hit 'q' and the pager exits, darcs regains control and
  exits without problems

So, while I'm sure it makes a difference, it isn't clear to me if
there is effectively some problem in darcs or not, possibly exposed by
different optimization levels...

Is anybody else experiencing the same behaviour?

ciao, lele.
-- 
nickname: Lele Gaifax    | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas    | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net |                 -- Fortunato Depero, 1929.
msg4880 (view) Author: kowey Date: 2008-05-29.17:04:31
Lele, please submit a rollback patch for this, making sure it really does solve
the problem. Hopefully David can have a look (or just apply it...) and see what
the root cause is.
msg4867 (view) Author: lele Date: 2008-05-26.18:32:52
On Mon, 26 May 2008 14:27:55 -0000
Lele Gaifax <bugs@darcs.net> wrote:

> Doing a "darcs query authors" when darcs decides it needs to spawn a
> pager and the reenters, you get it hung forever, eating all CPU
> time. A "kill -9" is needed.
> 
> This is with 2.0.0 (2.0.0 (+ 268 patches)), with ghc 6.8.3: I'm quite
> sure it worked ok till, say, +230.

Ok, I can reliably track the culprit to the patch

Thu May 22 18:06:17 CEST 2008  David Roundy <droundy@darcs.net>
  * bump optimization level up to -O2.
  I'm not quite sure why we ever dropped it down... presumably someone was
  impatient with compile times.

A double "make clean; make", with and without that specific patch,
shows that using -O2 causes (at least) that strange problem. So maybe
it was not just impatience...

hth,
ciao, lele.
-- 
nickname: Lele Gaifax    | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas    | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net |                 -- Fortunato Depero, 1929.
msg4863 (view) Author: lele Date: 2008-05-26.14:27:52
Doing a "darcs query authors" when darcs decides it needs to spawn a
pager and the reenters, you get it hung forever, eating all CPU
time. A "kill -9" is needed.

This is with 2.0.0 (2.0.0 (+ 268 patches)), with ghc 6.8.3: I'm quite
sure it worked ok till, say, +230.

ciao, lele.
-- 
nickname: Lele Gaifax    | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas    | comincerò ad aver paura di chi mi copia.
lele@nautilus.homeip.net |                 -- Fortunato Depero, 1929.

Browse related patches: unstable | stable

History
Date User Action Args
2008-06-03 14:54:25lelesetnosy: droundy, tommy, beschmi, kowey, dagit, lele, Serware
messages: + msg4932
2008-06-02 13:09:45koweysetnosy: droundy, tommy, beschmi, kowey, dagit, lele, Serware
messages: + msg4919
2008-06-02 12:47:05lelesetstatus: resolved -> chatting
nosy: droundy, tommy, beschmi, kowey, dagit, lele, Serware
messages: + msg4917
2008-05-30 11:58:15koweysetpriority: urgent -> bug
status: in-progress -> resolved
messages: + msg4893
nosy: droundy, tommy, beschmi, kowey, dagit, lele, Serware
2008-05-30 11:49:04lelesetnosy: droundy, tommy, beschmi, kowey, dagit, lele, Serware
messages: + msg4891
2008-05-30 10:16:48koweysetnosy: droundy, tommy, beschmi, kowey, dagit, lele, Serware
messages: + msg4889
2008-05-30 06:38:42lelesetnosy: droundy, tommy, beschmi, kowey, dagit, lele, Serware
messages: + msg4888
2008-05-29 17:04:35koweysetstatus: chatting -> in-progress
topic: + Regression, ReleaseCritical_2.1
title: darcs query authors does not terminate correctly -> show authors => hangs (2.0.0 + 268)
nosy: + Serware, kowey
messages: + msg4880
priority: urgent
assignedto: lele
2008-05-29 16:53:35koweylinkissue880 superseder
2008-05-26 18:32:55lelesetstatus: unread -> chatting
nosy: + droundy
messages: + msg4867
2008-05-26 14:27:54lelecreate