darcs

Issue 2400 clone from/to remote: couldn't fetch XXX from sources

Title clone from/to remote: couldn't fetch XXX from sources
Priority bug Status resolved
Milestone Resolved in 2.12.0
Superseder Nosy List bf
Assigned To
Topics

Created on 2014-06-15.19:57:08 by bf, last changed 2015-10-16.19:53:54 by noreply.

Messages
msg17546 (view) Author: bf Date: 2014-06-15.19:57:06
1. Summarise the issue (what were doing, what went wrong?)

I wanted to test the new clone-to-url feature (that replaces darcs put).
I did:

ben@sarun[1]: .../darcs/screened > pwd          
/home/ben/src/darcs/screened
ben@sarun[1]: .../darcs/screened > darcs clone . ben@localhost:/tmp/xxxx
        
Creating local clone...

darcs failed:  Couldn't fetch
`82e1a1f7eb13d7b6f0d97c11efea66af98a060d63346dd35732b8b617d5c124d'
in subdir pristine.hashed from sources:

thisrepo:/tmp/clone/local
cache:/home/ben/.cache/darcs
readonly:/home/ben/.darcs/cache

The places it looks for the patch file suggest that this is because the
repo to be cloned is not complete, i.e. iself cloned with --lazy flag?

Do we have a regression test for this feature?

3. What darcs version are you using? (Try: darcs --exact-version)

Latest from screened.

4. What operating system are you running?

Linux
msg17547 (view) Author: bf Date: 2014-06-16.00:31:40
Unfortunately I was not able to reproduce this with a toy example.
msg17584 (view) Author: bf Date: 2014-06-30.00:33:22
I think this was a case of me being confused which version of darcs was
active and my local copy of the darcs screened being borked by getting
it with a non-functional darcs. It works now after I clearing things out
and getting it new.

I will shortly send a patch that adds a simple test for clone over ssh.
msg17586 (view) Author: gh Date: 2014-07-02.08:31:47
OK, your comment made me look into the network tests which I never run.

tests for SSH are in ./tests/network/  and can be run passing the
flags --unit=no --shell=no --network=yes to darcs-test (or cabal
test).
msg17979 (view) Author: bf Date: 2015-02-02.21:19:07
Re-opening this bug. I ran the network tests and indeed
tests/network/get.sh fails with the same message, even though it does
something different:

[...]
| darcs get --lazy --tag . http://darcs.net temp3
| 
| darcs failed:  Couldn't fetch
`0000015910-a6ec1bb4577adea6a40e0d9f4c5f31c61c6b8ffcf24a0318dd5e129ed5e3c80e'
| in subdir inventories from sources:
| 
| thisrepo:/tmp/tmpThreadId1912851/temp3
| cache:/tmp/tmpThreadId1912851/.cache/darcs
[...]
msg17980 (view) Author: bf Date: 2015-02-02.22:18:36
Changing the subject line to better reflect the nature of this problem.
msg18572 (view) Author: bf Date: 2015-06-21.01:25:46
Ok, sorry for re-opening the wrong report.

I just noticed the failure disappears when I add --debug. This
immediately suggested a race condition to me. So I looked at the code in
Darcs.Repository and what do I see? fetchFilesUsingCache is executed
with forkIO...
msg18574 (view) Author: bf Date: 2015-06-21.12:12:30
Indeed, removing the forkIO makes the problem disappear. Will
investigate further. Perhaps refactor the code to use the async package
which has been designed to make this kind of code less error prone.
msg18787 (view) Author: noreply Date: 2015-10-16.19:17:40
The following patch sent by Ben Franksen <benjamin.franksen@helmholtz-berlin.de> updated issue issue2400 with
status=resolved;resolvedin=2.12.0 HEAD

* resolve issue2400: use async package to keep track of unpack threads 
Ignore-this: 19824275268ecdf0fb78ebc720827c17

The main difference is that we now cancel all threads when the job is done.
The previous implementation left one of the threads running and I suspect
(but haven't strictly verified) that this caused the error message. There is
rather strong evidence though: turning on debug messages makes the problem
disappear, as did turning off the concurrency (by commenting out the
forkIO), both of which suggests a race condition. Then there is the fact
that the clone actually succeeded despite the error message. Last not least,
with this patch in effect I can no longer reproduce the problem.
msg18788 (view) Author: gh Date: 2015-10-16.19:31:04
Sorry for the mixup (I wrongly pushed patches to reviewed), this issue
remains open.
msg18794 (view) Author: noreply Date: 2015-10-16.19:53:53
The following patch sent by Ben Franksen <benjamin.franksen@helmholtz-berlin.de> updated issue issue2400 with
status=resolved;resolvedin=2.12.0 HEAD

* resolve issue2400: use async package to keep track of unpack threads 
Ignore-this: 19824275268ecdf0fb78ebc720827c17

The main difference is that we now cancel all threads when the job is done.
The previous implementation left one of the threads running and I suspect
(but haven't strictly verified) that this caused the error message. There is
rather strong evidence though: turning on debug messages makes the problem
disappear, as did turning off the concurrency (by commenting out the
forkIO), both of which suggests a race condition. Then there is the fact
that the clone actually succeeded despite the error message. Last not least,
with this patch in effect I can no longer reproduce the problem.
History
Date User Action Args
2014-06-15 19:57:08bfcreate
2014-06-16 00:31:41bfsetmessages: + msg17547
2014-06-30 00:33:24bfsetpriority: invalid
messages: + msg17584
2014-07-02 08:31:48ghsetmessages: + msg17586
2015-02-02 21:19:09bfsetpriority: invalid -> bug
status: unknown -> needs-diagnosis/design
messages: + msg17979
2015-02-02 22:18:37bfsetmessages: + msg17980
title: clone over ssh fails -> clone from/to remote: couldn't fetch XXX from sources
2015-06-21 01:25:47bfsetmessages: + msg18572
2015-06-21 12:12:32bfsetmessages: + msg18574
2015-10-16 19:17:42noreplysetstatus: needs-diagnosis/design -> resolved
messages: + msg18787
resolvedin: 2.12.0
2015-10-16 19:31:05ghsetstatus: resolved -> needs-diagnosis/design
messages: + msg18788
2015-10-16 19:53:54noreplysetstatus: needs-diagnosis/design -> resolved
messages: + msg18794