Issue 2400: clone from/to remote: couldn't fetch XXX from sources

Title	clone from/to remote: couldn't fetch XXX from sources
Priority	bug	Status	resolved
Milestone		Resolved in	2.12.0
Superseder		Nosy List	bfrk
Assigned To		Topics

Created on 2014-06-15.19:57:08 by bfrk, last changed 2015-10-16.19:53:54 by noreply.

Messages
msg17546 (view)	Author: bfrk	Date: 2014-06-15.19:57:06
1. Summarise the issue (what were doing, what went wrong?) I wanted to test the new clone-to-url feature (that replaces darcs put). I did: ben@sarun[1]: .../darcs/screened > pwd /home/ben/src/darcs/screened ben@sarun[1]: .../darcs/screened > darcs clone . ben@localhost:/tmp/xxxx Creating local clone... darcs failed: Couldn't fetch `82e1a1f7eb13d7b6f0d97c11efea66af98a060d63346dd35732b8b617d5c124d' in subdir pristine.hashed from sources: thisrepo:/tmp/clone/local cache:/home/ben/.cache/darcs readonly:/home/ben/.darcs/cache The places it looks for the patch file suggest that this is because the repo to be cloned is not complete, i.e. iself cloned with --lazy flag? Do we have a regression test for this feature? 3. What darcs version are you using? (Try: darcs --exact-version) Latest from screened. 4. What operating system are you running? Linux
msg17547 (view)	Author: bfrk	Date: 2014-06-16.00:31:40
Unfortunately I was not able to reproduce this with a toy example.
msg17584 (view)	Author: bfrk	Date: 2014-06-30.00:33:22
I think this was a case of me being confused which version of darcs was active and my local copy of the darcs screened being borked by getting it with a non-functional darcs. It works now after I clearing things out and getting it new. I will shortly send a patch that adds a simple test for clone over ssh.
msg17586 (view)	Author: gh	Date: 2014-07-02.08:31:47
OK, your comment made me look into the network tests which I never run. tests for SSH are in ./tests/network/ and can be run passing the flags --unit=no --shell=no --network=yes to darcs-test (or cabal test).
msg17979 (view)	Author: bfrk	Date: 2015-02-02.21:19:07
Re-opening this bug. I ran the network tests and indeed tests/network/get.sh fails with the same message, even though it does something different: [...] \| darcs get --lazy --tag . http://darcs.net temp3 \| \| darcs failed: Couldn't fetch `0000015910-a6ec1bb4577adea6a40e0d9f4c5f31c61c6b8ffcf24a0318dd5e129ed5e3c80e' \| in subdir inventories from sources: \| \| thisrepo:/tmp/tmpThreadId1912851/temp3 \| cache:/tmp/tmpThreadId1912851/.cache/darcs [...]
msg17980 (view)	Author: bfrk	Date: 2015-02-02.22:18:36
Changing the subject line to better reflect the nature of this problem.
msg18572 (view)	Author: bfrk	Date: 2015-06-21.01:25:46
Ok, sorry for re-opening the wrong report. I just noticed the failure disappears when I add --debug. This immediately suggested a race condition to me. So I looked at the code in Darcs.Repository and what do I see? fetchFilesUsingCache is executed with forkIO...
msg18574 (view)	Author: bfrk	Date: 2015-06-21.12:12:30
Indeed, removing the forkIO makes the problem disappear. Will investigate further. Perhaps refactor the code to use the async package which has been designed to make this kind of code less error prone.
msg18787 (view)	Author: noreply	Date: 2015-10-16.19:17:40
The following patch sent by Ben Franksen <benjamin.franksen@helmholtz-berlin.de> updated issue issue2400 with status=resolved;resolvedin=2.12.0 HEAD * resolve issue2400: use async package to keep track of unpack threads Ignore-this: 19824275268ecdf0fb78ebc720827c17 The main difference is that we now cancel all threads when the job is done. The previous implementation left one of the threads running and I suspect (but haven't strictly verified) that this caused the error message. There is rather strong evidence though: turning on debug messages makes the problem disappear, as did turning off the concurrency (by commenting out the forkIO), both of which suggests a race condition. Then there is the fact that the clone actually succeeded despite the error message. Last not least, with this patch in effect I can no longer reproduce the problem.
msg18788 (view)	Author: gh	Date: 2015-10-16.19:31:04
Sorry for the mixup (I wrongly pushed patches to reviewed), this issue remains open.
msg18794 (view)	Author: noreply	Date: 2015-10-16.19:53:53
The following patch sent by Ben Franksen <benjamin.franksen@helmholtz-berlin.de> updated issue issue2400 with status=resolved;resolvedin=2.12.0 HEAD * resolve issue2400: use async package to keep track of unpack threads Ignore-this: 19824275268ecdf0fb78ebc720827c17 The main difference is that we now cancel all threads when the job is done. The previous implementation left one of the threads running and I suspect (but haven't strictly verified) that this caused the error message. There is rather strong evidence though: turning on debug messages makes the problem disappear, as did turning off the concurrency (by commenting out the forkIO), both of which suggests a race condition. Then there is the fact that the clone actually succeeded despite the error message. Last not least, with this patch in effect I can no longer reproduce the problem.

History
Date	User	Action	Args
2014-06-15 19:57:08	bfrk	create
2014-06-16 00:31:41	bfrk	set	messages: + msg17547
2014-06-30 00:33:24	bfrk	set	priority: invalid messages: + msg17584
2014-07-02 08:31:48	gh	set	messages: + msg17586
2015-02-02 21:19:09	bfrk	set	priority: invalid -> bug status: unknown -> needs-diagnosis/design messages: + msg17979
2015-02-02 22:18:37	bfrk	set	messages: + msg17980 title: clone over ssh fails -> clone from/to remote: couldn't fetch XXX from sources
2015-06-21 01:25:47	bfrk	set	messages: + msg18572
2015-06-21 12:12:32	bfrk	set	messages: + msg18574
2015-10-16 19:17:42	noreply	set	status: needs-diagnosis/design -> resolved messages: + msg18787 resolvedin: 2.12.0
2015-10-16 19:31:05	gh	set	status: resolved -> needs-diagnosis/design messages: + msg18788
2015-10-16 19:53:54	noreply	set	status: needs-diagnosis/design -> resolved messages: + msg18794

Issue 2400 clone from/to remote: couldn't fetch XXX from sources