Issue 2062 Darcs does not retry on failed http get/pull, resulting in useless repo

Title Darcs does not retry on failed http get/pull, resulting in useless repo
Priority wishlist Status duplicate
Milestone Resolved in
Superseder resume/restart HTTP connections
View: 1639
Nosy List ajsk
Assigned To
Topics HTTP

Created on 2011-04-07.02:22:34 by ajsk, last changed 2011-08-28.19:16:10 by ajsk.

msg13910 (view) Author: ajsk Date: 2011-04-07.02:22:33
When 'darcs get' or 'darcs pull' has a failure during http download,
darcs does not try to re-get or resume the download. A resume would be
the proper thing to do, but a full retry would also be acceptable.

Without recovery, the repository copy created is corrupted and unusable. 

If automatic retry fails after 3 attempts, darcs should be smart enough
to know where it left off, so that a manual resume may be performed once
the network problem is resolved.
msg14673 (view) Author: ajsk Date: 2011-08-17.02:27:22
msg14682 (view) Author: kowey Date: 2011-08-20.20:59:32
Thanks for the report.  

Sorry Andrew, I'm having a hard time keeping up with bug-tracker triage.

This appears to be a duplicate of issue1639, which we've had pegged as a 
wishlist item for a while.

Is it a serious blocker for you?

[Do let me know if I have misunderstood something in marking this a 

Is this blocking you in any way?  Or did you mean this was urgent in the 
sense that this is something that the darcs crew should be taking more 
seriously than other issues?

msg14683 (view) Author: ajsk Date: 2011-08-21.21:57:48
Yes it is a blocker. here is the overview of the problem.

Pulls are over an unreliable HTTP proxy, and there is nothing that can
be done except to retry. When the proxy fails due to high demand, the
resulting repo is not complete, and there is no option to resume or
recover from such an error. Incomplete repo happens also during large
updates, resulting in missing patch files. The only recourse is to
delete the entire repo and try over again, waiting and hoping for the
entire download to complete without error. SSH is not available at all,
neither is any other protocol. The network is basically locked down to
an overloaded proxy, which is why it fails. HTTP over this unreliable
proxy is the only option.

Here is what I propose.

Three things IMHO should be present.
Firstly, a retry option, that takes a positive integer... Zero should
mean try forever. This would make life a lot easier to start off with,
and is the easiest to implement I would think. 99% of the problems would
be solved with just this option. 

Secondly, it is a good idea to have an adjustable delay between retries
and between gets, as this too would be very helpful, especially when
dealing with an overloaded proxy.

Finally, there appears to be no sort of transaction journal, at least
from what I have seen. A journal to allow darcs to resume from some
point when something goes very wrong would be an excellent idea,
providing an ACID-like robustness (Please see
http://en.wikipedia.or/wiki/ACID for a terse description if you are not
familiar with the term). While this would be a little more difficult to
do, it could solve piles of problems, especially when testing a full
repo consistency.
msg14684 (view) Author: ajsk Date: 2011-08-21.22:14:21
Oh, I forgot to mention... yes, this is definitely something that the
darcs crew should be taking more seriously. 

This is a huge issue for cases when there is an unreliable connection.

Some examples of these are:
WIFI, dialup, unreliable proxies, broken routers on the net, some
bonehead digging up a fiber optic line, breaking things for everyone.

I could go on, but I don't think I need to. 

Retry, recovery, and only apply such updates if they were all successful
is very important to everybody.
msg14690 (view) Author: kowey Date: 2011-08-23.19:43:21
Thanks for the suggestions!  We should think about ways to get you 
unstuck.  Are you fetching 3rd-party repositories here, or is it 
primarily repositories you (ultimately) have control over?  Also, do you 
have any idea if the repositories you are fetching are "hashed" 
repositories?  There may be some way to exploit the file contents hashes 
and caching that darcs already provides ("hashed" files are copied to a 
global cache, so subsequent fetches can be faster; perhaps it would 
suffice to detect the misfetched files and delete them)

Could I ask you to redirect the transaction journal remark to darcs-
users? I don't think I'm a good person to comment on this.
msg14693 (view) Author: ajsk Date: 2011-08-28.19:16:09
All repositories are in the latest format. Yes, delete and retry or
resume. That would work. Will continue discussion on issue 1639. No
sense in doing this in two threads :-)
Date User Action Args
2011-04-07 02:22:34ajskcreate
2011-08-17 02:27:23ajsksetmessages: + msg14673
2011-08-20 20:59:34koweysetpriority: urgent -> wishlist
status: unknown -> duplicate
topic: + HTTP
superseder: + resume/restart HTTP connections
messages: + msg14682
2011-08-21 21:57:49ajsksetmessages: + msg14683
2011-08-21 22:14:22ajsksetmessages: + msg14684
2011-08-23 19:43:22koweysetmessages: + msg14690
2011-08-28 19:16:10ajsksetmessages: + msg14693