darcs

Issue 1639 resume/restart HTTP connections

Title resume/restart HTTP connections
Priority feature Status needs-implementation
Milestone Resolved in
Superseder Nosy List WorldMaker, ajsk, darcs-devel, dmitry.kurochkin, kowey
Assigned To
Topics HTTP, Hashed

Created on 2009-10-06.16:02:36 by kowey, last changed 2012-04-12.07:20:49 by ajsk.

Messages
msg8914 (view) Author: kowey Date: 2009-10-06.16:02:29
I'm importing this request from the wiki, the soon to be defunct
http://wiki.darcs.net/WishList

-----------------------
Resume/restart possibility for HTTP connections. If I get/pull a remote
repository over HTTP darcs will often abort due to network problems. My only
option is to retry the full get/pull again, possibly to have darcs abort again.
[firefly@diku.dk]

Reply on some wiki page:
a workaround is to use a caching proxy in the way. Darcs' HTTP traffic is tuned
to cooperate with a (HTTP/1.1 compliant) proxy.
-----------------------

If I understand correctly, hashed repositories make this a practical reality
once we've retrieved the pristine cache (lazy repositories).

Questions we need somebody to find answers to:
- is this bug still relevant?
- what kind of UI could this possibly have?
msg8915 (view) Author: WorldMaker Date: 2009-10-06.18:07:18
Eric Kow wrote:
> If I understand correctly, hashed repositories make this a practical reality
> once we've retrieved the pristine cache (lazy repositories).
> 
> Questions we need somebody to find answers to:
> - is this bug still relevant?

Maybe not with current hashed repositories, but when hashed packs are 
adopted this may become a relevant issue again. The benefit to packs is 
that is often faster to download one 5 megabyte file than 50 100 
kilobyte files. However, the possibility that a 5 MB (or larger) 
download might fail three-quarters of the way through would make it nice 
to have some sort of recovery to keep from wasting too much of a user's 
bandwidth/time.

> - what kind of UI could this possibly have?

Presumably the best kind would be to support it transparently and 
unobtrusively, probably following a similar pattern to that of most 
modern browsers: when downloading a large file (above some threshold), 
every so often save everything that has been downloaded to a .part file. 
You could save the partial files to the cache(s) and the next time an 
operation needs that same file, it can try to use the existing partial 
to resume the operation where it left off, given the case that the 
remote server supports resuming a partial download. Once a download has 
been completed successfully, the .part file can then be removed.
msg8916 (view) Author: kowey Date: 2009-10-06.19:06:47
Sounds good to me, thanks!
msg14685 (view) Author: ajsk Date: 2011-08-21.23:28:09
See also suggestions in issue2062
msg14688 (view) Author: kowey Date: 2011-08-23.17:23:25
Bumping up in priority as per issue2062
msg14689 (view) Author: kowey Date: 2011-08-23.17:42:04
msg14683 (by ajsk) suggests a retry count/delay feature (which sounds 
like it'd be best expressed as an environment variable).

There's also the notion of a transaction log which may require further 
discussion

I have the impression that this is a contribution that a non-Haskeller 
could make (if the thing holding them back from darcs hacking is the 
Haskell, that is) -- have a look at src/hscurl.c (somebody else on the 
darcs crew can help with the UI bits, ie the environment variables).
msg14692 (view) Author: ajsk Date: 2011-08-28.18:17:25
I may be able to contribute that then, since I know C :-)
msg14694 (view) Author: ajsk Date: 2011-08-28.19:31:20
A quick look at the c code, one thing I notice immediately -- no retry
for 504 gateway timeout. 504 errors are transient. :-) So, yes, it does
look like I can do some parts of a fix in the c file. not certain if all
the fixes will go there, but I need to look harder. Just thought I would
comment and say that I'm on it. :-)
msg14695 (view) Author: ajsk Date: 2011-08-28.21:44:26
Found where the error control should really be for transient http
errors. It should not be handled in curl.c at all.

★ URL.hs is the correct place to be doing the retries.

★ Retry should default to 3 attempts.

★ Default delay should be 1 second.

★ Retry delay should be delay * fail count.


Retry should happen when one the following errors are returned:

errorNum transients:
7   : CURLE_COULDNT_CONNECT
28  : CURLE_OPERATION_TIMEDOUT
52  : CURLE_GOT_NOTHING
55  : CURLE_SEND_ERROR
56  : CURLE_RECV_ERROR
88  : CURLE_CHUNK_FAILED

httpErrorCode transients:
500: /* Internal Server Error */
502: /* Bad Gateway */
503: /* Service Unavailable */
504: /* Gateway Timeout */
msg14707 (view) Author: kowey Date: 2011-08-31.20:37:59
Thanks for looking into this.  I guess I could have a go at looking into 
this (warning, I'm very slow).  I'm a bit unsure about how I'd go testing 
this... perhaps have to throw together some sort of http server that 
deliberately spits out http error codes randomly/frequently?  Is there 
some more convenient way?
msg14708 (view) Author: kowey Date: 2011-08-31.20:40:09
Note, while I'm not ready to mark this as "ProbablyEasy", it's the sort of 
task that doesn't require going too deep into Darcs internals.  So maybe 
worthwhile if somebody else wants to run a step further after the 
ProbablyEasy treadmill
msg14709 (view) Author: ganesh Date: 2011-08-31.21:00:54
We've been adding a test harness to the haskell HTTP package that has a 
web-server builtin, we could probably adapt it for darcs too.
msg14712 (view) Author: ajsk Date: 2011-09-05.06:10:25
Some input for you as far as testing robustness.

The easiest way to simulate a network failure: Yank the Ethernet cable.

Not joking :-)

You could also set up a squid proxy, and do some stop/start cycles on
that to also simulate a proxy failure.

Another idea would be to set up a web server and allow a small amount of
connections, That would simulate a congested/overloaded server situation.

For bonus points, do combinations of all the above, and you should not
be left with many corner cases to resolve.

Hope that helps.
msg14731 (view) Author: ganesh Date: 2011-09-16.21:06:41
As an aside, I find it slightly reassuring that git (at least on Windows) is similarly unrobust. Of course, darcs should definitely 
do better!

$ git clone http://darcs.haskell.org/ghc.git/
Cloning into ghc...
error: Recv failure: Connection was reset (curl_result = 56, http_code = 0, sha1
 = 8b41c97711ad3f2ea147af18984e3d65679b0ae2)
error: Recv failure: Connection was reset (curl_result = 56, http_code = 0, sha1
 = 09af3a824f0f161dd9bcb25e6b0cf6b40ac54b8a)
error: Unable to find eecd53bcdc307726fb4bb058f3da013b72386137 under http://darc
s.haskell.org/ghc.git
Cannot obtain needed commit eecd53bcdc307726fb4bb058f3da013b72386137
while processing commit 09495daf5b71fb9faea19ebd893a26b85911eaa3.
error: Fetch failed.
msg14823 (view) Author: kowey Date: 2011-11-23.17:31:47
Sorry, I haven't time to try and implement this.
Just unassigning myself so somebody else doesn't steer themselves away.
msg15558 (view) Author: ajsk Date: 2012-04-12.07:20:48
I'm going to try to wrap my brain around Haskell to fix this. 

It should not be that difficult to do. 

A counter and a retry can't be that difficult. :-)

I got some time today to dedicate some effort, so, I guess I'll dig into
some Haskell self-learning materials on the web.
History
Date User Action Args
2009-10-06 16:02:36koweycreate
2009-10-06 18:07:26WorldMakersetnosy: + WorldMaker
messages: + msg8915
2009-10-06 19:06:55koweysetstatus: needs-reproduction -> needs-implementation
nosy: kowey, darcs-devel, WorldMaker, dmitry.kurochkin
topic: + Hashed
messages: + msg8916
2009-10-23 22:39:18adminsetnosy: + maxbattcher, - WorldMaker
2009-10-24 00:00:37adminsetnosy: + WorldMaker, - maxbattcher
2011-08-20 20:59:34koweylinkissue2062 superseder
2011-08-20 20:59:47koweysetnosy: + ajsk
2011-08-21 23:28:10ajsksetmessages: + msg14685
2011-08-23 17:23:26koweysetpriority: wishlist -> feature
messages: + msg14688
2011-08-23 17:42:05koweysetmessages: + msg14689
2011-08-28 18:17:26ajsksetmessages: + msg14692
2011-08-28 19:31:21ajsksetmessages: + msg14694
2011-08-28 21:44:27ajsksetmessages: + msg14695
2011-08-31 20:38:00koweysetassignedto: kowey
messages: + msg14707
2011-08-31 20:40:10koweysetmessages: + msg14708
2011-08-31 21:00:55ganeshsetmessages: + msg14709
2011-09-05 06:10:27ajsksetmessages: + msg14712
2011-09-16 21:06:43ganeshsetmessages: + msg14731
2011-11-23 17:31:48koweysetassignedto: kowey ->
messages: + msg14823
2012-04-12 07:20:49ajsksetmessages: + msg15558