darcs

Issue 377 getting many patches over http is slow (pipelining)

Title getting many patches over http is slow (pipelining)
Priority bug Status resolved
Milestone Resolved in
Superseder case-insensitive filesystems confuse darcs
View: 196
Nosy List darcs-devel, dmitry.kurochkin, jch, kowey, markstos, simonmar, simonpj, thorkilnaur, tim, tommy
Assigned To tim
Topics Darcs2, Performance

Created on 2007-01-02.22:11:06 by simonpj, last changed 2009-10-24.00:42:50 by admin.

Messages
msg1377 (view) Author: simonpj Date: 2007-01-02.22:10:59
Sigh.  Two new darcs bugs.

Darcs has just gone into an infinite loop, when doing a 'pull' of a few dozen patches from GHC's HEAD repository into a partial repository on my laptop.  (Well, after 15 mins of 100% CPU I killed it.)  I have the repository if that helps.

OK so then I tried a darcs get (*not* --partial, since that seems to give rise to crashes and other strangeness) for the full repository

sh-2.04$ darcs get http://darcs.haskell.org/ghc --repo-name HEAD

        -- Wait 1.5 hrs for darcs to download 15,000 patches

Then:....

Applying patch 1 of 15164...
Applying patch 2 of 15164...
Applying patch 3 of 15164...
Applying patch 4 of 15164...
Applying patch 5 of 15164...
Applying patch 6 of 15164...
Applying patch 7 of 15164...
Applying patch 8 of 15164...
Applying patch 9 of 15164...
Applying patch 10 of 15164...
Applying patch 11 of 15164...
Applying patch 12 of 15164...
darcs failed:  Error applying hunk to file ./ghc/includes/rtsTypes.lh
Unapplicable patch:
Thu Jan 11 14:26:13 GMT Standard Time 1996  partain
  * [project @ 1996-01-11 14:06:51 by partain]

This is on Windows

I guess I don't have the most up-to-date version:
sh-2.04$ darcs --version
1.0.6pre1 (unknown)
sh-2.04$

Next thing is to upgrade... the latest version seems to be 1.07.

Simon
msg1380 (view) Author: kowey Date: 2007-01-03.01:29:05
> For what it's worth, I get the same problem with darcs 1.0.9rc2 on my Mac.
> The problem seems to be that the patch attempts to remove a file
>     ./ghc/includes/rtsTypes.lh
> 
> This is normally fine; however, previous patches had created the files
>     ./ghc/includes/RtsTypes.lh
>     ./ghc/includes/rtsTypes.lh
> 
> I'm betting that if you did the same thing on a case-sensitive file
> system, everything is ok.
> 
> On machines with case-insensitive files systems, e.g. my Mac, darcs does
> the wrong thing.

Oh! I just realised what a partial solution to this problem might look
like: a halfs pristine cache, exactly what was proposed for
http://bugs.darcs.net/issue230

At least we can ensure that pristine is correct and that we can apply
the patches.  Still not sure what we would do with the working
directory, though.
msg1381 (view) Author: kowey Date: 2007-01-03.01:32:03
Resending due to malformed tracker request.

Problem 1
---------
> Darcs has just gone into an infinite loop, when doing a 'pull' of a
> few dozen patches from GHC's HEAD repository into a partial repository
> on my laptop.

I'm going to shift discussion of that patch to
http://bugs.darcs.net/issue378

Side note
---------
> sh-2.04$ darcs get http://darcs.haskell.org/ghc --repo-name HEAD
> 
>         -- Wait 1.5 hrs for darcs to download 15,000 patches

One not-very-nice way to avoid this is to make a tarball, copy that
tarball by hand, and darcs get from untarred directory.  This also
side-steps the unapplicable patch problem below, because darcs does not
attempt to do its patch work.  This is not very useful for day-to-day
operations like darcs pull, but for an initial 'get', it might be
worthwhile.

Problem 2
---------
> darcs failed:  Error applying hunk to file ./ghc/includes/rtsTypes.lh
> Unapplicable patch:
> Thu Jan 11 14:26:13 GMT Standard Time 1996  partain
>   * [project @ 1996-01-11 14:06:51 by partain]

For what it's worth, I get the same problem with darcs 1.0.9rc2 on my Mac.
The problem seems to be that the patch attempts to remove a file
    ./ghc/includes/rtsTypes.lh

This is normally fine; however, previous patches had created the files
    ./ghc/includes/RtsTypes.lh
    ./ghc/includes/rtsTypes.lh

I'm betting that if you did the same thing on a case-sensitive file
system, everything is ok.

On machines with case-insensitive files systems, e.g. my Mac, darcs does
the wrong thing.  Darcs tries to avoid this possibility by warning you
when you add a file that has the same name, modulo case (either this is
relatively new in darcs, or the --case-ok override was used).  It would
be good if darcs had a a more robust treatment of such cases, because
shit does happen (and people do use --case-ok).  What that treatment
would be, I do not know.  See http://bugs.darcs.net/issue53 for some
prior discussion on the matter.

The workaround used by the HaXmL developers is to use --partial, but
sadly, this has other problems!

It seems also that what the darcs universe really needs is a
darcs-firstaid kit which will let you swap out a bad patch for an almost
equivalent "good" one.  For example, the bad patch in question might be
the patch that adds RtsTypes.lh, and its swapee might be the same, sans
that file.  This would normally be a dangerous thing (I presume), and
people might need to re-do their darcs get because the patch in question
and the subsequent patches are no longer the same (???), but it would
save some users a lot of grief in the here and now.  Such a tool could
also help with things like Kirsten was suggesting on the mailing list
(i.e.  patch swapping might just be one of the elements of the kit,
hunting down conflict-prone patches might be another)
msg1382 (view) Author: simonpj Date: 2007-01-03.08:03:42
Thanks for your prompt replies, Eric.

I think I have learned the following about this "unapplicable patch" problem: you're working on it, but there is nothing I can do about it:

        - I cannot do a non-partial get of GHC's repository on a Windows file system
        - There is no way to fix the repository so that I can

In short, non-partial gets simply cannot be done on Windows (for GHC's repository, that is).  This sounds like a pretty serious problem to me.

I guess I can use --partial for now, and hope that doesn't run into problems.

Simon

| -----Original Message-----
| From: Eric Kow [mailto:bugs@darcs.net]
| Sent: 03 January 2007 01:32
| To: beschmi@cloaked.de; droundy@darcs.net; eric.kow@gmail.com;
| ptp@lysator.liu.se; Simon Peyton-Jones
| Subject: [issue377] Unapplicable patch on case-insensitive file systems
|
|
| Eric Kow <eric.kow@gmail.com> added the comment:
|
| Resending due to malformed tracker request.
|
| Problem 1
| ---------
| > Darcs has just gone into an infinite loop, when doing a 'pull' of a
| > few dozen patches from GHC's HEAD repository into a partial
| repository
| > on my laptop.
|
| I'm going to shift discussion of that patch to
| http://bugs.darcs.net/issue378
|
| Side note
| ---------
| > sh-2.04$ darcs get http://darcs.haskell.org/ghc --repo-name HEAD
| >
| >         -- Wait 1.5 hrs for darcs to download 15,000 patches
|
| One not-very-nice way to avoid this is to make a tarball, copy that
| tarball by hand, and darcs get from untarred directory.  This also
| side-steps the unapplicable patch problem below, because darcs does not
| attempt to do its patch work.  This is not very useful for day-to-day
| operations like darcs pull, but for an initial 'get', it might be
| worthwhile.
|
| Problem 2
| ---------
| > darcs failed:  Error applying hunk to file ./ghc/includes/rtsTypes.lh
| > Unapplicable patch:
| > Thu Jan 11 14:26:13 GMT Standard Time 1996  partain
| >   * [project @ 1996-01-11 14:06:51 by partain]
|
| For what it's worth, I get the same problem with darcs 1.0.9rc2 on my
| Mac.
| The problem seems to be that the patch attempts to remove a file
|     ./ghc/includes/rtsTypes.lh
|
| This is normally fine; however, previous patches had created the files
|     ./ghc/includes/RtsTypes.lh
|     ./ghc/includes/rtsTypes.lh
|
| I'm betting that if you did the same thing on a case-sensitive file
| system, everything is ok.
|
| On machines with case-insensitive files systems, e.g. my Mac, darcs
| does
| the wrong thing.  Darcs tries to avoid this possibility by warning you
| when you add a file that has the same name, modulo case (either this is
| relatively new in darcs, or the --case-ok override was used).  It would
| be good if darcs had a a more robust treatment of such cases, because
| shit does happen (and people do use --case-ok).  What that treatment
| would be, I do not know.  See http://bugs.darcs.net/issue53 for some
| prior discussion on the matter.
|
| The workaround used by the HaXmL developers is to use --partial, but
| sadly, this has other problems!
|
| It seems also that what the darcs universe really needs is a
| darcs-firstaid kit which will let you swap out a bad patch for an
| almost
| equivalent "good" one.  For example, the bad patch in question might be
| the patch that adds RtsTypes.lh, and its swapee might be the same, sans
| that file.  This would normally be a dangerous thing (I presume), and
| people might need to re-do their darcs get because the patch in
| question
| and the subsequent patches are no longer the same (???), but it would
| save some users a lot of grief in the here and now.  Such a tool could
| also help with things like Kirsten was suggesting on the mailing list
| (i.e.  patch swapping might just be one of the elements of the kit,
| hunting down conflict-prone patches might be another)
|
| ----------
| title: Darcs loop and darcs failure -> Unapplicable patch on case-
| insensitive file systems
|
| ____________________________________
| Darcs issue tracker <bugs@darcs.net>
| <http://bugs.darcs.net/issue377>
| ____________________________________
msg1383 (view) Author: kowey Date: 2007-01-03.20:11:49
On Wed, Jan 03, 2007 at 08:02:37 +0000, Simon Peyton-Jones wrote:
> In short, non-partial gets simply cannot be done on Windows (for GHC's
> repository, that is).  This sounds like a pretty serious problem to
> me.

Right.  This only applies to remote gets, though.  Getting locally works
because darcs does not attempt any patch application.  Hence the tarball
trick.

> I guess I can use --partial for now, and hope that doesn't run into problems.
msg1384 (view) Author: simonmar Date: 2007-01-04.14:09:32
Simon Peyton-Jones wrote:
> Thanks for your prompt replies, Eric.
>
> I think I have learned the following about this "unapplicable
> patch" problem: you're working on it, but there is nothing I can do
> about it:
>
>       - I cannot do a non-partial get of GHC's repository on
> a Windows file system

The workaround is to download the entire repo instead of using 'darcs get'.  I put up a tarball for GHC, see this msg:

http://www.haskell.org/pipermail/cvs-ghc/2006-December/032971.html

>       - There is no way to fix the repository so that I can
>
> In short, non-partial gets simply cannot be done on Windows
> (for GHC's repository, that is).  This sounds like a pretty serious
> problem to me.

This is the case-insensitive filesystem issue, right?

Would it be possible for darcs to use a checkpoint to create the working copy during 'darcs get', even without --partial?  That might even speed up darcs get, because it doesn't have to apply all the patches up to the checkpoint.  Also it would work around this case-insensitivity problem. Perhaps it could be optional, if downloading the checkpoint takes too long.

Cheers,
        Simon

> I guess I can use --partial for now, and hope that doesn't run into
> problems.
>
> Simon
>
>
>
>
>> -----Original Message-----
>> From: Eric Kow [mailto:bugs@darcs.net]
>> Sent: 03 January 2007 01:32
>> To: beschmi@cloaked.de; droundy@darcs.net; eric.kow@gmail.com;
>> ptp@lysator.liu.se; Simon Peyton-Jones
>> Subject: [issue377] Unapplicable patch on case-insensitive file
>> systems
>>
>>
>> Eric Kow <eric.kow@gmail.com> added the comment:
>>
>> Resending due to malformed tracker request.
>>
>> Problem 1
>> ---------
>>> Darcs has just gone into an infinite loop, when doing a 'pull' of a
>>> few dozen patches from GHC's HEAD repository into a partial
>>> repository on my laptop.
>>
>> I'm going to shift discussion of that patch to
>> http://bugs.darcs.net/issue378
>>
>> Side note
>> ---------
>>> sh-2.04$ darcs get http://darcs.haskell.org/ghc --repo-name HEAD
>>>
>>>         -- Wait 1.5 hrs for darcs to download 15,000 patches
>>
>> One not-very-nice way to avoid this is to make a tarball, copy that
>> tarball by hand, and darcs get from untarred directory.  This also
>> side-steps the unapplicable patch problem below, because darcs does
>> not attempt to do its patch work.  This is not very useful for
>> day-to-day operations like darcs pull, but for an initial 'get', it
>> might be worthwhile.
>>
>> Problem 2
>> ---------
>>> darcs failed:  Error applying hunk to file
>>> ./ghc/includes/rtsTypes.lh Unapplicable patch: Thu Jan 11 14:26:13
>>>   GMT Standard Time 1996  partain * [project @ 1996-01-11 14:06:51
>>> by partain]
>>
>> For what it's worth, I get the same problem with darcs 1.0.9rc2 on
>> my Mac. The problem seems to be that the patch attempts to remove a
>>     file ./ghc/includes/rtsTypes.lh
>>
>> This is normally fine; however, previous patches had created the
>>     files ./ghc/includes/RtsTypes.lh
>>     ./ghc/includes/rtsTypes.lh
>>
>> I'm betting that if you did the same thing on a case-sensitive file
>> system, everything is ok.
>>
>> On machines with case-insensitive files systems, e.g. my Mac, darcs
>> does the wrong thing.  Darcs tries to avoid this possibility by
>> warning you when you add a file that has the same name, modulo case
>> (either this is relatively new in darcs, or the --case-ok override
>> was used).  It would be good if darcs had a a more robust treatment
>> of such cases, because shit does happen (and people do use
>> --case-ok).  What that treatment would be, I do not know.  See
>> http://bugs.darcs.net/issue53 for some prior discussion on the
>> matter.
>>
>> The workaround used by the HaXmL developers is to use --partial, but
>> sadly, this has other problems!
>>
>> It seems also that what the darcs universe really needs is a
>> darcs-firstaid kit which will let you swap out a bad patch for an
>> almost equivalent "good" one.  For example, the bad patch in
>> question might be the patch that adds RtsTypes.lh, and its swapee
>> might be the same, sans that file.  This would normally be a
>> dangerous thing (I presume), and people might need to re-do their
>> darcs get because the patch in question and the subsequent patches
>> are no longer the same (???), but it would save some users a lot of
>> grief in the here and now.  Such a tool could also help with things
>> like Kirsten was suggesting on the mailing list (i.e.  patch
>> swapping might just be one of the elements of the kit, hunting down
>> conflict-prone patches might be another)
>>
>> ----------
>> title: Darcs loop and darcs failure -> Unapplicable patch on case-
>> insensitive file systems
>>
>> ____________________________________
>> Darcs issue tracker <bugs@darcs.net>
>> <http://bugs.darcs.net/issue377>
>> ____________________________________
msg1387 (view) Author: tim Date: 2007-01-08.15:56:23
On 1/2/07, Simon Peyton-Jones <bugs@darcs.net> wrote:
>
> New submission from Simon Peyton-Jones <simonpj@microsoft.com>:
>
> Sigh.  Two new darcs bugs.
>
> Darcs has just gone into an infinite loop, when doing a 'pull' of a few dozen patches from GHC's HEAD repository into a partial repository on my laptop.  (Well, after 15 mins of 100% CPU I killed it.)  I have the repository if that helps.
>
> OK so then I tried a darcs get (*not* --partial, since that seems to give rise to crashes and other strangeness) for the full repository
>
> sh-2.04$ darcs get http://darcs.haskell.org/ghc --repo-name HEAD
>
>         -- Wait 1.5 hrs for darcs to download 15,000 patches
>
> Then:....
>
> Applying patch 1 of 15164...
> Applying patch 2 of 15164...
> Applying patch 3 of 15164...
> Applying patch 4 of 15164...
> Applying patch 5 of 15164...
> Applying patch 6 of 15164...
> Applying patch 7 of 15164...
> Applying patch 8 of 15164...
> Applying patch 9 of 15164...
> Applying patch 10 of 15164...
> Applying patch 11 of 15164...
> Applying patch 12 of 15164...
> darcs failed:  Error applying hunk to file ./ghc/includes/rtsTypes.lh
> Unapplicable patch:
> Thu Jan 11 14:26:13 GMT Standard Time 1996  partain
>   * [project @ 1996-01-11 14:06:51 by partain]
>
> This is on Windows
>
> I guess I don't have the most up-to-date version:
> sh-2.04$ darcs --version
> 1.0.6pre1 (unknown)
> sh-2.04$
>
> Next thing is to upgrade... the latest version seems to be 1.07.

I don't think upgrading will help.  I saw similar problems on my
Windows machine when using darcs built from the darcs repository for
darcs. (as of Dec. 20 2006 or thereabouts.) In particular, the first
problem always happens when there are conflicts, doesn't it? At least,
I know I've seen it before and the "solution" was to manually
binary-search for the conflicting patch and avoid pulling it.

Cheers,
Kirsten
msg1397 (view) Author: droundy Date: 2007-01-09.23:17:36
On Wed, Jan 03, 2007 at 01:32:11AM +0000, Eric Kow wrote:
> On machines with case-insensitive files systems, e.g. my Mac, darcs does
> the wrong thing.  Darcs tries to avoid this possibility by warning you
> when you add a file that has the same name, modulo case (either this is
> relatively new in darcs, or the --case-ok override was used).  It would
> be good if darcs had a a more robust treatment of such cases, because
> shit does happen (and people do use --case-ok).  What that treatment
> would be, I do not know.  See http://bugs.darcs.net/issue53 for some
> prior discussion on the matter.

There is a fix to this in the wishlist stage, which is the idea for a halfs
(or other data-base-like) pristine cache.  This would allow us to apply all
the patches to the pristine cache first (as we currently do), and then copy
the cache to the working directory (as we currently do, but requiring new
code).  The catch is that halfs isn't yet good enough--it doesn't allow for
resizing of the file holding the filesystem, and I believe doesn't hold
file modification times, which darcs needs for efficiency reasons.

On the other hand, it probably wouldn't be much work (maybe a week for
someone of the caliber of dons or John Goerzen) to write a binding (or use
an existing one) to a database library like sqlite or berkeleydb and
implement an expandable filesystem in that and allow darcs to use it for
the backend cache.

This isn't that hard to fix, it's just that noone has had time.  :(

Note that this would also remove a primary cause of repository corruption,
which is programs that recursively descend into the prisine cache,
corrupting the repository as they go (automake used to do this, and perhaps
dreamweaver still does).  It ought also to at least on some filesystems and
under some workloads allow for improved performance.

David
msg1398 (view) Author: droundy Date: 2007-01-09.23:19:59
On Thu, Jan 04, 2007 at 02:08:38PM +0000, Simon Marlow wrote:
> Would it be possible for darcs to use a checkpoint to create the working
> copy during 'darcs get', even without --partial?  That might even speed
> up darcs get, because it doesn't have to apply all the patches up to the
> checkpoint.  Also it would work around this case-insensitivity
> problem. Perhaps it could be optional, if downloading the checkpoint
> takes too long.

That would be possible, and you're right, it certainly could speed up full
gets (at the cost of bandwidth and disk space), and perhaps isn't a bad
idea for a new feature.

David
msg1400 (view) Author: jch Date: 2007-01-11.16:36:02
[Resending -- sorry if you receive this twice]

>>         -- Wait 1.5 hrs for darcs to download 15,000 patches

> One not-very-nice way to avoid this is to make a tarball, copy that
> tarball by hand, and darcs get from untarred directory.

The problem, in case someone's interested, is that Darcs doesn't
pipeline outgoing HTTP requests, so you pay a one RTT penalty for
every patch downloaded.  This will not be fixed until libcurl
implements pipelining, which, as far as I can tell, is not likely to
happen anytime soon.

A workaround could be to open multiple concurrent connections and
fetch the patches in parallel.

I can recommend the following paper on the subject:

  http://www.w3.org/Protocols/HTTP/Performance/Pipeline.html

                                        Juliusz
msg1401 (view) Author: droundy Date: 2007-01-11.19:52:33
On Thu, Jan 11, 2007 at 04:36:09PM +0000, Juliusz Chroboczek wrote:
> > >        -- Wait 1.5 hrs for darcs to download 15,000 patches
> 
> > One not-very-nice way to avoid this is to make a tarball, copy that
> > tarball by hand, and darcs get from untarred directory.
> 
> The problem, in case someone's interested, is that Darcs doesn't
> pipeline outgoing HTTP requests, so you pay a one RTT penalty for
> every patch downloaded.  This will not be fixed until libcurl
> implements pipelining, which, as far as I can tell, is not likely to
> happen anytime soon.
> 
> A workaround could be to open multiple concurrent connections and
> fetch the patches in parallel.
> 
> I can recommend the following paper on the subject:
> 
>   http://www.w3.org/Protocols/HTTP/Performance/Pipeline.html

I imagine that another possibility would be to switch to using Haskell code
for fetching files over http.  I know there's a module or more available,
and they might be easier to use and modify than libcurl.  It'd actually be
quite nice to remove the libcurl dependency for http access.  I'd want to
keep the code, so we could still handle ftp, gopher, etc, but it'd probably
be much easier to use Net.Http or whatever, and probably easier to give
good error messages, and might simplify the install and compile process for
many of our users.
-- 
David Roundy
Department of Physics
Oregon State University
msg1405 (view) Author: simonmar Date: 2007-01-12.15:54:14
Juliusz Chroboczek wrote:
> Juliusz Chroboczek <jch@pps.jussieu.fr> added the comment:
>
> [Resending -- sorry if you receive this twice]
>
>>>         -- Wait 1.5 hrs for darcs to download 15,000 patches
>
>> One not-very-nice way to avoid this is to make a tarball, copy that
>> tarball by hand, and darcs get from untarred directory.
>
> The problem, in case someone's interested, is that Darcs doesn't
> pipeline outgoing HTTP requests, so you pay a one RTT penalty for
> every patch downloaded.  This will not be fixed until libcurl
> implements pipelining, which, as far as I can tell, is not likely to
> happen anytime soon.
>
> A workaround could be to open multiple concurrent connections and
> fetch the patches in parallel.
>
> I can recommend the following paper on the subject:
>
>   http://www.w3.org/Protocols/HTTP/Performance/Pipeline.html

Or use the Haskell HTTP library with a few forkIOs to download patches?

Cheers,
        Simon
msg1503 (view) Author: kowey Date: 2007-03-06.07:52:26
Note that the issue196 link is for the case-insensitive stuff.  (sorry, just
doing some gardening on the tracker)
msg1507 (view) Author: tim Date: 2007-03-06.11:01:14
I'm working on replacing libcurl with Network.HTTP in the manner suggested by
Simon M. 

-Kirsten
msg1508 (view) Author: droundy Date: 2007-03-06.16:38:31
On Tue, Mar 06, 2007 at 11:01:21AM +0000, Kirsten Chevalier wrote:
> I'm working on replacing libcurl with Network.HTTP in the manner suggested by
> Simon M. 

That would be great.
-- 
David Roundy
Department of Physics
Oregon State University
msg1551 (view) Author: tim Date: 2007-03-23.20:14:40
In case anyone was wondering, I'm still working on this.

Just for the sake of getting something working, I modified Curl.hs so that it
now fetches URLs in the following way:
- for each HTTP request, fork off a thread (using forkIO) that uses the
simpleHTTP function in Network.HTTP to make the request
- after all the requests have been initiated, wait for all of the child threads
to finish

This works, but doesn't perform much better than the original version (that is,
darcs HEAD with libcurl.) Of course, it's very inefficient to create a new
connection for each request, and I'm suspecting that's why. (However, I'm having
a hard time figuring out precisely what's taking so much time; I've been using
strace to figure out how much time gets spent in each system call, but since
most of the time in either case is spent waiting, I'm not sure how to measure
which system calls in particular are the culprit for the waiting -- for example,
in order to figure out whether it's the plethora of connections that's the
problem or whether I've implemented the waiting for the child threads in an
inefficient way. Any suggestions for that?)

So either I could:
1. open a small fixed number of connections and re-use them for subsequent
requests (I guess this would still be an improvement over using libcurl, since
if I understand correctly, libcurl does reuse connections but doesn't make
requests in parallel, or at least darcs isn't using it to make requests in
parallel), similarly to what Juliusz suggests below
2. open a single connection using Network.HTTP's openTCP function, and then fork
off a thread for each request (using forkIO) that uses sendHTTP to handle the
request.

However, I'm not even sure if approach (2) would work at all, since Network.HTTP
doesn't implement HTTP pipelining either. 

So, any suggestions here would be welcome, because I'm sure I'm probably
overlooking one or more obvious things.
msg1747 (view) Author: jch Date: 2007-06-22.23:44:10
> - for each HTTP request, fork off a thread (using forkIO) that uses
> the simpleHTTP function in Network.HTTP to make the request - after
> all the requests have been initiated, wait for all of the child
> threads to finish

That's a good start.

> This works, but doesn't perform much better than the original
> version (that is, darcs HEAD with libcurl.) Of course, it's very
> inefficient to create a new connection for each request, and I'm
> suspecting that's why. (However, I'm having a hard time figuring out
> precisely what's taking so much time;

Could you please install Polipo, then set

  export http_proxy=http://localhost:8123

and try your test again?  If your requests are indeed going out in
parallel, Polipo will batch them together and pipeline them, and we'll
have a good idea whether pipelining is a win.

You can check on

  http://localhost:8123/polipo/servers?

whether Polipo is pipelining (you might need to set disableServersList=false
for it to work).

If you want to repeat the test, you'll need to do

  killall -USR1 polipo
  rm -rf /var/cache/polipo/*
  killall -USR2 polipo

in order to flush Polipo's cache.

You'll find Polipo on http://www.pps.jussieu.fr/~jch/software/polipo/ .
If you're running a recent Debian, Ubuntu, Gentoo or BSD, it should be
available in the package or port collection.

                                        Juliusz
msg2348 (view) Author: markstos Date: 2008-01-07.04:46:46
In the unstable repo, there is now this patch:

"Initial implementation of HTTP pipelining using libwww."

From testing it now, it seems it is working. The output keeps repeating the
statement: 

"Reusing existing connection to darcs.haskell.org:80"

As an eyeball test between 1.0.9 and 2.0.0pre2, there appears to be an improvement.
msg2410 (view) Author: droundy Date: 2008-01-10.17:24:59
This libwww patch only affects darcs get old old-style repositories, though (so
far as I know), so there's still work to be done.  And I refrained from applying
the ChangeLog patch you sent for that reason.
msg2431 (view) Author: jch Date: 2008-01-11.15:51:04
> "Initial implementation of HTTP pipelining using libwww."

> "Reusing existing connection to darcs.haskell.org:80"

Reusing a persistent connection is not pipelining.  Curl will use
persistent connections with no trouble, but it won't pipeline.

  http://en.wikipedia.org/wiki/HTTP_pipelining

                                        Juliusz
msg3132 (view) Author: droundy Date: 2008-02-05.19:02:25
I believe pipelining is working now, so I'll mark this as resolved.  As always,
testing would be appreciate.
msg3133 (view) Author: droundy Date: 2008-02-05.19:03:08
I should add, this is only working when compiling either with libwww or a very
recent libcurl, but I think that's a reasonable constraint.
msg3134 (view) Author: markstos Date: 2008-02-05.19:15:48
David Roundy wrote:
> 
> I should add, this is only working when compiling either with libwww or a very
> recent libcurl, but I think that's a reasonable constraint.

Which http option would get compiled by default?

   Mark
msg3149 (view) Author: droundy Date: 2008-02-06.15:44:10
On Tue, Feb 05, 2008 at 07:15:49PM -0000, Mark Stosberg wrote:
> David Roundy wrote:
> > I should add, this is only working when compiling either with libwww or a very
> > recent libcurl, but I think that's a reasonable constraint.
> 
> Which http option would get compiled by default?

Currently, neither are chosen by default.  But since the feature is
implemented, I'd either resolve this bug, or downgrade it to a wishlist bug
for one of these to be chosen by default.
-- 
David Roundy
Department of Physics
Oregon State University
History
Date User Action Args
2007-01-02 22:11:06simonpjcreate
2007-01-03 01:29:12koweysetstatus: unread -> unknown
nosy: droundy, tommy, beschmi, kowey, simonpj
messages: + msg1380
2007-01-03 01:32:11koweysetnosy: droundy, tommy, beschmi, kowey, simonpj
messages: + msg1381
title: Darcs loop and darcs failure -> Unapplicable patch on case-insensitive file systems
2007-01-03 08:04:15simonpjsetnosy: droundy, tommy, beschmi, kowey, simonpj
messages: + msg1382
2007-01-03 20:11:55koweysetnosy: droundy, tommy, beschmi, kowey, simonpj
messages: + msg1383
2007-01-04 14:10:00simonmarsetnosy: + simonmar
messages: + msg1384
2007-01-08 15:56:33catamorphismsetnosy: + catamorphism
messages: + msg1387
title: Unapplicable patch on case-insensitive file systems -> Darcs loop and darcs failure
2007-01-09 23:17:45droundysetnosy: droundy, tommy, beschmi, kowey, simonmar, simonpj, catamorphism
messages: + msg1397
title: Darcs loop and darcs failure -> Unapplicable patch on case-insensitive file systems
2007-01-09 23:20:08droundysetnosy: droundy, tommy, beschmi, kowey, simonmar, simonpj, catamorphism
messages: + msg1398
2007-01-11 16:36:09jchsetnosy: + jch
messages: + msg1400
title: Unapplicable patch on case-insensitive file systems -> Darcs loop and darcs failure
2007-01-11 19:52:41droundysetnosy: droundy, jch, tommy, beschmi, kowey, simonmar, simonpj, catamorphism
messages: + msg1401
2007-01-12 15:54:20simonmarsetnosy: droundy, jch, tommy, beschmi, kowey, simonmar, simonpj, catamorphism
messages: + msg1405
2007-03-06 07:52:39koweysetnosy: droundy, jch, tommy, beschmi, kowey, simonmar, simonpj, catamorphism
superseder: + case-insensitive filesystems confuse darcs
messages: + msg1503
title: Darcs loop and darcs failure -> getting many patches over http is slow
2007-03-06 11:01:21timsetstatus: unknown -> has-patch
nosy: + tim
messages: + msg1507
2007-03-06 11:01:50timsetnosy: droundy, jch, tommy, beschmi, kowey, simonmar, simonpj, catamorphism, tim
assignedto: catamorphism -> tim
2007-03-06 16:38:48droundysetnosy: droundy, jch, tommy, beschmi, kowey, simonmar, simonpj, catamorphism, tim
messages: + msg1508
2007-03-23 20:14:48timsetnosy: droundy, jch, tommy, beschmi, kowey, simonmar, simonpj, catamorphism, tim
messages: + msg1551
2007-06-22 23:44:11jchsetnosy: droundy, jch, tommy, beschmi, kowey, simonmar, simonpj, catamorphism, tim
messages: + msg1747
2007-07-16 21:24:16koweysettopic: + Performance
2007-08-04 05:41:33koweysetstatus: has-patch -> unknown
2008-01-07 04:46:48markstossetstatus: unknown -> resolved-in-unstable
nosy: + markstos
messages: + msg2348
title: getting many patches over http is slow -> getting many patches over http is slow (pipelining)
2008-01-10 17:32:36droundysetmessages: + msg2410
2008-01-10 18:24:24markstossetstatus: resolved-in-unstable -> has-patch
topic: + Darcs2
2008-01-11 15:51:05jchsetnosy: markstos, catamorphism, tim, droundy, simonmar, jch, simonpj, tommy, kowey, beschmi
messages: + msg2431
2008-02-05 19:02:26droundysetstatus: has-patch -> resolved-in-unstable
nosy: droundy, jch, tommy, beschmi, kowey, markstos, simonmar, simonpj, catamorphism, tim
messages: + msg3132
2008-02-05 19:03:09droundysetnosy: droundy, jch, tommy, beschmi, kowey, markstos, simonmar, simonpj, catamorphism, tim
messages: + msg3133
2008-02-05 19:15:49markstossetnosy: droundy, jch, tommy, beschmi, kowey, markstos, simonmar, simonpj, catamorphism, tim
messages: + msg3134
2008-02-06 15:44:11droundysetnosy: droundy, jch, tommy, beschmi, kowey, markstos, simonmar, simonpj, catamorphism, tim
messages: + msg3149
2008-09-04 21:30:06adminsetstatus: resolved-in-unstable -> resolved
nosy: + dagit
2009-08-06 17:48:20adminsetnosy: + jast, Serware, dmitry.kurochkin, darcs-devel, zooko, mornfall, simon, thorkilnaur, - droundy, jch, simonmar, simonpj, catamorphism, tim
2009-08-06 20:44:05adminsetnosy: - beschmi
2009-08-10 22:00:11adminsetnosy: + catamorphism, tim, simonmar, jch, simonpj, - darcs-devel, zooko, jast, Serware, mornfall
2009-08-10 23:58:45adminsetnosy: - dagit
2009-08-25 18:00:09adminsetnosy: + darcs-devel, - simon
2009-08-27 14:05:55adminsetnosy: jch, tommy, kowey, markstos, darcs-devel, simonmar, simonpj, catamorphism, tim, thorkilnaur, dmitry.kurochkin
2009-10-23 22:37:59adminsetnosy: + marlowsd, - simonmar
2009-10-23 23:36:29adminsetnosy: + simonmar, - marlowsd
2009-10-24 00:42:50adminsetnosy: - catamorphism