darcs

Issue 987 fetching patches is way too slow

Title fetching patches is way too slow
Priority bug Status duplicate
Milestone 2.10.0 Resolved in
Superseder packed storage combining many small files into fewer larger ones
View: 1535
Nosy List Serware, darcs-devel, dmitry.kurochkin, ertai, galbolle, gwern, kowey, mornfall, noaddress, thorkilnaur, tommy, tux_rocker
Assigned To
Topics HTTP, Performance

Created on 2008-08-11.20:05:58 by kowey, last changed 2014-11-11.17:44:18 by gh.

Messages
msg5381 (view) Author: kowey Date: 2008-08-11.20:05:52
We still have a lot of anecdotal evidence that darcs fetching patches is slow,
slow, painfully slow.  For example, on my machine with darcs 2.0.2+
(--with-curl-pipelining)

darcs get http://allmydata.org/source/tahoe/trunk

I'm creating this bug mostly as a placeholder.

Things we need
 * more systematic/reproducible measurements (maybe we need something like a
patch-per-second figure)
 * a clearer idea what are the conditions that lead to slowneness (we have
reports that .haskell.org is inherently slow, why exactly? and what about non
haskell.org advice).  What kind of advice can we give to users to avoid slowness?

I'm not very clear on how to do this properly.  Thoughts, Dmitry?
msg5383 (view) Author: kowey Date: 2008-08-11.22:12:04
for info:

darcs get http://allmydata.org/source/tahoe/trunk --timings --debug  29.88s user
19.83s system 2% cpu 36:26.97 total

MacOS 10.5, --with-curl-pipelining
msg5384 (view) Author: gwern Date: 2008-08-11.22:17:07
I did a darcs get of tahoe as well. My time was much slower; I enabled curl
pipelining and reinstalled, but I find myself wondering whether I actually did -
97 minutes is a lot longer than kowey's 36.

$ http_proxy="" HTTP_PROXY="" =darcs get +RTS -p -RTS   24.71s user 2.29s system
0% cpu 1:37:25.73 total


Attached is profiling output, although I suspect it does not show us time wasted
blocking on network activity.

re: haskell.org:
:08:33 < dons> btw, darcs.haskell.org isn't throttled
18:08:38 < dons> it was
msg5386 (view) Author: dmitry.kurochkin Date: 2008-08-11.22:54:30
There are several possible issues with pipelining:
- Is server HTTP 1.1 or 1.0. Pipelining will work for 1.1 only.
- If there are proxies on the way situation gets more complicated.
- What is the repository format? I believe pipelining gives good results only
for not hashed repos...

To see if pipelining is actually working I recommend wireshark.

In general, when darcs downloads thousands of small files, there is a big
overhead for transferring HTTP headers. I think it should be much faster to
doenload a single file. But I do not think we can actually do anything about
this without changing repo format or installing something on server side.

Regards,
  Dmitry
msg5393 (view) Author: kowey Date: 2008-08-12.09:22:13
Dmitry, that gives a bit of extra insight, thanks!

Could you post a simple recipe for people to check if the server supports HTTP
1.1?  I tried to telnet darcs.haskell.org 80 and GET /ghc/_darcs/inventory
but that did not produce the intended effect.

As for proxies, I'll note randomly that when I try to git clone from university
(behind a proxy), it is also very slow (never finishes, actually)

Finally, surely there must be a way we could bundle patches together into a
giant tarball (based on the contents of _darcs/inventories).  It could be a
third party tool that does it, and a future darcs could just check to see if
those tarballs exist, preferring them to downloading individual patches.  Seems
pretty non-invasive?
msg5397 (view) Author: mornfall Date: 2008-08-12.10:16:06
12:00:35 | xroc@ann:~/dev/public/_test -> ../testget.sh
grabbing tarball... done: real 0m0.036s
grabbing darcs... done: real 0m17.451s
grabbing darcs (cached)... done: real 0m2.685s
grabbing old-style darcs... done: real 0m16.072s

To try for yourself: wget http://repos.mornfall.net/testget.sh
The repository has 2000 files and 2000 patches, all very tiny. The above is
running from localhost, so very high bandwidth and low latency. Over a consumer-

12:08:47 | morn@eri:~/tmp -> bash ./testget.sh                                     
grabbing tarball... done: real 0m1.393s
grabbing darcs... done: real 4m7.057s
grabbing darcs (cached)... done: real 0m2.914s
grabbing old-style darcs... done: real 1m47.846s

The server is running apache 2.2 so I believe it is serving HTTP 1.1. The darcs
used is the one built for Debian (2.0.2). Note that the tarball above is given
the benefit of not unpacking -- but then, git doesn't unpack either (although the
always-packed format for git is not sustainable for http either, since pulls get
pretty expensive that way).
msg5398 (view) Author: mornfall Date: 2008-08-12.10:18:00
(Is it me or roundup truncates lines semi-randomly? "Over a consumer-grade cable
connection, 6Mb I think" was the truncated sentence.)
msg5425 (view) Author: gwern Date: 2008-08-12.17:57:38
> Finally, surely there must be a way we could bundle patches together into a
giant tarball (based on the contents of _darcs/inventories).  It could be a
third party tool that does it, and a future darcs could just check to see if
those tarballs exist, preferring them to downloading individual patches.  Seems
pretty non-invasive?

kowey, would it be possible to repurpose the checkpointing functionality? That
seems to be close to what we want here.
msg5431 (view) Author: kowey Date: 2008-08-12.18:15:52
Just jotting down some thoughts.  Currently, a checkpoint is (if I understand
correctly) basically the composition of several patches.  You lose everything
that happens in the middle (yay, compact).  Gwern, if I understand correctly,
you are saying that in addition to creating these, the checkpoint command could
also create the glorified patch bundles.

Maybe.  A good experiment actually would be to use the darcs send command
creating a gigantic bundle over an empty repository, gzip the results, wget
them, and darcs apply.  If wget+darcs apply is fast, we may have a cheap and
dirty solution to the problem of making darcs get/push faster over networks.
msg5432 (view) Author: kowey Date: 2008-08-12.18:17:29
... and yes, I meant darcs get/put not get/push
msg5510 (view) Author: dmitry.kurochkin Date: 2008-08-14.17:24:01
How to see if pipelining is actually working:

1. Run wireshark, set capture filter to smth like 'host darcs.net'.
2. Run darcs get or another curl/libwww using command.

You should see many HTTP packets in wireshark. Select one of them, right click,
follow tcp stream. New windows opens with HTTP traffic. Now there are few
possibilities:
- there is a single HTTP transaction - like HTTP GET/200 OK. Then tcp connection
is closed. This means that HTTP is not persistent and for each file a new tcp
connection is opened. This option is the slowest.
- there are many HTTP transactions, but they go one after another. Like:
  * > GET
  * < 200 OK
  * > GET
  * < 200 OK
  This means that HTTP connection is persistent but pipelining is not used.
Faster than option 1, should work in most cases.
- if there are many transactions and requests go one after another before
responses arrive, like:
  * > GET 1
  * > GET 2
  * < 200 OK 1
  * < 200 OK 2
  You are lucky :) Pipelining works. The fastest possible option.

Not sure if the above is a simple recipe but that is what I do. If you just need
to know if server supports HTTP 1.1 just look at HTTP request and response -
version is in the first line. You can copy request lines and use telnet to issue
request by hand.

Note that whether pipelining is used depends on how darcs requests files. If
darcs waits for the first file before requesting another, there would be no
pipelining obviously. So when looking at the HTTP stream scroll down to patch
downloads - you will not see pipelining near the beginning.

I guess this is not the best description. But it should become clear when you
see it yourself :)

Regards,
  Dmitry
msg5548 (view) Author: gwern Date: 2008-08-15.19:59:53
Dmitry: according to the HTTP RFC http://www.faqs.org/rfcs/rfc2616.html we are
allowed to have up to two simultaneous connections to a server. How hard would
it be to do? It seems to me that this could be another potential speed boost:
even if it has no effect in the case where pipelining works (would it?), it
would still help when pipelining isn't working, I think.
msg5611 (view) Author: dmitry.kurochkin Date: 2008-08-19.20:49:01
I do not think using two connections worth that. AFAIK most HTTP servers are 1.1
and support pipelining nowadays. And using multiple connections is discouraged,
two simultaneous connections are intended for situations where we have a big
file to download and need to do small requests at the same time.
msg6916 (view) Author: mornfall Date: 2008-12-28.11:36:18
It might make sense to add some sort of network functionality to darcs-
benchmark. I have a shellscript to do some network benchmarking for now. Will 
publish later.
msg7700 (view) Author: kowey Date: 2009-04-14.20:50:37
Hi Dmitry,

Is there a way to tell if pipelining is working, using only command-line tools?
 My ideal scenario is that we be able to tell people "copy and paste this to
your terminal and then run darcs get here"... is such a thing possible?

Thanks!
msg7737 (view) Author: dmitry.kurochkin Date: 2009-04-22.10:28:01
Hi Eric.

I do not know any such tool. The best I can think of is using netcat to make
HTTP request and check is server is HTTP/1.1 and uses persistent connection.

Regards,
  Dmitry
msg7738 (view) Author: kowey Date: 2009-04-22.13:03:21
On Wed, Apr 22, 2009 at 10:28:04 -0000, Dmitry Kurochkin wrote:
> I do not know any such tool. The best I can think of is using netcat to make
> HTTP request and check is server is HTTP/1.1 and uses persistent connection.

I've noticed people using this thing called tcpdump.
Is there anything we could do with that?
msg7739 (view) Author: dmitry.kurochkin Date: 2009-04-22.13:58:01
On Wed, Apr 22, 2009 at 5:03 PM, Eric Kow <bugs@darcs.net> wrote:
>
> Eric Kow <kowey@darcs.net> added the comment:
>
> On Wed, Apr 22, 2009 at 10:28:04 -0000, Dmitry Kurochkin wrote:
>> I do not know any such tool. The best I can think of is using netcat to make
>> HTTP request and check is server is HTTP/1.1 and uses persistent connection.
>
> I've noticed people using this thing called tcpdump.
> Is there anything we could do with that?

Yes, you can use tcpdump, or better tshark/wireshark to capture and
analyze traffic.

But these tools will not just tell you if pipelining is enabled. You
have to look at the packets and analyze it. I have described how this
should look like here http://bugs.darcs.net/msg5510.

Regards,
  Dmitry

>
> __________________________________
> Darcs bug tracker <bugs@darcs.net>
> <http://bugs.darcs.net/issue987>
> __________________________________
>
msg7740 (view) Author: dmitry.kurochkin Date: 2009-04-22.14:00:42
I believe the way to improve our get over http performance are patch bundles.

IIRC someone was working on this and there was a darcs branch for it.
What is the status? I am interested in looking (and working) on it.

Regards,
  Dmitry
msg7741 (view) Author: kowey Date: 2009-04-22.14:28:50
On Wed, Apr 22, 2009 at 17:59:47 +0400, Dmitry Kurochkin wrote:
> I believe the way to improve our get over http performance are patch bundles.
> 
> IIRC someone was working on this and there was a darcs branch for it.
> What is the status? I am interested in looking (and working) on it.

Nicolas and Florent were last looking at this.
  http://wiki.darcs.net/index.html/PacksSpecification

Now that we have this hashed-storage work, there is also a question of
how we can make the two fit together.

Also, if I understand correctly, Nicolas has some newer, better ideas
on how to go about this.

Nicolas, could I ask you to comment?
msg7967 (view) Author: mornfall Date: 2009-07-15.13:38:18
Bumping to 2.4.
msg8137 (view) Author: kowey Date: 2009-08-14.15:07:09
Bumping to 2.5 and now thanks to Petr we have a clearer idea how to do it, so
I'm marking this need-implementation.

I'm tentatively assigning this to Petr, who is interested in pursuing this.

Nailing this one day would be good.  Fix that first impression of Darcs :-)
msg8505 (view) Author: kowey Date: 2009-08-26.13:10:58
OK, moving the packed stuff to its own ticket (issue1535)
msg11396 (view) Author: tux_rocker Date: 2010-06-13.19:44:11
Bumping to 2.6 as code freeze for 2.5 is approaching and I don't see any
activity here.
msg14754 (view) Author: markstos Date: 2011-10-13.12:58:55
How is this different from --packs? In any case, bumping to 2.10.
msg17770 (view) Author: gh Date: 2014-11-11.17:44:16
Closing it as "duplicate" of packs.
History
Date User Action Args
2008-08-11 20:05:58koweycreate
2008-08-11 22:12:08koweysetstatus: unread -> unknown
nosy: tommy, beschmi, kowey, dagit, gwern, dmitry.kurochkin
messages: + msg5383
2008-08-11 22:17:10gwernsetnosy: tommy, beschmi, kowey, dagit, gwern, dmitry.kurochkin
messages: + msg5384
2008-08-11 22:54:33dmitry.kurochkinsetnosy: tommy, beschmi, kowey, dagit, gwern, dmitry.kurochkin
messages: + msg5386
2008-08-12 09:22:16koweysetnosy: tommy, beschmi, kowey, dagit, gwern, dmitry.kurochkin
messages: + msg5393
2008-08-12 09:24:26koweylinkissue986 superseder
2008-08-12 10:16:10mornfallsetnosy: + mornfall
messages: + msg5397
2008-08-12 10:18:03mornfallsetnosy: tommy, beschmi, kowey, dagit, gwern, dmitry.kurochkin, mornfall
messages: + msg5398
2008-08-12 17:57:41gwernsetnosy: tommy, beschmi, kowey, dagit, gwern, dmitry.kurochkin, mornfall
messages: + msg5425
2008-08-12 18:15:55koweysetnosy: tommy, beschmi, kowey, dagit, gwern, dmitry.kurochkin, mornfall
messages: + msg5431
2008-08-12 18:17:32koweysetnosy: tommy, beschmi, kowey, dagit, gwern, dmitry.kurochkin, mornfall
messages: + msg5432
2008-08-14 17:24:04dmitry.kurochkinsetnosy: + darcs-devel
messages: + msg5510
2008-08-15 19:59:56gwernsetnosy: tommy, beschmi, kowey, darcs-devel, dagit, gwern, dmitry.kurochkin, mornfall
messages: + msg5548
2008-08-19 20:49:03dmitry.kurochkinsetnosy: tommy, beschmi, kowey, darcs-devel, dagit, gwern, dmitry.kurochkin, mornfall
messages: + msg5611
2008-08-28 12:18:26koweysettopic: + HTTP
nosy: tommy, beschmi, kowey, darcs-devel, dagit, gwern, dmitry.kurochkin, mornfall
2008-08-28 12:18:49koweysettopic: + Target-2.0
nosy: + Serware, droundy
2008-12-28 11:36:31mornfallsettopic: + Target-2.3, - Target-2.0
nosy: + simon, thorkilnaur
messages: + msg6916
2009-04-14 20:50:39koweysetnosy: droundy, tommy, beschmi, kowey, darcs-devel, dagit, simon, thorkilnaur, gwern, dmitry.kurochkin, Serware, mornfall
messages: + msg7700
2009-04-15 16:47:13droundysetnosy: - droundy
2009-04-22 10:28:04dmitry.kurochkinsetnosy: tommy, beschmi, kowey, darcs-devel, dagit, simon, thorkilnaur, gwern, dmitry.kurochkin, Serware, mornfall
messages: + msg7737
2009-04-22 13:03:26koweysetnosy: tommy, beschmi, kowey, darcs-devel, dagit, simon, thorkilnaur, gwern, dmitry.kurochkin, Serware, mornfall
messages: + msg7738
2009-04-22 13:58:04dmitry.kurochkinsetnosy: + serware, noaddress
messages: + msg7739
2009-04-22 14:00:45dmitry.kurochkinsetnosy: tommy, beschmi, kowey, darcs-devel, dagit, simon, thorkilnaur, gwern, dmitry.kurochkin, serware, Serware, mornfall, noaddress
messages: + msg7740
2009-04-22 14:28:52koweysetnosy: + galbolle, ertai
messages: + msg7741
2009-07-15 13:38:25mornfallsettopic: + Target-2.4, - Target-2.3
nosy: tommy, beschmi, kowey, darcs-devel, dagit, simon, thorkilnaur, gwern, ertai, dmitry.kurochkin, serware, Serware, mornfall, galbolle, noaddress
messages: + msg7967
2009-08-06 21:10:34adminsetnosy: - beschmi
2009-08-11 00:20:02adminsetnosy: - dagit
2009-08-14 15:07:18koweysetstatus: unknown -> needs-implementation
nosy: tommy, kowey, darcs-devel, simon, thorkilnaur, gwern, ertai, dmitry.kurochkin, serware, Serware, mornfall, galbolle, noaddress
topic: + Target-2.5, - Target-2.4
messages: + msg8137
2009-08-25 17:37:16adminsetnosy: - simon
2009-08-26 13:11:00koweysetpriority: urgent -> bug
status: needs-implementation -> deferred
superseder: + packed storage combining many small files into fewer larger ones
messages: + msg8505
nosy: tommy, kowey, darcs-devel, thorkilnaur, gwern, ertai, dmitry.kurochkin, serware, Serware, mornfall, galbolle, noaddress
2009-08-27 14:32:53adminsetnosy: tommy, kowey, darcs-devel, thorkilnaur, gwern, ertai, dmitry.kurochkin, serware, Serware, mornfall, galbolle, noaddress
2009-10-23 22:40:16adminsetnosy: + nicolas.pouillard, - ertai
2009-10-23 22:44:30adminsetnosy: - Serware
2009-10-23 23:28:07adminsetnosy: + Serware, - serware
2009-10-24 00:05:10adminsetnosy: + ertai, - nicolas.pouillard
2010-06-13 19:44:13tux_rockersettopic: + Target-2.6, - Target-2.5
nosy: + tux_rocker
messages: + msg11396
2010-06-15 21:07:55adminsettopic: - Target-2.6
2010-06-15 21:07:55adminsetmilestone: 2.8.0
2011-10-13 12:58:56markstossetmessages: + msg14754
milestone: 2.8.0 -> 2.10.0
2014-11-11 17:44:18ghsetstatus: deferred -> duplicate
messages: + msg17770