Created on 2008-08-11.20:05:58 by kowey, last changed 2014-11-11.17:44:18 by gh.
msg5381 (view) |
Author: kowey |
Date: 2008-08-11.20:05:52 |
|
We still have a lot of anecdotal evidence that darcs fetching patches is slow,
slow, painfully slow. For example, on my machine with darcs 2.0.2+
(--with-curl-pipelining)
darcs get http://allmydata.org/source/tahoe/trunk
I'm creating this bug mostly as a placeholder.
Things we need
* more systematic/reproducible measurements (maybe we need something like a
patch-per-second figure)
* a clearer idea what are the conditions that lead to slowneness (we have
reports that .haskell.org is inherently slow, why exactly? and what about non
haskell.org advice). What kind of advice can we give to users to avoid slowness?
I'm not very clear on how to do this properly. Thoughts, Dmitry?
|
msg5383 (view) |
Author: kowey |
Date: 2008-08-11.22:12:04 |
|
for info:
darcs get http://allmydata.org/source/tahoe/trunk --timings --debug 29.88s user
19.83s system 2% cpu 36:26.97 total
MacOS 10.5, --with-curl-pipelining
|
msg5384 (view) |
Author: gwern |
Date: 2008-08-11.22:17:07 |
|
I did a darcs get of tahoe as well. My time was much slower; I enabled curl
pipelining and reinstalled, but I find myself wondering whether I actually did -
97 minutes is a lot longer than kowey's 36.
$ http_proxy="" HTTP_PROXY="" =darcs get +RTS -p -RTS 24.71s user 2.29s system
0% cpu 1:37:25.73 total
Attached is profiling output, although I suspect it does not show us time wasted
blocking on network activity.
re: haskell.org:
:08:33 < dons> btw, darcs.haskell.org isn't throttled
18:08:38 < dons> it was
|
msg5386 (view) |
Author: dmitry.kurochkin |
Date: 2008-08-11.22:54:30 |
|
There are several possible issues with pipelining:
- Is server HTTP 1.1 or 1.0. Pipelining will work for 1.1 only.
- If there are proxies on the way situation gets more complicated.
- What is the repository format? I believe pipelining gives good results only
for not hashed repos...
To see if pipelining is actually working I recommend wireshark.
In general, when darcs downloads thousands of small files, there is a big
overhead for transferring HTTP headers. I think it should be much faster to
doenload a single file. But I do not think we can actually do anything about
this without changing repo format or installing something on server side.
Regards,
Dmitry
|
msg5393 (view) |
Author: kowey |
Date: 2008-08-12.09:22:13 |
|
Dmitry, that gives a bit of extra insight, thanks!
Could you post a simple recipe for people to check if the server supports HTTP
1.1? I tried to telnet darcs.haskell.org 80 and GET /ghc/_darcs/inventory
but that did not produce the intended effect.
As for proxies, I'll note randomly that when I try to git clone from university
(behind a proxy), it is also very slow (never finishes, actually)
Finally, surely there must be a way we could bundle patches together into a
giant tarball (based on the contents of _darcs/inventories). It could be a
third party tool that does it, and a future darcs could just check to see if
those tarballs exist, preferring them to downloading individual patches. Seems
pretty non-invasive?
|
msg5397 (view) |
Author: mornfall |
Date: 2008-08-12.10:16:06 |
|
12:00:35 | xroc@ann:~/dev/public/_test -> ../testget.sh
grabbing tarball... done: real 0m0.036s
grabbing darcs... done: real 0m17.451s
grabbing darcs (cached)... done: real 0m2.685s
grabbing old-style darcs... done: real 0m16.072s
To try for yourself: wget http://repos.mornfall.net/testget.sh
The repository has 2000 files and 2000 patches, all very tiny. The above is
running from localhost, so very high bandwidth and low latency. Over a consumer-
12:08:47 | morn@eri:~/tmp -> bash ./testget.sh
grabbing tarball... done: real 0m1.393s
grabbing darcs... done: real 4m7.057s
grabbing darcs (cached)... done: real 0m2.914s
grabbing old-style darcs... done: real 1m47.846s
The server is running apache 2.2 so I believe it is serving HTTP 1.1. The darcs
used is the one built for Debian (2.0.2). Note that the tarball above is given
the benefit of not unpacking -- but then, git doesn't unpack either (although the
always-packed format for git is not sustainable for http either, since pulls get
pretty expensive that way).
|
msg5398 (view) |
Author: mornfall |
Date: 2008-08-12.10:18:00 |
|
(Is it me or roundup truncates lines semi-randomly? "Over a consumer-grade cable
connection, 6Mb I think" was the truncated sentence.)
|
msg5425 (view) |
Author: gwern |
Date: 2008-08-12.17:57:38 |
|
> Finally, surely there must be a way we could bundle patches together into a
giant tarball (based on the contents of _darcs/inventories). It could be a
third party tool that does it, and a future darcs could just check to see if
those tarballs exist, preferring them to downloading individual patches. Seems
pretty non-invasive?
kowey, would it be possible to repurpose the checkpointing functionality? That
seems to be close to what we want here.
|
msg5431 (view) |
Author: kowey |
Date: 2008-08-12.18:15:52 |
|
Just jotting down some thoughts. Currently, a checkpoint is (if I understand
correctly) basically the composition of several patches. You lose everything
that happens in the middle (yay, compact). Gwern, if I understand correctly,
you are saying that in addition to creating these, the checkpoint command could
also create the glorified patch bundles.
Maybe. A good experiment actually would be to use the darcs send command
creating a gigantic bundle over an empty repository, gzip the results, wget
them, and darcs apply. If wget+darcs apply is fast, we may have a cheap and
dirty solution to the problem of making darcs get/push faster over networks.
|
msg5432 (view) |
Author: kowey |
Date: 2008-08-12.18:17:29 |
|
... and yes, I meant darcs get/put not get/push
|
msg5510 (view) |
Author: dmitry.kurochkin |
Date: 2008-08-14.17:24:01 |
|
How to see if pipelining is actually working:
1. Run wireshark, set capture filter to smth like 'host darcs.net'.
2. Run darcs get or another curl/libwww using command.
You should see many HTTP packets in wireshark. Select one of them, right click,
follow tcp stream. New windows opens with HTTP traffic. Now there are few
possibilities:
- there is a single HTTP transaction - like HTTP GET/200 OK. Then tcp connection
is closed. This means that HTTP is not persistent and for each file a new tcp
connection is opened. This option is the slowest.
- there are many HTTP transactions, but they go one after another. Like:
* > GET
* < 200 OK
* > GET
* < 200 OK
This means that HTTP connection is persistent but pipelining is not used.
Faster than option 1, should work in most cases.
- if there are many transactions and requests go one after another before
responses arrive, like:
* > GET 1
* > GET 2
* < 200 OK 1
* < 200 OK 2
You are lucky :) Pipelining works. The fastest possible option.
Not sure if the above is a simple recipe but that is what I do. If you just need
to know if server supports HTTP 1.1 just look at HTTP request and response -
version is in the first line. You can copy request lines and use telnet to issue
request by hand.
Note that whether pipelining is used depends on how darcs requests files. If
darcs waits for the first file before requesting another, there would be no
pipelining obviously. So when looking at the HTTP stream scroll down to patch
downloads - you will not see pipelining near the beginning.
I guess this is not the best description. But it should become clear when you
see it yourself :)
Regards,
Dmitry
|
msg5548 (view) |
Author: gwern |
Date: 2008-08-15.19:59:53 |
|
Dmitry: according to the HTTP RFC http://www.faqs.org/rfcs/rfc2616.html we are
allowed to have up to two simultaneous connections to a server. How hard would
it be to do? It seems to me that this could be another potential speed boost:
even if it has no effect in the case where pipelining works (would it?), it
would still help when pipelining isn't working, I think.
|
msg5611 (view) |
Author: dmitry.kurochkin |
Date: 2008-08-19.20:49:01 |
|
I do not think using two connections worth that. AFAIK most HTTP servers are 1.1
and support pipelining nowadays. And using multiple connections is discouraged,
two simultaneous connections are intended for situations where we have a big
file to download and need to do small requests at the same time.
|
msg6916 (view) |
Author: mornfall |
Date: 2008-12-28.11:36:18 |
|
It might make sense to add some sort of network functionality to darcs-
benchmark. I have a shellscript to do some network benchmarking for now. Will
publish later.
|
msg7700 (view) |
Author: kowey |
Date: 2009-04-14.20:50:37 |
|
Hi Dmitry,
Is there a way to tell if pipelining is working, using only command-line tools?
My ideal scenario is that we be able to tell people "copy and paste this to
your terminal and then run darcs get here"... is such a thing possible?
Thanks!
|
msg7737 (view) |
Author: dmitry.kurochkin |
Date: 2009-04-22.10:28:01 |
|
Hi Eric.
I do not know any such tool. The best I can think of is using netcat to make
HTTP request and check is server is HTTP/1.1 and uses persistent connection.
Regards,
Dmitry
|
msg7738 (view) |
Author: kowey |
Date: 2009-04-22.13:03:21 |
|
On Wed, Apr 22, 2009 at 10:28:04 -0000, Dmitry Kurochkin wrote:
> I do not know any such tool. The best I can think of is using netcat to make
> HTTP request and check is server is HTTP/1.1 and uses persistent connection.
I've noticed people using this thing called tcpdump.
Is there anything we could do with that?
|
msg7739 (view) |
Author: dmitry.kurochkin |
Date: 2009-04-22.13:58:01 |
|
On Wed, Apr 22, 2009 at 5:03 PM, Eric Kow <bugs@darcs.net> wrote:
>
> Eric Kow <kowey@darcs.net> added the comment:
>
> On Wed, Apr 22, 2009 at 10:28:04 -0000, Dmitry Kurochkin wrote:
>> I do not know any such tool. The best I can think of is using netcat to make
>> HTTP request and check is server is HTTP/1.1 and uses persistent connection.
>
> I've noticed people using this thing called tcpdump.
> Is there anything we could do with that?
Yes, you can use tcpdump, or better tshark/wireshark to capture and
analyze traffic.
But these tools will not just tell you if pipelining is enabled. You
have to look at the packets and analyze it. I have described how this
should look like here http://bugs.darcs.net/msg5510.
Regards,
Dmitry
>
> __________________________________
> Darcs bug tracker <bugs@darcs.net>
> <http://bugs.darcs.net/issue987>
> __________________________________
>
|
msg7740 (view) |
Author: dmitry.kurochkin |
Date: 2009-04-22.14:00:42 |
|
I believe the way to improve our get over http performance are patch bundles.
IIRC someone was working on this and there was a darcs branch for it.
What is the status? I am interested in looking (and working) on it.
Regards,
Dmitry
|
msg7741 (view) |
Author: kowey |
Date: 2009-04-22.14:28:50 |
|
On Wed, Apr 22, 2009 at 17:59:47 +0400, Dmitry Kurochkin wrote:
> I believe the way to improve our get over http performance are patch bundles.
>
> IIRC someone was working on this and there was a darcs branch for it.
> What is the status? I am interested in looking (and working) on it.
Nicolas and Florent were last looking at this.
http://wiki.darcs.net/index.html/PacksSpecification
Now that we have this hashed-storage work, there is also a question of
how we can make the two fit together.
Also, if I understand correctly, Nicolas has some newer, better ideas
on how to go about this.
Nicolas, could I ask you to comment?
|
msg7967 (view) |
Author: mornfall |
Date: 2009-07-15.13:38:18 |
|
Bumping to 2.4.
|
msg8137 (view) |
Author: kowey |
Date: 2009-08-14.15:07:09 |
|
Bumping to 2.5 and now thanks to Petr we have a clearer idea how to do it, so
I'm marking this need-implementation.
I'm tentatively assigning this to Petr, who is interested in pursuing this.
Nailing this one day would be good. Fix that first impression of Darcs :-)
|
msg8505 (view) |
Author: kowey |
Date: 2009-08-26.13:10:58 |
|
OK, moving the packed stuff to its own ticket (issue1535)
|
msg11396 (view) |
Author: tux_rocker |
Date: 2010-06-13.19:44:11 |
|
Bumping to 2.6 as code freeze for 2.5 is approaching and I don't see any
activity here.
|
msg14754 (view) |
Author: markstos |
Date: 2011-10-13.12:58:55 |
|
How is this different from --packs? In any case, bumping to 2.10.
|
msg17770 (view) |
Author: gh |
Date: 2014-11-11.17:44:16 |
|
Closing it as "duplicate" of packs.
|
|
Date |
User |
Action |
Args |
2008-08-11 20:05:58 | kowey | create | |
2008-08-11 22:12:08 | kowey | set | status: unread -> unknown nosy:
tommy, beschmi, kowey, dagit, gwern, dmitry.kurochkin messages:
+ msg5383 |
2008-08-11 22:17:10 | gwern | set | nosy:
tommy, beschmi, kowey, dagit, gwern, dmitry.kurochkin messages:
+ msg5384 |
2008-08-11 22:54:33 | dmitry.kurochkin | set | nosy:
tommy, beschmi, kowey, dagit, gwern, dmitry.kurochkin messages:
+ msg5386 |
2008-08-12 09:22:16 | kowey | set | nosy:
tommy, beschmi, kowey, dagit, gwern, dmitry.kurochkin messages:
+ msg5393 |
2008-08-12 09:24:26 | kowey | link | issue986 superseder |
2008-08-12 10:16:10 | mornfall | set | nosy:
+ mornfall messages:
+ msg5397 |
2008-08-12 10:18:03 | mornfall | set | nosy:
tommy, beschmi, kowey, dagit, gwern, dmitry.kurochkin, mornfall messages:
+ msg5398 |
2008-08-12 17:57:41 | gwern | set | nosy:
tommy, beschmi, kowey, dagit, gwern, dmitry.kurochkin, mornfall messages:
+ msg5425 |
2008-08-12 18:15:55 | kowey | set | nosy:
tommy, beschmi, kowey, dagit, gwern, dmitry.kurochkin, mornfall messages:
+ msg5431 |
2008-08-12 18:17:32 | kowey | set | nosy:
tommy, beschmi, kowey, dagit, gwern, dmitry.kurochkin, mornfall messages:
+ msg5432 |
2008-08-14 17:24:04 | dmitry.kurochkin | set | nosy:
+ darcs-devel messages:
+ msg5510 |
2008-08-15 19:59:56 | gwern | set | nosy:
tommy, beschmi, kowey, darcs-devel, dagit, gwern, dmitry.kurochkin, mornfall messages:
+ msg5548 |
2008-08-19 20:49:03 | dmitry.kurochkin | set | nosy:
tommy, beschmi, kowey, darcs-devel, dagit, gwern, dmitry.kurochkin, mornfall messages:
+ msg5611 |
2008-08-28 12:18:26 | kowey | set | topic:
+ HTTP nosy:
tommy, beschmi, kowey, darcs-devel, dagit, gwern, dmitry.kurochkin, mornfall |
2008-08-28 12:18:49 | kowey | set | topic:
+ Target-2.0 nosy:
+ Serware, droundy |
2008-12-28 11:36:31 | mornfall | set | topic:
+ Target-2.3, - Target-2.0 nosy:
+ simon, thorkilnaur messages:
+ msg6916 |
2009-04-14 20:50:39 | kowey | set | nosy:
droundy, tommy, beschmi, kowey, darcs-devel, dagit, simon, thorkilnaur, gwern, dmitry.kurochkin, Serware, mornfall messages:
+ msg7700 |
2009-04-15 16:47:13 | droundy | set | nosy:
- droundy |
2009-04-22 10:28:04 | dmitry.kurochkin | set | nosy:
tommy, beschmi, kowey, darcs-devel, dagit, simon, thorkilnaur, gwern, dmitry.kurochkin, Serware, mornfall messages:
+ msg7737 |
2009-04-22 13:03:26 | kowey | set | nosy:
tommy, beschmi, kowey, darcs-devel, dagit, simon, thorkilnaur, gwern, dmitry.kurochkin, Serware, mornfall messages:
+ msg7738 |
2009-04-22 13:58:04 | dmitry.kurochkin | set | nosy:
+ serware, noaddress messages:
+ msg7739 |
2009-04-22 14:00:45 | dmitry.kurochkin | set | nosy:
tommy, beschmi, kowey, darcs-devel, dagit, simon, thorkilnaur, gwern, dmitry.kurochkin, serware, Serware, mornfall, noaddress messages:
+ msg7740 |
2009-04-22 14:28:52 | kowey | set | nosy:
+ galbolle, ertai messages:
+ msg7741 |
2009-07-15 13:38:25 | mornfall | set | topic:
+ Target-2.4, - Target-2.3 nosy:
tommy, beschmi, kowey, darcs-devel, dagit, simon, thorkilnaur, gwern, ertai, dmitry.kurochkin, serware, Serware, mornfall, galbolle, noaddress messages:
+ msg7967 |
2009-08-06 21:10:34 | admin | set | nosy:
- beschmi |
2009-08-11 00:20:02 | admin | set | nosy:
- dagit |
2009-08-14 15:07:18 | kowey | set | status: unknown -> needs-implementation nosy:
tommy, kowey, darcs-devel, simon, thorkilnaur, gwern, ertai, dmitry.kurochkin, serware, Serware, mornfall, galbolle, noaddress topic:
+ Target-2.5, - Target-2.4 messages:
+ msg8137 |
2009-08-25 17:37:16 | admin | set | nosy:
- simon |
2009-08-26 13:11:00 | kowey | set | priority: urgent -> bug status: needs-implementation -> deferred superseder:
+ packed storage combining many small files into fewer larger ones messages:
+ msg8505 nosy:
tommy, kowey, darcs-devel, thorkilnaur, gwern, ertai, dmitry.kurochkin, serware, Serware, mornfall, galbolle, noaddress |
2009-08-27 14:32:53 | admin | set | nosy:
tommy, kowey, darcs-devel, thorkilnaur, gwern, ertai, dmitry.kurochkin, serware, Serware, mornfall, galbolle, noaddress |
2009-10-23 22:40:16 | admin | set | nosy:
+ nicolas.pouillard, - ertai |
2009-10-23 22:44:30 | admin | set | nosy:
- Serware |
2009-10-23 23:28:07 | admin | set | nosy:
+ Serware, - serware |
2009-10-24 00:05:10 | admin | set | nosy:
+ ertai, - nicolas.pouillard |
2010-06-13 19:44:13 | tux_rocker | set | topic:
+ Target-2.6, - Target-2.5 nosy:
+ tux_rocker messages:
+ msg11396 |
2010-06-15 21:07:55 | admin | set | topic:
- Target-2.6 |
2010-06-15 21:07:55 | admin | set | milestone: 2.8.0 |
2011-10-13 12:58:56 | markstos | set | messages:
+ msg14754 milestone: 2.8.0 -> 2.10.0 |
2014-11-11 17:44:18 | gh | set | status: deferred -> duplicate messages:
+ msg17770 |
|