darcs

Issue 408 Slow Push and Pull with Remote Repository

Title Slow Push and Pull with Remote Repository
Priority wishlist Status resolved
Milestone Resolved in
Superseder Nosy List bakert, darcs-devel, darcs-users, dbueno, dmitry.kurochkin, jch, kowey, thorkilnaur, tim, tommy
Assigned To
Topics

Created on 2007-02-07.23:01:25 by bakert, last changed 2009-10-24.00:42:55 by admin.

Files
File name Uploaded Type Edit Remove
unnamed bakert, 2007-02-11.20:09:37 text/html
Messages
msg1466 (view) Author: bakert Date: 2007-02-07.23:01:09
Hello,

I'm trying to solve a problem with a repository.  Each pull and push
takes about 3 minutes.

It is a large-ish repository (109MB total, _darcs folder is 61MB, 2370
changes according to "darcs changes").

I set up ssh/scp/sftp logging to see what it was doing, as described here:

http://wiki.darcs.net/index.html/DeveloperTips

It turns out that every time a pull or a push happens, hundreds of
commands like this are issued:

----------

scp -o ControlPath=/tmp//darcs-ssh/delroy@dev
delroy@dev:ersmithers_darcs_test/_darcs/patches/20061024182332-db8d4-6e9902a182892864b23efc86fa31c3352b7b9e4e.gz
/home/bakert/testdarcs/ersmithers_darcs_test/darcsHhasMC

----------

delroy@dev is where the repository I am push/pull-ing from/to is.

As each one takes up to a second this is why it is so slow.  Even if I
check out a new repository, make a one line change and then push it
still does hundreds of these.

I've tried issuing "darcs checkpoint" on the remote copy to no avail.
Probably unrelatedly if I try and do "darcs optimize --reorder-patches
--checkpoint" on my local copy I get:

----------

darcs: bug in darcs!
fromJust error at DarcsRepo.lhs:525 compiled 06:01:36 Sep 13 2006

----------

I'm not a darcs expert by any means and I'm really seeking any advice.
 Is the scp stuff normal?  Is there anything I can do to stop it or
speed it up?

Please let me know if any other information would help diagnose the problem.

The output of darcs --exact-version is:

----------

darcs compiled on Sep 13 2006, at 06:04:11
# configured Fri Jun 16 14:55:21 EDT 2006
./configure --no-create --no-recursion

Context:

[TAG 1.0.8
Tommy Pettersson <ptp@lysator.liu.se>**20060616160213]

----------

Any help you can give greatly appreciated.

Thanks,

Tom
msg1467 (view) Author: dbueno Date: 2007-02-07.23:19:26
On 2/7/07, Thomas David Baker <bakert@gmail.com> wrote:
> Hello,
>
> I'm trying to solve a problem with a repository.  Each pull and push
> takes about 3 minutes.
>
> It is a large-ish repository (109MB total, _darcs folder is 61MB, 2370
> changes according to "darcs changes").

I ran into a similar problem with a large repo. I put all my class
assignments in a version-controlled edu/ directory of about 384 MB.
After created a darcs repo (both remote and local), issuing a darcs
pull took a long time (I didn't wait for it to finish) and used 1GB of
memory before I killed it off. I was running darcs on a Dual 2.5GHz
PowerMac G5. Using CVS or SVN for the same repo succeeded quickly and
without incident.

Is there some non-obvious reason why the time taken is so different
from CVS and SVN? Or, more generally, why is it so slow?

-Denis
msg1468 (view) Author: droundy Date: 2007-02-07.23:25:24
On Wed, Feb 07, 2007 at 10:46:10PM +0000, Thomas David Baker wrote:
> Hello,

Hi.

...
> As each one takes up to a second this is why it is so slow.  Even if I
> check out a new repository, make a one line change and then push it
> still does hundreds of these.
> 
> I've tried issuing "darcs checkpoint" on the remote copy to no avail.

You probably need to create a tag.  Darcs uses tags to optimize the
treatment of old history.  Another possibility is that you've got some sort
of weird disjoint tags, and darcs optimize isn't able to figure out a good
way to order your partches.

> Probably unrelatedly if I try and do "darcs optimize --reorder-patches
> --checkpoint" on my local copy I get:
> 
> ----------
> 
> darcs: bug in darcs!
> fromJust error at DarcsRepo.lhs:525 compiled 06:01:36 Sep 13 2006

Have you sent in a bug report on this? This bug doesn't look familiar to
me, unless you've got a partial repository.  I don't have time to look into
this any time soon, but if you file a bug report, someone else may.
-- 
David Roundy
Department of Physics
Oregon State University
msg1469 (view) Author: tim Date: 2007-02-07.23:59:57
Is the reason why it's so slow really because of lack of tags, or because ssh is
invoked as an external process? The latter is pretty expensive. For a while I've
been wanting to see if there's an appropriate OpenSSL library that could be used
to avoid invoking ssh/scp as external commands, with appropriate Haskell
bindings. When I have copious free time maybe I'll try it. Unless someone else
does it first, that is; I wouldn't complain ;-)
msg1470 (view) Author: bakert Date: 2007-02-08.00:13:58
There are actually a LOT of tags in the system.  Perhaps that's the
problem in some way?  Every time we make a release (more than a
hundred times) that release is tagged through darcs.  Perhaps each tag
corresponds to an scp command and that's why I'm seeing so many???

As you can tell I'm not a darcs expert.  I figured that hundreds of
scp calls to push a single line change were not necessary and that I
need to eliminate those calls rather than speed them up.  I have no
real understanding of how darcs works, though, so perhaps I am being
daft?

Does it make sense for there to be hundreds of scp calls when pushing
or pulling from two  repositories with only a single patch
one-line-change different?  Or is that definitely incorrect?

Thanks,

Tom

On 08/02/07, Kirsten Chevalier <bugs@darcs.net> wrote:
>
> Kirsten Chevalier <catamorphism@gmail.com> added the comment:
>
> Is the reason why it's so slow really because of lack of tags, or because ssh is
> invoked as an external process? The latter is pretty expensive. For a while I've
> been wanting to see if there's an appropriate OpenSSL library that could be used
> to avoid invoking ssh/scp as external commands, with appropriate Haskell
> bindings. When I have copious free time maybe I'll try it. Unless someone else
> does it first, that is; I wouldn't complain ;-)
>
> ----------
> nosy: +catamorphism
>
> ____________________________________
> Darcs issue tracker <bugs@darcs.net>
> <http://bugs.darcs.net/issue408>
> ____________________________________
>
msg1471 (view) Author: jch Date: 2007-02-08.18:03:44
Please make sure you regularly tag the remote repo.  If that doesn't
help, consider using ``darcs optimize --reorder-patches''.

                                        Juliusz
msg1472 (view) Author: bakert Date: 2007-02-08.18:10:47
The remote repo has been tagged a lot, and recently.

I'll try "darcs optimize --reorder-patches''.  Is that on the remote
repo, or locally, or both?

T

On 08/02/07, Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr> wrote:
> Please make sure you regularly tag the remote repo.  If that doesn't
> help, consider using ``darcs optimize --reorder-patches''.
>
>                                         Juliusz
>
>
msg1473 (view) Author: droundy Date: 2007-02-08.19:16:40
On Thu, Feb 08, 2007 at 12:13:07AM +0000, Thomas David Baker wrote:
> There are actually a LOT of tags in the system.  Perhaps that's the
> problem in some way?  Every time we make a release (more than a
> hundred times) that release is tagged through darcs.  Perhaps each tag
> corresponds to an scp command and that's why I'm seeing so many???
> 
> As you can tell I'm not a darcs expert.  I figured that hundreds of
> scp calls to push a single line change were not necessary and that I
> need to eliminate those calls rather than speed them up.  I have no
> real understanding of how darcs works, though, so perhaps I am being
> daft?
> 
> Does it make sense for there to be hundreds of scp calls when pushing
> or pulling from two  repositories with only a single patch
> one-line-change different?  Or is that definitely incorrect?

Try running

cat _darcs/inventory

on your remote and local repositories.  If either of them list hundreds of
patches, then yes, darcs will need to fetch those hundreds of patches.
-- 
David Roundy
Department of Physics
Oregon State University
msg1474 (view) Author: bakert Date: 2007-02-08.22:01:27
I tried

darcs optimize --reorder-patches

on the remote repository.

It's been running for about 2 hours now.  Is that normal?

On the local repo:

$ wc -l _darcs/inventory
4

But on the remote repo:

$ wc -l _darcs/inventory
1836

This is on two repositories that are theoretically identical (one is a
new "get" of the other, with a one line change recorded on the new
repo).

I guess I am showing my ignorance of how darcs works.  Is there any
way to make the inventory on the remote repo smaller?  I suppose
optimize --reorder-patches is the right thing to be trying?  But
should it take so long?

Thanks for all your suggestions everyone.  Any further advice you have
for me will be gratefully received.

Thanks again,

T

On 08/02/07, Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr> wrote:
> Please make sure you regularly tag the remote repo.  If that doesn't
> help, consider using ``darcs optimize --reorder-patches''.
>
>                                         Juliusz
>
>
msg1475 (view) Author: bakert Date: 2007-02-08.22:45:20
Hi all,

You will be glad to hear that:

$ darcs optimize --reorder-patches

took about 3 hours but has done the business.  A pull that took 3
minutes now takes 3 seconds.

Thanks so much for all your help, especially to Juliusz who offered
that specific suggestion.

Tom

On 08/02/07, Thomas David Baker <bakert@gmail.com> wrote:
> I tried
>
> darcs optimize --reorder-patches
>
> on the remote repository.
>
> It's been running for about 2 hours now.  Is that normal?
>
> On the local repo:
>
> $ wc -l _darcs/inventory
> 4
>
> But on the remote repo:
>
> $ wc -l _darcs/inventory
> 1836
>
> This is on two repositories that are theoretically identical (one is a
> new "get" of the other, with a one line change recorded on the new
> repo).
>
> I guess I am showing my ignorance of how darcs works.  Is there any
> way to make the inventory on the remote repo smaller?  I suppose
> optimize --reorder-patches is the right thing to be trying?  But
> should it take so long?
>
> Thanks for all your suggestions everyone.  Any further advice you have
> for me will be gratefully received.
>
> Thanks again,
>
> T
>
>
>
>
> On 08/02/07, Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr> wrote:
> > Please make sure you regularly tag the remote repo.  If that doesn't
> > help, consider using ``darcs optimize --reorder-patches''.
> >
> >                                         Juliusz
> >
> >
>
msg1476 (view) Author: droundy Date: 2007-02-08.23:04:44
On Thu, Feb 08, 2007 at 10:00:34PM +0000, Thomas David Baker wrote:
> I tried
> 
> darcs optimize --reorder-patches
> 
> on the remote repository.
> 
> It's been running for about 2 hours now.  Is that normal?
> 
> On the local repo:
> 
> $ wc -l _darcs/inventory
> 4
> 
> But on the remote repo:
> 
> $ wc -l _darcs/inventory
> 1836

A darcs optimize probably would have been much faster than an optimize
--reorder-patches, and probably would have had as good an effect.

Next time you run optimize --reorder, however, it will be far faster, since
it will be reordering fewer patches.
-- 
David Roundy
Department of Physics
Oregon State University
msg1479 (view) Author: bakert Date: 2007-02-11.20:09:37
On 08/02/07, David Roundy <bugs@darcs.net> wrote:
> A darcs optimize probably would have been much faster than an optimize
> --reorder-patches, and probably would have had as good an effect.

since
> it will be reordering fewer patches.

Just out of interest and to finish this one off, when fixing the real
repository (I was working on copies) optimize was very fast but did not fix
the problem.  With --reorder-patches it took about 3 hours again and
everything is now hunky dory.

Thanks for your help everyone.

T
Attachments
History
Date User Action Args
2007-02-07 23:01:25bakertcreate
2007-02-07 23:19:35dbuenosetstatus: unread -> unknown
nosy: + dbueno
messages: + msg1467
2007-02-07 23:25:32droundysetnosy: darcs-users, droundy, tommy, beschmi, kowey, bakert, dbueno
messages: + msg1468
2007-02-08 00:00:09catamorphismsetnosy: + catamorphism
messages: + msg1469
2007-02-08 00:14:10bakertsetnosy: darcs-users, droundy, tommy, beschmi, kowey, catamorphism, bakert, dbueno
messages: + msg1470
2007-02-08 18:03:51jchsetnosy: + jch
messages: + msg1471
2007-02-08 18:10:53bakertsetnosy: darcs-users, droundy, jch, tommy, beschmi, kowey, catamorphism, bakert, dbueno
messages: + msg1472
2007-02-08 19:16:47droundysetnosy: darcs-users, droundy, jch, tommy, beschmi, kowey, catamorphism, bakert, dbueno
messages: + msg1473
2007-02-08 22:01:37bakertsetnosy: darcs-users, droundy, jch, tommy, beschmi, kowey, catamorphism, bakert, dbueno
messages: + msg1474
2007-02-08 22:45:24bakertsetnosy: darcs-users, droundy, jch, tommy, beschmi, kowey, catamorphism, bakert, dbueno
messages: + msg1475
2007-02-08 23:04:54droundysetnosy: darcs-users, droundy, jch, tommy, beschmi, kowey, catamorphism, bakert, dbueno
messages: + msg1476
2007-02-11 20:09:51bakertsetfiles: + unnamed
nosy: darcs-users, droundy, jch, tommy, beschmi, kowey, catamorphism, bakert, dbueno
messages: + msg1479
2007-03-08 11:16:43koweysetstatus: unknown -> resolved
nosy: darcs-users, droundy, jch, tommy, beschmi, kowey, catamorphism, bakert, dbueno
2009-08-06 17:41:58adminsetnosy: + markstos, jast, Serware, dmitry.kurochkin, darcs-devel, zooko, dagit, mornfall, simon, thorkilnaur, - darcs-users, droundy, jch, catamorphism, bakert, dbueno
2009-08-06 20:39:02adminsetnosy: - beschmi
2009-08-10 22:01:44adminsetnosy: + dbueno, darcs-users, jch, catamorphism, bakert, - markstos, darcs-devel, zooko, jast, dagit, Serware, mornfall
2009-08-25 17:55:32adminsetnosy: + darcs-devel, - simon
2009-08-27 13:48:42adminsetnosy: darcs-users, jch, tommy, kowey, darcs-devel, catamorphism, bakert, dbueno, thorkilnaur, dmitry.kurochkin
2009-10-24 00:42:55adminsetnosy: + tim, - catamorphism