darcs

Issue 12 Apply is very slow: O(N^2)

Title Apply is very slow: O(N^2)
Priority bug Status resolved
Milestone Resolved in
Superseder Nosy List darcs-devel, dmitry.kurochkin, kowey, markstos, radekg, thorkilnaur, tommy, wglozer
Assigned To
Topics Performance

Created on 2005-11-17.17:30:11 by radekg, last changed 2009-08-27.14:03:36 by admin.

Messages
msg29 (view) Author: radekg Date: 2005-11-17.17:30:10
All operations ran on Windows XP darcs 1.0.4. SSH taken from putty.
Exact version gives:
====================================================
darcs compiled on Nov 14 2005, at 22:48:32
# configured Mon Nov 14 22:35:33 USMST 2005
./configure --disable-mmap

Context:

[update web page to reflect 1.0.4 as latest stable source.
David Roundy <droundy@darcs.net>**20051113210144]

[TAG 1.0.4
David Roundy <droundy@darcs.net>**20051113134431]
====================================================

On server side I use Debian Testing with darcs 1.0.3. Exact version gives:
====================================================
darcs compiled on Sep  7 2005, at 15:07:24
# configured Tue May 24 18:34:35 EDT 2005
sh ./configure

Context:

[TAG 1.0.3
Tomasz Zielonka <tomasz.zielonka@gmail.com>**20050524215127]
====================================================

And now the operation that fails:
====================================================
At 19:43 I ran:

D:\test\blog>darcs put radekg@192.168.4.10:/home/radekg/blog2

It is 20:20 and darcs is still running. At server it is eating all the
CPU cycles and a lot of memory ps gives:

radekg    9365  0.0  0.1   2908   916 ?        Ss   19:45   0:00 bash
-c cd '/home/radekg/blog2' && darcs apply --all
radekg    9366 78.2 25.7 146728 132760 ?       R    19:45  28:58 darcs
apply --all

At 20:34 I finally press CTRL+C but on server ps gives:

radekg    9365  0.0  0.1   2908   916 ?        Ss   19:45   0:00 bash
-c cd '/home/radekg/blog2' && darcs apply --all
radekg    9366 78.6 25.7 146728 132760 ?       R    19:45  39:41 darcs
apply --all

So darcs is still running, and it is running for another 15 minutes
untill I kill it!

=================================

This, in my opinion, is really dangerous. And if not supported error
should be reported.

Regards,
   Radek.
msg38 (view) Author: droundy Date: 2005-11-18.14:06:00
I am wondering if this put bug might be solved by the patch sent in by Will
changing a readFile to readFilePS?
-- 
David Roundy
http://www.darcs.net
msg43 (view) Author: radekg Date: 2005-11-18.21:26:11
Will version with this patch appear eventually at http://glozer.net/darcs/daily/
? As far as I can see it isn't there yet. If it appears then I will check
whether this bug is solved otherwise I have no environment to build darcs myself
to check it out. Sorry.

Radek.
msg44 (view) Author: wglozer Date: 2005-11-18.22:52:11
> Will version with this patch appear eventually at
> http://glozer.net/darcs/daily/

It took a little while for the patch to be applied in the stable repo,
but
it is there now.  I just rebuilt 1.0.4 with all patches to date and 
republished it at http://glozer.net/darcs/darcs-1.0.4-win32.zip.

I tested get and put over SSH with putty, both worked.

Regards,
Will
msg45 (view) Author: radekg Date: 2005-11-19.10:37:13
I have downloaded pointed darcs version and tried it today. 
Exact version gives:
=================================================
darcs compiled on Nov 18 2005, at 14:58:13
# configured Fri Nov 18 14:52:36 USMST 2005
./configure --disable-mmap

Context:

[Fix patch selection for command changes --context
Daniel BŘnzli <daniel.buenzli@epfl.ch>**20051117145112]

[fix bug in amInRepository.
David Roundy <droundy@darcs.net>**20051117131208
 This is the bug that causes issue9.  I'm a bit uncertain about this
 patch--it looks obvious, but this is pretty subtle code.
]

[fix bug where darcs tries to delete open tempfile on win32
Will <will@glozer.net>**20051117023516]

[bump version to 1.0.5rc1
Tommy Pettersson <ptp@lysator.liu.se>**20051117111128]

[Fixed documentation of DARCS_GET_HTTP with curl.
Daniel BŘnzli <daniel.buenzli@epfl.ch>**20051117062547]

[add --dont-edit-description opposite to --edit-description.
David Roundy <droundy@darcs.net>**20051116115632]

[Flag superfluous input using Test::Builder
Florian Weimer <fw@deneb.enyo.de>**20051112160533

 This change makes sure that the test case that triggers this error
 condition fails in a clean way (especially if test planning is not used,
 but it should work with plans as well).
]

[English and markup fixes.
Dave Love <fx@gnu.org>**20051114224845]

[update web page to reflect 1.0.4 as latest stable source.
David Roundy <droundy@darcs.net>**20051113210144]

[TAG 1.0.4
David Roundy <droundy@darcs.net>**20051113134431]
=================================================

I was patient for 15 minutes or so this time but I have took closer look on what
was happening on the server side.
It seems darcs receives all the patches and then apply them all and then hangs
doing nothing and eating cpu clocks.
After Ctrl+C on windows side and 10 minutes later ps on server gives:

radekg    2069  0.0  0.2   2908  1348 ?        Ss   11:24   0:00 bash -c cd
'/home/radekg/blog2' && darcs apply --all
radekg    2070 92.0 25.8 146724 133560 ?       R    11:24  11:25 darcs apply --all

Radek.
msg46 (view) Author: radekg Date: 2005-11-19.10:46:55
Forgot to mention. On windows side there are two temporary files left after
Ctrl-C this time. One is about 28 MB, and other 0 bytes. I haven't noticed them
before so I guess they are result of some patch to darcs.

Radek.
msg47 (view) Author: droundy Date: 2005-11-19.12:01:34
Radoslaw, what happens if you run darcs initialize in an empty directory on
the server, and then run

darcs push -a server:empty_repo

from your windows box?

This ought to be equivalent to the put command, so if it works, then we'll
have isolated the problem just a bit.

Another test you could do would be to use tar and scp to copy the
repository directly to the linux server and then (after untarring) try
putting to a local directory on the server.

One final question: have you run darcs check on the win32 copy of the
repository? It may be that a corrupt repository could cause trouble like
this.  It shouldn't--but if the repo's corrupt, knowing that would help us
isolate the bug.

Finally, is this a repository that you could put on a publically accessible
server somewhere?
-- 
David Roundy
http://www.darcs.net
msg51 (view) Author: radekg Date: 2005-11-19.19:43:55
> Radoslaw, what happens if you run darcs initialize in an empty directory on
> the server, and then run
>
> darcs push -a server:empty_repo
>
> from your windows box?

Half an hour and same behaviour.

> Another test you could do would be to use tar and scp to copy the
> repository directly to the linux server and then (after untarring) try
> putting to a local directory on the server.

No, can't do. There is no put command in darcs 1.0.3 ;)

> One final question: have you run darcs check on the win32 copy of the
> repository? It may be that a corrupt repository could cause trouble like
> this.  It shouldn't--but if the repo's corrupt, knowing that would help us
> isolate the bug.

The repo is consistent.

> Finally, is this a repository that you could put on a publically accessible
> server somewhere?

http://zuzia.hopto.org/~radekg/blog2

Sorry for the transfer again - home server.

Cheers,
  Radek.
--
Galiera/Gallery: http://zdjecia.zuzia.homelinux.net/
msg53 (view) Author: radekg Date: 2005-11-19.20:26:55
> > Another test you could do would be to use tar and scp to copy the
> > repository directly to the linux server and then (after untarring) try
> > putting to a local directory on the server.
>
> No, can't do. There is no put command in darcs 1.0.3 ;)

Silly me!! Ofcourse I can.. I have simulated put with darcs init,
darcs push like you told me to.
The result is the same after half an hour! So it seems it is not 1.0.4
issue nor windows issue. It might be something with my repo (which is
consistent however large, especially in number of files).
Anyway it seems all patches are applied but darcs hangs in some kind
of loop (guessing).

Cheers,
  Radek
msg54 (view) Author: radekg Date: 2005-11-20.12:40:56
Going further I have left darcs with its "darcs put" simulation on
server overnight and It has eventually ended.
  So probably I have misinterpreted behaviour of darcs. It is just
slow, not in endless loop.
  Still I do not understand what is it doing after it has applied all patches.

Radek.
--
Galiera/Gallery: http://zdjecia.zuzia.homelinux.net/
msg55 (view) Author: droundy Date: 2005-11-20.13:22:13
On Sun, Nov 20, 2005 at 12:40:56PM +0000, Radoslaw Grzanka wrote:
> Going further I have left darcs with its "darcs put" simulation on server
> overnight and It has eventually ended.  So probably I have misinterpreted
> behaviour of darcs. It is just slow, not in endless loop.  Still I do not
> understand what is it doing after it has applied all patches.

Excellent.  In general, push/apply (which put is equivalent to, although
put could be optimized to be much smarter) is much less efficient than
pull.  This repo demonstrates a very embarrassing slowness in darcs, which
is a Good Thing (not the slowness, but that we have a repo that
demonstrates it.

If I don't have time to track this slowness down myself, do I have your
permission to make the repo available on darcs.net for other developers?

My guess is that there are two issues at work here, of which either one
could be the culprit.

The first is that when applying patch bundles, darcs holds the entire patch
bundle in memory (i.e. doesn't parse it lazily).  This is good in that it
means that we know before we start applying it that it isn't malformed, but
is expensive when working with such a large patch bundle.  We can perhaps
eliminate this (or confirm it) as the primary culprit by comparing pulling
vs applying times--or perhaps getting vs putting times.

The second is that you've got a lot of binary files in there, and darcs is
known to have speed issues with parsing binary patches--and related memory
issues, darcs wastes some space with an extra copy because of the poorly
designed patch format.  On our TODO list is a new patch format for binary
patches (as well as one for hunk patches) which will reduce the parsing
time to essentially nothing by storing actual binary chunks of data in the
patch file.  Not so convenient for reading the patch files by hand, but it
sure beats waiting all night to apply a patch.

The third possibility is that I don't know what I'm talking about, and
there's actually an operation in darcs apply that's O(N^2) or worse, and
it's just the huge number of files that is causing the trouble.  If this is
the case... well, it'll depend on what's causing the problem.
-- 
David Roundy
http://www.darcs.net
msg57 (view) Author: radekg Date: 2005-11-20.16:03:12
> If I don't have time to track this slowness down myself, do I have your
> permission to make the repo available on darcs.net for other developers?

Yes, ofcourse. It is GPL stuff there. You may want to put it on
somewhat faster server and if so please inform me, I will remove it
from my server which may be not very reliable (again, it is at my home
and is DSL based etc.)

Thank you for your explanation. I hope you will eventually make darcs
faster. I'm looking forward to see darcs further releases.

Regards,
  Radek.
msg58 (view) Author: droundy Date: 2005-11-21.12:52:10
I've found the cause of slowness, but don't have time to fix it.  The time
is all spent in sift_for_pending, which is O(N^2) in the number of
primitive patches being applied.  We need to be very careful when modifying
this, as if we don't, we'll get a return of the "buggy pending" errors.

There are two solutions (and eventually, ideally, we'd implement both).

In the case of put and many applies/pushes, there are no local changes.  In
this case, unless there is a conflict, sift_for_pending always gives an
empty patch.  If we check first whether there was a conflict or local
changes, we could skip the call to sift_for_pending.  This would be optimal
for these cases.

The other, longer fix would be to implement an O(NlogN) commute.  It's
possible, but very tedious (and potentially very bug-prone), and only pays
off for very large patches (such as this one).

I say we should implement the first for now, and save the tricky commute
for later.  But I'm not going to implement either at the moment, so this
bug is waiting for a volunteer...
-- 
David Roundy
http://www.darcs.net
msg2354 (view) Author: markstos Date: 2008-01-07.05:15:26
Can this bug be considered "resolved in unstable" with the landing of the new
conflictor code?
msg2501 (view) Author: droundy Date: 2008-01-14.22:43:24
No, this is actually not a conflict-related bug.  It's just related to the
simple handling of "pending" changes.

However, now that I look at the code, it seems that this performance bug has
been fixed.  I think I fixed it over Christmas vacation, so I'll mark this as
resolved-in-unstable.

The approach I used was actually to track the result of the merge with pending
changes separately, which avoids special cases, and makes things faster at the
same time.  Previously we performed the merge and then forgot the results, which
was silly (but easy).
History
Date User Action Args
2005-11-17 17:30:11radekgcreate
2005-11-18 14:04:15droundysettopic: + Windows
nosy: + wglozer
2005-11-18 14:06:00droundysetstatus: unread -> unknown
nosy: droundy, tommy, radekg, wglozer
messages: + msg38
2005-11-18 21:26:11radekgsetnosy: droundy, tommy, radekg, wglozer
messages: + msg43
2005-11-18 22:52:11wglozersetnosy: droundy, tommy, radekg, wglozer
messages: + msg44
2005-11-19 10:37:14radekgsetnosy: droundy, tommy, radekg, wglozer
messages: + msg45
2005-11-19 10:46:55radekgsetnosy: droundy, tommy, radekg, wglozer
messages: + msg46
2005-11-19 12:01:34droundysetnosy: droundy, tommy, radekg, wglozer
messages: + msg47
2005-11-19 19:43:55radekgsetnosy: droundy, tommy, radekg, wglozer
messages: + msg51
2005-11-19 20:26:55radekgsetnosy: droundy, tommy, radekg, wglozer
messages: + msg53
2005-11-20 12:40:56radekgsetnosy: droundy, tommy, radekg, wglozer
messages: + msg54
2005-11-20 13:22:13droundysetnosy: droundy, tommy, radekg, wglozer
messages: + msg55
2005-11-20 16:03:14radekgsetnosy: droundy, tommy, radekg, wglozer
messages: + msg57
2005-11-21 12:52:11droundysetnosy: droundy, tommy, radekg, wglozer
messages: + msg58
title: Darcs fails to put repo through ssh -> Apply is very slow: O(N^2)
2007-07-19 07:59:33koweysettopic: - Windows
nosy: + kowey, beschmi
2007-07-31 18:31:00koweysettopic: + Performance
2008-01-07 05:15:28markstossettopic: + Conflicts
nosy: + markstos
messages: + msg2354
2008-01-14 22:43:25droundysetstatus: unknown -> resolved-in-unstable
nosy: markstos, radekg, droundy, wglozer, tommy, kowey, beschmi
topic: - Conflicts
messages: + msg2501
2008-09-04 21:27:26adminsetstatus: resolved-in-unstable -> resolved
nosy: + dagit
2008-09-04 21:27:38adminsetnosy: droundy, tommy, beschmi, kowey, radekg, markstos, wglozer, dagit
2009-08-06 17:37:10adminsetnosy: + jast, Serware, dmitry.kurochkin, darcs-devel, zooko, mornfall, simon, thorkilnaur, - droundy, radekg, wglozer
2009-08-06 20:34:11adminsetnosy: - beschmi
2009-08-10 21:43:52adminsetnosy: + radekg, wglozer, - darcs-devel, zooko, jast, Serware, mornfall
2009-08-10 23:53:15adminsetnosy: - dagit
2009-08-25 17:51:03adminsetnosy: + darcs-devel, - simon
2009-08-27 14:03:36adminsetnosy: tommy, kowey, radekg, markstos, wglozer, darcs-devel, thorkilnaur, dmitry.kurochkin