darcs

Issue 49 Severe performance problem on large import

Title Severe performance problem on large import
Priority bug Status duplicate
Milestone Resolved in
Superseder whatsnew -ls loads complete contents of files into memory
View: 79
Nosy List darcs-devel, dmitry.kurochkin, jch, kowey, markstos, steve, thorkilnaur, tommy, zooko
Assigned To
Topics Performance

Created on 2005-12-08.12:08:00 by steve, last changed 2009-08-27.13:53:41 by admin.

Messages
msg184 (view) Author: steve Date: 2005-12-08.12:07:59
# on OSX Server 10.3.9 running darcs 1.0.5rc2 (release candidate )

mkdir testdarcs
cd testdarcs
curl http://io.urbanape.com/release/IoFull-2005-12-06.tar.gz -o 
IoFull-2005-12-06.tar.gz
gunzip *.gz
tar -xf *.tar
cd release/IoFull-2005-12-06
make
darcs initialize
darcs add -r *
darcs whatsnew

# the bug is that whatsnew never returns

-- Steve
msg193 (view) Author: droundy Date: 2005-12-10.17:47:21
How long did you wait? The whatsnew completes for me after four hours on my
PIII.  So this isn't a failure to complete bug, but rather an extreme
slowness bug.  Not that this makes it any less important to fix, but it'll
guide whoever wants to work on this.  I'm guessing we've got something
O(N^2) in there, probably the sorting of the pending patches with the
"diff" patches.

Indeed, if we leave out the "add -r" step and instead run whatsnew -l
--no-summary, the resulting display still takes a while, but beginning
showing output immediately.  So I'd say it's definitely the sort
operation.  I think we really do want to do this sort, since otherwise the
addfiles will all come before any of the file contents, but it would be
nice to make it O(NlogN).
-- 
David Roundy
http://www.darcs.net
msg194 (view) Author: steve Date: 2005-12-10.19:24:40
On 10-Dec-05, at AM 09:47, David Roundy wrote:
> David Roundy <droundy@darcs.net> added the comment:
>
> How long did you wait? The whatsnew completes for me after four hours 
> on my
> PIII.  So this isn't a failure to complete bug, but rather an extreme
> slowness bug.  Not that this makes it any less important to fix, but 
> it'll
> guide whoever wants to work on this.  I'm guessing we've got something
> O(N^2) in there, probably the sorting of the pending patches with the
> "diff" patches.

I waited about 30 minutes on a dual 2.5 Ghz G5 with 2GB RAM that was 
otherwise idle.

-- Steve
msg408 (view) Author: zooko Date: 2006-01-20.17:41:05
See also:

http://bugs.darcs.net/issue80
http://bugs.darcs.net/issue99

and most especially:

http://bugs.darcs.net/issue79
msg409 (view) Author: zooko Date: 2006-01-20.17:55:17
And this:

http://www.abridgegame.org/pipermail/darcs-devel/2006-January/003970.html
msg581 (view) Author: zooko Date: 2006-03-26.09:47:38
Note that Jason Dagit <dagit@codersbase.com> submitted a patch which
dramatically improved performance in some cases, possibly including this case,
but that it was rejected by the darcs maintainer.  The reason for the rejection
was unclear to me -- apparently the patch "shouldn't have been necessary", so
hopefully someone will fix this problem in the right way.  In the meantime,
people who are unable to import their trees into darcs because of this bug might
consider trying Jason Dagit's patch themselves.
msg603 (view) Author: jch Date: 2006-04-07.22:51:24
The reason for the rejection is simply that Ian disagrees, and I consider Ian
the better Haskell programmer.

I'm adding Ian to the CC list; I guess I really ought to apply that patch if he
doesn't reply.
msg609 (view) Author: igloo Date: 2006-04-08.00:00:48
On Fri, Apr 07, 2006 at 10:51:26PM +0000, Juliusz Chroboczek wrote:
> 
> Juliusz Chroboczek <jch@pps.jussieu.fr> added the comment:
> 
> The reason for the rejection is simply that Ian disagrees, and I consider Ian
> the better Haskell programmer.
> 
> I'm adding Ian to the CC list; I guess I really ought to apply that patch if he
> doesn't reply.

I'm a bit lost here. If you mean the patch I think you do then it
changes the behaviour of record but not whatsnew, which this bug seems
to be about.

Darcs generally doesn't seem to cope with pending patches very well,
e.g. you can end up with the contents of pending added files in
_darcs/patches/pending. My guess, based on David's message (193), is
that in this case darcs is constructing the hunk patches for pending
adds, and hence reading the pending added files and building the lists
of lines, in memory in order to do the sort, and that this is the cause
of the slowdown.

A proper fix (for both problems I mention above) probably means handling
pending patches better internally. The new hunk format will paper over
the problem though, I think (unless the problem is with binary files, in
which case I don't think the new xor format will help significantly,
although a +data-data format would).

Thanks
Ian
msg622 (view) Author: jch Date: 2006-04-23.18:57:32
> I'm a bit lost here. If you mean the patch I think you do then it
> changes the behaviour of record but not whatsnew, which this bug seems
> to be about.

Sorry for that, I only looked at Zooko's message (which is indeed about
the patch I think you think I mean).

For extra fun, if you look at the Darcs repo, you'll find:

Fri Jan 21 19:52:41 CET 2005  Juliusz Chroboczek <jch@pps.jussieu.fr>
  * Don't slurp twice when getting.

Fri Jan 21 19:52:41 CET 2005  Juliusz Chroboczek <jch@pps.jussieu.fr>
  UNDO: Don't slurp twice when getting.
msg2950 (view) Author: markstos Date: 2008-01-31.03:49:22
This bug is a duplicate of issue79 (or really vice-versa), which is about
"whatsnew" seeming to load added files into memory.
History
Date User Action Args
2005-12-08 12:08:00stevecreate
2005-12-09 14:19:43droundysetnosy: droundy, tommy, steve
2005-12-10 17:47:22droundysetstatus: unread -> unknown
nosy: droundy, tommy, steve
messages: + msg193
2005-12-10 19:24:41stevesetnosy: droundy, tommy, steve
messages: + msg194
2005-12-17 22:08:46jchsetnosy: droundy, tommy, steve
title: whatsnew bug? -> Severe performance problem on large import
2006-01-20 17:41:05zookosetnosy: + zooko
messages: + msg408
2006-01-20 17:55:18zookosetnosy: droundy, tommy, zooko, steve
messages: + msg409
2006-03-26 09:47:41zookosetnosy: droundy, tommy, zooko, steve
messages: + msg581
2006-04-07 22:51:26jchsetnosy: + igloo, jch
messages: + msg603
2006-04-08 00:00:50igloosetnosy: droundy, jch, tommy, zooko, steve, igloo
messages: + msg609
2006-04-23 18:57:33jchsetnosy: droundy, jch, tommy, zooko, steve, igloo
messages: + msg622
2007-07-16 01:20:25koweysettopic: + Performance
nosy: + kowey, beschmi
2008-01-31 03:49:23markstossetstatus: unknown -> duplicate
nosy: + markstos
superseder: + whatsnew -ls loads complete contents of files into memory
messages: + msg2950
2009-08-06 17:46:32adminsetnosy: + jast, Serware, dmitry.kurochkin, darcs-devel, dagit, mornfall, simon, thorkilnaur, - droundy, jch, steve, igloo
2009-08-06 20:42:14adminsetnosy: - beschmi
2009-08-10 22:06:19adminsetnosy: + steve, igloo, jch, - darcs-devel, jast, dagit, Serware, mornfall
2009-08-25 17:19:25adminsetnosy: + darcs-devel, - igloo
2009-08-25 17:58:20adminsetnosy: - simon
2009-08-27 13:53:41adminsetnosy: jch, tommy, kowey, markstos, darcs-devel, zooko, steve, thorkilnaur, dmitry.kurochkin