darcs

Issue 711 whatsnew -ls is very slow (note the --look-for-adds)

Title whatsnew -ls is very slow (note the --look-for-adds)
Priority bug Status resolved
Milestone Resolved in
Superseder Nosy List darcs-devel, dmitry.kurochkin, ganesh, kowey, markstos, thorkilnaur, tommy, wdobler
Assigned To ganesh
Topics Darcs2, Performance

Created on 2008-02-27.21:55:21 by wdobler, last changed 2009-08-27.14:18:02 by admin.

Messages
msg3680 (view) Author: wdobler Date: 2008-02-27.21:55:15
I find that darcs2 takes ~ 60 times longer to do `darcs w -ls' on a largish
real-life SVN tree. Here's some timings:

  prompt>  rm -rf _darcs ; /usr/bin/darcs init ; time /usr/bin/darcs w -ls >
/dev/null
    6.912u 0.232s 0:07.22 98.8%     0+0k 0+0io 0pf+0w
    7.188u 0.296s 0:07.52 99.3%     0+0k 0+0io 0pf+0w

  prompt>  rm -rf _darcs ; darcs2 init [--hashed] ; time darcs2 w -ls > /dev/null
    396.504u 2.352s 6:52.75 96.6%   0+0k 0+0io 0pf+0w
    446.899u 2.480s 7:49.69 95.6%   0+0k 0+0io 1pf+0w
    386.548u 2.276s 6:37.59 97.7%   0+0k 0+0io 0pf+0w

  prompt>  rm -rf .hg ; hg init ; time hg add > /dev/null
    0.856u 0.244s 0:01.11 98.1%     0+0k 0+0io 0pf+0w
    0.888u 0.240s 0:01.11 100.9%    0+0k 0+0io 0pf+0w

My darcs versions:

  prompt> /usr/bin/darcs --version
    1.0.9rc1 (release candidate 1)

  prompt>  darcs2 --version
    2.0.0pre3 (unknown)
    [static binary downloaded from the web site]

The tree I am trying to import contains 6467 files (mostly Java + Python) of ~
42 MB in total.

The file sizes have the following statistics:

average size = 6.5kB
quartiles:
   0% (=min) :  2 B
  25%        : 712 B
  50%        : 1.8 kB  (= median)
  75%        : 4.8 kB
 100% (=max) : 1.6 MB

I cannot give access to the data, but if nobody else has found this type of
slowdown, I could try to synthesize directory trees with similar statistics.
msg3684 (view) Author: markstos Date: 2008-02-28.03:52:35
Considering the significant nature of this regression, I nominate this issue to
be looked into further for the Darcs 2.0 release.

However, I understand the issue is largely with files *not* managed by darcs,
and this test case is sort of a "worst case scenario", because the tree is large
and no files are managed by darcs when the command is run. 

This wishlist item already in the system are related:

issue79: whatsnew -ls loads complete contents of added files into memory
msg3747 (view) Author: droundy Date: 2008-03-04.16:36:49
I believe this bug may now be fixed.  There were some strictness bugs introduced
that led many operations to be O(N) in the length of the history of the repository.

David
msg3776 (view) Author: wdobler Date: 2008-03-05.22:48:27
I still seem to get the same problem with the latest version obtained via

  darcs2 get http://darcs.net/repos/unstable-hashed

about an hour ago. Should the fix be part of that version?

FWIW, the latest patch included is

  Wed Mar  5 18:53:45 CET 2008  David Roundy <droundy@darcs.net>
  * resolve issue726: truncate progress messages to 80 chars.
msg3777 (view) Author: droundy Date: 2008-03-05.23:47:49
Oh, I didn't read this very carefully.  Does the answer change if you just run

darcs whatsnew -s

?
msg3787 (view) Author: wdobler Date: 2008-03-06.22:36:23
`darcs2 whatsnew -s' only takes a small fraction of a second.
As does `darcs2 whatsnew' without any options.
msg3788 (view) Author: wdobler Date: 2008-03-06.22:43:04
The output of `darcs2 w -ls --debug-verbose' is

  Beginning identifying repository .
  Done identifying repository .
  diffing dir...
  diffing dir...

..and then nothing until I interrupt (don't have time to wait for  or 7 minutes
right now).
msg3792 (view) Author: droundy Date: 2008-03-06.22:54:48
On Thu, Mar 06, 2008 at 10:36:24PM -0000, Wolfgang Dobler wrote:
> `darcs2 whatsnew -s' only takes a small fraction of a second.
> As does `darcs2 whatsnew' without any options.

In that case, perhaps you can tell me how many files are detected by
whatsnew -ls? Before I can get a clear picture of what's happening on your
system, I have to have some idea what darcs is doing.  I presume you have a
huge number of files that aren't added.  Are any/many of them large?
-- 
David Roundy
Department of Physics
Oregon State University
msg3813 (view) Author: markstos Date: 2008-03-07.14:58:26
If there are lots of added files, than this issue could be the same as 
http://bugs.darcs.net/issue79

"whatsnew -ls loads complete contents of added files into memory"

That bug, while important, is not a regression, although perhaps the performance
of how the bug exposes itself might be.
msg3816 (view) Author: droundy Date: 2008-03-07.15:24:17
No, Mark, it can't be issue79, since that isn't a regression and wdobling is
reporting a major slowdown relative to darcs-1.  On this other hand, he hasn't
given us enough information to reproduce this bug.

A quick check involving 600 binary files and 1000 text files adding up to 3.6G
show a small performance regression from 1.1s to 2.5s.  In view of that, I'll
relabel this as a performance problem, and remove it from FixForDarcs20.

David
msg3841 (view) Author: wdobler Date: 2008-03-07.23:19:13
I have played a bit more with the tree, see below.

> In that case, perhaps you can tell me how many files are detected by whatsnew -ls?
6927 files

> I presume you have a huge number of files that aren't added.  Are any/many of
them large?
The largest ignored file is ~ 1.6 GB. There's 2 ignored files above 1GB, ~30
above 300kB, ~ 90 above 100kB.

> If there are lots of added files, than this issue could be the same as
http://bugs.darcs.net/issue79

Memory consumption of darcs 2 stagnates around 80kB (VIRT)  56m (RES) 4,2kB
(SHR), which is 3.0% of my RAM (2GB). darcs1 uses a maximum of 1.4%.
I don't think this easily explains a slow-down by ~60.

> A quick check involving 600 binary files and 1000 text files adding up to 3.6G
show a small performance regression from 1.1s to 2.5s.
That's fewer files than I had, but larger total size. So the problem is likely
the number of files, not file size.


The whole issue is highly nonlinear in the size of the tree. I have stripped
down the tree to a few subdirectories that have the largest effect, and now the
slowdown is around 3 to 4 only.
However, if I replicate the main subdirectory (i.e. double the problem), I find
a slowdown around 10.

In fact, you can easily reproduce this, using the darcs unstable repo:
prompt>  darcs get http://darcs.net/repos/unstable; cd unstable
prompt>  (for darcs in /usr/bin/darcs /usr/bin/darcs /usr/bin/darcs darcs2
darcs2 darcs2 ; do rm -rf _darcs ; $darcs init ; time $darcs w -ls > /dev/null;
done)
  0.524u 0.040s  0.56  99.98%
  0.512u 0.044s  0.55  99.82%
  0.576u 0.068s  0.64  99.52%
  0.544u 0.032s  0.57  99.91%
  0.460u 0.036s  0.49  100.00%
  0.456u 0.040s  0.49  100.00%

Not much of a difference, right? Now we increase the repo size:

prompt>  mkdir large; cp -r src large/src1; cp -r src large/src2

and get

  0.680u 0.052s  0.73  100.00%
  0.472u 0.028s  0.50  99.02%
  0.472u 0.012s  0.48  99.79%
  1.364u 0.052s  1.42  99.22%
  1.236u 0.060s  1.32  98.14%
  1.140u 0.036s  1.17  99.71%

Increasing again gives
prompt>  mkdir larger; cp -r large larger/large1; cp -r large larger/large2
  1.268u 0.064s  1.33  99.80%
  0.920u 0.052s  0.97  99.30%
  0.944u 0.036s  0.98  99.21%
  4.000u 0.104s  4.11  99.79%
  4.052u 0.076s  4.14  99.71%
  3.964u 0.072s  4.04  99.74%

And after the next iteration:
wdobler@frenesi:~/tmp/unstable> mkdir yetlarger; cp -r larger yetlarger/larger1;
cp -r larger yetlarger/larger2;
  2.484u 0.100s  2.59  99.76%
  2.360u 0.104s  2.47  99.67%
  2.408u 0.060s  2.47  99.79%
  17.701u 0.260s  18.11  99.17%
  17.833u 0.200s  18.16  99.30%
  18.033u 0.224s  18.28  99.87%
msg4067 (view) Author: wdobler Date: 2008-03-28.17:23:54
So can you reproduce the issue? Is there anything else I can do?

While this is truly a `performance problem' and apparently doesn't bite for
small trees, it is still a regression and I think it would be a pity to put off
users with large trees because of a problem that mostly wasn't there in Darcs 1.x.
msg4069 (view) Author: markstos Date: 2008-03-28.18:11:06
> While this is truly a `performance problem' and apparently doesn't bite for
> small trees, it is still a regression and I think it would be a pity to put off
> users with large trees because of a problem that mostly wasn't there in Darcs 1.x.

You are right that'd we like this to work. Patches or other 
contributions to help the effort along directly or indirectly are welcome.

    Mark
msg4070 (view) Author: droundy Date: 2008-03-28.18:15:38
On Fri, Mar 28, 2008 at 05:23:55PM -0000, Wolfgang Dobler wrote:
> So can you reproduce the issue? Is there anything else I can do?

Yes, I can reproduce this, and I can't think of anything you could do,
unless you're willing to code up a special-case routine for darcs whatsnew
-s.

> While this is truly a `performance problem' and apparently doesn't bite for
> small trees, it is still a regression and I think it would be a pity to put off
> users with large trees because of a problem that mostly wasn't there in Darcs 1.x.

I consider this a not-so-important regression, just because this is an
extreme use case with a very easy workaround: don't use --look-for-adds.
I'm sorry, but optimizing this sort of extreme case just doesn't seem like
a good use of my time.  The regression, I'm certain, came about as a result
of a fix for a very subtle and hard-to-track-down sort of bug.
-- 
David Roundy
Department of Physics
Oregon State University
msg4079 (view) Author: wdobler Date: 2008-03-28.19:59:06
David Roundy wrote:
> Yes, I can reproduce this, and I can't think of anything you could
> do, unless you're willing to code up a special-case routine for
> darcs whatsnew -s.

If I knew Haskell, I probably would.

> I consider this a not-so-important regression, just because this
> is an extreme use case

Hmm, probably not that extreme. If darcs was written in a language like
Java rather than Haskell (don't panic, this is just a thought
experiment ;-), the source file count would probably be up by a factor
of 5 or so, maybe even more. So the slowdown could be quite significant.

Also, I think that reviewers are likely to use some largish tree as
a data point.

> with a very easy workaround: don't use --look-for-adds.

So you are saying that adding the files wouldn't be that incredibly
slow and I should just get rid of my habit of running `darcs w -ls'
before adding + recording? I'll try this -- knowing such a thing
would already be very helpful.
  Is it worth putting this in the manual? I could cook up a
documentation patch at least. Or should this be in the Wiki somewhere
once Darcs2 is out?

> I'm sorry, but optimizing this sort of extreme case just doesn't
> seem like a good use of my time.

I absolutely trust in you finding the optimal priorities for the
darcs project.

> The regression, I'm certain, came about as a result of a fix
> for a very subtle and hard-to-track-down sort of bug.

OK, knowing this also makes it easier to live with the current
state.

W o l f g a n g
msg4081 (view) Author: droundy Date: 2008-03-28.20:38:05
On Fri, Mar 28, 2008 at 07:59:09PM -0000, Wolfgang Dobler wrote:
> Also, I think that reviewers are likely to use some largish tree as
> a data point.

At this point, I care very little what reviewers say.  It'd be nice to not
get more users.

> > with a very easy workaround: don't use --look-for-adds.
> 
> So you are saying that adding the files wouldn't be that incredibly
> slow and I should just get rid of my habit of running `darcs w -ls'
> before adding + recording? I'll try this -- knowing such a thing
> would already be very helpful.
>   Is it worth putting this in the manual? I could cook up a
> documentation patch at least. Or should this be in the Wiki somewhere
> once Darcs2 is out?

You never said you wanted to add the files to your repository.  That would
be another question.  I couldn't say how darcs will behave with this many
files in the repository.  Certainly adding them will be painful.
-- 
David Roundy
Department of Physics
Oregon State University
msg4084 (view) Author: wdobler Date: 2008-03-28.21:22:21
> You never said you wanted to add the files to your repository.
> That would be another question.  I couldn't say how darcs will behave
> with this many files in the repository.  Certainly adding them will
> be painful.

It wasn't that bad the last time I did so with darcs1. Although, IIRC, I 
took a mental note not to add the whole tree at once, but to proceed
one subdirectory at a time.

Just to give you an idea what I am trying to do here: We have this largish
Java project that is under SVN. When I am working on some complex new feature,
I often don't want to commit my wild experiments to the SVN repository.
  But I am hooked on Revision Control and terribly afraid of losing something
that worked at one time.
  Enter darcs. I `darcs init', adjust the boring file (this is where I use
`whatsnew -ls' a lot), then add everything and record.
  Then I work, comitting to darcs every now and then, and only comitting to
the central SVN repo once I think a (sub-)feature makes sense. When I am done
and everything is committed to SVN, I can simply remove the whole _darcs
directory.

This is not the standard darcs user story, but I am convinced that there are
others working the same way. It is sort of messy at times, because I get lots
of upstream changes in between, but I find darcs very useful for this
approach.

For the time being, I'll stick with darcs1 for this sort of thing.
msg4085 (view) Author: markstos Date: 2008-03-28.21:25:44
> This is not the standard darcs user story, but I am convinced that there are
> others working the same way. It is sort of messy at times, because I get lots
> of upstream changes in between, but I find darcs very useful for this
> approach.

I sometimes use darcs in a similar ad-hoc way, but for smaller projects 
where the issue is not triggered.

   Mark
msg6548 (view) Author: ganesh Date: 2008-10-31.08:38:56
I've been doing some investigation of this bug. So far I've focused on the
situation where there are a lot of files in a single directory, though much of
the work I've done there may well apply to your directory structures too.

I've made several speedups, including removing several quadratic slowdowns,
though none of them have made it into mainline darcs yet due to a backlog from
the darcs sprint (which is where I started this work too).

In the meantime, if you feel like trying the branch at
http://code.haskell.org/darcs/sprint-slurpy/, I'd be interested to hear how it
behaves for you.

It'd also be useful for me to have a synthesised tree as you offer in your
original report, though I suspect that testing with very artificial trees as
I've been doing so far is probably good enough.
msg6556 (view) Author: wdobler Date: 2008-11-02.22:30:47
Ganesh, your patches are definitively a huge step forwards.

But before showing the figures, I must mention that the extreme slowdown I
reported... has gone. And I don't know why.
 I now get about 20 seconds, rather than 6 or 7 minutes for the same repo. What
has changed on the system is
(a) an upgrade from Ubuntu Dapper (6.06) to Hardy (8.04), which obviously
changed many libraries, and
(b) the file system changed from ext3 to xfs.

At some point, I though that the problem was probably related to the GHC bug
that led to slow execution of some darcs commands, but I think I am still using
the same versions as before, so that doesn't make that much sense.

Anyway, here are the numbers I get on the tree in question, measured using
  rm -rf _darcs ; darcs init ; time darcs w -ls > /dev/null
three times and reporting only the best timing (they were all comparable).

A. On a desktop machine with reasonably fast hard drive

1.0.9 (release):
11.052u 0.304s 0:11.35 100.0%   0+0k 0+8io 0pf+0w

2.0.0 (unknown):
21.417u 0.424s 0:21.86 99.8%    0+0k 0+0io 0pf+0w

2.1.0 (release):
25.097u 1.488s 0:25.23 105.3%   0+0k 0+0io 0pf+0w

2.1.0 (+ 276 patches):
5.380u 0.244s 0:05.51 101.9%    0+0k 0+0io 0pf+0w


B. On a laptop

1.0.9 (release)
26.393u 0.780s 0:27.19 99.9%    0+0k 0+8io 0pf+0w

2.0.1rc2 (2.0.1rc2 (+ -1 patch)):
90.353u 2.276s 1:31.67 101.0%   0+0k 0+0io 0pf+0w

2.1.0 (release):
89.977u 2.860s 1:31.98 100.9%   0+0k 0+8io 0pf+0w

2.1.0 (+ 276 patches):
12.732u 1.060s 0:12.99 106.1%   0+0k 0+8io 0pf+0w


So your patches really save the day, making `darcs w -ls' faster by a factor of
between close to 5 and 7!
msg6578 (view) Author: ganesh Date: 2008-11-03.22:59:43
Wolfgang: great to know it helps in your situation. Most/all of the work is in
unstable now, btw.

All: Should we close this bug? We've certainly stopped whatsnew -ls being "very
slow", but David did allude to making a fast path for whatsnew -s that I didn't
quite understand the details of. Perhaps we should open a new ticket for that if
we can record what would need to be done.
msg6591 (view) Author: markstos Date: 2008-11-04.15:48:32
Marking as "resolved-in-unstable" as ganesh reports that it is.  

Feel free to re-open if you see the issue is still present.
History
Date User Action Args
2008-02-27 21:55:21wdoblercreate
2008-02-27 21:57:23wdoblersetnosy: droundy, tommy, beschmi, kowey, wdobler
title: Darcs2 performance regression with `darcs whatis' -> Darcs2 performance regression with `darcs whatsnew'
2008-02-28 03:52:42markstossetstatus: unread -> unknown
topic: + Target-2.0
messages: + msg3684
nosy: + markstos
2008-03-04 16:36:51droundysetstatus: unknown -> resolved
nosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3747
2008-03-05 22:48:28wdoblersetstatus: resolved -> unknown
nosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3776
2008-03-05 23:47:50droundysetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3777
2008-03-06 22:36:24wdoblersetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3787
2008-03-06 22:43:05wdoblersetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3788
2008-03-06 22:54:49droundysetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3792
2008-03-07 14:58:27markstossetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3813
2008-03-07 15:24:20droundysettopic: - Regression, Target-2.0
nosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3816
title: Darcs2 performance regression with `darcs whatsnew' -> Darcs2 performance problem with `darcs whatsnew'
2008-03-07 23:19:15wdoblersetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3841
2008-03-28 17:23:55wdoblersetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg4067
2008-03-28 18:11:08markstossetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg4069
2008-03-28 18:15:41droundysetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg4070
2008-03-28 19:59:09wdoblersetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg4079
2008-03-28 20:38:07droundysetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg4081
2008-03-28 21:22:22wdoblersetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg4084
2008-03-28 21:25:45markstossetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg4085
2008-05-14 16:56:32koweysetnosy: + dagit
title: Darcs2 performance problem with `darcs whatsnew' -> whatsnew -ls is very slow (note the --look-for-adds)
2008-10-31 08:38:58ganeshsetstatus: unknown -> has-patch
nosy: + dmitry.kurochkin, ganesh, simon, thorkilnaur
messages: + msg6548
assignedto: ganesh
2008-11-02 22:30:50wdoblersetnosy: droundy, tommy, beschmi, kowey, markstos, dagit, ganesh, simon, thorkilnaur, wdobler, dmitry.kurochkin
messages: + msg6556
2008-11-03 22:59:45ganeshsetnosy: droundy, tommy, beschmi, kowey, markstos, dagit, ganesh, simon, thorkilnaur, wdobler, dmitry.kurochkin
messages: + msg6578
2008-11-04 15:48:34markstossetstatus: has-patch -> resolved-in-unstable
nosy: droundy, tommy, beschmi, kowey, markstos, dagit, ganesh, simon, thorkilnaur, wdobler, dmitry.kurochkin
messages: + msg6591
2009-04-22 03:28:21twbsetstatus: resolved-in-unstable -> resolved
nosy: droundy, tommy, beschmi, kowey, markstos, dagit, ganesh, simon, thorkilnaur, wdobler, dmitry.kurochkin
2009-08-06 17:55:08adminsetnosy: + jast, Serware, darcs-devel, zooko, mornfall, - droundy, ganesh, wdobler
2009-08-06 20:59:04adminsetnosy: - beschmi
2009-08-10 22:16:22adminsetnosy: + wdobler, ganesh, - darcs-devel, zooko, jast, Serware, mornfall
2009-08-11 00:07:47adminsetnosy: - dagit
2009-08-25 17:33:10adminsetnosy: + darcs-devel, - simon
2009-08-27 14:18:02adminsetnosy: tommy, kowey, markstos, darcs-devel, ganesh, thorkilnaur, wdobler, dmitry.kurochkin