Issue711

Title whatsnew -ls is very slow (note the --look-for-adds)
Priority bug Status chatting
Superseder Nosy List beschmi, dagit, droundy, kowey, markstos, tommy, wdobler
Assigned To Topics Darcs2, Performance

Created on 2008-02-27.21:55:21 by wdobler, last changed 2008-05-14.16:56:32 by kowey.

Messages
msg4085 (view) Author: markstos Date: 2008-03-28.21:25:44
> This is not the standard darcs user story, but I am convinced that there are
> others working the same way. It is sort of messy at times, because I get lots
> of upstream changes in between, but I find darcs very useful for this
> approach.

I sometimes use darcs in a similar ad-hoc way, but for smaller projects 
where the issue is not triggered.

   Mark
msg4084 (view) Author: wdobler Date: 2008-03-28.21:22:21
> You never said you wanted to add the files to your repository.
> That would be another question.  I couldn't say how darcs will behave
> with this many files in the repository.  Certainly adding them will
> be painful.

It wasn't that bad the last time I did so with darcs1. Although, IIRC, I 
took a mental note not to add the whole tree at once, but to proceed
one subdirectory at a time.

Just to give you an idea what I am trying to do here: We have this largish
Java project that is under SVN. When I am working on some complex new feature,
I often don't want to commit my wild experiments to the SVN repository.
  But I am hooked on Revision Control and terribly afraid of losing something
that worked at one time.
  Enter darcs. I `darcs init', adjust the boring file (this is where I use
`whatsnew -ls' a lot), then add everything and record.
  Then I work, comitting to darcs every now and then, and only comitting to
the central SVN repo once I think a (sub-)feature makes sense. When I am done
and everything is committed to SVN, I can simply remove the whole _darcs
directory.

This is not the standard darcs user story, but I am convinced that there are
others working the same way. It is sort of messy at times, because I get lots
of upstream changes in between, but I find darcs very useful for this
approach.

For the time being, I'll stick with darcs1 for this sort of thing.
msg4081 (view) Author: droundy Date: 2008-03-28.20:38:05
On Fri, Mar 28, 2008 at 07:59:09PM -0000, Wolfgang Dobler wrote:
> Also, I think that reviewers are likely to use some largish tree as
> a data point.

At this point, I care very little what reviewers say.  It'd be nice to not
get more users.

> > with a very easy workaround: don't use --look-for-adds.
> 
> So you are saying that adding the files wouldn't be that incredibly
> slow and I should just get rid of my habit of running `darcs w -ls'
> before adding + recording? I'll try this -- knowing such a thing
> would already be very helpful.
>   Is it worth putting this in the manual? I could cook up a
> documentation patch at least. Or should this be in the Wiki somewhere
> once Darcs2 is out?

You never said you wanted to add the files to your repository.  That would
be another question.  I couldn't say how darcs will behave with this many
files in the repository.  Certainly adding them will be painful.
-- 
David Roundy
Department of Physics
Oregon State University
msg4079 (view) Author: wdobler Date: 2008-03-28.19:59:06
David Roundy wrote:
> Yes, I can reproduce this, and I can't think of anything you could
> do, unless you're willing to code up a special-case routine for
> darcs whatsnew -s.

If I knew Haskell, I probably would.

> I consider this a not-so-important regression, just because this
> is an extreme use case

Hmm, probably not that extreme. If darcs was written in a language like
Java rather than Haskell (don't panic, this is just a thought
experiment ;-), the source file count would probably be up by a factor
of 5 or so, maybe even more. So the slowdown could be quite significant.

Also, I think that reviewers are likely to use some largish tree as
a data point.

> with a very easy workaround: don't use --look-for-adds.

So you are saying that adding the files wouldn't be that incredibly
slow and I should just get rid of my habit of running `darcs w -ls'
before adding + recording? I'll try this -- knowing such a thing
would already be very helpful.
  Is it worth putting this in the manual? I could cook up a
documentation patch at least. Or should this be in the Wiki somewhere
once Darcs2 is out?

> I'm sorry, but optimizing this sort of extreme case just doesn't
> seem like a good use of my time.

I absolutely trust in you finding the optimal priorities for the
darcs project.

> The regression, I'm certain, came about as a result of a fix
> for a very subtle and hard-to-track-down sort of bug.

OK, knowing this also makes it easier to live with the current
state.

W o l f g a n g
msg4070 (view) Author: droundy Date: 2008-03-28.18:15:38
On Fri, Mar 28, 2008 at 05:23:55PM -0000, Wolfgang Dobler wrote:
> So can you reproduce the issue? Is there anything else I can do?

Yes, I can reproduce this, and I can't think of anything you could do,
unless you're willing to code up a special-case routine for darcs whatsnew
-s.

> While this is truly a `performance problem' and apparently doesn't bite for
> small trees, it is still a regression and I think it would be a pity to put off
> users with large trees because of a problem that mostly wasn't there in Darcs 1.x.

I consider this a not-so-important regression, just because this is an
extreme use case with a very easy workaround: don't use --look-for-adds.
I'm sorry, but optimizing this sort of extreme case just doesn't seem like
a good use of my time.  The regression, I'm certain, came about as a result
of a fix for a very subtle and hard-to-track-down sort of bug.
-- 
David Roundy
Department of Physics
Oregon State University
msg4069 (view) Author: markstos Date: 2008-03-28.18:11:06
> While this is truly a `performance problem' and apparently doesn't bite for
> small trees, it is still a regression and I think it would be a pity to put off
> users with large trees because of a problem that mostly wasn't there in Darcs 1.x.

You are right that'd we like this to work. Patches or other 
contributions to help the effort along directly or indirectly are welcome.

    Mark
msg4067 (view) Author: wdobler Date: 2008-03-28.17:23:54
So can you reproduce the issue? Is there anything else I can do?

While this is truly a `performance problem' and apparently doesn't bite for
small trees, it is still a regression and I think it would be a pity to put off
users with large trees because of a problem that mostly wasn't there in Darcs 1.x.
msg3841 (view) Author: wdobler Date: 2008-03-07.23:19:13
I have played a bit more with the tree, see below.

> In that case, perhaps you can tell me how many files are detected by whatsnew -ls?
6927 files

> I presume you have a huge number of files that aren't added.  Are any/many of
them large?
The largest ignored file is ~ 1.6 GB. There's 2 ignored files above 1GB, ~30
above 300kB, ~ 90 above 100kB.

> If there are lots of added files, than this issue could be the same as
http://bugs.darcs.net/issue79

Memory consumption of darcs 2 stagnates around 80kB (VIRT)  56m (RES) 4,2kB
(SHR), which is 3.0% of my RAM (2GB). darcs1 uses a maximum of 1.4%.
I don't think this easily explains a slow-down by ~60.

> A quick check involving 600 binary files and 1000 text files adding up to 3.6G
show a small performance regression from 1.1s to 2.5s.
That's fewer files than I had, but larger total size. So the problem is likely
the number of files, not file size.


The whole issue is highly nonlinear in the size of the tree. I have stripped
down the tree to a few subdirectories that have the largest effect, and now the
slowdown is around 3 to 4 only.
However, if I replicate the main subdirectory (i.e. double the problem), I find
a slowdown around 10.

In fact, you can easily reproduce this, using the darcs unstable repo:
prompt>  darcs get http://darcs.net/repos/unstable; cd unstable
prompt>  (for darcs in /usr/bin/darcs /usr/bin/darcs /usr/bin/darcs darcs2
darcs2 darcs2 ; do rm -rf _darcs ; $darcs init ; time $darcs w -ls > /dev/null;
done)
  0.524u 0.040s  0.56  99.98%
  0.512u 0.044s  0.55  99.82%
  0.576u 0.068s  0.64  99.52%
  0.544u 0.032s  0.57  99.91%
  0.460u 0.036s  0.49  100.00%
  0.456u 0.040s  0.49  100.00%

Not much of a difference, right? Now we increase the repo size:

prompt>  mkdir large; cp -r src large/src1; cp -r src large/src2

and get

  0.680u 0.052s  0.73  100.00%
  0.472u 0.028s  0.50  99.02%
  0.472u 0.012s  0.48  99.79%
  1.364u 0.052s  1.42  99.22%
  1.236u 0.060s  1.32  98.14%
  1.140u 0.036s  1.17  99.71%

Increasing again gives
prompt>  mkdir larger; cp -r large larger/large1; cp -r large larger/large2
  1.268u 0.064s  1.33  99.80%
  0.920u 0.052s  0.97  99.30%
  0.944u 0.036s  0.98  99.21%
  4.000u 0.104s  4.11  99.79%
  4.052u 0.076s  4.14  99.71%
  3.964u 0.072s  4.04  99.74%

And after the next iteration:
wdobler@frenesi:~/tmp/unstable> mkdir yetlarger; cp -r larger yetlarger/larger1;
cp -r larger yetlarger/larger2;
  2.484u 0.100s  2.59  99.76%
  2.360u 0.104s  2.47  99.67%
  2.408u 0.060s  2.47  99.79%
  17.701u 0.260s  18.11  99.17%
  17.833u 0.200s  18.16  99.30%
  18.033u 0.224s  18.28  99.87%
msg3816 (view) Author: droundy Date: 2008-03-07.15:24:17
No, Mark, it can't be issue79, since that isn't a regression and wdobling is
reporting a major slowdown relative to darcs-1.  On this other hand, he hasn't
given us enough information to reproduce this bug.

A quick check involving 600 binary files and 1000 text files adding up to 3.6G
show a small performance regression from 1.1s to 2.5s.  In view of that, I'll
relabel this as a performance problem, and remove it from FixForDarcs20.

David
msg3813 (view) Author: markstos Date: 2008-03-07.14:58:26
If there are lots of added files, than this issue could be the same as 
http://bugs.darcs.net/issue79

"whatsnew -ls loads complete contents of added files into memory"

That bug, while important, is not a regression, although perhaps the performance
of how the bug exposes itself might be.
msg3792 (view) Author: droundy Date: 2008-03-06.22:54:48
On Thu, Mar 06, 2008 at 10:36:24PM -0000, Wolfgang Dobler wrote:
> `darcs2 whatsnew -s' only takes a small fraction of a second.
> As does `darcs2 whatsnew' without any options.

In that case, perhaps you can tell me how many files are detected by
whatsnew -ls? Before I can get a clear picture of what's happening on your
system, I have to have some idea what darcs is doing.  I presume you have a
huge number of files that aren't added.  Are any/many of them large?
-- 
David Roundy
Department of Physics
Oregon State University
msg3788 (view) Author: wdobler Date: 2008-03-06.22:43:04
The output of `darcs2 w -ls --debug-verbose' is

  Beginning identifying repository .
  Done identifying repository .
  diffing dir...
  diffing dir...

..and then nothing until I interrupt (don't have time to wait for  or 7 minutes
right now).
msg3787 (view) Author: wdobler Date: 2008-03-06.22:36:23
`darcs2 whatsnew -s' only takes a small fraction of a second.
As does `darcs2 whatsnew' without any options.
msg3777 (view) Author: droundy Date: 2008-03-05.23:47:49
Oh, I didn't read this very carefully.  Does the answer change if you just run

darcs whatsnew -s

?
msg3776 (view) Author: wdobler Date: 2008-03-05.22:48:27
I still seem to get the same problem with the latest version obtained via

  darcs2 get http://darcs.net/repos/unstable-hashed

about an hour ago. Should the fix be part of that version?

FWIW, the latest patch included is

  Wed Mar  5 18:53:45 CET 2008  David Roundy <droundy@darcs.net>
  * resolve issue726: truncate progress messages to 80 chars.
msg3747 (view) Author: droundy Date: 2008-03-04.16:36:49
I believe this bug may now be fixed.  There were some strictness bugs introduced
that led many operations to be O(N) in the length of the history of the repository.

David
msg3684 (view) Author: markstos Date: 2008-02-28.03:52:35
Considering the significant nature of this regression, I nominate this issue to
be looked into further for the Darcs 2.0 release.

However, I understand the issue is largely with files *not* managed by darcs,
and this test case is sort of a "worst case scenario", because the tree is large
and no files are managed by darcs when the command is run. 

This wishlist item already in the system are related:

issue79: whatsnew -ls loads complete contents of added files into memory
msg3680 (view) Author: wdobler Date: 2008-02-27.21:55:15
I find that darcs2 takes ~ 60 times longer to do `darcs w -ls' on a largish
real-life SVN tree. Here's some timings:

  prompt>  rm -rf _darcs ; /usr/bin/darcs init ; time /usr/bin/darcs w -ls >
/dev/null
    6.912u 0.232s 0:07.22 98.8%     0+0k 0+0io 0pf+0w
    7.188u 0.296s 0:07.52 99.3%     0+0k 0+0io 0pf+0w

  prompt>  rm -rf _darcs ; darcs2 init [--hashed] ; time darcs2 w -ls > /dev/null
    396.504u 2.352s 6:52.75 96.6%   0+0k 0+0io 0pf+0w
    446.899u 2.480s 7:49.69 95.6%   0+0k 0+0io 1pf+0w
    386.548u 2.276s 6:37.59 97.7%   0+0k 0+0io 0pf+0w

  prompt>  rm -rf .hg ; hg init ; time hg add > /dev/null
    0.856u 0.244s 0:01.11 98.1%     0+0k 0+0io 0pf+0w
    0.888u 0.240s 0:01.11 100.9%    0+0k 0+0io 0pf+0w

My darcs versions:

  prompt> /usr/bin/darcs --version
    1.0.9rc1 (release candidate 1)

  prompt>  darcs2 --version
    2.0.0pre3 (unknown)
    [static binary downloaded from the web site]

The tree I am trying to import contains 6467 files (mostly Java + Python) of ~
42 MB in total.

The file sizes have the following statistics:

average size = 6.5kB
quartiles:
   0% (=min) :  2 B
  25%        : 712 B
  50%        : 1.8 kB  (= median)
  75%        : 4.8 kB
 100% (=max) : 1.6 MB

I cannot give access to the data, but if nobody else has found this type of
slowdown, I could try to synthesize directory trees with similar statistics.

Browse related patches: unstable | stable

History
Date User Action Args
2008-05-14 16:56:32koweysetnosy: + dagit
title: Darcs2 performance problem with `darcs whatsnew' -> whatsnew -ls is very slow (note the --look-for-adds)
2008-03-28 21:25:45markstossetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg4085
2008-03-28 21:22:22wdoblersetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg4084
2008-03-28 20:38:07droundysetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg4081
2008-03-28 19:59:09wdoblersetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg4079
2008-03-28 18:15:41droundysetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg4070
2008-03-28 18:11:08markstossetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg4069
2008-03-28 17:23:55wdoblersetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg4067
2008-03-07 23:19:15wdoblersetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3841
2008-03-07 15:24:20droundysettopic: - Regression, ReleaseCritical_2.1
nosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3816
title: Darcs2 performance regression with `darcs whatsnew' -> Darcs2 performance problem with `darcs whatsnew'
2008-03-07 14:58:27markstossetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3813
2008-03-06 22:54:49droundysetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3792
2008-03-06 22:43:05wdoblersetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3788
2008-03-06 22:36:24wdoblersetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3787
2008-03-05 23:47:50droundysetnosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3777
2008-03-05 22:48:28wdoblersetstatus: resolved -> chatting
nosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3776
2008-03-04 16:36:51droundysetstatus: chatting -> resolved
nosy: droundy, tommy, beschmi, kowey, markstos, wdobler
messages: + msg3747
2008-02-28 03:52:42markstossetstatus: unread -> chatting
topic: + ReleaseCritical_2.1
messages: + msg3684
nosy: + markstos
2008-02-27 21:57:23wdoblersetnosy: droundy, tommy, beschmi, kowey, wdobler
title: Darcs2 performance regression with `darcs whatis' -> Darcs2 performance regression with `darcs whatsnew'
2008-02-27 21:55:21wdoblercreate