Created on 2008-02-27.21:55:21 by wdobler, last changed 2009-08-27.14:18:02 by admin.
msg3680 (view) |
Author: wdobler |
Date: 2008-02-27.21:55:15 |
|
I find that darcs2 takes ~ 60 times longer to do `darcs w -ls' on a largish
real-life SVN tree. Here's some timings:
prompt> rm -rf _darcs ; /usr/bin/darcs init ; time /usr/bin/darcs w -ls >
/dev/null
6.912u 0.232s 0:07.22 98.8% 0+0k 0+0io 0pf+0w
7.188u 0.296s 0:07.52 99.3% 0+0k 0+0io 0pf+0w
prompt> rm -rf _darcs ; darcs2 init [--hashed] ; time darcs2 w -ls > /dev/null
396.504u 2.352s 6:52.75 96.6% 0+0k 0+0io 0pf+0w
446.899u 2.480s 7:49.69 95.6% 0+0k 0+0io 1pf+0w
386.548u 2.276s 6:37.59 97.7% 0+0k 0+0io 0pf+0w
prompt> rm -rf .hg ; hg init ; time hg add > /dev/null
0.856u 0.244s 0:01.11 98.1% 0+0k 0+0io 0pf+0w
0.888u 0.240s 0:01.11 100.9% 0+0k 0+0io 0pf+0w
My darcs versions:
prompt> /usr/bin/darcs --version
1.0.9rc1 (release candidate 1)
prompt> darcs2 --version
2.0.0pre3 (unknown)
[static binary downloaded from the web site]
The tree I am trying to import contains 6467 files (mostly Java + Python) of ~
42 MB in total.
The file sizes have the following statistics:
average size = 6.5kB
quartiles:
0% (=min) : 2 B
25% : 712 B
50% : 1.8 kB (= median)
75% : 4.8 kB
100% (=max) : 1.6 MB
I cannot give access to the data, but if nobody else has found this type of
slowdown, I could try to synthesize directory trees with similar statistics.
|
msg3684 (view) |
Author: markstos |
Date: 2008-02-28.03:52:35 |
|
Considering the significant nature of this regression, I nominate this issue to
be looked into further for the Darcs 2.0 release.
However, I understand the issue is largely with files *not* managed by darcs,
and this test case is sort of a "worst case scenario", because the tree is large
and no files are managed by darcs when the command is run.
This wishlist item already in the system are related:
issue79: whatsnew -ls loads complete contents of added files into memory
|
msg3747 (view) |
Author: droundy |
Date: 2008-03-04.16:36:49 |
|
I believe this bug may now be fixed. There were some strictness bugs introduced
that led many operations to be O(N) in the length of the history of the repository.
David
|
msg3776 (view) |
Author: wdobler |
Date: 2008-03-05.22:48:27 |
|
I still seem to get the same problem with the latest version obtained via
darcs2 get http://darcs.net/repos/unstable-hashed
about an hour ago. Should the fix be part of that version?
FWIW, the latest patch included is
Wed Mar 5 18:53:45 CET 2008 David Roundy <droundy@darcs.net>
* resolve issue726: truncate progress messages to 80 chars.
|
msg3777 (view) |
Author: droundy |
Date: 2008-03-05.23:47:49 |
|
Oh, I didn't read this very carefully. Does the answer change if you just run
darcs whatsnew -s
?
|
msg3787 (view) |
Author: wdobler |
Date: 2008-03-06.22:36:23 |
|
`darcs2 whatsnew -s' only takes a small fraction of a second.
As does `darcs2 whatsnew' without any options.
|
msg3788 (view) |
Author: wdobler |
Date: 2008-03-06.22:43:04 |
|
The output of `darcs2 w -ls --debug-verbose' is
Beginning identifying repository .
Done identifying repository .
diffing dir...
diffing dir...
..and then nothing until I interrupt (don't have time to wait for or 7 minutes
right now).
|
msg3792 (view) |
Author: droundy |
Date: 2008-03-06.22:54:48 |
|
On Thu, Mar 06, 2008 at 10:36:24PM -0000, Wolfgang Dobler wrote:
> `darcs2 whatsnew -s' only takes a small fraction of a second.
> As does `darcs2 whatsnew' without any options.
In that case, perhaps you can tell me how many files are detected by
whatsnew -ls? Before I can get a clear picture of what's happening on your
system, I have to have some idea what darcs is doing. I presume you have a
huge number of files that aren't added. Are any/many of them large?
--
David Roundy
Department of Physics
Oregon State University
|
msg3813 (view) |
Author: markstos |
Date: 2008-03-07.14:58:26 |
|
If there are lots of added files, than this issue could be the same as
http://bugs.darcs.net/issue79
"whatsnew -ls loads complete contents of added files into memory"
That bug, while important, is not a regression, although perhaps the performance
of how the bug exposes itself might be.
|
msg3816 (view) |
Author: droundy |
Date: 2008-03-07.15:24:17 |
|
No, Mark, it can't be issue79, since that isn't a regression and wdobling is
reporting a major slowdown relative to darcs-1. On this other hand, he hasn't
given us enough information to reproduce this bug.
A quick check involving 600 binary files and 1000 text files adding up to 3.6G
show a small performance regression from 1.1s to 2.5s. In view of that, I'll
relabel this as a performance problem, and remove it from FixForDarcs20.
David
|
msg3841 (view) |
Author: wdobler |
Date: 2008-03-07.23:19:13 |
|
I have played a bit more with the tree, see below.
> In that case, perhaps you can tell me how many files are detected by whatsnew -ls?
6927 files
> I presume you have a huge number of files that aren't added. Are any/many of
them large?
The largest ignored file is ~ 1.6 GB. There's 2 ignored files above 1GB, ~30
above 300kB, ~ 90 above 100kB.
> If there are lots of added files, than this issue could be the same as
http://bugs.darcs.net/issue79
Memory consumption of darcs 2 stagnates around 80kB (VIRT) 56m (RES) 4,2kB
(SHR), which is 3.0% of my RAM (2GB). darcs1 uses a maximum of 1.4%.
I don't think this easily explains a slow-down by ~60.
> A quick check involving 600 binary files and 1000 text files adding up to 3.6G
show a small performance regression from 1.1s to 2.5s.
That's fewer files than I had, but larger total size. So the problem is likely
the number of files, not file size.
The whole issue is highly nonlinear in the size of the tree. I have stripped
down the tree to a few subdirectories that have the largest effect, and now the
slowdown is around 3 to 4 only.
However, if I replicate the main subdirectory (i.e. double the problem), I find
a slowdown around 10.
In fact, you can easily reproduce this, using the darcs unstable repo:
prompt> darcs get http://darcs.net/repos/unstable; cd unstable
prompt> (for darcs in /usr/bin/darcs /usr/bin/darcs /usr/bin/darcs darcs2
darcs2 darcs2 ; do rm -rf _darcs ; $darcs init ; time $darcs w -ls > /dev/null;
done)
0.524u 0.040s 0.56 99.98%
0.512u 0.044s 0.55 99.82%
0.576u 0.068s 0.64 99.52%
0.544u 0.032s 0.57 99.91%
0.460u 0.036s 0.49 100.00%
0.456u 0.040s 0.49 100.00%
Not much of a difference, right? Now we increase the repo size:
prompt> mkdir large; cp -r src large/src1; cp -r src large/src2
and get
0.680u 0.052s 0.73 100.00%
0.472u 0.028s 0.50 99.02%
0.472u 0.012s 0.48 99.79%
1.364u 0.052s 1.42 99.22%
1.236u 0.060s 1.32 98.14%
1.140u 0.036s 1.17 99.71%
Increasing again gives
prompt> mkdir larger; cp -r large larger/large1; cp -r large larger/large2
1.268u 0.064s 1.33 99.80%
0.920u 0.052s 0.97 99.30%
0.944u 0.036s 0.98 99.21%
4.000u 0.104s 4.11 99.79%
4.052u 0.076s 4.14 99.71%
3.964u 0.072s 4.04 99.74%
And after the next iteration:
wdobler@frenesi:~/tmp/unstable> mkdir yetlarger; cp -r larger yetlarger/larger1;
cp -r larger yetlarger/larger2;
2.484u 0.100s 2.59 99.76%
2.360u 0.104s 2.47 99.67%
2.408u 0.060s 2.47 99.79%
17.701u 0.260s 18.11 99.17%
17.833u 0.200s 18.16 99.30%
18.033u 0.224s 18.28 99.87%
|
msg4067 (view) |
Author: wdobler |
Date: 2008-03-28.17:23:54 |
|
So can you reproduce the issue? Is there anything else I can do?
While this is truly a `performance problem' and apparently doesn't bite for
small trees, it is still a regression and I think it would be a pity to put off
users with large trees because of a problem that mostly wasn't there in Darcs 1.x.
|
msg4069 (view) |
Author: markstos |
Date: 2008-03-28.18:11:06 |
|
> While this is truly a `performance problem' and apparently doesn't bite for
> small trees, it is still a regression and I think it would be a pity to put off
> users with large trees because of a problem that mostly wasn't there in Darcs 1.x.
You are right that'd we like this to work. Patches or other
contributions to help the effort along directly or indirectly are welcome.
Mark
|
msg4070 (view) |
Author: droundy |
Date: 2008-03-28.18:15:38 |
|
On Fri, Mar 28, 2008 at 05:23:55PM -0000, Wolfgang Dobler wrote:
> So can you reproduce the issue? Is there anything else I can do?
Yes, I can reproduce this, and I can't think of anything you could do,
unless you're willing to code up a special-case routine for darcs whatsnew
-s.
> While this is truly a `performance problem' and apparently doesn't bite for
> small trees, it is still a regression and I think it would be a pity to put off
> users with large trees because of a problem that mostly wasn't there in Darcs 1.x.
I consider this a not-so-important regression, just because this is an
extreme use case with a very easy workaround: don't use --look-for-adds.
I'm sorry, but optimizing this sort of extreme case just doesn't seem like
a good use of my time. The regression, I'm certain, came about as a result
of a fix for a very subtle and hard-to-track-down sort of bug.
--
David Roundy
Department of Physics
Oregon State University
|
msg4079 (view) |
Author: wdobler |
Date: 2008-03-28.19:59:06 |
|
David Roundy wrote:
> Yes, I can reproduce this, and I can't think of anything you could
> do, unless you're willing to code up a special-case routine for
> darcs whatsnew -s.
If I knew Haskell, I probably would.
> I consider this a not-so-important regression, just because this
> is an extreme use case
Hmm, probably not that extreme. If darcs was written in a language like
Java rather than Haskell (don't panic, this is just a thought
experiment ;-), the source file count would probably be up by a factor
of 5 or so, maybe even more. So the slowdown could be quite significant.
Also, I think that reviewers are likely to use some largish tree as
a data point.
> with a very easy workaround: don't use --look-for-adds.
So you are saying that adding the files wouldn't be that incredibly
slow and I should just get rid of my habit of running `darcs w -ls'
before adding + recording? I'll try this -- knowing such a thing
would already be very helpful.
Is it worth putting this in the manual? I could cook up a
documentation patch at least. Or should this be in the Wiki somewhere
once Darcs2 is out?
> I'm sorry, but optimizing this sort of extreme case just doesn't
> seem like a good use of my time.
I absolutely trust in you finding the optimal priorities for the
darcs project.
> The regression, I'm certain, came about as a result of a fix
> for a very subtle and hard-to-track-down sort of bug.
OK, knowing this also makes it easier to live with the current
state.
W o l f g a n g
|
msg4081 (view) |
Author: droundy |
Date: 2008-03-28.20:38:05 |
|
On Fri, Mar 28, 2008 at 07:59:09PM -0000, Wolfgang Dobler wrote:
> Also, I think that reviewers are likely to use some largish tree as
> a data point.
At this point, I care very little what reviewers say. It'd be nice to not
get more users.
> > with a very easy workaround: don't use --look-for-adds.
>
> So you are saying that adding the files wouldn't be that incredibly
> slow and I should just get rid of my habit of running `darcs w -ls'
> before adding + recording? I'll try this -- knowing such a thing
> would already be very helpful.
> Is it worth putting this in the manual? I could cook up a
> documentation patch at least. Or should this be in the Wiki somewhere
> once Darcs2 is out?
You never said you wanted to add the files to your repository. That would
be another question. I couldn't say how darcs will behave with this many
files in the repository. Certainly adding them will be painful.
--
David Roundy
Department of Physics
Oregon State University
|
msg4084 (view) |
Author: wdobler |
Date: 2008-03-28.21:22:21 |
|
> You never said you wanted to add the files to your repository.
> That would be another question. I couldn't say how darcs will behave
> with this many files in the repository. Certainly adding them will
> be painful.
It wasn't that bad the last time I did so with darcs1. Although, IIRC, I
took a mental note not to add the whole tree at once, but to proceed
one subdirectory at a time.
Just to give you an idea what I am trying to do here: We have this largish
Java project that is under SVN. When I am working on some complex new feature,
I often don't want to commit my wild experiments to the SVN repository.
But I am hooked on Revision Control and terribly afraid of losing something
that worked at one time.
Enter darcs. I `darcs init', adjust the boring file (this is where I use
`whatsnew -ls' a lot), then add everything and record.
Then I work, comitting to darcs every now and then, and only comitting to
the central SVN repo once I think a (sub-)feature makes sense. When I am done
and everything is committed to SVN, I can simply remove the whole _darcs
directory.
This is not the standard darcs user story, but I am convinced that there are
others working the same way. It is sort of messy at times, because I get lots
of upstream changes in between, but I find darcs very useful for this
approach.
For the time being, I'll stick with darcs1 for this sort of thing.
|
msg4085 (view) |
Author: markstos |
Date: 2008-03-28.21:25:44 |
|
> This is not the standard darcs user story, but I am convinced that there are
> others working the same way. It is sort of messy at times, because I get lots
> of upstream changes in between, but I find darcs very useful for this
> approach.
I sometimes use darcs in a similar ad-hoc way, but for smaller projects
where the issue is not triggered.
Mark
|
msg6548 (view) |
Author: ganesh |
Date: 2008-10-31.08:38:56 |
|
I've been doing some investigation of this bug. So far I've focused on the
situation where there are a lot of files in a single directory, though much of
the work I've done there may well apply to your directory structures too.
I've made several speedups, including removing several quadratic slowdowns,
though none of them have made it into mainline darcs yet due to a backlog from
the darcs sprint (which is where I started this work too).
In the meantime, if you feel like trying the branch at
http://code.haskell.org/darcs/sprint-slurpy/, I'd be interested to hear how it
behaves for you.
It'd also be useful for me to have a synthesised tree as you offer in your
original report, though I suspect that testing with very artificial trees as
I've been doing so far is probably good enough.
|
msg6556 (view) |
Author: wdobler |
Date: 2008-11-02.22:30:47 |
|
Ganesh, your patches are definitively a huge step forwards.
But before showing the figures, I must mention that the extreme slowdown I
reported... has gone. And I don't know why.
I now get about 20 seconds, rather than 6 or 7 minutes for the same repo. What
has changed on the system is
(a) an upgrade from Ubuntu Dapper (6.06) to Hardy (8.04), which obviously
changed many libraries, and
(b) the file system changed from ext3 to xfs.
At some point, I though that the problem was probably related to the GHC bug
that led to slow execution of some darcs commands, but I think I am still using
the same versions as before, so that doesn't make that much sense.
Anyway, here are the numbers I get on the tree in question, measured using
rm -rf _darcs ; darcs init ; time darcs w -ls > /dev/null
three times and reporting only the best timing (they were all comparable).
A. On a desktop machine with reasonably fast hard drive
1.0.9 (release):
11.052u 0.304s 0:11.35 100.0% 0+0k 0+8io 0pf+0w
2.0.0 (unknown):
21.417u 0.424s 0:21.86 99.8% 0+0k 0+0io 0pf+0w
2.1.0 (release):
25.097u 1.488s 0:25.23 105.3% 0+0k 0+0io 0pf+0w
2.1.0 (+ 276 patches):
5.380u 0.244s 0:05.51 101.9% 0+0k 0+0io 0pf+0w
B. On a laptop
1.0.9 (release)
26.393u 0.780s 0:27.19 99.9% 0+0k 0+8io 0pf+0w
2.0.1rc2 (2.0.1rc2 (+ -1 patch)):
90.353u 2.276s 1:31.67 101.0% 0+0k 0+0io 0pf+0w
2.1.0 (release):
89.977u 2.860s 1:31.98 100.9% 0+0k 0+8io 0pf+0w
2.1.0 (+ 276 patches):
12.732u 1.060s 0:12.99 106.1% 0+0k 0+8io 0pf+0w
So your patches really save the day, making `darcs w -ls' faster by a factor of
between close to 5 and 7!
|
msg6578 (view) |
Author: ganesh |
Date: 2008-11-03.22:59:43 |
|
Wolfgang: great to know it helps in your situation. Most/all of the work is in
unstable now, btw.
All: Should we close this bug? We've certainly stopped whatsnew -ls being "very
slow", but David did allude to making a fast path for whatsnew -s that I didn't
quite understand the details of. Perhaps we should open a new ticket for that if
we can record what would need to be done.
|
msg6591 (view) |
Author: markstos |
Date: 2008-11-04.15:48:32 |
|
Marking as "resolved-in-unstable" as ganesh reports that it is.
Feel free to re-open if you see the issue is still present.
|
|
Date |
User |
Action |
Args |
2008-02-27 21:55:21 | wdobler | create | |
2008-02-27 21:57:23 | wdobler | set | nosy:
droundy, tommy, beschmi, kowey, wdobler title: Darcs2 performance regression with `darcs whatis' -> Darcs2 performance regression with `darcs whatsnew' |
2008-02-28 03:52:42 | markstos | set | status: unread -> unknown topic:
+ Target-2.0 messages:
+ msg3684 nosy:
+ markstos |
2008-03-04 16:36:51 | droundy | set | status: unknown -> resolved nosy:
droundy, tommy, beschmi, kowey, markstos, wdobler messages:
+ msg3747 |
2008-03-05 22:48:28 | wdobler | set | status: resolved -> unknown nosy:
droundy, tommy, beschmi, kowey, markstos, wdobler messages:
+ msg3776 |
2008-03-05 23:47:50 | droundy | set | nosy:
droundy, tommy, beschmi, kowey, markstos, wdobler messages:
+ msg3777 |
2008-03-06 22:36:24 | wdobler | set | nosy:
droundy, tommy, beschmi, kowey, markstos, wdobler messages:
+ msg3787 |
2008-03-06 22:43:05 | wdobler | set | nosy:
droundy, tommy, beschmi, kowey, markstos, wdobler messages:
+ msg3788 |
2008-03-06 22:54:49 | droundy | set | nosy:
droundy, tommy, beschmi, kowey, markstos, wdobler messages:
+ msg3792 |
2008-03-07 14:58:27 | markstos | set | nosy:
droundy, tommy, beschmi, kowey, markstos, wdobler messages:
+ msg3813 |
2008-03-07 15:24:20 | droundy | set | topic:
- Regression, Target-2.0 nosy:
droundy, tommy, beschmi, kowey, markstos, wdobler messages:
+ msg3816 title: Darcs2 performance regression with `darcs whatsnew' -> Darcs2 performance problem with `darcs whatsnew' |
2008-03-07 23:19:15 | wdobler | set | nosy:
droundy, tommy, beschmi, kowey, markstos, wdobler messages:
+ msg3841 |
2008-03-28 17:23:55 | wdobler | set | nosy:
droundy, tommy, beschmi, kowey, markstos, wdobler messages:
+ msg4067 |
2008-03-28 18:11:08 | markstos | set | nosy:
droundy, tommy, beschmi, kowey, markstos, wdobler messages:
+ msg4069 |
2008-03-28 18:15:41 | droundy | set | nosy:
droundy, tommy, beschmi, kowey, markstos, wdobler messages:
+ msg4070 |
2008-03-28 19:59:09 | wdobler | set | nosy:
droundy, tommy, beschmi, kowey, markstos, wdobler messages:
+ msg4079 |
2008-03-28 20:38:07 | droundy | set | nosy:
droundy, tommy, beschmi, kowey, markstos, wdobler messages:
+ msg4081 |
2008-03-28 21:22:22 | wdobler | set | nosy:
droundy, tommy, beschmi, kowey, markstos, wdobler messages:
+ msg4084 |
2008-03-28 21:25:45 | markstos | set | nosy:
droundy, tommy, beschmi, kowey, markstos, wdobler messages:
+ msg4085 |
2008-05-14 16:56:32 | kowey | set | nosy:
+ dagit title: Darcs2 performance problem with `darcs whatsnew' -> whatsnew -ls is very slow (note the --look-for-adds) |
2008-10-31 08:38:58 | ganesh | set | status: unknown -> has-patch nosy:
+ dmitry.kurochkin, ganesh, simon, thorkilnaur messages:
+ msg6548 assignedto: ganesh |
2008-11-02 22:30:50 | wdobler | set | nosy:
droundy, tommy, beschmi, kowey, markstos, dagit, ganesh, simon, thorkilnaur, wdobler, dmitry.kurochkin messages:
+ msg6556 |
2008-11-03 22:59:45 | ganesh | set | nosy:
droundy, tommy, beschmi, kowey, markstos, dagit, ganesh, simon, thorkilnaur, wdobler, dmitry.kurochkin messages:
+ msg6578 |
2008-11-04 15:48:34 | markstos | set | status: has-patch -> resolved-in-unstable nosy:
droundy, tommy, beschmi, kowey, markstos, dagit, ganesh, simon, thorkilnaur, wdobler, dmitry.kurochkin messages:
+ msg6591 |
2009-04-22 03:28:21 | twb | set | status: resolved-in-unstable -> resolved nosy:
droundy, tommy, beschmi, kowey, markstos, dagit, ganesh, simon, thorkilnaur, wdobler, dmitry.kurochkin |
2009-08-06 17:55:08 | admin | set | nosy:
+ jast, Serware, darcs-devel, zooko, mornfall, - droundy, ganesh, wdobler |
2009-08-06 20:59:04 | admin | set | nosy:
- beschmi |
2009-08-10 22:16:22 | admin | set | nosy:
+ wdobler, ganesh, - darcs-devel, zooko, jast, Serware, mornfall |
2009-08-11 00:07:47 | admin | set | nosy:
- dagit |
2009-08-25 17:33:10 | admin | set | nosy:
+ darcs-devel, - simon |
2009-08-27 14:18:02 | admin | set | nosy:
tommy, kowey, markstos, darcs-devel, ganesh, thorkilnaur, wdobler, dmitry.kurochkin |
|