darcs

Issue 1762 memory leak in whatsnew --look-for-adds

Title memory leak in whatsnew --look-for-adds
Priority bug Status duplicate
Milestone Resolved in
Superseder whatsnew -l reading files it should not (2.4)
View: 1746
Nosy List darcs-devel, dmitry.kurochkin, eivuokko, hoijarvi, jaredj, kirby, kowey, mornfall, quick, wglozer
Assigned To mornfall
Topics Performance, Regression

Created on 2010-03-10.19:17:46 by hoijarvi, last changed 2010-03-22.12:25:48 by kowey.

Messages
msg10155 (view) Author: hoijarvi Date: 2010-03-10.19:17:41
darcs whatsnew -s --look-for-adds

accumulates memory and may exhaust it.

this repository used to work fine:
30 MB pristine 
4 MB of patches
half million boring files
format = darcs 1

I don't have darcs-2 repositories this big for testing right now.

This kills TortoiseDarcs.
msg10156 (view) Author: kowey Date: 2010-03-10.19:24:26
Kari: could I confirm the format?  Is old-fashioned or hashed?

Thanks! (I suspect this is not limited to Windows; I just experienced
some whatsnew -sl pain on Linux with a symlinked directory that's pretty
huge)
msg10158 (view) Author: hoijarvi Date: 2010-03-10.20:17:50
It's old-fashioned. Created with darcs 1.0.9 and has been working fine
with up to 2.3

I'm not converting every repo at once, this is one of my main things so
I'm starting with less critical ones.
msg10161 (view) Author: kowey Date: 2010-03-11.11:58:15
Would it be possible to see what happens performance-wise when you
upgrade the repository to darcs-1 hashed format (darcs optimize --upgrade)?

If you're reluctant to make a copy of your gigantic working dir, it
should suffice to backup just the _darcs dir.
msg10166 (view) Author: hoijarvi Date: 2010-03-11.15:48:58
One more thing: those half million files were NOT marked boring, they
should have been. They're cached .png images. So there's more stuff in
this update. Please read carefully. 

I ran the convert. The format file now contains 

hashed
darcs-2

When I run darcs 2.3.1 and 2.4 parallel, in the beginning 2.4 memory
consumption grows slowly like in 2.3.1 or 1.0.9. After a while, 2.4
starts to consume memory very rapidly, until it goes out of memory. In
smaller repositories it actually finishes, so it's not in an infinite
recursion I assume.

The memory consumption for darcs 2.3.1 is also big, 600 MB, and CPU time
excessive: 16 minutes. 1.0.9 is way slower, 350 MB, 35 minutes and still
hasn't finished.

Then I marked .png files as boring. The result is expexted: 2.3.1 uses
only 50 MB of memory and finishes in one minute CPU. 1.0.9 is similar.

BUT: 2.4 still crashes with out of memory. I can understand, that 2.3
didn't store half million filenames and therefore didn't need as much
memory. Even with boring files, that should just be ignored, this happens.
msg10167 (view) Author: kowey Date: 2010-03-11.15:56:48
Thanks, Kari.

I think we now have enough info for me to pass this on to Petr to have a
look (now that we know what happens with a hashed repository).

BTW, you can get a lot of benefits by just upgrading to the darcs-1
hashed format; no need to jump to darcs-2 for these sort of performance
problems.  Unfortunately, we haven't been very effective at
communicating this very effectively yet.  The end result is lots of
folks needlessly going through the inconvenience of running "darcs
convert" when "darcs optimize --upgrade" would really have gotten them
quite far. :-(
msg10168 (view) Author: quick Date: 2010-03-11.16:18:28
Sounds similar to Issue 1746.

On Thu, 11 Mar 2010 08:49:01 -0700, Kari Hoijarvi <bugs@darcs.net> wrote:

>
> Kari Hoijarvi <hoijarvi@seas.wustl.edu> added the comment:
>
> One more thing: those half million files were NOT marked boring, they
> should have been. They're cached .png images. So there's more stuff in
> this update. Please read carefully.
>
> I ran the convert. The format file now contains
>
> hashed
> darcs-2
>
> When I run darcs 2.3.1 and 2.4 parallel, in the beginning 2.4 memory
> consumption grows slowly like in 2.3.1 or 1.0.9. After a while, 2.4
> starts to consume memory very rapidly, until it goes out of memory. In
> smaller repositories it actually finishes, so it's not in an infinite
> recursion I assume.
>
> The memory consumption for darcs 2.3.1 is also big, 600 MB, and CPU time
> excessive: 16 minutes. 1.0.9 is way slower, 350 MB, 35 minutes and still
> hasn't finished.
>
> Then I marked .png files as boring. The result is expexted: 2.3.1 uses
> only 50 MB of memory and finishes in one minute CPU. 1.0.9 is similar.
>
> BUT: 2.4 still crashes with out of memory. I can understand, that 2.3
> didn't store half million filenames and therefore didn't need as much
> memory. Even with boring files, that should just be ignored, this happens.
>
> __________________________________
> Darcs bug tracker <bugs@darcs.net>
> <http://bugs.darcs.net/issue1762>
> __________________________________
> _______________________________________________
> darcs-devel mailing list (AUTOMATIC POSTINGS ONLY PLEASE!)
> darcs-devel@darcs.net
> http://lists.osuosl.org/mailman/listinfo/darcs-devel
>


-- 
-KQ
msg10408 (view) Author: kowey Date: 2010-03-22.12:25:44
This does look like it might be a duplicate of issue1746 - performance
regression in which whatsnew --summary does too much work.  

Thanks, Kevin!  This sort of thing helps me to come to keep a handle on
the BTS
History
Date User Action Args
2010-03-10 19:17:47hoijarvicreate
2010-03-10 19:24:32koweysetstatus: unknown -> waiting-for
nosy: + kowey
topic: + Performance, - Windows
messages: + msg10156
assignedto: hoijarvi
2010-03-10 20:17:52hoijarvisetmessages: + msg10158
2010-03-11 11:58:17koweysetmessages: + msg10161
2010-03-11 15:49:01hoijarvisetmessages: + msg10166
2010-03-11 15:56:50koweysetstatus: waiting-for -> needs-reproduction
nosy: + mornfall
messages: + msg10167
assignedto: hoijarvi -> mornfall
2010-03-11 16:18:30quicksetnosy: + quick
messages: + msg10168
2010-03-22 12:25:48koweysetstatus: needs-reproduction -> duplicate
messages: + msg10408
superseder: + whatsnew -l reading files it should not (2.4)