Issue 2405 patch index creation consumes too much memory

Title patch index creation consumes too much memory
Priority Status resolved
Milestone 2.10.0 Resolved in
Superseder Nosy List alex.aegf, beschmi, bsrkaditya, darcs-devel, gh
Assigned To
Topics PatchIndex, Performance

Created on 2014-07-30.20:10:49 by gh, last changed 2015-02-12.17:47:24 by gh.

msg17636 (view) Author: gh Date: 2014-07-30.20:10:46
I obtained the following by running `darcs optimize enable-patch-index`
on a local mirror of http://darcs.net .

With darcs built with optimizations (ie normally):


Without optimizations (the command terminates with a stack overflow,
which made me discover the problem):


To generate these files, compile darcs with profiling enabled, then
delete _darcs/patch_index , then run `darcs optimize enable-patch-index
 +RTS -xt -hy`, then `hp2ps -c darcs.hp && ps2pdf darcs.ps && evince
darcs.pdf` .

It would be nice to see where the main memory consumption comes from,
and reduce it.
msg17652 (view) Author: gh Date: 2014-08-08.20:24:00
I used ghc 7.6.3 to obtain the above data and I've been told that 7.8
was nicer with regards to stack overflows. Anyway memory consumption
would remain high.
msg17753 (view) Author: gh Date: 2014-11-05.20:36:02
More data:

msg17755 (view) Author: gh Date: 2014-11-06.22:16:27
The simplest way to check memory usage is to compile darcs with the flag
rts:   cabal install -frts

Then run with darcs optimize enable-patch-index +RTS -hT

And look into the generated memory report: hp2ps -c darcs.hp && ps2pdf
darcs.ps && evince darcs.pdf

Or generate runtime statistics with +RTS -s .
msg17800 (view) Author: ganesh Date: 2014-11-15.12:59:27
I've been playing around with this a bit, informed by some of the things Guillaume 

Broadly it seems that the problem is that the entire list of [(PatchId, [PatchMod 
FileName])] is being constructed and then used to build the PatchIndex, rather than 
being produced and consumed incrementally.

I think that the list of patches from the repository is itself parsed lazily (otherwise 
all darcs commands that operate on PatchSet would read the entire set into memory), so 
the problem must be coming from the patch index code itself.

I can see a few potential reasons:
 - the set of FileNames that patches2PatchMods passes around and the related use of 
mapAccumL, which may be introducing an unintentional dependency on calculating the 
entire output list before producing any results
 - the two uses of pmods in createPatchIndexFrom (pids and mods) which may mean it's 
kept in memory for the second use until the first one is done
 - the nubSeq in applyPatchMods and the way the state monad is used - these *should* be 
incremental in themselves, but it makes it a bit hard to untangle the logic

Logically, it should always be possible to build the patch index up one patch at a time 
- otherwise updatePatchIndex wouldn't work properly. However the code isn't quite 
structured that way, so I'd like to simplify it to bring us closer to that point.

Another way of improving memory usage substantially might be to make sure that the 
strings and bytestrings in the list are all shared with each other via hash-
consing/interning. For example lots of filenames must be repeated a lot. However this 
seems like a bit fiddly to implement (you need to manage a cache) and might interfere 
with laziness.
msg18088 (view) Author: gh Date: 2015-02-12.17:47:23
I'm going to accept a series of patches by Ganesh and me that solve the
memory leak, so this issue will be fixed.

This does not guarantee that it will be activated by default in 2.10
since it uses up time to build anyway, but I'll open another ticket for
that topic.
Date User Action Args
2014-07-30 20:10:49ghcreate
2014-07-30 20:11:11ghsetnosy: + darcs-devel
2014-07-30 20:11:23ghsetnosy: + bsrkaditya
2014-08-08 20:24:01ghsetmessages: + msg17652
2014-11-05 20:36:03ghsetmessages: + msg17753
2014-11-06 21:37:06ghsetnosy: + beschmi
assignedto: alex.aegf ->
2014-11-06 22:16:29ghsetmessages: + msg17755
2014-11-15 12:59:30ganeshsetmessages: + msg17800
2014-11-15 14:31:13ganeshlinkpatch1224 issues
2014-11-24 18:51:18ganeshlinkpatch1234 issues
2014-12-12 17:25:14ghsetstatus: unknown -> has-patch
2015-02-12 17:47:24ghsetstatus: has-patch -> resolved
messages: + msg18088