Issue 2405: patch index creation consumes too much memory - Darcs bug tracker

Title	patch index creation consumes too much memory
Priority		Status	resolved
Milestone	2.10.0	Resolved in
Superseder		Nosy List	alex.aegf, beschmi, bsrkaditya, darcs-devel, gh
Assigned To		Topics	PatchIndex, Performance

Created on 2014-07-30.20:10:49 by gh, last changed 2015-02-12.17:47:24 by gh.

Messages
msg17636 (view)	Author: gh	Date: 2014-07-30.20:10:46
I obtained the following by running `darcs optimize enable-patch-index` on a local mirror of http://darcs.net . With darcs built with optimizations (ie normally): https://dl.dropboxusercontent.com/u/6239815/patchindex_creation_optimized.pdf Without optimizations (the command terminates with a stack overflow, which made me discover the problem): https://dl.dropboxusercontent.com/u/6239815/patchindex_creation_not_optimized.pdf To generate these files, compile darcs with profiling enabled, then delete _darcs/patch_index , then run `darcs optimize enable-patch-index +RTS -xt -hy`, then `hp2ps -c darcs.hp && ps2pdf darcs.ps && evince darcs.pdf` . It would be nice to see where the main memory consumption comes from, and reduce it.
msg17652 (view)	Author: gh	Date: 2014-08-08.20:24:00
I used ghc 7.6.3 to obtain the above data and I've been told that 7.8 was nicer with regards to stack overflows. Anyway memory consumption would remain high.
msg17753 (view)	Author: gh	Date: 2014-11-05.20:36:02
More data: https://dl.dropboxusercontent.com/u/6239815/patchindex_creation_optimized_hm.pdf
msg17755 (view)	Author: gh	Date: 2014-11-06.22:16:27
The simplest way to check memory usage is to compile darcs with the flag rts: cabal install -frts Then run with darcs optimize enable-patch-index +RTS -hT And look into the generated memory report: hp2ps -c darcs.hp && ps2pdf darcs.ps && evince darcs.pdf Or generate runtime statistics with +RTS -s .
msg17800 (view)	Author: ganesh	Date: 2014-11-15.12:59:27
I've been playing around with this a bit, informed by some of the things Guillaume discovered. Broadly it seems that the problem is that the entire list of [(PatchId, [PatchMod FileName])] is being constructed and then used to build the PatchIndex, rather than being produced and consumed incrementally. I think that the list of patches from the repository is itself parsed lazily (otherwise all darcs commands that operate on PatchSet would read the entire set into memory), so the problem must be coming from the patch index code itself. I can see a few potential reasons: - the set of FileNames that patches2PatchMods passes around and the related use of mapAccumL, which may be introducing an unintentional dependency on calculating the entire output list before producing any results - the two uses of pmods in createPatchIndexFrom (pids and mods) which may mean it's kept in memory for the second use until the first one is done - the nubSeq in applyPatchMods and the way the state monad is used - these should be incremental in themselves, but it makes it a bit hard to untangle the logic Logically, it should always be possible to build the patch index up one patch at a time - otherwise updatePatchIndex wouldn't work properly. However the code isn't quite structured that way, so I'd like to simplify it to bring us closer to that point. Another way of improving memory usage substantially might be to make sure that the strings and bytestrings in the list are all shared with each other via hash- consing/interning. For example lots of filenames must be repeated a lot. However this seems like a bit fiddly to implement (you need to manage a cache) and might interfere with laziness.
msg18088 (view)	Author: gh	Date: 2015-02-12.17:47:23
I'm going to accept a series of patches by Ganesh and me that solve the memory leak, so this issue will be fixed. This does not guarantee that it will be activated by default in 2.10 since it uses up time to build anyway, but I'll open another ticket for that topic.

History
Date	User	Action	Args
2014-07-30 20:10:49	gh	create
2014-07-30 20:11:11	gh	set	nosy: + darcs-devel
2014-07-30 20:11:23	gh	set	nosy: + bsrkaditya
2014-08-08 20:24:01	gh	set	messages: + msg17652
2014-11-05 20:36:03	gh	set	messages: + msg17753
2014-11-06 21:37:06	gh	set	nosy: + beschmi assignedto: alex.aegf ->
2014-11-06 22:16:29	gh	set	messages: + msg17755
2014-11-15 12:59:30	ganesh	set	messages: + msg17800
2014-11-15 14:31:13	ganesh	link	patch1224 issues
2014-11-24 18:51:18	ganesh	link	patch1234 issues
2014-12-12 17:25:14	gh	set	status: unknown -> has-patch
2015-02-12 17:47:24	gh	set	status: has-patch -> resolved messages: + msg18088