Patch 1567: two improvements for annotate command - Darcs bug tracker

Title	two improvements for annotate command
Superseder		Nosy List	bfrk
Related Issues
Status	accepted	Assigned To
Milestone

Created on 2017-08-09.11:12:41 by bfrk, last changed 2017-08-18.19:33:56 by gh.

Files
File name	Status	Uploaded	Type	Edit	Remove
plugging-memory-leak-in-annotate.dpatch		bfrk, 2017-08-09.11:12:39	text/x-darcs-patch

See mailing list archives for discussion on individual patches.

Messages
msg19516 (view)	Author: bfrk	Date: 2017-08-09.11:12:39
Separating these two improvements to the annotate command from https://hub.darcs.net/bf/darcs-encoding. 2 patches for repository /home/franksen/src/darcs/tmp: patch 6c5d5b7ec2ce1cf805e489d02279412c739ed2a4 Author: Ben Franksen <ben.franksen@online.de> Date: Thu Apr 27 18:38:13 CEST 2017 * plugging memory leak in annotate patch 5ab1a2344812c29660ef4c6a6267f3a55a637d9f Author: Ben Franksen <ben.franksen@online.de> Date: Fri Apr 28 16:55:39 CEST 2017 * improve efficiency of annotate The patch index makes finding the relevant patches quite efficient, but when a patch contains many hunks, the previous implementation took a very long time to finish. This was because it naively applied every hunk and then used diff to see which lines were changed. The new implementation directly observes the effect of patches on the Annotated structure using the new Annotate patch interface (type class). This also makes attribution of lines to patches more accurate, since the result no longer depends on the diff algorithm that annotate used internally. Attachments plugging-memory-leak-in-annotate.dpatch
msg19543 (view)	Author: gh	Date: 2017-08-09.18:45:36
I compiled darcs with RTS enabled and compared the outputs of "darcs annotate darcs.cabal +RTS -s" before and after these patches. There is no change with the patch "plugging memory leak in annotate", so I wonder what it is really plugging ? OTOH the second patch brings a nice improvement. We go: * from "2,314,884,184 bytes allocated in the heap" * to "746,047,704 bytes allocated in the heap", and * from "Total time 1.340s" * to "Total time 0.752s" There is one piece of statistic that I do not know what to make of, we go: * from "28 MB total memory in use (0 MB lost due to fragmentation)" * to "37 MB total memory in use (0 MB lost due to fragmentation)" So the peak memory use (as explained in [1]) has rosen. I can accept this anyway since running time is better, which is good for most users. Is this caused by the fact that we have more function calls? Do you know a way to reduce this peak memory use? Would this be a problem on machines with low ram? [1] https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/runtime_control.html#rts-options-to-produce-runtime-statistics
msg19548 (view)	Author: bfrk	Date: 2017-08-10.16:56:57
I swear I have seen the leak, to the point where the version without the extra strictness crashed with an out-of-memory exception, whereas with the change it worked. But I can no longer reproduce it, at least not on the machine I am ATM. Will try with my laptop at home when I'm there. I will investigate the issue with the increased peak memory use when I have solved the first riddle...
msg19549 (view)	Author: bfrk	Date: 2017-08-11.13:35:51
I was not able to reproduce a decisive difference in memory consumption between before and after patch 'plugging memory leak in annotate'. I would rebase this patch if it weren't already in screened. Re peak memory increase: I have tested this with a somewhat more extreme example repo. This repo contains about 600 patches that all modify a single file (~15000 lines of json). This repo was my initial motivation for re-implementing annotate. In the following have elided irrelevant parts of the RTS output. Before: 10,397,285,400 bytes allocated in the heap 26,784,355,256 bytes copied during GC 1,038,980,636 bytes maximum residency (217 sample(s)) 23,257,040 bytes maximum slop 2607 MB total memory in use (589 MB lost due to fragmentation) MUT time 54.687s ( 76.714s elapsed) GC time 57.521s ( 63.233s elapsed) Total time 112.449s (140.173s elapsed) After: 5,188,947,404 bytes allocated in the heap 3,974,428,900 bytes copied during GC 408,305,932 bytes maximum residency (34 sample(s)) 7,009,744 bytes maximum slop 910 MB total memory in use (0 MB lost due to fragmentation) MUT time 4.884s ( 5.171s elapsed) GC time 11.183s ( 11.184s elapsed) Total time 16.087s ( 16.358s elapsed) All memory consumption is reduced significantly, including peak. With your test example (darcs.cabal) I see that peak memory as well as 'maximum residency' is sightly increased. I am pretty sure this is an artefact caused by low memory pressure together with a faster program (thus fast allocation rate). When I limit the heap using +RTS -M22M I get smaller values for peak as well as maximum residency for the second version (after applying the two patches).
msg19606 (view)	Author: gh	Date: 2017-08-18.19:33:55
OK there figures look nice, thanks! I'm not sure I want to rollback the first patch, especially since issue2243 is fixed by the present bundle. I'm accepting it. (Next time let us try to provide proof of the performance improvement at the moment we send patch bundles for the first time.)

History
Date	User	Action	Args
2017-08-09 11:12:41	bfrk	create
2017-08-09 11:12:57	bfrk	set	status: needs-screening -> needs-review
2017-08-09 18:45:36	gh	set	messages: + msg19543
2017-08-10 16:56:57	bfrk	set	messages: + msg19548
2017-08-11 13:35:52	bfrk	set	messages: + msg19549
2017-08-18 19:33:56	gh	set	status: needs-review -> accepted messages: + msg19606