darcs

Patch 1567 two improvements for annotate command

Title two improvements for annotate command
Superseder Nosy List bf
Related Issues
Status accepted Assigned To
Milestone

Created on 2017-08-09.11:12:41 by bf, last changed 2017-08-18.19:33:56 by gh.

Files
File name Status Uploaded Type Edit Remove
plugging-memory-leak-in-annotate.dpatch bf, 2017-08-09.11:12:39 text/x-darcs-patch
See mailing list archives for discussion on individual patches.
Messages
msg19516 (view) Author: bf Date: 2017-08-09.11:12:39
Separating these two improvements to the annotate command from
https://hub.darcs.net/bf/darcs-encoding.

2 patches for repository /home/franksen/src/darcs/tmp:

patch 6c5d5b7ec2ce1cf805e489d02279412c739ed2a4
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Apr 27 18:38:13 CEST 2017
  * plugging memory leak in annotate

patch 5ab1a2344812c29660ef4c6a6267f3a55a637d9f
Author: Ben Franksen <ben.franksen@online.de>
Date:   Fri Apr 28 16:55:39 CEST 2017
  * improve efficiency of annotate
  
  The patch index makes finding the relevant patches quite efficient,
but when
  a patch contains many hunks, the previous implementation took a very long
  time to finish. This was because it naively applied every hunk and
then used
  diff to see which lines were changed.
  
  The new implementation directly observes the effect of patches on the
  Annotated structure using the new Annotate patch interface (type class).
  This also makes attribution of lines to patches more accurate, since the
  result no longer depends on the diff algorithm that annotate used
  internally.
Attachments
msg19543 (view) Author: gh Date: 2017-08-09.18:45:36
I compiled darcs with RTS enabled and compared the outputs of "darcs
annotate darcs.cabal +RTS -s" before and after these patches.

There is no change with the patch "plugging memory leak in annotate", so
I wonder what it is really plugging ?

OTOH the second patch brings a nice improvement. We go:

* from "2,314,884,184 bytes allocated in the heap"
* to   "746,047,704 bytes allocated in the heap", and
* from "Total   time    1.340s"
* to   "Total   time    0.752s"

There is one piece of statistic that I do not know what to make of, we go:

* from "28 MB total memory in use (0 MB lost due to fragmentation)"
* to   "37 MB total memory in use (0 MB lost due to fragmentation)"

So the peak memory use (as explained in [1]) has rosen. I can accept
this anyway since running time is better, which is good for most users.

Is this caused by the fact that we have more function calls? Do you know
a way to reduce this peak memory use? Would this be a problem on
machines with low ram?

[1]
https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/runtime_control.html#rts-options-to-produce-runtime-statistics
msg19548 (view) Author: bf Date: 2017-08-10.16:56:57
I swear I have seen the leak, to the point where the version without the
extra strictness crashed with an out-of-memory exception, whereas with
the change it worked. But I can no longer reproduce it, at least not on
the machine I am ATM. Will try with my laptop at home when I'm there.

I will investigate the issue with the increased peak memory use when I
have solved the first riddle...
msg19549 (view) Author: bf Date: 2017-08-11.13:35:51
I was not able to reproduce a decisive difference in memory consumption
between before and after patch 'plugging memory leak in annotate'. I
would rebase this patch if it weren't already in screened.

Re peak memory increase: I have tested this with a somewhat more extreme
example repo. This repo contains about 600 patches that all modify a
single file (~15000 lines of json). This repo was my initial motivation
for re-implementing annotate. In the following have elided irrelevant
parts of the RTS output.

Before:

  10,397,285,400 bytes allocated in the heap
  26,784,355,256 bytes copied during GC
   1,038,980,636 bytes maximum residency (217 sample(s))
      23,257,040 bytes maximum slop
            2607 MB total memory in use (589 MB lost due to fragmentation)

  MUT     time   54.687s  ( 76.714s elapsed)
  GC      time   57.521s  ( 63.233s elapsed)
  Total   time  112.449s  (140.173s elapsed)

After:

   5,188,947,404 bytes allocated in the heap
   3,974,428,900 bytes copied during GC
     408,305,932 bytes maximum residency (34 sample(s))
       7,009,744 bytes maximum slop
             910 MB total memory in use (0 MB lost due to fragmentation)

  MUT     time    4.884s  (  5.171s elapsed)
  GC      time   11.183s  ( 11.184s elapsed)
  Total   time   16.087s  ( 16.358s elapsed)

All memory consumption is reduced significantly, including peak.

With your test example (darcs.cabal) I see that peak memory as well as
'maximum residency' is sightly increased. I am pretty sure this is an
artefact caused by low memory pressure together with a faster program
(thus fast allocation rate). When I limit the heap using +RTS -M22M I
get smaller values for peak as well as maximum residency for the second
version (after applying the two patches).
msg19606 (view) Author: gh Date: 2017-08-18.19:33:55
OK there figures look nice, thanks! I'm not sure I want to rollback the
first patch, especially since issue2243 is fixed by the present bundle.
I'm accepting it.

(Next time let us try to provide proof of the performance improvement at
the moment we send patch bundles for the first time.)
History
Date User Action Args
2017-08-09 11:12:41bfcreate
2017-08-09 11:12:57bfsetstatus: needs-screening -> needs-review
2017-08-09 18:45:36ghsetmessages: + msg19543
2017-08-10 16:56:57bfsetmessages: + msg19548
2017-08-11 13:35:52bfsetmessages: + msg19549
2017-08-18 19:33:56ghsetstatus: needs-review -> accepted
messages: + msg19606