Issue 2438 patch index creation makes cloning much slower

Title patch index creation makes cloning much slower
Priority Status resolved
Milestone 2.10.0 Resolved in 2.10.0
Superseder Nosy List bsrkaditya, gh, jaredj, kowey
Assigned To
Topics PatchIndex, Performance, ProbablyEasy

Created on 2015-02-18.03:05:56 by gh, last changed 2015-03-11.07:24:13 by noreply.

File name Uploaded Type Edit Remove
timing_PI_atom gh, 2015-02-18.03:05:51 application/octet-stream
msg18148 (view) Author: gh Date: 2015-02-18.03:05:51
Cloning a repository with a big history (eg darcs.net with > 11K
patches) can be 2 times or 3 times slower with patch index creation
(currently enabled by default).

I took the case of cloning a local darcs.net repository with a hot hard
drive cache. On an Intel Atom netbook, cloning is approx. 3 times
slower. On a more modern i3 netbook cloning is approx. 2 times slower.

Apart from the slowness, another problem is that PI creation at cloning
is not announced by a message, and it is not ctrl-c'able (or better: it
is but the user only sees a message telling that the cloned repository
is lazy).

I don't know if it is computationnally possible to make patch index
creation *faster* than what it is now.

So a couple of easy fixes I can imagine are:

* disable patch index by default at clone and init and in repositories
that do not explicitely have a patch index. This would leave us only
with "darcs optimize enable-patch-index" to create it.
* disable patch index on cloning and initializing, enable it only when
running annotate or log with a file argument, when the repository is

Attached are the figures of the informal benchmarks I ran, they are
quite reproduceable (with current darcs HEAD).
msg18151 (view) Author: bf Date: 2015-02-18.08:52:51
Just a spontaneous idea: can we limit the automatic (not explicitly
asked for) index creation to patches after the last clean tag? That
should be pretty fast in almost all cases. Or would that defeat the
purpose of the patch index?
msg18152 (view) Author: gh Date: 2015-02-18.13:53:35
Indeed that would defeat its purpose. The main data structure in PI is
the touch_map, which, given a file, points you to the subsequence of
patches that touch it. This subsequence can have patches that lay
before the laast clean tag.

See http://darcs.net/Internals/PatchIndex#touch_map-also-known-as-info_map .
msg18303 (view) Author: noreply Date: 2015-03-11.07:24:11
The following patch sent by Guillaume Hoffmann <guillaumh@gmail.com> updated issue issue2438 with
status=resolved;resolvedin=2.10.0 HEAD

* resolve issue2438: no longer build patch index by default on cloning 
Ignore-this: 800d7493e751ed0bd23e8041c7c8bdd4
* build automatically only with annotate or non-interactive `log file`
* patch index creation only occurs if the lock can be taken
* PI creation only happens on init, clone and convert if
  --patch-index is passed
* PI creation no longer done when finalizing a repo job
* the only way to have the file `_darcs/no_patch_index` created,
  is to run `optimize disable-patch-index`, or to ctrl-c
  PI creation (during annotate, log, init, clone or convert)
Date User Action Args
2015-02-18 03:05:56ghcreate
2015-02-18 08:52:52bfsetmessages: + msg18151
2015-02-18 13:53:36ghsetmessages: + msg18152
2015-03-11 07:24:13noreplysetstatus: unknown -> resolved
messages: + msg18303
resolvedin: 2.10.0