darcs

Issue 984 wish: drastically improve darcs annotate performance

Title wish: drastically improve darcs annotate performance
Priority feature Status resolved
Milestone 2.10.0 Resolved in 2.10.0
Superseder patch index optimisation (aka filecache)
View: 1566
Nosy List Serware, dmitry.kurochkin, kowey, markstos, mndrix, mornfall, thorkilnaur, tommy, tux_rocker, zooko
Assigned To beschmi
Topics Performance

Created on 2008-08-11.15:14:42 by kowey, last changed 2013-06-16.19:36:48 by gh.

Messages
msg5377 (view) Author: kowey Date: 2008-08-11.15:14:33
People just resort to darcs changes -v.  And the output isn't very nice, apparently.

It may be worth considering a rewrite/rethink.
msg5448 (view) Author: markstos Date: 2008-08-13.00:57:04
I vote for this as well, but I'm marking it as "deferred" for now, since it's a
wishlist/feature item.
msg5491 (view) Author: kowey Date: 2008-08-13.16:43:20
Actually, I think we're aiming to have this fixed for darcs 2.1...

We may have to renegotiate this as time goes on, however.
msg6274 (view) Author: kowey Date: 2008-10-08.09:22:20
See http://bugs.darcs.net/issue124 for the proposed solution to this: creating a
cache that maps filenames to the patches that modify them.
msg6277 (view) Author: droundy Date: 2008-10-08.14:01:21
On Wed, Oct 08, 2008 at 09:22:21AM -0000, Eric Kow wrote:
> See http://bugs.darcs.net/issue124 for the proposed solution to this: creating a
> cache that maps filenames to the patches that modify them.

I'll just briefly mention for the record that there are obviously
multiple approaches one could use for creating such a cache.

1. One could create a copy of the hashed_inventory with mention of all
the files touched included.

2. One could create a directory tree mimicing the pristine cache, in
which the contents of each file is an inventory of all patches
affecting that file.

The latter is much faster for looking up patches affecting a given
file, but it's quite a bit of work.  Of course, there are innumerable
variations on this, such as storing copies of the pieces of patches
that affect each file (so we wouldn't have to read in the entire patch
when running annotate).

I suspect 2 will actually be the easiest to deal with in the long run.
It could reuse almost all the HashedIO code to maintain the cache.
The only real downside of 2 is that it doesn't easily handle the case
of files that have been deleted, particularly when a newer file has
been added that has the same name (or an older file has been renamed
over the deleted file's name).  But in these rare cases we could in
the worst case scenario use the existing code.

On the other hand, if we had a mapping of current filename to
PatchInfo of patch that created the file + its original name, then we
could use that immutable file id (which still isn't consistent across
repositories, or across darcs optimize --reorder) to look up the
patches that affect the file.  This might allow a someone simpler
approach in which we can drop the existing annotate code in favor of
the optimized code (instead of almost never using it).

But ultimately, essentially any reasonable cache layout will work, and
it's up to whoever decides to implement it.  We just need to be able
to find out which patches affect a given file without parsing every
patch in the repository.

David
msg6586 (view) Author: kowey Date: 2008-11-04.12:15:49
Hi Benedikt, I'm assigning this to you because you have been working on the
filecache stuff.  Thanks very much!
msg6917 (view) Author: mornfall Date: 2008-12-28.11:38:09
Let's try to get this in by 2.3... Benedikt, do you have any news to share on 
this? We are all looking forward. :)
msg8498 (view) Author: kowey Date: 2009-08-26.08:58:44
I've created a filecache-specific ticket on issue1566.
msg9528 (view) Author: zooko Date: 2009-12-04.20:29:30
Just a data point: I have darcs-2.3.0 here on a Linux server, and when using the repo 
from http://allmydata.org/source/tahoe/trunk-hashedformat , the following command takes 
45 to 60 seconds:

darcs annotate --xml-output --match "hash 20091127055900-66853-
b166fc807226ca10c1ea66864896646b85d67fcb.gz" docs/frontends/webapi.txt

This is way too long!  It would be great if it could be reduced to 1/100 of that runtime.
msg9529 (view) Author: zooko Date: 2009-12-04.20:31:36
So, that "darcs annotate" command that I mentioned in my previous note is the 
one that produces this web page: 
http://allmydata.org/trac/tahoe/browser/docs/frontends/webapi.txt?
annotate=blame&rev=4112 .  If that web page gives you an error then the "darcs 
annotate" took too long or ran out of RAM and was killed.  If that web page 
gives you a nice annotation of a text file, then it worked!
msg11522 (view) Author: kowey Date: 2010-06-21.18:50:41
Our new goal is to have this in 2.6.0 HEAD soon after the 2.5 release is
out the door.
msg16865 (view) Author: gh Date: 2013-06-16.19:36:47
Resolved by patch index.
History
Date User Action Args
2008-08-11 15:14:42koweycreate
2008-08-12 16:36:45koweylinkissue986 superseder
2008-08-13 00:57:07markstossetstatus: unread -> deferred
nosy: + markstos
messages: + msg5448
2008-08-13 16:43:30koweysetstatus: deferred -> unknown
nosy: + Serware, mornfall, droundy, simon
topic: + Target-2.0
messages: + msg5491
2008-10-08 09:22:21koweysetnosy: + dmitry.kurochkin, thorkilnaur
superseder: + wish: performance of darcs changes <file>
messages: + msg6274
2008-10-08 14:01:24droundysetnosy: droundy, tommy, beschmi, kowey, markstos, dagit, simon, thorkilnaur, dmitry.kurochkin, Serware, mornfall
messages: + msg6277
2008-11-04 12:15:51koweysetnosy: droundy, tommy, beschmi, kowey, markstos, dagit, simon, thorkilnaur, dmitry.kurochkin, Serware, mornfall
messages: + msg6586
assignedto: beschmi
2008-12-28 11:38:18mornfallsettopic: + Target-2.3, - Target-2.0
nosy: droundy, tommy, beschmi, kowey, markstos, dagit, simon, thorkilnaur, dmitry.kurochkin, Serware, mornfall
messages: + msg6917
2009-08-06 17:59:32adminsetnosy: + jast, darcs-devel, zooko, - droundy
2009-08-06 21:10:27adminsetnosy: - beschmi
2009-08-10 22:22:01adminsetnosy: - darcs-devel, zooko, jast
2009-08-11 00:19:55adminsetnosy: - dagit
2009-08-17 05:17:14koweysetstatus: unknown -> has-patch
nosy: tommy, kowey, markstos, simon, thorkilnaur, dmitry.kurochkin, Serware, mornfall
topic: + Target-2.4, - Target-2.3
superseder: - wish: performance of darcs changes <file>
2009-08-17 05:18:39koweylinkissue124 superseder
2009-08-25 17:24:19adminsetnosy: + darcs-devel, - simon
2009-08-25 19:14:56koweysetnosy: tommy, kowey, markstos, darcs-devel, thorkilnaur, dmitry.kurochkin, Serware, mornfall
2009-08-26 08:58:03koweyunlinkissue124 superseder
2009-08-26 08:58:46koweysetstatus: has-patch -> deferred
nosy: tommy, kowey, markstos, darcs-devel, thorkilnaur, dmitry.kurochkin, Serware, mornfall
superseder: + patch index optimisation (aka filecache)
messages: + msg8498
2009-08-27 14:32:41adminsetnosy: tommy, kowey, markstos, darcs-devel, thorkilnaur, dmitry.kurochkin, Serware, mornfall
2009-09-14 10:51:58koweysettopic: + Target-2.5, - Target-2.4
nosy: tommy, kowey, markstos, darcs-devel, thorkilnaur, dmitry.kurochkin, Serware, mornfall
2009-10-23 22:43:56adminsetnosy: + serware, - Serware
2009-10-23 23:29:46adminsetnosy: + Serware, - serware
2009-12-04 20:29:32zookosetnosy: + zooko
messages: + msg9528
2009-12-04 20:31:38zookosetmessages: + msg9529
2010-06-15 20:51:53adminsetmilestone: 2.5.0
2010-06-15 20:59:12adminsettopic: - Target-2.5
2010-06-21 18:50:42koweysetnosy: + tux_rocker, - darcs-devel
messages: + msg11522
milestone: 2.5.0 -> 2.8.0
2012-02-27 19:15:04mndrixsetnosy: + mndrix
2013-06-16 19:36:48ghsetstatus: deferred -> resolved
resolvedin: 2.10.0
messages: + msg16865
milestone: 2.8.0 -> 2.10.0