darcs

Issue 124 wish: performance of darcs changes

Title wish: performance of darcs changes <file>
Priority feature Status resolved
Milestone 2.10.0 Resolved in 2.10.0
Superseder patch index optimisation (aka filecache)
View: 1566
Nosy List beschmi, dmitry.kurochkin, jch, kowey, markstos, marnix, mndrix, simonmar, thorkilnaur, tommy
Assigned To beschmi
Topics PatchIndex, Performance

Created on 2006-02-03.10:54:59 by simonmar, last changed 2013-06-17.02:48:03 by gh.

Messages
msg456 (view) Author: simonmar Date: 2006-02-03.10:54:58
On a repository with a lot of patches, asking for darcs changes on a particular
file or directory can take a *long* time (30+ seconds of CPU time is common in
the GHC repository).

This is preventing us from using the darcs browser in Trac, because it
automatically runs darcs changes in this way.  Even the darcsweb browser isn't
really usable with GHC.

If I could wish for only one thing in darcs, faster 'darcs changes <file>' would
be it.  I imagine that in order to do this you have to store the files changed
along with each patch in the inventory, but once you have this a lot of other
operations could be sped up too (darcs annotate would be my #2 wish).  Also,
wouldn't this let you do 'darcs changes <file>' on a partial repository, as long
as you don't also say -v?
msg466 (view) Author: droundy Date: 2006-02-05.13:37:24
On Fri, Feb 03, 2006 at 10:54:59AM +0000, Simon Marlow wrote:
> On a repository with a lot of patches, asking for darcs changes on a
> particular file or directory can take a *long* time (30+ seconds of CPU
> time is common in the GHC repository).
> 
> This is preventing us from using the darcs browser in Trac, because it
> automatically runs darcs changes in this way.  Even the darcsweb browser
> isn't really usable with GHC.
> 
> If I could wish for only one thing in darcs, faster 'darcs changes
> <file>' would be it.  I imagine that in order to do this you have to
> store the files changed along with each patch in the inventory, but once
> you have this a lot of other operations could be sped up too (darcs
> annotate would be my #2 wish).

Actually, I'd prefer to use a separate file to list the files and
directories touched by each patch (or perhaps listing the patches touching
each file or directory?).  That way older darcs would still be able to pull
from a new-format repository (but would be unable to write to it, so the
file-touching cache would stay up-to-date).

This is definitely a good idea, that's been procrastinated because darcs is
already fast enough for most repositories (i.e. the ones that I use for
darcs itself, or at work).

It'd be moderately easy to do this.  We'd just need to make
Repository.updateInventory write to the new file (or files) also.  You'd
need to look inside the PatchToken (which will hold the filename of the
actual patch, which has been written, and which you can then read).  We'd
also need to add to RepoFormat a new RepoProperty, which would be
FilesModifiedCache or something.  Then a repo that has such a cache would
have a _darcs/format file containing something like

"darcs-1.0|files-modified-cache"

which would mean that any darcs that understands darcs-1.0 can read the
repo, but only those that understand files-modified-cache could write to
it.  So then updateInventory would check if we have the FilesModifiedCache
format and if so would write to the cache which files are modified by this
patch.

Of course, then changes, etc., would all need to be rewritten to take
advantage of this info, which could be tedious, but could be done gradually
once the data is actually stored.  I think I might actually lean towards
storing a list of patches for each file or directory rather than the other
way around, but it's hard to say.

We'd also need a way to add this cache to a repo, presumably via darcs
optimize.  And perhaps the same flag for darcs get and initialize.  Or
would this be something we'd always want by default? It really doesn't take
up much space or time, so perhaps all new repos should include this data?

> Also, wouldn't this let you do 'darcs changes <file>' on a partial
> repository, as long as you don't also say -v?

Indeed, your way of doing it would allow this.  If we did it using my
caching trick, I'm not sure whether this would work.  It would depend if we
grab the remote files-touched information on darcs get --partial.
-- 
David Roundy
http://www.darcs.net
msg545 (view) Author: jch Date: 2006-03-03.16:37:38
Simon Marlow:

>> I imagine that in order to do this you have to store the files
>> changed along with each patch in the inventory,

David Roundy:

> Actually, I'd prefer to use a separate file to list the files and
> directories touched by each patch (or perhaps listing the patches touching
> each file or directory?).

Why's that?  We are planning to rework the inventory format, we might
as well make it extensible.

I'm thinking of an inventory format that starts with a line of the form

  darcs-inventory-1.0 keyword1 keyword2 keyword3

which contains all the keywords that are stored in the inventory.
Then the inventory itself will contain entries of the form

  patch-id keyword1=... keyword2=... keyword3=...

and we could initially have keywords such as

  patch-hash                   (discussed at FOSDEM)
  contents-hash                (   --         --   )
  touched-files                (proposed by Simon)

all of which would be optional.

There's just one thing that's not clear to me: how would you deal with
rename patches?

                                        Juliusz
msg1560 (view) Author: jamesdsadler Date: 2007-03-31.06:24:20
Darcs should index and/or cache its metadata so that commands such as 'darcs
changes [file]' and 'darcs annotate [file]' are fast.  These commands should run
near-instantaneously.

I have a large darcs repo (> 500 MB including a few binaries), and running
either of the commands means darcs uses 100% of the CPU and takes anywhere
between 5 and 10 minutes. It also happens to consume the better part of 2gig of RAM.

It should be noted that darcs-server and darcsweb are both essentially built on
top of those commands and would therefore benefit from having them sped up.
msg3236 (view) Author: markstos Date: 2008-02-08.17:18:36
I'm also interested in the possibility of using caching to improve performance
in these cases.
msg7056 (view) Author: thorkilnaur Date: 2009-01-12.10:50:37
Setting status deferred, as this is a wishlist issue.

Best regards
Thorkil
msg7080 (view) Author: thorkilnaur Date: 2009-01-13.11:17:08
Sorry, deferring was a mistake. Setting status need-info instead, asking for a 
discussion of whether we should work towards supporting this feature.

Thanks and best regards
Thorkil
msg7083 (view) Author: markstos Date: 2009-01-13.14:22:39
On Tue, 13 Jan 2009 11:17:10 -0000
Thorkil Naur <bugs@darcs.net> wrote:

> 
> Thorkil Naur <naur@post11.tele.dk> added the comment:
> 
> Sorry, deferring was a mistake. Setting status need-info instead, asking for a 
> discussion of whether we should work towards supporting this feature.

I would like to to see this supported, and I believe it is one of the places
that is a common spot for users to observe that darcs is "slow".

I use this command somewhat regularly. (Although it is usually "fast enough"
for me).

I would like to see it addressed as part of a broader focus on performance of
key areas.

    Mark
msg8193 (view) Author: kowey Date: 2009-08-17.05:18:33
If I understand correctly, this would be addressed by the current filecache work
in issue984
msg8497 (view) Author: kowey Date: 2009-08-26.08:58:01
Re-setting dependency specifically to a filecache ticket. 'Deferred' here just
means "we'll fix issue1566 first and then come and look at this".
msg11698 (view) Author: marnix Date: 2010-07-09.04:17:33
Per kowey's last comment this issue depends on issue1566.  However, this
issue currently has an earlier milestone (2.5.0) than issue1566 (which
has 2.6.0).  That seems inconsistent.
msg11724 (view) Author: kowey Date: 2010-07-12.14:20:17
Milestone bumped accordingly (but we're going to try hard to get the 
patchindex stuff in as soon after Darcs 2.5 is out as possible)
msg16486 (view) Author: markstos Date: 2012-12-28.19:15:55
I think this is resolved now, since the PatchIndex work has landed.
msg16864 (view) Author: gh Date: 2013-06-16.19:36:33
Yes.
History
Date User Action Args
2006-02-03 10:54:59simonmarcreate
2006-02-05 13:37:27droundysetstatus: unread -> unknown
nosy: droundy, tommy, simonmar
messages: + msg466
2006-03-03 16:37:39jchsetnosy: + jch
messages: + msg545
title: performance of darcs changes <file> -> Re: performance of darcs changes <file>
2007-03-31 05:08:35koweylinkissue426 superseder
2007-03-31 06:24:26jamesdsadlersetnosy: + jamesdsadler, kowey, beschmi
messages: + msg1560
2007-07-18 07:36:53koweysettopic: + Performance
nosy: droundy, jch, tommy, beschmi, kowey, simonmar, jamesdsadler
title: Re: performance of darcs changes <file> -> performance of darcs changes <file>
2008-02-08 17:18:37markstossetstatus: unknown -> deferred
nosy: + markstos
messages: + msg3236
title: performance of darcs changes <file> -> wish: performance of darcs changes <file>
2008-05-21 10:47:33koweysetstatus: deferred -> unknown
nosy: + dagit
2008-10-08 09:22:21koweylinkissue984 superseder
2009-01-12 10:50:39thorkilnaursetstatus: unknown -> deferred
nosy: + dmitry.kurochkin, simon, thorkilnaur
messages: + msg7056
2009-01-13 11:17:10thorkilnaursetstatus: deferred -> waiting-for
nosy: droundy, jch, tommy, beschmi, kowey, markstos, dagit, simonmar, simon, jamesdsadler, thorkilnaur, dmitry.kurochkin
messages: + msg7080
2009-01-13 14:22:41markstossetnosy: droundy, jch, tommy, beschmi, kowey, markstos, dagit, simonmar, simon, jamesdsadler, thorkilnaur, dmitry.kurochkin
messages: + msg7083
2009-08-06 17:51:25adminsetnosy: + jast, Serware, darcs-devel, zooko, mornfall, - droundy, jch, simonmar, jamesdsadler
2009-08-06 20:48:47adminsetnosy: - beschmi
2009-08-10 21:43:39adminsetnosy: + jamesdsadler, simonmar, jch, - darcs-devel, zooko, jast, Serware, mornfall
2009-08-10 23:52:37adminsetnosy: - dagit
2009-08-17 05:17:13koweyunlinkissue984 superseder
2009-08-17 05:18:39koweysetstatus: waiting-for -> has-patch
nosy: + beschmi
topic: + Target-2.4
superseder: + wish: drastically improve darcs annotate performance
messages: + msg8193
2009-08-21 17:01:21koweylinkissue1477 superseder
2009-08-25 17:38:54adminsetnosy: + darcs-devel, - simon
2009-08-26 08:58:03koweysetstatus: has-patch -> deferred
nosy: jch, tommy, beschmi, kowey, markstos, darcs-devel, simonmar, jamesdsadler, thorkilnaur, dmitry.kurochkin
superseder: + patch index optimisation (aka filecache), - wish: drastically improve darcs annotate performance
messages: + msg8497
2009-08-26 12:15:12koweysetpriority: wishlist -> feature
nosy: jch, tommy, beschmi, kowey, markstos, darcs-devel, simonmar, jamesdsadler, thorkilnaur, dmitry.kurochkin
2009-08-27 13:43:55adminsetnosy: jch, tommy, beschmi, kowey, markstos, darcs-devel, simonmar, jamesdsadler, thorkilnaur, dmitry.kurochkin
2009-08-27 14:32:46adminsetnosy: jch, tommy, beschmi, kowey, markstos, darcs-devel, simonmar, jamesdsadler, thorkilnaur, dmitry.kurochkin
2009-09-14 10:52:04koweysettopic: + Target-2.5, - Target-2.4
nosy: jch, tommy, beschmi, kowey, markstos, darcs-devel, simonmar, jamesdsadler, thorkilnaur, dmitry.kurochkin
2009-10-23 22:37:40adminsetnosy: + marlowsd, - simonmar
2009-10-23 23:36:07adminsetnosy: + simonmar, - marlowsd
2010-06-14 17:23:58tux_rockersetassignedto: beschmi
2010-06-15 20:51:54adminsetmilestone: 2.5.0
2010-06-15 20:59:13adminsettopic: - Target-2.5
2010-07-09 04:17:34marnixsetnosy: + marnix
messages: + msg11698
2010-07-12 14:20:18koweysetnosy: - darcs-devel
messages: + msg11724
milestone: 2.5.0 -> 2.8.0
2010-07-12 14:25:03koweysetnosy: - jamesdsadler
2012-02-27 19:17:10mndrixsetnosy: + mndrix
2012-12-28 19:15:58markstossetstatus: deferred -> resolved
topic: + PatchIndex
messages: + msg16486
2013-06-16 19:36:34ghsetstatus: resolved -> unknown
resolvedin: 2.10.0
messages: + msg16864
milestone: 2.8.0 -> 2.10.0
2013-06-17 02:48:03ghsetstatus: unknown -> resolved