Issue 1357 wish: "chunky" representation for hunks

Title wish: "chunky" representation for hunks
Priority feature Status deferred
Milestone Resolved in
Superseder Nosy List darcs-devel, dmitry.kurochkin, kowey, thorkilnaur
Assigned To
Topics Performance

Created on 2009-02-13.17:56:44 by kowey, last changed 2011-04-02.09:21:37 by kowey.

msg7307 (view) Author: kowey Date: 2009-02-13.17:56:39
This is a low-level optimisation for darcs.  We had tried it out between darcses
1.0.9 and 2.0.0, but backed it out due to bugs.  Note that camp uses chunky
hunks, if I understand correctly.

David has suggested a less ambitious and safer approach. The idea is that this
would be akin to the relationship between hashed-1.0 and plain repositories, in
other words, that you would need a new darcs client to interact with the chunky
repository, but that it would be 100% compatible with non-chunky hunks (can be
pushed back and forth)

I need a volunteer to:
- Describe what chunky hunks are
- Describe how we think they will help darcs performance
msg8444 (view) Author: kowey Date: 2009-08-24.01:14:31
These patches by David may be relevant.

  * add new "chunk" patch format (currently only used in darcs-2 repositories).
  * make ColourPrinter able to deal nicely with userchunks that contain newlines.
  * rewrite hunks to store solid chunks rather than lists of lines.

See also http://mornfall.net/blog/patch_formats.html
msg10031 (view) Author: kowey Date: 2010-02-19.11:44:04
Jason: is this what your HunkHandle work is about?
Please un-assign yourself if I'm mistaken.
msg10036 (view) Author: dagit Date: 2010-02-19.16:10:03
My HunkHandles are a record that stores a filename, an offset and a
length.  They can be used to read in a hunk patch after an initial scan
over a patch bundle to calculate the handles.

It would be nice to also adapt commute and merge to rely more on the
meta data about a hunk and ignore the playload of the hunk whenever
possible, but that's not going to be in my initial rewrite.

Note:  No on disk format changes are necessary for my change.  I'm
sacrificing a bit of disk IO in an attempt to make disk accesses more
explicit (the goal is to make it harder to hold on to things in memory
that are huge).

I think the two ideas "chunky" rep and HunkHandles have similar goals
but the work itself is quite different.  Therefore, I will remove myself
as the owner of this ticket.
msg10037 (view) Author: kowey Date: 2010-02-19.16:38:57
On Fri, Feb 19, 2010 at 16:10:06 +0000, Jason Dagit wrote:
> My HunkHandles are a record that stores a filename, an offset and a
> length.  They can be used to read in a hunk patch after an initial scan
> over a patch bundle to calculate the handles.

> I think the two ideas "chunky" rep and HunkHandles have similar goals
> but the work itself is quite different.

Jason graciously explained this to me on IRC.  For tracking purposes:
 - http://irclog.perlgeek.de/darcs/2010-02-19#i_2009373
 - http://lists.osuosl.org/pipermail/darcs-users/2010-January/022760.html

So it seems that there are 4 ideas going on, three of which sort of
get swept up into the chunky-hunk umbrella.

 1. changing the in-memory representation of hunk patches

 2. changing the on-disk representation of hunk patches

 3. radically changing #1/#2 so that hunks just point to hashes of
    contents (Petr's idea)

 4. Lightly changing the in-memory representation so that we avoid
    storing much more than filename/offset in memory.  Let's step
    through these backwards.

    Unlike #3, we do not change the notion of commutation in any
    way; nor we do we change the fact that we still explicitly
    manipulate hunk contents.  No hashes here.

    Unlike #2, we do not modify the on-disk representation of hunk
    patches in any way.  (This should be fairly clear).  Just an
    online behaviour, moving some of our work from memory to disk.

    Unlike #1, we do not change the internal representation of the
    contents.  We still work with a [ByteString] representing a
    list of lines, and not a single ByteString representing a region.

Phew! Maybe HunkHandle needs a ticket of its own.

Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
msg13833 (view) Author: ganesh Date: 2011-03-30.20:47:25
The blog link above seems to now be 
msg13834 (view) Author: bsrkaditya Date: 2011-03-30.20:47:50
Just, as a note,
In irc, Heffalump has said that the patch
* add new "chunk" patch format (currently only used in darcs-2 
had been rolled back.
msg13864 (view) Author: kowey Date: 2011-04-02.09:21:35
OK, I just had a bit of a discussion with this on Petr, 

Basically following the list in msg10037

1. Would be pointless/stupid without doing #2 or #3
2. Is doable (think of how hashed relates to old-fashioned), but you 
have to do work to prove that commutation is invariant under the hunk 
conversion), which is almost as much work as #3 for the team.  Perhaps 
it would be less work for users, but as a matter of principle, we want 
to resist format changes unless we really need them
3. Involves new patch types, but is ultimately easier than #2 according 
to Petr.  The current URL is 
4. Is basically orthogonal to #3.  We could do both or neither, or some 
combination in between.  See issue2055

*SO* Unless somebody comes in and argues that #2 actually makes sense 
without doing #3, I say that Petr's notion of chunky hunks is the 
definitive one.  If anybody disagrees with this view, they should bring 
it up. Phew!
Date User Action Args
2009-02-13 17:56:44koweycreate
2009-04-15 22:52:21koweysetstatus: needs-reproduction -> deferred
nosy: kowey, simon, thorkilnaur, dmitry.kurochkin
2009-08-17 17:10:16koweylinkissue80 superseder
2009-08-21 16:37:56koweylinkissue1007 superseder
2009-08-24 01:14:33koweysetnosy: kowey, simon, thorkilnaur, dmitry.kurochkin
messages: + msg8444
2009-08-25 17:40:52adminsetnosy: + darcs-devel, - simon
2009-08-26 12:15:16koweysetpriority: wishlist -> feature
nosy: kowey, darcs-devel, thorkilnaur, dmitry.kurochkin
2009-08-27 14:32:48adminsetnosy: kowey, darcs-devel, thorkilnaur, dmitry.kurochkin
2010-02-19 11:44:06koweysetstatus: deferred -> has-patch
nosy: + dagit
messages: + msg10031
assignedto: dagit
2010-02-19 16:10:06dagitsetmessages: + msg10036
2010-02-19 16:10:33dagitsetstatus: has-patch -> deferred
assignedto: dagit ->
2010-02-19 16:10:42dagitsetnosy: - dagit
2010-02-19 16:38:59koweysetmessages: + msg10037
2011-03-30 20:47:26ganeshsetmessages: + msg13833
2011-03-30 20:47:51bsrkadityasetmessages: + msg13834
2011-04-02 09:21:37koweysetmessages: + msg13864