Created on 2009-02-13.17:56:44 by kowey, last changed 2017-07-31.00:24:00 by gh.
This is a low-level optimisation for darcs. We had tried it out between darcses
1.0.9 and 2.0.0, but backed it out due to bugs. Note that camp uses chunky
hunks, if I understand correctly.
David has suggested a less ambitious and safer approach. The idea is that this
would be akin to the relationship between hashed-1.0 and plain repositories, in
other words, that you would need a new darcs client to interact with the chunky
repository, but that it would be 100% compatible with non-chunky hunks (can be
pushed back and forth)
I need a volunteer to:
- Describe what chunky hunks are
- Describe how we think they will help darcs performance
These patches by David may be relevant.
* add new "chunk" patch format (currently only used in darcs-2 repositories).
* make ColourPrinter able to deal nicely with userchunks that contain newlines.
* rewrite hunks to store solid chunks rather than lists of lines.
See also http://mornfall.net/blog/patch_formats.html
Jason: is this what your HunkHandle work is about?
Please un-assign yourself if I'm mistaken.
My HunkHandles are a record that stores a filename, an offset and a
length. They can be used to read in a hunk patch after an initial scan
over a patch bundle to calculate the handles.
It would be nice to also adapt commute and merge to rely more on the
meta data about a hunk and ignore the playload of the hunk whenever
possible, but that's not going to be in my initial rewrite.
Note: No on disk format changes are necessary for my change. I'm
sacrificing a bit of disk IO in an attempt to make disk accesses more
explicit (the goal is to make it harder to hold on to things in memory
that are huge).
I think the two ideas "chunky" rep and HunkHandles have similar goals
but the work itself is quite different. Therefore, I will remove myself
as the owner of this ticket.
On Fri, Feb 19, 2010 at 16:10:06 +0000, Jason Dagit wrote:
> My HunkHandles are a record that stores a filename, an offset and a
> length. They can be used to read in a hunk patch after an initial scan
> over a patch bundle to calculate the handles.
> I think the two ideas "chunky" rep and HunkHandles have similar goals
> but the work itself is quite different.
Jason graciously explained this to me on IRC. For tracking purposes:
So it seems that there are 4 ideas going on, three of which sort of
get swept up into the chunky-hunk umbrella.
1. changing the in-memory representation of hunk patches
2. changing the on-disk representation of hunk patches
3. radically changing #1/#2 so that hunks just point to hashes of
contents (Petr's idea)
4. Lightly changing the in-memory representation so that we avoid
storing much more than filename/offset in memory. Let's step
through these backwards.
Unlike #3, we do not change the notion of commutation in any
way; nor we do we change the fact that we still explicitly
manipulate hunk contents. No hashes here.
Unlike #2, we do not modify the on-disk representation of hunk
patches in any way. (This should be fairly clear). Just an
online behaviour, moving some of our work from memory to disk.
Unlike #1, we do not change the internal representation of the
contents. We still work with a [ByteString] representing a
list of lines, and not a single ByteString representing a region.
Phew! Maybe HunkHandle needs a ticket of its own.
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9
The blog link above seems to now be
Just, as a note,
In irc, Heffalump has said that the patch
* add new "chunk" patch format (currently only used in darcs-2
had been rolled back.
OK, I just had a bit of a discussion with this on Petr,
Basically following the list in msg10037
1. Would be pointless/stupid without doing #2 or #3
2. Is doable (think of how hashed relates to old-fashioned), but you
have to do work to prove that commutation is invariant under the hunk
conversion), which is almost as much work as #3 for the team. Perhaps
it would be less work for users, but as a matter of principle, we want
to resist format changes unless we really need them
3. Involves new patch types, but is ultimately easier than #2 according
to Petr. The current URL is
4. Is basically orthogonal to #3. We could do both or neither, or some
combination in between. See issue2055
*SO* Unless somebody comes in and argues that #2 actually makes sense
without doing #3, I say that Petr's notion of chunky hunks is the
definitive one. If anybody disagrees with this view, they should bring
it up. Phew!
|2009-04-15 22:52:21||kowey||set||status: needs-reproduction -> deferred|
kowey, simon, thorkilnaur, dmitry.kurochkin
|2009-08-17 17:10:16||kowey||link||issue80 superseder|
|2009-08-21 16:37:56||kowey||link||issue1007 superseder|
kowey, simon, thorkilnaur, dmitry.kurochkin|
+ darcs-devel, - simon|
|2009-08-26 12:15:16||kowey||set||priority: wishlist -> feature|
kowey, darcs-devel, thorkilnaur, dmitry.kurochkin
kowey, darcs-devel, thorkilnaur, dmitry.kurochkin|
|2010-02-19 11:44:06||kowey||set||status: deferred -> has-patch|
|2010-02-19 16:10:33||dagit||set||status: has-patch -> deferred|
assignedto: dagit ->
|2017-07-31 00:24:00||gh||set||status: deferred -> given-up|