Created on 2019-06-14.19:05:28 by bfrk, last changed 2019-07-27.17:59:21 by ganesh.
See mailing list archives
for discussion on individual patches.
msg20809 (view) |
Author: bfrk |
Date: 2019-06-14.19:05:28 |
|
4 patches for repository /home/ben/src/darcs/without-v3:
patch 201588ce705f87b17bcb800c113b91d75d3a450e
Author: Ben Franksen <ben.franksen@online.de>
Date: Fri Feb 8 16:50:41 CET 2019
* add some functions we need for V3 to Darcs.Patch.Ident
patch 2ab2d6ea7d3b4299dee37bc81e36fa35bc9c035d
Author: Ben Franksen <ben.franksen@online.de>
Date: Fri Feb 8 17:45:13 CET 2019
* add RepoPatchV3 aka camp conflictors
patch 41d9f81cb0bb0c89d3589440cd2959d0b5df7361
Author: Ben Franksen <ben.franksen@online.de>
Date: Fri Feb 8 17:45:38 CET 2019
* add V3 stuff to the test suite (quickcheck tests only)
patch b8386798be44a17c5e33cfaa130da7f94fc60941
Author: Ben Franksen <ben.franksen@online.de>
Date: Fri Feb 22 15:02:39 CET 2019
* add a comment to commutePast
Attachments
|
msg20812 (view) |
Author: ganesh |
Date: 2019-06-14.23:09:47 |
|
Continuing the discussion from patch1804:
>>> -- The 'Int' gets inverted together with the patch.
>>> data PrimPatchId = PrimPatchId !Int !SHA1
>>
>> Is it illegal for the Int to be 0?
>
> Yes.
>
>> If so this should be documented
>> and ideally tested. (I haven't read a lot of the rest of the
>> relevant code yet, so this might already be tested.)
>
> Good point. No, i don't think this is tested nor documented. Will do.
Actually, a followup thought is do we really need this slightly
opaque encoding, or could we use a Bool like PatchInfo has?
In the same vein, I wonder a bit about the SHA1. I wonder what
the trade-offs between that and using the PatchInfo directly are (if
the latter is even possible). I'll need to get to grips with the rest
of the code to have a good feeling for that though.
|
msg20816 (view) |
Author: bfrk |
Date: 2019-06-15.08:08:03 |
|
> Actually, a followup thought is do we really need this slightly
> opaque encoding, or could we use a Bool like PatchInfo has?
We certainly could. But we'd still have to maintain an invariant (>=0).
I tried to minimize the memory footprint here. Note this gets attached
to /every/ single prim patch (if it is part of a RepoPatchV3).
> In the same vein, I wonder a bit about the SHA1. I wonder what
> the trade-offs between that and using the PatchInfo directly are (if
> the latter is even possible). I'll need to get to grips with the rest
> of the code to have a good feeling for that though.
Again, certainly possible. But remember that the PrimPatchId is part of
the on-disk representation. Using the full meta-data would make patches
quite a bit larger, slowing down read and write of patches and also
equality tests. And since we read PrimPatchIds from disk, we cannot (at
least not automatically) share equal PrimPatchIds in memory, so memory
requirements would also go up significantly.
|
msg20831 (view) |
Author: ganesh |
Date: 2019-06-15.10:55:50 |
|
> Again, certainly possible. But remember that the PrimPatchId is
> part of the on-disk representation. Using the full meta-data
> would make patches quite a bit larger, slowing down read and
> write of patches and also equality tests. And since we read
> PrimPatchIds from disk, we cannot (at least not automatically)
> share equal PrimPatchIds in memory, so memory requirements
> would also go up significantly.
Yeah. I was thinking that the on-disk representation could elide the
PrimPatchId when it's the same as that of the containing patch, so
it'd only be needed for conflicts.
When reading patches, this would also implicitly mean we get back
most of the sharing.
A pointer is smaller than a SHA1, so this would be a win for the
non-conflicting case.
We could also choose to use hash-consing to get back sharing for
the conflicting case, at the cost of some speed when reading, and
extra code complexity in reading patches.
|
msg20833 (view) |
Author: ganesh |
Date: 2019-06-15.11:07:39 |
|
>> Actually, a followup thought is do we really need this slightly
>> opaque encoding, or could we use a Bool like PatchInfo has?
>
> We certainly could. But we'd still have to maintain an invariant
> (>=0). I tried to minimize the memory footprint here. Note this
> gets attached to /every/ single prim patch (if it is part of
> a RepoPatchV3).
Isn't it just that the numbers need to be unique amongst all the
primitives with the same named patch, even if in practice we
allocate them as positive numbers?
One alternative if keeping the efficiency is important would be to
protect the representation by not exporting the constructor of
NamedPrim.
|
msg20834 (view) |
Author: bfrk |
Date: 2019-06-15.14:01:03 |
|
>> Again, certainly possible. But remember that the PrimPatchId is
>> part of the on-disk representation. Using the full meta-data
>> would make patches quite a bit larger, slowing down read and
>> write of patches and also equality tests. And since we read
>> PrimPatchIds from disk, we cannot (at least not automatically)
>> share equal PrimPatchIds in memory, so memory requirements
>> would also go up significantly.
>
> Yeah. I was thinking that the on-disk representation could elide the
> PrimPatchId when it's the same as that of the containing patch, so
> it'd only be needed for conflicts.
>
> When reading patches, this would also implicitly mean we get back
> most of the sharing.
>
> A pointer is smaller than a SHA1, so this would be a win for the
> non-conflicting case.
Okay, this would solve the space issue (on disk and in memory), assuming
the repo doesn't contain too many conflicted patches.
> We could also choose to use hash-consing to get back sharing for
> the conflicting case, at the cost of some speed when reading, and
> extra code complexity in reading patches.
Let me flesh that out a bit and please correct me if I misunderstood:
So the idea here is to use a global (per repo) lookup table from SHA1
hashes to meta-data. That would allow us to use the optimized on-disk
representation for Conflictors, too, i.e. the current one base on
PrimPatchId, but use the real meta-data in a shared way in memory.
Access to this table is of course impure, we would need to hide the
necessary unsafePerformIO stuff behind some hopefully semantically pure
abstraction.
I still cannot see how this would give us back cheap equality tests for
prim patches in the implementation of RepoPatchV3.
BTW, one optimization of the on-disk representation that would be
minimally invasive and even serve as a possible first step toward the
more far-reaching changes discussed above, is to eliminate the
PrimPatchIds for unconflicted patches and just re-calculate them when
reading the patch.
|
msg20886 (view) |
Author: ganesh |
Date: 2019-07-07.11:37:04 |
|
(Sorry for the delay in following up on this)
On 15/06/2019 15:01, Ben Franksen wrote:
>
> Ben Franksen <ben.franksen@online.de> added the comment:
>
>>> Again, certainly possible. But remember that the PrimPatchId is
>>> part of the on-disk representation. Using the full meta-data
>>> would make patches quite a bit larger, slowing down read and
>>> write of patches and also equality tests. And since we read
>>> PrimPatchIds from disk, we cannot (at least not automatically)
>>> share equal PrimPatchIds in memory, so memory requirements
>>> would also go up significantly.
>>
>> Yeah. I was thinking that the on-disk representation could elide the
>> PrimPatchId when it's the same as that of the containing patch, so
>> it'd only be needed for conflicts.
>>
>> When reading patches, this would also implicitly mean we get back
>> most of the sharing.
>>
>> A pointer is smaller than a SHA1, so this would be a win for the
>> non-conflicting case.
>
> Okay, this would solve the space issue (on disk and in memory), assuming
> the repo doesn't contain too many conflicted patches.
>
>> We could also choose to use hash-consing to get back sharing for
>> the conflicting case, at the cost of some speed when reading, and
>> extra code complexity in reading patches.
>
> Let me flesh that out a bit and please correct me if I misunderstood:
>
> So the idea here is to use a global (per repo) lookup table from SHA1
> hashes to meta-data. That would allow us to use the optimized on-disk
> representation for Conflictors, too, i.e. the current one base on
> PrimPatchId, but use the real meta-data in a shared way in memory.
> Access to this table is of course impure, we would need to hide the
> necessary unsafePerformIO stuff behind some hopefully semantically pure
> abstraction.
Yes. It doesn't necessarily need to be a global with unsafePerformIO, we
could just thread it around when parsing a repo if we don't want to have
perfect sharing all the time.
> I still cannot see how this would give us back cheap equality tests for
> prim patches in the implementation of RepoPatchV3.
You're right, it's not as simple as I had thought.
To get cheap equality we'd also need to use 'reallyUnsafePtrEquality'.
Clearly the name should give us some pause :-) It sounds like it can't
give false positives (false reports that
> BTW, one optimization of the on-disk representation that would be
> minimally invasive and even serve as a possible first step toward the
> more far-reaching changes discussed above, is to eliminate the
> PrimPatchIds for unconflicted patches and just re-calculate them when
> reading the patch.
Yep. I think that's also what I meant by eliding it when it's the same
as the containing patch.
One option for now would be to keep the full patch names in the on-disk
format, but keep the hashes in the in-memory representation for now.
That would be easy on reading (just hash), but harder when writing as
we'd need a table of hash -> patch name somewhere. But then the hashing
would just be an implementation detail we could change at any time.
|
msg20889 (view) |
Author: bfrk |
Date: 2019-07-07.15:45:35 |
|
So it looks like getting all the benefits from hash-consing would
require us to swallow a heavy dose of unsafeWhatever. And the added code
complexity is considerable. I won't say that it is unreasonable to
consider such an optimization in the long run, but I'd say for now it is
out of scope.
>> BTW, one optimization of the on-disk representation that would be
>> minimally invasive and even serve as a possible first step toward the
>> more far-reaching changes discussed above, is to eliminate the
>> PrimPatchIds for unconflicted patches and just re-calculate them when
>> reading the patch.
>
> Yep. I think that's also what I meant by eliding it when it's the same
> as the containing patch.
>
> One option for now would be to keep the full patch names in the on-disk
> format, but keep the hashes in the in-memory representation for now.
> That would be easy on reading (just hash), but harder when writing as
> we'd need a table of hash -> patch name somewhere. But then the hashing
> would just be an implementation detail we could change at any time.
I am happy to consider optimizations of all sorts, but I really think
that the on-disk patch format should not contain the full meta data more
than once, ever. This is just wrong.
I am also, albeit less strongly, opposed to penalizing conflictors even
if that improves things in repos that contain few conflicts. One of the
reasons I have invested so much work on the v3 format is that I want
darcs to finally get rid of the "avoid conflicts at all costs" stigma.
Eliding hashes when they are the same as for the containing patch is a
possibility, but not quite as uninvasive as I thought. The problem is
that when we read a Named patch we'd have to pass down a hash and a
serial number to the parsing function for the ingredients. There is
currently no API to do that, as up to now all patch types have a fully
self-contained on-disk format. I have been through such an internal API
change with the FromPrim class (there was no way to avoid that one), so
believe me when i say this is doable but nothing to enjoy.
Can we postpone these considerations for the moment and move on? There
is no pressing need to nail down the on-disk format for v3 at this time.
The current implementation is simple and yet reasonably efficient. More
importantly, it required no deep thinking to get right, which can't be
said for the commute and merge code... ;-) Let us optimize things when
we are certain we have got everything correct. I urge you to open a
ticket gathering the possibilities we discussed so we don't forget.
|
msg20895 (view) |
Author: bfrk |
Date: 2019-07-08.16:17:48 |
|
Following up on patch review.
2 patches for repository http://darcs.net/screened:
patch f5d79f24bab01bad038670f17c2e56c2d7a45b49
Author: Ben Franksen <ben.franksen@online.de>
Date: Sat Jun 15 11:20:33 CEST 2019
* document and add property for PrimPatchId invariant
patch 51096b5cd49a10d71aec2b7242caaf344d72e53d
Author: Ben Franksen <ben.franksen@online.de>
Date: Mon Jul 8 18:10:10 CEST 2019
* make PrimPatchId opaque and check invariant when constructing it
Attachments
|
msg20904 (view) |
Author: ganesh |
Date: 2019-07-11.10:07:21 |
|
> Can we postpone these considerations for the moment and move on?
> There is no pressing need to nail down the on-disk format for v3
> at this time. The current implementation is simple and yet
> reasonably efficient. More importantly, it required no deep
> thinking to get right, which can't be said for the commute and
> merge code... ;-) Let us optimize things when we are certain we
> have got everything correct. I urge you to open a ticket gathering
> the possibilities we discussed so we don't forget.
Agreed. Raised issue2628.
|
msg20905 (view) |
Author: ganesh |
Date: 2019-07-11.12:36:29 |
|
In src/Darcs/Patch/V3/Contexted.hs:
> +-- Comparing the contexts is inefficient and unnecessary
> +-- if the patches have identities, see 'prop_CtxEq'.
It took me a little while to understand that this is true because
the context is minimal. It is implied from the existing description
and prop_ctxInvariants makes it unambiguous, but it still took me a
little while to work it out.
I'll send a patch to emphasise this in the haddocks for Contexted.
|
msg20906 (view) |
Author: ganesh |
Date: 2019-07-11.12:38:05 |
|
Forget my previous comment, I must have completely missed the
final line of the haddocks that says this: :-)
"The definition of context above is chosen so that this sequence is
minimal."
|
msg20907 (view) |
Author: ganesh |
Date: 2019-07-11.13:08:21 |
|
> patch 51096b5cd49a10d71aec2b7242caaf344d72e53d
> Author: Ben Franksen <ben.franksen@online.de>
> Date: Mon Jul 8 18:10:10 CEST 2019
> * make PrimPatchId opaque and check invariant when constructing
> it
I'd also like the "inverses are negation" aspect to be opaque.
I think this probably means that the API should only expose
positive numbers and have separate methods for checking for an
inverse and inverting.
I'm happy to make that change myself if you prefer.
I'm happy with the rest of this bundle, though I haven't read the
actual conflict handling algorithm in detail or compared it to the
camp paper, nor have I looked closely at the exact set of
properties. If you think it's important to have a second set
of eyes on the algorithm, I can try to do it after I've caught up
with the rest of my reviewing backlog.
|
msg20911 (view) |
Author: bfrk |
Date: 2019-07-12.13:44:40 |
|
>> * make PrimPatchId opaque and check invariant when constructing
>> it
>
> I'd also like the "inverses are negation" aspect to be opaque.
> I think this probably means that the API should only expose
> positive numbers and have separate methods for checking for an
> inverse and inverting.
>
> I'm happy to make that change myself if you prefer.
I would appreciate that. (And there is no better way to deeply
understand things than to make changes to it...)
> I'm happy with the rest of this bundle, though I haven't read the
> actual conflict handling algorithm in detail or compared it to the
> camp paper, nor have I looked closely at the exact set of
> properties. If you think it's important to have a second set
> of eyes on the algorithm, I can try to do it after I've caught up
> with the rest of my reviewing backlog.
Sounds like a good plan. When you get to it: a special thing to look out
for would be cases where I didn't clearly distinguish between
side-conditions (which may rightfully fail) and invariants which must
not. Also there are some details where my implementation differs from
the paper on purpose; in some cases this is because there was a bug in
the algorithm as described in the paper. It may be useful to comment
these differences (if I did not already do that).
|
msg20938 (view) |
Author: ganesh |
Date: 2019-07-14.18:36:14 |
|
Accepting this now we have a viable candidate for improving the
abstraction of the sequence numbers.
|
|
Date |
User |
Action |
Args |
2019-06-14 19:05:28 | bfrk | create | |
2019-06-14 23:09:47 | ganesh | set | status: needs-screening -> review-in-progress assignedto: ganesh messages:
+ msg20812 nosy:
+ ganesh |
2019-06-15 08:08:03 | bfrk | set | messages:
+ msg20816 |
2019-06-15 10:55:50 | ganesh | set | messages:
+ msg20831 |
2019-06-15 11:07:39 | ganesh | set | messages:
+ msg20833 |
2019-06-15 14:01:03 | bfrk | set | messages:
+ msg20834 |
2019-07-07 11:37:04 | ganesh | set | messages:
+ msg20886 |
2019-07-07 15:45:36 | bfrk | set | messages:
+ msg20889 |
2019-07-08 16:17:48 | bfrk | set | files:
+ patch-preview.txt, document-and-add-property-for-primpatchid-invariant.dpatch, unnamed messages:
+ msg20895 |
2019-07-11 10:07:21 | ganesh | set | messages:
+ msg20904 |
2019-07-11 12:36:29 | ganesh | set | messages:
+ msg20905 |
2019-07-11 12:38:06 | ganesh | set | messages:
+ msg20906 |
2019-07-11 13:08:21 | ganesh | set | messages:
+ msg20907 |
2019-07-12 13:44:41 | bfrk | set | messages:
+ msg20911 |
2019-07-14 18:36:14 | ganesh | set | status: review-in-progress -> accepted-pending-tests messages:
+ msg20938 |
2019-07-27 17:59:21 | ganesh | set | status: accepted-pending-tests -> accepted |
|