Patch 1823: v3: basic implementation and tests

Title	v3: basic implementation and tests
Superseder		Nosy List	bfrk, ganesh
Related Issues
Status	accepted	Assigned To	ganesh
Milestone

Created on 2019-06-14.19:05:28 by bfrk, last changed 2019-07-27.17:59:21 by ganesh.

Files
File name	Status	Uploaded	Type	Edit	Remove
add-some-functions-we-need-for-v3-to-darcs_patch_ident.dpatch		bfrk, 2019-06-14.19:05:28	application/x-darcs-patch
document-and-add-property-for-primpatchid-invariant.dpatch		bfrk, 2019-07-08.16:17:48	application/x-darcs-patch
patch-preview.txt		bfrk, 2019-06-14.19:05:28	text/x-darcs-patch
patch-preview.txt		bfrk, 2019-07-08.16:17:48	text/x-darcs-patch
unnamed		bfrk, 2019-06-14.19:05:28	text/plain
unnamed		bfrk, 2019-07-08.16:17:48	text/plain

See mailing list archives for discussion on individual patches.

Messages
msg20809 (view)	Author: bfrk	Date: 2019-06-14.19:05:28
4 patches for repository /home/ben/src/darcs/without-v3: patch 201588ce705f87b17bcb800c113b91d75d3a450e Author: Ben Franksen <ben.franksen@online.de> Date: Fri Feb 8 16:50:41 CET 2019 * add some functions we need for V3 to Darcs.Patch.Ident patch 2ab2d6ea7d3b4299dee37bc81e36fa35bc9c035d Author: Ben Franksen <ben.franksen@online.de> Date: Fri Feb 8 17:45:13 CET 2019 * add RepoPatchV3 aka camp conflictors patch 41d9f81cb0bb0c89d3589440cd2959d0b5df7361 Author: Ben Franksen <ben.franksen@online.de> Date: Fri Feb 8 17:45:38 CET 2019 * add V3 stuff to the test suite (quickcheck tests only) patch b8386798be44a17c5e33cfaa130da7f94fc60941 Author: Ben Franksen <ben.franksen@online.de> Date: Fri Feb 22 15:02:39 CET 2019 * add a comment to commutePast Attachments add-some-functions-we-need-for-v3-to-darcs_patch_ident.dpatch patch-preview.txt unnamed
msg20812 (view)	Author: ganesh	Date: 2019-06-14.23:09:47
Continuing the discussion from patch1804: >>> -- The 'Int' gets inverted together with the patch. >>> data PrimPatchId = PrimPatchId !Int !SHA1 >> >> Is it illegal for the Int to be 0? > > Yes. > >> If so this should be documented >> and ideally tested. (I haven't read a lot of the rest of the >> relevant code yet, so this might already be tested.) > > Good point. No, i don't think this is tested nor documented. Will do. Actually, a followup thought is do we really need this slightly opaque encoding, or could we use a Bool like PatchInfo has? In the same vein, I wonder a bit about the SHA1. I wonder what the trade-offs between that and using the PatchInfo directly are (if the latter is even possible). I'll need to get to grips with the rest of the code to have a good feeling for that though.
msg20816 (view)	Author: bfrk	Date: 2019-06-15.08:08:03
> Actually, a followup thought is do we really need this slightly > opaque encoding, or could we use a Bool like PatchInfo has? We certainly could. But we'd still have to maintain an invariant (>=0). I tried to minimize the memory footprint here. Note this gets attached to /every/ single prim patch (if it is part of a RepoPatchV3). > In the same vein, I wonder a bit about the SHA1. I wonder what > the trade-offs between that and using the PatchInfo directly are (if > the latter is even possible). I'll need to get to grips with the rest > of the code to have a good feeling for that though. Again, certainly possible. But remember that the PrimPatchId is part of the on-disk representation. Using the full meta-data would make patches quite a bit larger, slowing down read and write of patches and also equality tests. And since we read PrimPatchIds from disk, we cannot (at least not automatically) share equal PrimPatchIds in memory, so memory requirements would also go up significantly.
msg20831 (view)	Author: ganesh	Date: 2019-06-15.10:55:50
> Again, certainly possible. But remember that the PrimPatchId is > part of the on-disk representation. Using the full meta-data > would make patches quite a bit larger, slowing down read and > write of patches and also equality tests. And since we read > PrimPatchIds from disk, we cannot (at least not automatically) > share equal PrimPatchIds in memory, so memory requirements > would also go up significantly. Yeah. I was thinking that the on-disk representation could elide the PrimPatchId when it's the same as that of the containing patch, so it'd only be needed for conflicts. When reading patches, this would also implicitly mean we get back most of the sharing. A pointer is smaller than a SHA1, so this would be a win for the non-conflicting case. We could also choose to use hash-consing to get back sharing for the conflicting case, at the cost of some speed when reading, and extra code complexity in reading patches.
msg20833 (view)	Author: ganesh	Date: 2019-06-15.11:07:39
>> Actually, a followup thought is do we really need this slightly >> opaque encoding, or could we use a Bool like PatchInfo has? > > We certainly could. But we'd still have to maintain an invariant > (>=0). I tried to minimize the memory footprint here. Note this > gets attached to /every/ single prim patch (if it is part of > a RepoPatchV3). Isn't it just that the numbers need to be unique amongst all the primitives with the same named patch, even if in practice we allocate them as positive numbers? One alternative if keeping the efficiency is important would be to protect the representation by not exporting the constructor of NamedPrim.
msg20834 (view)	Author: bfrk	Date: 2019-06-15.14:01:03
>> Again, certainly possible. But remember that the PrimPatchId is >> part of the on-disk representation. Using the full meta-data >> would make patches quite a bit larger, slowing down read and >> write of patches and also equality tests. And since we read >> PrimPatchIds from disk, we cannot (at least not automatically) >> share equal PrimPatchIds in memory, so memory requirements >> would also go up significantly. > > Yeah. I was thinking that the on-disk representation could elide the > PrimPatchId when it's the same as that of the containing patch, so > it'd only be needed for conflicts. > > When reading patches, this would also implicitly mean we get back > most of the sharing. > > A pointer is smaller than a SHA1, so this would be a win for the > non-conflicting case. Okay, this would solve the space issue (on disk and in memory), assuming the repo doesn't contain too many conflicted patches. > We could also choose to use hash-consing to get back sharing for > the conflicting case, at the cost of some speed when reading, and > extra code complexity in reading patches. Let me flesh that out a bit and please correct me if I misunderstood: So the idea here is to use a global (per repo) lookup table from SHA1 hashes to meta-data. That would allow us to use the optimized on-disk representation for Conflictors, too, i.e. the current one base on PrimPatchId, but use the real meta-data in a shared way in memory. Access to this table is of course impure, we would need to hide the necessary unsafePerformIO stuff behind some hopefully semantically pure abstraction. I still cannot see how this would give us back cheap equality tests for prim patches in the implementation of RepoPatchV3. BTW, one optimization of the on-disk representation that would be minimally invasive and even serve as a possible first step toward the more far-reaching changes discussed above, is to eliminate the PrimPatchIds for unconflicted patches and just re-calculate them when reading the patch.
msg20886 (view)	Author: ganesh	Date: 2019-07-07.11:37:04
(Sorry for the delay in following up on this) On 15/06/2019 15:01, Ben Franksen wrote: > > Ben Franksen <ben.franksen@online.de> added the comment: > >>> Again, certainly possible. But remember that the PrimPatchId is >>> part of the on-disk representation. Using the full meta-data >>> would make patches quite a bit larger, slowing down read and >>> write of patches and also equality tests. And since we read >>> PrimPatchIds from disk, we cannot (at least not automatically) >>> share equal PrimPatchIds in memory, so memory requirements >>> would also go up significantly. >> >> Yeah. I was thinking that the on-disk representation could elide the >> PrimPatchId when it's the same as that of the containing patch, so >> it'd only be needed for conflicts. >> >> When reading patches, this would also implicitly mean we get back >> most of the sharing. >> >> A pointer is smaller than a SHA1, so this would be a win for the >> non-conflicting case. > > Okay, this would solve the space issue (on disk and in memory), assuming > the repo doesn't contain too many conflicted patches. > >> We could also choose to use hash-consing to get back sharing for >> the conflicting case, at the cost of some speed when reading, and >> extra code complexity in reading patches. > > Let me flesh that out a bit and please correct me if I misunderstood: > > So the idea here is to use a global (per repo) lookup table from SHA1 > hashes to meta-data. That would allow us to use the optimized on-disk > representation for Conflictors, too, i.e. the current one base on > PrimPatchId, but use the real meta-data in a shared way in memory. > Access to this table is of course impure, we would need to hide the > necessary unsafePerformIO stuff behind some hopefully semantically pure > abstraction. Yes. It doesn't necessarily need to be a global with unsafePerformIO, we could just thread it around when parsing a repo if we don't want to have perfect sharing all the time. > I still cannot see how this would give us back cheap equality tests for > prim patches in the implementation of RepoPatchV3. You're right, it's not as simple as I had thought. To get cheap equality we'd also need to use 'reallyUnsafePtrEquality'. Clearly the name should give us some pause :-) It sounds like it can't give false positives (false reports that > BTW, one optimization of the on-disk representation that would be > minimally invasive and even serve as a possible first step toward the > more far-reaching changes discussed above, is to eliminate the > PrimPatchIds for unconflicted patches and just re-calculate them when > reading the patch. Yep. I think that's also what I meant by eliding it when it's the same as the containing patch. One option for now would be to keep the full patch names in the on-disk format, but keep the hashes in the in-memory representation for now. That would be easy on reading (just hash), but harder when writing as we'd need a table of hash -> patch name somewhere. But then the hashing would just be an implementation detail we could change at any time.
msg20889 (view)	Author: bfrk	Date: 2019-07-07.15:45:35
So it looks like getting all the benefits from hash-consing would require us to swallow a heavy dose of unsafeWhatever. And the added code complexity is considerable. I won't say that it is unreasonable to consider such an optimization in the long run, but I'd say for now it is out of scope. >> BTW, one optimization of the on-disk representation that would be >> minimally invasive and even serve as a possible first step toward the >> more far-reaching changes discussed above, is to eliminate the >> PrimPatchIds for unconflicted patches and just re-calculate them when >> reading the patch. > > Yep. I think that's also what I meant by eliding it when it's the same > as the containing patch. > > One option for now would be to keep the full patch names in the on-disk > format, but keep the hashes in the in-memory representation for now. > That would be easy on reading (just hash), but harder when writing as > we'd need a table of hash -> patch name somewhere. But then the hashing > would just be an implementation detail we could change at any time. I am happy to consider optimizations of all sorts, but I really think that the on-disk patch format should not contain the full meta data more than once, ever. This is just wrong. I am also, albeit less strongly, opposed to penalizing conflictors even if that improves things in repos that contain few conflicts. One of the reasons I have invested so much work on the v3 format is that I want darcs to finally get rid of the "avoid conflicts at all costs" stigma. Eliding hashes when they are the same as for the containing patch is a possibility, but not quite as uninvasive as I thought. The problem is that when we read a Named patch we'd have to pass down a hash and a serial number to the parsing function for the ingredients. There is currently no API to do that, as up to now all patch types have a fully self-contained on-disk format. I have been through such an internal API change with the FromPrim class (there was no way to avoid that one), so believe me when i say this is doable but nothing to enjoy. Can we postpone these considerations for the moment and move on? There is no pressing need to nail down the on-disk format for v3 at this time. The current implementation is simple and yet reasonably efficient. More importantly, it required no deep thinking to get right, which can't be said for the commute and merge code... ;-) Let us optimize things when we are certain we have got everything correct. I urge you to open a ticket gathering the possibilities we discussed so we don't forget.
msg20895 (view)	Author: bfrk	Date: 2019-07-08.16:17:48
Following up on patch review. 2 patches for repository http://darcs.net/screened: patch f5d79f24bab01bad038670f17c2e56c2d7a45b49 Author: Ben Franksen <ben.franksen@online.de> Date: Sat Jun 15 11:20:33 CEST 2019 * document and add property for PrimPatchId invariant patch 51096b5cd49a10d71aec2b7242caaf344d72e53d Author: Ben Franksen <ben.franksen@online.de> Date: Mon Jul 8 18:10:10 CEST 2019 * make PrimPatchId opaque and check invariant when constructing it Attachments document-and-add-property-for-primpatchid-invariant.dpatch patch-preview.txt unnamed
msg20904 (view)	Author: ganesh	Date: 2019-07-11.10:07:21
> Can we postpone these considerations for the moment and move on? > There is no pressing need to nail down the on-disk format for v3 > at this time. The current implementation is simple and yet > reasonably efficient. More importantly, it required no deep > thinking to get right, which can't be said for the commute and > merge code... ;-) Let us optimize things when we are certain we > have got everything correct. I urge you to open a ticket gathering > the possibilities we discussed so we don't forget. Agreed. Raised issue2628.
msg20905 (view)	Author: ganesh	Date: 2019-07-11.12:36:29
In src/Darcs/Patch/V3/Contexted.hs: > +-- Comparing the contexts is inefficient and unnecessary > +-- if the patches have identities, see 'prop_CtxEq'. It took me a little while to understand that this is true because the context is minimal. It is implied from the existing description and prop_ctxInvariants makes it unambiguous, but it still took me a little while to work it out. I'll send a patch to emphasise this in the haddocks for Contexted.
msg20906 (view)	Author: ganesh	Date: 2019-07-11.12:38:05
Forget my previous comment, I must have completely missed the final line of the haddocks that says this: :-) "The definition of context above is chosen so that this sequence is minimal."
msg20907 (view)	Author: ganesh	Date: 2019-07-11.13:08:21
> patch 51096b5cd49a10d71aec2b7242caaf344d72e53d > Author: Ben Franksen <ben.franksen@online.de> > Date: Mon Jul 8 18:10:10 CEST 2019 > * make PrimPatchId opaque and check invariant when constructing > it I'd also like the "inverses are negation" aspect to be opaque. I think this probably means that the API should only expose positive numbers and have separate methods for checking for an inverse and inverting. I'm happy to make that change myself if you prefer. I'm happy with the rest of this bundle, though I haven't read the actual conflict handling algorithm in detail or compared it to the camp paper, nor have I looked closely at the exact set of properties. If you think it's important to have a second set of eyes on the algorithm, I can try to do it after I've caught up with the rest of my reviewing backlog.
msg20911 (view)	Author: bfrk	Date: 2019-07-12.13:44:40
>> * make PrimPatchId opaque and check invariant when constructing >> it > > I'd also like the "inverses are negation" aspect to be opaque. > I think this probably means that the API should only expose > positive numbers and have separate methods for checking for an > inverse and inverting. > > I'm happy to make that change myself if you prefer. I would appreciate that. (And there is no better way to deeply understand things than to make changes to it...) > I'm happy with the rest of this bundle, though I haven't read the > actual conflict handling algorithm in detail or compared it to the > camp paper, nor have I looked closely at the exact set of > properties. If you think it's important to have a second set > of eyes on the algorithm, I can try to do it after I've caught up > with the rest of my reviewing backlog. Sounds like a good plan. When you get to it: a special thing to look out for would be cases where I didn't clearly distinguish between side-conditions (which may rightfully fail) and invariants which must not. Also there are some details where my implementation differs from the paper on purpose; in some cases this is because there was a bug in the algorithm as described in the paper. It may be useful to comment these differences (if I did not already do that).
msg20938 (view)	Author: ganesh	Date: 2019-07-14.18:36:14
Accepting this now we have a viable candidate for improving the abstraction of the sequence numbers.

History
Date	User	Action	Args
2019-06-14 19:05:28	bfrk	create
2019-06-14 23:09:47	ganesh	set	status: needs-screening -> review-in-progress assignedto: ganesh messages: + msg20812 nosy: + ganesh
2019-06-15 08:08:03	bfrk	set	messages: + msg20816
2019-06-15 10:55:50	ganesh	set	messages: + msg20831
2019-06-15 11:07:39	ganesh	set	messages: + msg20833
2019-06-15 14:01:03	bfrk	set	messages: + msg20834
2019-07-07 11:37:04	ganesh	set	messages: + msg20886
2019-07-07 15:45:36	bfrk	set	messages: + msg20889
2019-07-08 16:17:48	bfrk	set	files: + patch-preview.txt, document-and-add-property-for-primpatchid-invariant.dpatch, unnamed messages: + msg20895
2019-07-11 10:07:21	ganesh	set	messages: + msg20904
2019-07-11 12:36:29	ganesh	set	messages: + msg20905
2019-07-11 12:38:06	ganesh	set	messages: + msg20906
2019-07-11 13:08:21	ganesh	set	messages: + msg20907
2019-07-12 13:44:41	bfrk	set	messages: + msg20911
2019-07-14 18:36:14	ganesh	set	status: review-in-progress -> accepted-pending-tests messages: + msg20938
2019-07-27 17:59:21	ganesh	set	status: accepted-pending-tests -> accepted

Patch 1823 v3: basic implementation and tests

Attachments

Attachments