Patch 1844: automatically specialize merging and equality for sequences of patches with identity

Title	automatically specialize merging and equality for sequences of patches with identity
Superseder		Nosy List	bfrk
Related Issues
Status	obsoleted	Assigned To
Milestone

Created on 2019-06-28.18:36:57 by bfrk, last changed 2023-07-10.13:48:19 by bfrk.

Files
File name	Status	Uploaded	Type	Edit	Remove
automatically-specialize-merging-and-equality-for-sequences-of-patches-with-identity.dpatch		bfrk, 2019-07-07.16:20:42	application/x-darcs-patch
patch-preview.txt		bfrk, 2019-07-07.16:20:42	text/x-darcs-patch
unnamed		bfrk, 2019-07-07.16:20:42	text/plain

See mailing list archives for discussion on individual patches.

Messages
msg20875 (view)	Author: bfrk	Date: 2019-06-28.18:36:56
Not for screened yet but please take a look. The dependency on the second patch wasn't intended. If it hurts I can try and rebase. Also note that all the unit tests that have a "nontrivial xxx" condition and are run for FLs fail now with a "Buggy use of unsafeCompare on FL" error. I don't understand why they didn't do that before this patch. I see no way to fix these conditions generically. The only solution I can see is to use different conditions for the FL tests that don't use unsafeCompare. 2 patches for repository http://darcs.net/screened: patch 6214e01928baab9123f03566f12d15e91d86c3f5 Author: Ben Franksen <ben.franksen@online.de> Date: Tue Feb 26 23:29:10 CET 2019 * fix a bug in darcs1->darcs-2 conversion The test data for threewayanddep threewayandmultideps were quite obviously wrong! The darcs-2 Conflictors are complete bogus, referring to patches that don't appear in the repo. This is caused by erroneous calls to sortCoalesceFL in the RepoPatchV1 implementation in unravel and in effect. The way these functions are normally used (effect when we apply a patch and unravel to generate conflict markup) is quite tolerant wrt coalescing. However, unravel is also used to convert darcs-1 Mergers to darcs-2 Conflictors, and here the result is catastrophic. Instead of sortCoalesceFL we must merely cancel inverses, just like we do in the darcs-2 and darcs-3 theory when we construct contexted patches (aka Nons). patch abc332c0bda4d4ab622fb46319301e33c4a89d2f Author: Ben Franksen <ben.franksen@online.de> Date: Fri Jun 28 20:25:32 CEST 2019 * automatically specialize merging and equality for sequences of patches with identity The idea is to allow patch sequences to adapt their behavior depending on whether the patch type has an Ident instance or not. This is done for two reasons: (1) equality tests for, and removing elements from sequences of patches is much faster if we can use PatchIds; (2) merging sequences of patches with identity must first remove duplicate patches, otherwise these would be wrongly treated as conflicting. The new class Sequence abstracts over the basic operations we want to specialize for FL and RL: equality and removing elements. Similarly, class Merge gets a new method mergeFLs. By using the new DerivingVia extension we can derive the Sequence instances for patch types with an Ident instance from one template definition for a suitable newtype wrapper named HasIdent. The fastRemoveXXX functions are now gone from the Patch API and only used internally inside D.P.Ident, the ugly IdEq2 class could be fully removed. Attachments fix-a-bug-in-darcs1__darcs_2-conversion.dpatch patch-preview.txt unnamed
msg20876 (view)	Author: bfrk	Date: 2019-06-28.18:37:45
Fixing subject line...
msg20877 (view)	Author: bfrk	Date: 2019-06-28.21:03:42
I just realized that semantic equality for patches and patch sequences is not sufficient. It would be enough for darcs itself, but many /properties/ can only be expressed via structural equality. In particular, any property /about/ semantic equality must refer to the underlying structural equality or be vacuous. Similarly, the usual read/show roundtrip properties should be expressed with structural comparison else we risk testing only that the contained PatchIds are correctly roundtripped. The recent introduction of NamedPrim, class Ident, and now class Sequence in this bundle, has made it rather obvious that we cannot continue to be unclear which sort of equality we mean. Our current =\/= and =/\= operators are clearly meant to be semantic, so we need new "operators" (functions, methods) to express structural equality. This means that we either need to add more methods to class Eq2 or introcude a new class. Any ideas?
msg20878 (view)	Author: ganesh	Date: 2019-06-29.11:41:57
I definitely agree about needing a clear structural versus semantic difference. Perhaps also nominal, though hopefully it can be merged with one or both of the other concepts. The `Eq2 (FL P)` instance has always scared me and your message about generic instances inspired me to investigate if we can do without it and instead be more explicit about what we want at call sites. I'm still trying to figure out why findCommonAndUncommon apparently requires it to call partitionFL...
msg20879 (view)	Author: bfrk	Date: 2019-06-30.12:07:27
> I definitely agree about needing a clear structural versus semantic > difference. Perhaps also nominal, though hopefully it can be merged > with one or both of the other concepts. Hm, nominal equality would be my IdEq2 class, right? Removing that as a separate concept is what this patch is about, after all, so yes to that. After experimenting with structural equality, I am now leaning towards understanding it in a local sense. That is, instances should not be built by recursively calling structural equality. Instead they should use only the semantic equality for their ingredients. This is normally faster (except with the FL instances for patches without nominal equality). More importantly, it avoids over-specifying properties. This would not impair effectiveness of property tests: We test all relevant properties separately for each layer of the patch type hierarchy. The higher level types are always parameteric in the lower level patch type, so we can rely on them not to break the properties of their ingredients. > The `Eq2 (FL P)` instance has always scared me and your message > about generic instances inspired me to investigate if we can do > without it and instead be more explicit about what we want at call > sites. When I discovered the instance Eq2 (FL p) I felt not scared but relieved. I was glad to know that I never have to care about possible re-orderings when comparing sequences. IMO this is as it should be. I am also pretty sure that we always want this kind of equality for FLs and RLs, even for most properties. Because this is the only equality that matters, and requiring patches to be in the same order would be an over-specification. We also have tests that ensure order inside a sequence doesn't matter ("consistent tree flattenings", for which I plan to send an improved version that is more general, i.e. tries all possible permutations of an arbitrary sequence of patches). I am also quite sure that optimizing semantic equality using nominal equality is safe everywhere. The Eq2 FL instance may be slow for patches w/o identity but honestly I no longer care much about optimizing the buggy old patch formats. > I'm still trying to figure out why findCommonAndUncommon > apparently requires it to call partitionFL... I don't think it is required, it's just an implementation detail. In fact I think findCommonAndUncommon could be refactored to call findCommonFL after splitting the patchsets at the latest common tag. I was planning to make this change but had more pressing things to do (like fixing conflict resolution for V3, and now being side-tracked with equality refactors).
msg20880 (view)	Author: bfrk	Date: 2019-06-30.19:49:45
Here we go. Equality for patches fully cleaned up. Test suite adapted accordingly. Unfortunatey still with the unrelated dependency. 10 patches for repository http://darcs.net/screened: patch bf544a1f29a9874e1fd87a8b127c105ff1cbcad7 Author: Ben Franksen <ben.franksen@online.de> Date: Sat Jun 29 15:33:45 CEST 2019 * fix in D.P.Named.Wrapped Defining Eq2 with DeriveAnyClass means that a call to any of its methods will loop. It is better to throw an error in this case. patch d0ceafb4f82172393861c70aac0bc8a76ecaa7c4 Author: Ben Franksen <ben.franksen@online.de> Date: Sun Jun 30 10:58:01 CEST 2019 * remove unneeded Eq2 instances from Darcs.Patch.Choices patch 374bb15dd587e89dfb2842a884b227b1f664bdf1 Author: Ben Franksen <ben.franksen@online.de> Date: Sun Jun 30 16:23:29 CEST 2019 * define default methods class Eq more symmetrically patch 6214e01928baab9123f03566f12d15e91d86c3f5 Author: Ben Franksen <ben.franksen@online.de> Date: Tue Feb 26 23:29:10 CET 2019 * fix a bug in darcs1->darcs-2 conversion The test data for threewayanddep threewayandmultideps were quite obviously wrong! The darcs-2 Conflictors are complete bogus, referring to patches that don't appear in the repo. This is caused by erroneous calls to sortCoalesceFL in the RepoPatchV1 implementation in unravel and in effect. The way these functions are normally used (effect when we apply a patch and unravel to generate conflict markup) is quite tolerant wrt coalescing. However, unravel is also used to convert darcs-1 Mergers to darcs-2 Conflictors, and here the result is catastrophic. Instead of sortCoalesceFL we must merely cancel inverses, just like we do in the darcs-2 and darcs-3 theory when we construct contexted patches (aka Nons). patch e49eac4940f25f09e46b34dbccc3a550bb71e7bb Author: Ben Franksen <ben.franksen@online.de> Date: Fri Jun 28 20:25:32 CEST 2019 * automatically specialize merging and equality for sequences of patches with identity The idea is to allow patch sequences to adapt their behavior depending on whether the patch type has an Ident instance or not. This is done for two reasons: (1) equality tests for, and removing elements from sequences of patches is much faster if we can use PatchIds; (2) merging sequences of patches with identity must first remove duplicate patches, otherwise these would be wrongly treated as conflicting. The new class Sequence abstracts over the basic operations we want to specialize for FL and RL: equality and removing elements. Similarly, class Merge gets a new method mergeFLs. By using the new DerivingVia extension we can derive the Sequence instances for patch types with an Ident instance from one template definition for a suitable newtype wrapper named HasIdent. The fastRemoveXXX functions are now gone from the Patch API and only used internally inside D.P.Ident, the ugly IdEq2 class could be fully removed. patch cd24cd3cda462108ce3d96d302b6050dd626f4dc Author: Ben Franksen <ben.franksen@online.de> Date: Sun Jun 30 16:21:12 CEST 2019 * fix documentation in D.P.W.Eq patch 98401fdc043a962ab32113b9b729b23fbfd672ed Author: Ben Franksen <ben.franksen@online.de> Date: Fri Jun 21 15:30:40 CEST 2019 * harness: remove unused propFail While a comment said this is "handy for debugging arbitrary code" I could not find an any further explanation or an example. I guess it can be re-added easily if needed. patch b5f82f0bd3b4a6c02d4f5ca39a86b1ee876da43a Author: Ben Franksen <ben.franksen@online.de> Date: Fri Jun 21 21:47:26 CEST 2019 * harness: move nontrivialX conditions to D.T.P.A.Generic and remove dead code patch efaca89dba06b22e2394a9a263e22216f84ddc23 Author: Ben Franksen <ben.franksen@online.de> Date: Fri Jun 28 11:00:00 CEST 2019 * remove unused eqFLRev patch 20cb6a18c141a4fc0139e6fed81263c82915976a Author: Ben Franksen <ben.franksen@online.de> Date: Sun Jun 30 21:26:07 CEST 2019 * clean up patch equality We make a clear distinction between semantic and structural equality. The former is captured by Eq2, the latter by the new class StructuralEq2. We use nominal equality as optimization for semantic equality where available. Structural equality is understood to be local; it should not refer to structural equality of its components but rather to their semantic equality. Thus the new definitions for StructuralEq2 are quite similar to those which previously defined Eq2. Some of the equality definitions have been cleaned up and streamlined, for instance the two basic prim patch types now use deriving Eq instead of manually written definitions. There remain no /uses/ of unsafeCompare throughout the code except in RepoPatchV1 and in the default method definitions for Eq2. It still appears in many places to define Eq2, though, since this is quite convenient. Attachments fix-in-d_p_named_wrapped.dpatch patch-preview.txt unnamed
msg20881 (view)	Author: ganesh	Date: 2019-07-06.09:47:09
On 30/06/2019 20:49, Ben Franksen wrote: > Structural equality is understood to be local; it should not refer to > structural equality of its components but rather to their semantic equality. This feels a bit odd to me. For example, how should the instance be defined for a newtype? What if we refactor a single compound patch type to have two levels of types, or vice versa? In general I'd hope a type class would express properties that are independent of the specific internal representation of the type.
msg20882 (view)	Author: bfrk	Date: 2019-07-06.20:28:07
>> Structural equality is understood to be local; it should not refer >> to structural equality of its components but rather to their >> semantic equality > > This feels a bit odd to me. Let's try to define structural equality for, say, a RepoPatchV3 conflictor. I take that example because it is simpler than V1 and V2 in that the type is not defined recursively. So we have as ingredient, among others, an FL of prims for the effect. Suppose we take structural equality for the effect to define structural equality for two conflictors. This makes no sense. Such an equality would be far too strong. The order of the prims in the effect may change in ways that are hard to control when we commute such conflictors. I cannot promise that e.g. the permutivity property holds under such an equality. And neither would we gain anything by specifying that it does. Even for the read/show roundtrip property requiring it is of questionable use (though it would probably hold). Using semantic equality for this component is the only useful definition here. For the other ingredients I could make a similar argument, and also for all the other compound patch types. > For example, how should the instance be defined for a newtype A very good question. I had to think about this one for a while. I'd say it depends on whether you need to access the internals of the base type when you redefine methods. If this is the case, then you probably want to inherit structural equality from the base type. But such a newtype is just a convenient way to avoid copying the base type and its class methods and then changing some definitions. I would argue that in such cases the base type should not be regarded as a "component". If, on the other hand, you do not access the internals of the base type in any way, then I'd say that yes, referring to the semantic equality of the base type is how I would define it. Perhaps I should make this clearer in the documentation: structural equality should be defined in terms of semantic equality of its components /if/ you promise not to access or at least not to rely on the internals of your components. One way to make sure of that is to make your type parametric over it. > What if we refactor a single compound patch type to have two levels > of types, or vice versa? Then we should define what semantic equality means for the intermediate type and use that. And we should define structural equality for it, too, and use that to define certain properties that express that the semantic equality is sound. These must be tested separately. We already do this for prim patches vs. repo patches. This is the idea of unit testing, right? To test things at each level, assuming that what we build on is correct. You do the same when you prove a theorem. Nobody proves 1/2 + 1/2 = 1 by referring to the construction of rational numbers in terms of pairs of integers. Not only would that be hilarious, it would also completely defeat the purpose of defining rational numbers as an abstraction in its own right. > In general I'd hope a type class would express properties that are > independent of the specific internal representation of the type. I agree. But structural equality is, like Read and Show, an exception, since it is, by definition, about internal representation. This is why it is not and should not be used inside darcs proper. It only exists because we need it to define properties that other classes must obey.
msg20883 (view)	Author: bfrk	Date: 2019-07-07.09:45:04
> [...StructuralEq2...] only exists > because we need it to define properties that other classes must obey. Note to self: I need to go over the documentation of all patch classes and reformulate the laws stated for them to make it clear what kind of equality we talk about. (And accordingly adapt the properties I added to those modules for documentary purposes.) Until now I have been stating these laws using (==) in an informal way, leaving it to the reader to infer what exactly this should mean.
msg20884 (view)	Author: ganesh	Date: 2019-07-07.10:22:02
On 06/07/2019 21:28, Ben Franksen wrote: > Perhaps I should make this clearer in the documentation: structural > equality should be defined in terms of semantic equality of its > components /if/ you promise not to access or at least not to rely on the > internals of your components. > > One way to make sure of that is to make your type parametric over it. What kinds of properties do you want to express using structural equality? Are they generic ones, that would make sense to use on multiple different container types? If not we don't necessarily need a type class, though it's still good documentation. Either way, I'm wondering if structural equality is really a property of a patch container rather than a patch, i.e. class StructuralEq2 f where eqsFork :: f p wA wB -> f p wA wC -> EqCheck wB wC (...) that would rule out containers with two or more parameters, but would nicely encode the idea you state about being parameteric. >> In general I'd hope a type class would express properties that are >> independent of the specific internal representation of the type. > > I agree. But structural equality is, like Read and Show, an exception, > since it is, by definition, about internal representation. This is why > it is not and should not be used inside darcs proper. It only exists > because we need it to define properties that other classes must obey. I'm actually a bit surprised we never want to use structural equality within darcs itself. My original starting point for thinking we need structural equality was a feeling that we overuse the (semantic) Eq2 (FL p) instance. But perhaps it's right that we should always be working at the semantic level. If this class is just for testing it'd be nice to be flag that somehow, e.g. with the name. Or I wonder about using a nullary type class for that, e.g. class TestCode => StructuralEq2 p where ... and then only define 'instance TestCode' in our test harness. Cheers, Ganesh
msg20890 (view)	Author: bfrk	Date: 2019-07-07.16:20:42
Here is a cleaned up and rebased version of the bundle. All unrelated dependencies are in screened now. I improved the documentation for the Eq2 and StructuralEq2 classes as discussed. This does not yet consider Ganesh's latest remarks, I am still digesting the idea of limiting structural equality to container types i.e. those that take a patch type as parameter. 2 patches for repository http://darcs.net/screened: patch de792b6ffc34f785e7f02463d6ab503fdd465207 Author: Ben Franksen <ben.franksen@online.de> Date: Fri Jun 28 20:25:32 CEST 2019 * automatically specialize merging and equality for sequences of patches with identity The idea is to allow patch sequences to adapt their behavior depending on whether the patch type has an Ident instance or not. This is done for two reasons: (1) equality tests for, and removing elements from sequences of patches is much faster if we can use PatchIds; (2) merging sequences of patches with identity must first remove duplicate patches, otherwise these would be wrongly treated as conflicting. The new class Sequence abstracts over the basic operations we want to specialize for FL and RL: equality and removing elements. Similarly, class Merge gets a new method mergeFLs. By using the new DerivingVia extension we can derive the Sequence instances for patch types with an Ident instance from one template definition for a suitable newtype wrapper named HasIdent. The fastRemoveXXX functions are now gone from the Patch API and only used internally inside D.P.Ident, the ugly IdEq2 class could be fully removed. patch 1362fe7224611f2b5593ce90406551a9b21cf3c6 Author: Ben Franksen <ben.franksen@online.de> Date: Sun Jun 30 21:26:07 CEST 2019 * clean up patch equality We make a clear distinction between semantic and structural equality. The former is captured by Eq2, the latter by the new class StructuralEq2. We use nominal equality as optimization for semantic equality where available. Structural equality is understood to be local; it should not refer to structural equality of its components but rather to their semantic equality. Thus the new definitions for StructuralEq2 are quite similar to those which previously defined Eq2. Some of the equality definitions have been cleaned up and streamlined, for instance the two basic prim patch types now use deriving Eq instead of manually written definitions. There remain no /uses/ of unsafeCompare throughout the code except in RepoPatchV1 and in the default method definitions for Eq2. It still appears in many places to define Eq2, though, since this is quite convenient. Attachments automatically-specialize-merging-and-equality-for-sequences-of-patches-with-identity.dpatch patch-preview.txt unnamed
msg20891 (view)	Author: bfrk	Date: 2019-07-07.17:10:43
>> Perhaps I should make this clearer in the documentation: structural >> equality should be defined in terms of semantic equality of its >> components /if/ you promise not to access or at least not to rely on the >> internals of your components. >> >> One way to make sure of that is to make your type parametric over it. > > What kinds of properties do you want to express using structural > equality? Are they generic ones, that would make sense to use on > multiple different container types? If not we don't necessarily need a > type class, though it's still good documentation. My immediate application (and motivation) are indeed the generic properties defined in src/Darcs/Test/Patch/Properties/Generic.hs. Replacing semantic with structural equality here makes sure the existing tests aren't weakened by optimizing semantic with nominal equality. Adapting the generic properties is included in the latest version of this bundle. > Either way, I'm wondering if structural equality is really a property of > a patch container rather than a patch, i.e. > > class StructuralEq2 f where > eqsFork :: f p wA wB -> f p wA wC -> EqCheck wB wC > (...) > > that would rule out containers with two or more parameters, but would > nicely encode the idea you state about being parameteric. Interesting idea! Though I see two practical problems with that. One is that we cannot make instances for prim patch types. It would be a shame if we could no longer express properties generically for prims and compound patch types. I guess we could work around that limitation by defining some suitable newtype wrapper that derives all instances except StructuralEq2. We'd need a default method signature that says default eqsFork :: Eq2 p => f p wA wB -> f p wA wC -> EqCheck wB wC and then define newtype Id2 p wX wY = Id2 { unId2 :: p wX wY } deriving (...) instance StructuralEq2 Id2 where eqsFork = (=\/=) I guess that doesn't violate the intention: Id2 really is a single element container and here behaves exactly as advertised. The other problem is that using the methods of this class requires to make it explicit that we are dealing with containers. This means lots of changes to type signatures at least in D.T.P.Properties.Generic, but possibly other modules in the harness. I am already doing that in D.T.Patch as a matter of necessity (for my new test case generator) so this is not a serious hurdle. >>> In general I'd hope a type class would express properties that are >>> independent of the specific internal representation of the type. >> >> I agree. But structural equality is, like Read and Show, an exception, >> since it is, by definition, about internal representation. This is why >> it is not and should not be used inside darcs proper. It only exists >> because we need it to define properties that other classes must obey. > > I'm actually a bit surprised we never want to use structural equality > within darcs itself. My original starting point for thinking we need > structural equality was a feeling that we overuse the (semantic) Eq2 (FL > p) instance. But perhaps it's right that we should always be working at > the semantic level. Honestly, I cannot think of a single situation where comparing sequences patch by patch would be what we want, or even where it could be used as a valid optimization. I believe (but haven't checked) that the only code that uses equality of sequences at all is that for RepoPatchV[123] and maybe in some of the Rebase patch types. > If this class is just for testing it'd be nice to be flag that somehow, > e.g. with the name. Or I wonder about using a nullary type class for > that, e.g. > > class TestCode => StructuralEq2 p where > ... > > and then only define 'instance TestCode' in our test harness. Also a nice idea. Yes, why not?
msg23527 (view)	Author: bfrk	Date: 2023-07-10.13:48:18
This has been obsoleted by patch2266

History
Date	User	Action	Args
2019-06-28 18:36:57	bfrk	create
2019-06-28 18:37:45	bfrk	set	messages: + msg20876 title: fix a bug in darcs1->darcs-2 conversion (and 1 more) -> automatically specialize merging and equality for sequences of patches with identity
2019-06-28 21:03:42	bfrk	set	messages: + msg20877
2019-06-29 11:41:57	ganesh	set	messages: + msg20878
2019-06-30 12:07:28	bfrk	set	messages: + msg20879
2019-06-30 19:49:46	bfrk	set	files: + patch-preview.txt, fix-in-d_p_named_wrapped.dpatch, unnamed messages: + msg20880
2019-07-06 09:47:09	ganesh	set	messages: + msg20881
2019-07-06 20:28:07	bfrk	set	messages: + msg20882
2019-07-07 09:45:04	bfrk	set	messages: + msg20883
2019-07-07 10:22:03	ganesh	set	messages: + msg20884
2019-07-07 16:20:43	bfrk	set	files: + patch-preview.txt, automatically-specialize-merging-and-equality-for-sequences-of-patches-with-identity.dpatch, unnamed messages: + msg20890
2019-07-07 17:10:44	bfrk	set	messages: + msg20891
2019-07-08 16:19:29	bfrk	set	files: - fix-a-bug-in-darcs1__darcs_2-conversion.dpatch
2019-07-08 16:19:33	bfrk	set	files: - fix-in-d_p_named_wrapped.dpatch
2019-07-08 16:19:38	bfrk	set	files: - patch-preview.txt
2019-07-08 16:19:40	bfrk	set	files: - patch-preview.txt
2019-07-08 16:19:42	bfrk	set	files: - unnamed
2019-07-08 16:19:45	bfrk	set	files: - unnamed
2019-07-29 16:37:45	bfrk	set	status: needs-screening -> in-discussion
2023-07-10 13:48:19	bfrk	set	status: in-discussion -> obsoleted messages: + msg23527

Patch 1844 automatically specialize merging and equality for sequences of patches with identity

Attachments

Attachments

Attachments