darcs

Patch 1844 automatically specialize merging and equality for sequences of patches with identity

Title automatically specialize merging and equality for sequences of patches with identity
Superseder Nosy List bf
Related Issues
Status needs-screening Assigned To
Milestone

Created on 2019-06-28.18:36:57 by bf, last changed 2019-07-08.16:19:45 by bf.

Files
File name Status Uploaded Type Edit Remove
automatically-specialize-merging-and-equality-for-sequences-of-patches-with-identity.dpatch bf, 2019-07-07.16:20:42 application/x-darcs-patch
patch-preview.txt bf, 2019-07-07.16:20:42 text/x-darcs-patch
unnamed bf, 2019-07-07.16:20:42 text/plain
See mailing list archives for discussion on individual patches.
Messages
msg20875 (view) Author: bf Date: 2019-06-28.18:36:56
Not for screened yet but please take a look.

The dependency on the second patch wasn't intended. If it hurts I can try
and rebase.

Also note that all the unit tests that have a "nontrivial xxx" condition and
are run for FLs fail now with a "Buggy use of unsafeCompare on FL" error. I
don't understand why they didn't do that before this patch. I see no way to
fix these conditions generically. The only solution I can see is to use
different conditions for the FL tests that don't use unsafeCompare.

2 patches for repository http://darcs.net/screened:

patch 6214e01928baab9123f03566f12d15e91d86c3f5
Author: Ben Franksen <ben.franksen@online.de>
Date:   Tue Feb 26 23:29:10 CET 2019
  * fix a bug in darcs1->darcs-2 conversion
  
  The test data for threewayanddep threewayandmultideps were quite obviously
  wrong! The darcs-2 Conflictors are complete bogus, referring to patches that
  don't appear in the repo. This is caused by erroneous calls to
  sortCoalesceFL in the RepoPatchV1 implementation in unravel and in effect.
  The way these functions are normally used (effect when we apply a patch and
  unravel to generate conflict markup) is quite tolerant wrt coalescing.
  However, unravel is also used to convert darcs-1 Mergers to darcs-2
  Conflictors, and here the result is catastrophic. Instead of sortCoalesceFL
  we must merely cancel inverses, just like we do in the darcs-2 and darcs-3
  theory when we construct contexted patches (aka Nons).

patch abc332c0bda4d4ab622fb46319301e33c4a89d2f
Author: Ben Franksen <ben.franksen@online.de>
Date:   Fri Jun 28 20:25:32 CEST 2019
  * automatically specialize merging and equality for sequences of patches with identity
  
  The idea is to allow patch sequences to adapt their behavior depending on
  whether the patch type has an Ident instance or not. This is done for two
  reasons:
  (1) equality tests for, and removing elements from sequences of patches is
      much faster if we can use PatchIds;
  (2) merging sequences of patches with identity must first remove duplicate
      patches, otherwise these would be wrongly treated as conflicting.
  The new class Sequence abstracts over the basic operations we want to
  specialize for FL and RL: equality and removing elements. Similarly, class
  Merge gets a new method mergeFLs.
  By using the new DerivingVia extension we can derive the Sequence instances
  for patch types with an Ident instance from one template definition for a
  suitable newtype wrapper named HasIdent.
  The fastRemoveXXX functions are now gone from the Patch API and only used
  internally inside D.P.Ident, the ugly IdEq2 class could be fully removed.
Attachments
msg20876 (view) Author: bf Date: 2019-06-28.18:37:45
Fixing subject line...
msg20877 (view) Author: bf Date: 2019-06-28.21:03:42
I just realized that semantic equality for patches and patch 
sequences is not sufficient. It would be enough for darcs itself, 
but many /properties/ can only be expressed via structural equality. 
In particular, any property /about/ semantic equality must refer to 
the underlying structural equality or be vacuous. Similarly, the 
usual read/show roundtrip properties should be expressed with 
structural comparison else we risk testing only that the contained 
PatchIds are correctly roundtripped.

The recent introduction of NamedPrim, class Ident, and now class 
Sequence in this bundle, has made it rather obvious that we cannot 
continue to be unclear which sort of equality we mean. Our current 
=\/= and =/\= operators are clearly meant to be semantic, so we need 
new "operators" (functions, methods) to express structural equality.

This means that we either need to add more methods to class Eq2 or 
introcude a new class.

Any ideas?
msg20878 (view) Author: ganesh Date: 2019-06-29.11:41:57
I definitely agree about needing a clear structural versus semantic
difference. Perhaps also nominal, though hopefully it can be merged
with one or both of the other concepts.

The `Eq2 (FL P)` instance has always scared me and your message
about generic instances inspired me to investigate if we can do
without it and instead be more explicit about what we want at call
sites. I'm still trying to figure out why findCommonAndUncommon
apparently requires it to call partitionFL...
msg20879 (view) Author: bf Date: 2019-06-30.12:07:27
> I definitely agree about needing a clear structural versus semantic
> difference. Perhaps also nominal, though hopefully it can be merged
> with one or both of the other concepts.

Hm, nominal equality would be my IdEq2 class, right? Removing that as a
separate concept is what this patch is about, after all, so yes to that.

After experimenting with structural equality, I am now leaning towards
understanding it in a local sense. That is, instances should not be
built by recursively calling structural equality. Instead they should
use only the semantic equality for their ingredients. This is normally
faster (except with the FL instances for patches without nominal
equality). More importantly, it avoids over-specifying properties.

This would not impair effectiveness of property tests: We test all
relevant properties separately for each layer of the patch type
hierarchy. The higher level types are always parameteric in the lower
level patch type, so we can rely on them not to break the properties of
their ingredients.

> The `Eq2 (FL P)` instance has always scared me and your message
> about generic instances inspired me to investigate if we can do
> without it and instead be more explicit about what we want at call
> sites.

When I discovered the instance Eq2 (FL p) I felt not scared but
relieved. I was glad to know that I never have to care about possible
re-orderings when comparing sequences. IMO this is as it should be.

I am also pretty sure that we always want this kind of equality for FLs
and RLs, even for most properties. Because this is the only equality
that matters, and requiring patches to be in the same order would be an
over-specification.

We also have tests that ensure order inside a sequence doesn't matter
("consistent tree flattenings", for which I plan to send an improved
version that is more general, i.e. tries all possible permutations of an
arbitrary sequence of patches).

I am also quite sure that optimizing semantic equality using nominal
equality is safe everywhere. The Eq2 FL instance may be slow for patches
w/o identity but honestly I no longer care much about optimizing the
buggy old patch formats.

> I'm still trying to figure out why findCommonAndUncommon
> apparently requires it to call partitionFL...

I don't think it is required, it's just an implementation detail. In
fact I think findCommonAndUncommon could be refactored to call
findCommonFL after splitting the patchsets at the latest common tag. I
was planning to make this change but had more pressing things to do
(like fixing conflict resolution for V3, and now being side-tracked with
equality refactors).
msg20880 (view) Author: bf Date: 2019-06-30.19:49:45
Here we go. Equality for patches fully cleaned up. Test suite adapted
accordingly. Unfortunatey still with the unrelated dependency.

10 patches for repository http://darcs.net/screened:

patch bf544a1f29a9874e1fd87a8b127c105ff1cbcad7
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sat Jun 29 15:33:45 CEST 2019
  * fix in D.P.Named.Wrapped
  
  Defining Eq2 with DeriveAnyClass means that a call to any of its methods
  will loop. It is better to throw an error in this case.

patch d0ceafb4f82172393861c70aac0bc8a76ecaa7c4
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sun Jun 30 10:58:01 CEST 2019
  * remove unneeded Eq2 instances from Darcs.Patch.Choices

patch 374bb15dd587e89dfb2842a884b227b1f664bdf1
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sun Jun 30 16:23:29 CEST 2019
  * define default methods class Eq more symmetrically

patch 6214e01928baab9123f03566f12d15e91d86c3f5
Author: Ben Franksen <ben.franksen@online.de>
Date:   Tue Feb 26 23:29:10 CET 2019
  * fix a bug in darcs1->darcs-2 conversion
  
  The test data for threewayanddep threewayandmultideps were quite obviously
  wrong! The darcs-2 Conflictors are complete bogus, referring to patches that
  don't appear in the repo. This is caused by erroneous calls to
  sortCoalesceFL in the RepoPatchV1 implementation in unravel and in effect.
  The way these functions are normally used (effect when we apply a patch and
  unravel to generate conflict markup) is quite tolerant wrt coalescing.
  However, unravel is also used to convert darcs-1 Mergers to darcs-2
  Conflictors, and here the result is catastrophic. Instead of sortCoalesceFL
  we must merely cancel inverses, just like we do in the darcs-2 and darcs-3
  theory when we construct contexted patches (aka Nons).

patch e49eac4940f25f09e46b34dbccc3a550bb71e7bb
Author: Ben Franksen <ben.franksen@online.de>
Date:   Fri Jun 28 20:25:32 CEST 2019
  * automatically specialize merging and equality for sequences of patches with identity
  
  The idea is to allow patch sequences to adapt their behavior depending on
  whether the patch type has an Ident instance or not. This is done for two
  reasons:
  (1) equality tests for, and removing elements from sequences of patches is
      much faster if we can use PatchIds;
  (2) merging sequences of patches with identity must first remove duplicate
      patches, otherwise these would be wrongly treated as conflicting.
  The new class Sequence abstracts over the basic operations we want to
  specialize for FL and RL: equality and removing elements. Similarly, class
  Merge gets a new method mergeFLs.
  By using the new DerivingVia extension we can derive the Sequence instances
  for patch types with an Ident instance from one template definition for a
  suitable newtype wrapper named HasIdent.
  The fastRemoveXXX functions are now gone from the Patch API and only used
  internally inside D.P.Ident, the ugly IdEq2 class could be fully removed.

patch cd24cd3cda462108ce3d96d302b6050dd626f4dc
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sun Jun 30 16:21:12 CEST 2019
  * fix documentation in D.P.W.Eq

patch 98401fdc043a962ab32113b9b729b23fbfd672ed
Author: Ben Franksen <ben.franksen@online.de>
Date:   Fri Jun 21 15:30:40 CEST 2019
  * harness: remove unused propFail
  
  While a comment said this is "handy for debugging arbitrary code" I could
  not find an any further explanation or an example. I guess it can be
  re-added easily if needed.

patch b5f82f0bd3b4a6c02d4f5ca39a86b1ee876da43a
Author: Ben Franksen <ben.franksen@online.de>
Date:   Fri Jun 21 21:47:26 CEST 2019
  * harness: move nontrivialX conditions to D.T.P.A.Generic and remove dead code

patch efaca89dba06b22e2394a9a263e22216f84ddc23
Author: Ben Franksen <ben.franksen@online.de>
Date:   Fri Jun 28 11:00:00 CEST 2019
  * remove unused eqFLRev

patch 20cb6a18c141a4fc0139e6fed81263c82915976a
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sun Jun 30 21:26:07 CEST 2019
  * clean up patch equality
  
  We make a clear distinction between semantic and structural equality. The
  former is captured by Eq2, the latter by the new class StructuralEq2. We use
  nominal equality as optimization for semantic equality where available.
  Structural equality is understood to be local; it should not refer to
  structural equality of its components but rather to their semantic equality.
  Thus the new definitions for StructuralEq2 are quite similar to those which
  previously defined Eq2. Some of the equality definitions have been cleaned
  up and streamlined, for instance the two basic prim patch types now use
  deriving Eq instead of manually written definitions.
  There remain no /uses/ of unsafeCompare throughout the code except in
  RepoPatchV1 and in the default method definitions for Eq2. It still appears
  in many places to define Eq2, though, since this is quite convenient.
Attachments
msg20881 (view) Author: ganesh Date: 2019-07-06.09:47:09
On 30/06/2019 20:49, Ben Franksen wrote:

>   Structural equality is understood to be local; it should not refer to
>   structural equality of its components but rather to their semantic equality.

This feels a bit odd to me. For example, how should the instance be
defined for a newtype? What if we refactor a single compound patch type
to have two levels of types, or vice versa? In general I'd hope a type
class would express properties that are independent of the specific
internal representation of the type.
msg20882 (view) Author: bf Date: 2019-07-06.20:28:07
>> Structural equality is understood to be local; it should not refer
>> to structural equality of its components but rather to their 
>> semantic equality
> 
> This feels a bit odd to me.

Let's try to define structural equality for, say, a RepoPatchV3
conflictor. I take that example because it is simpler than V1 and V2 in
that the type is not defined recursively.

So we have as ingredient, among others, an FL of prims for the effect.
Suppose we take structural equality for the effect to define structural
equality for two conflictors. This makes no sense. Such an equality
would be far too strong. The order of the prims in the effect may change
in ways that are hard to control when we commute such conflictors. I
cannot promise that e.g. the permutivity property holds under such an
equality. And neither would we gain anything by specifying that it does.
Even for the read/show roundtrip property requiring it is of
questionable use (though it would probably hold).

Using semantic equality for this component is the only useful definition
here. For the other ingredients I could make a similar argument, and
also for all the other compound patch types.

> For example, how should the instance be defined for a newtype

A very good question. I had to think about this one for a while.

I'd say it depends on whether you need to access the internals of the
base type when you redefine methods. If this is the case, then you
probably want to inherit structural equality from the base type. But
such a newtype is just a convenient way to avoid copying the base type
and its class methods and then changing some definitions. I would argue
that in such cases the base type should not be regarded as a "component".

If, on the other hand, you do not access the internals of the base type
in any way, then I'd say that yes, referring to the semantic equality of
the base type is how I would define it.

Perhaps I should make this clearer in the documentation: structural
equality should be defined in terms of semantic equality of its
components /if/ you promise not to access or at least not to rely on the
internals of your components.

One way to make sure of that is to make your type parametric over it.

> What if we refactor a single compound patch type to have two levels 
> of types, or vice versa?

Then we should define what semantic equality means for the intermediate
type and use that. And we should define structural equality for it, too,
and use that to define certain properties that express that the semantic
equality is sound. These must be tested separately.

We already do this for prim patches vs. repo patches. This is the idea
of unit testing, right? To test things at each level, assuming that what
we build on is correct. You do the same when you prove a theorem. Nobody
proves 1/2 + 1/2 = 1 by referring to the construction of rational
numbers in terms of pairs of integers. Not only would that be hilarious,
it would also completely defeat the purpose of defining rational numbers
as an abstraction in its own right.

> In general I'd hope a type class would express properties that are 
> independent of the specific internal representation of the type.

I agree. But structural equality is, like Read and Show, an exception,
since it is, by definition, about internal representation. This is why
it is not and should not be used inside darcs proper. It only exists
because we need it to define properties that other classes must obey.
msg20883 (view) Author: bf Date: 2019-07-07.09:45:04
> [...StructuralEq2...] only exists
> because we need it to define properties that other classes must obey.

Note to self: I need to go over the documentation of all patch classes
and reformulate the laws stated for them to make it clear what kind of
equality we talk about. (And accordingly adapt the properties I added to
those modules for documentary purposes.) Until now I have been stating
these laws using (==) in an informal way, leaving it to the reader to
infer what exactly this should mean.
msg20884 (view) Author: ganesh Date: 2019-07-07.10:22:02
On 06/07/2019 21:28, Ben Franksen wrote:

> Perhaps I should make this clearer in the documentation: structural
> equality should be defined in terms of semantic equality of its
> components /if/ you promise not to access or at least not to rely on the
> internals of your components.
> 
> One way to make sure of that is to make your type parametric over it.

What kinds of properties do you want to express using structural
equality? Are they generic ones, that would make sense to use on
multiple different container types? If not we don't necessarily need a
type class, though it's still good documentation.

Either way, I'm wondering if structural equality is really a property of
a patch container rather than a patch, i.e.

 class StructuralEq2 f where
   eqsFork :: f p wA wB -> f p wA wC -> EqCheck wB wC
   (...)

that would rule out containers with two or more parameters, but would
nicely encode the idea you state about being parameteric.

>> In general I'd hope a type class would express properties that are 
>> independent of the specific internal representation of the type.
> 
> I agree. But structural equality is, like Read and Show, an exception,
> since it is, by definition, about internal representation. This is why
> it is not and should not be used inside darcs proper. It only exists
> because we need it to define properties that other classes must obey.

I'm actually a bit surprised we never want to use structural equality
within darcs itself. My original starting point for thinking we need
structural equality was a feeling that we overuse the (semantic) Eq2 (FL
p) instance. But perhaps it's right that we should always be working at
the semantic level.

If this class is just for testing it'd be nice to be flag that somehow,
e.g. with the name. Or I wonder about using a nullary type class for
that, e.g.

  class TestCode => StructuralEq2 p where
    ...

and then only define 'instance TestCode' in our test harness.

Cheers,

Ganesh
msg20890 (view) Author: bf Date: 2019-07-07.16:20:42
Here is a cleaned up and rebased version of the bundle. All unrelated
dependencies are in screened now. I improved the documentation for the Eq2
and StructuralEq2 classes as discussed. This does not yet consider Ganesh's
latest remarks, I am still digesting the idea of limiting structural
equality to container types i.e. those that take a patch type as parameter.

2 patches for repository http://darcs.net/screened:

patch de792b6ffc34f785e7f02463d6ab503fdd465207
Author: Ben Franksen <ben.franksen@online.de>
Date:   Fri Jun 28 20:25:32 CEST 2019
  * automatically specialize merging and equality for sequences of patches with identity
  
  The idea is to allow patch sequences to adapt their behavior depending on
  whether the patch type has an Ident instance or not. This is done for two
  reasons:
  (1) equality tests for, and removing elements from sequences of patches is
      much faster if we can use PatchIds;
  (2) merging sequences of patches with identity must first remove duplicate
      patches, otherwise these would be wrongly treated as conflicting.
  The new class Sequence abstracts over the basic operations we want to
  specialize for FL and RL: equality and removing elements. Similarly, class
  Merge gets a new method mergeFLs.
  By using the new DerivingVia extension we can derive the Sequence instances
  for patch types with an Ident instance from one template definition for a
  suitable newtype wrapper named HasIdent.
  The fastRemoveXXX functions are now gone from the Patch API and only used
  internally inside D.P.Ident, the ugly IdEq2 class could be fully removed.

patch 1362fe7224611f2b5593ce90406551a9b21cf3c6
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sun Jun 30 21:26:07 CEST 2019
  * clean up patch equality
  
  We make a clear distinction between semantic and structural equality. The
  former is captured by Eq2, the latter by the new class StructuralEq2. We use
  nominal equality as optimization for semantic equality where available.
  Structural equality is understood to be local; it should not refer to
  structural equality of its components but rather to their semantic equality.
  Thus the new definitions for StructuralEq2 are quite similar to those which
  previously defined Eq2. Some of the equality definitions have been cleaned
  up and streamlined, for instance the two basic prim patch types now use
  deriving Eq instead of manually written definitions.
  There remain no /uses/ of unsafeCompare throughout the code except in
  RepoPatchV1 and in the default method definitions for Eq2. It still appears
  in many places to define Eq2, though, since this is quite convenient.
Attachments
msg20891 (view) Author: bf Date: 2019-07-07.17:10:43
>> Perhaps I should make this clearer in the documentation: structural
>> equality should be defined in terms of semantic equality of its
>> components /if/ you promise not to access or at least not to rely on the
>> internals of your components.
>>
>> One way to make sure of that is to make your type parametric over it.
> 
> What kinds of properties do you want to express using structural
> equality? Are they generic ones, that would make sense to use on
> multiple different container types? If not we don't necessarily need a
> type class, though it's still good documentation.

My immediate application (and motivation) are indeed the generic
properties defined in src/Darcs/Test/Patch/Properties/Generic.hs.
Replacing semantic with structural equality here makes sure the existing
tests aren't weakened by optimizing semantic with nominal equality.
Adapting the generic properties is included in the latest version of
this bundle.

> Either way, I'm wondering if structural equality is really a property of
> a patch container rather than a patch, i.e.
> 
>  class StructuralEq2 f where
>    eqsFork :: f p wA wB -> f p wA wC -> EqCheck wB wC
>    (...)
> 
> that would rule out containers with two or more parameters, but would
> nicely encode the idea you state about being parameteric.

Interesting idea! Though I see two practical problems with that.

One is that we cannot make instances for prim patch types. It would be a
shame if we could no longer express properties generically for prims and
compound patch types. I guess we could work around that limitation by
defining some suitable newtype wrapper that derives all instances except
StructuralEq2. We'd need a default method signature that says

default eqsFork :: Eq2 p => f p wA wB -> f p wA wC -> EqCheck wB wC

and then define

newtype Id2 p wX wY = Id2 { unId2 :: p wX wY } deriving (...)

instance StructuralEq2 Id2 where
  eqsFork = (=\/=)

I guess that doesn't violate the intention: Id2 really is a single
element container and here behaves exactly as advertised.

The other problem is that using the methods of this class requires to
make it explicit that we are dealing with containers. This means lots of
changes to type signatures at least in D.T.P.Properties.Generic, but
possibly other modules in the harness. I am already doing that in
D.T.Patch as a matter of necessity (for my new test case generator) so
this is not a serious hurdle.

>>> In general I'd hope a type class would express properties that are 
>>> independent of the specific internal representation of the type.
>>
>> I agree. But structural equality is, like Read and Show, an exception,
>> since it is, by definition, about internal representation. This is why
>> it is not and should not be used inside darcs proper. It only exists
>> because we need it to define properties that other classes must obey.
> 
> I'm actually a bit surprised we never want to use structural equality
> within darcs itself. My original starting point for thinking we need
> structural equality was a feeling that we overuse the (semantic) Eq2 (FL
> p) instance. But perhaps it's right that we should always be working at
> the semantic level.

Honestly, I cannot think of a single situation where comparing sequences
patch by patch would be what we want, or even where it could be used as
a valid optimization. I believe (but haven't checked) that the only code
that uses equality of sequences at all is that for RepoPatchV[123] and
maybe in some of the Rebase patch types.

> If this class is just for testing it'd be nice to be flag that somehow,
> e.g. with the name. Or I wonder about using a nullary type class for
> that, e.g.
> 
>   class TestCode => StructuralEq2 p where
>     ...
> 
> and then only define 'instance TestCode' in our test harness.

Also a nice idea. Yes, why not?
History
Date User Action Args
2019-06-28 18:36:57bfcreate
2019-06-28 18:37:45bfsetmessages: + msg20876
title: fix a bug in darcs1->darcs-2 conversion (and 1 more) -> automatically specialize merging and equality for sequences of patches with identity
2019-06-28 21:03:42bfsetmessages: + msg20877
2019-06-29 11:41:57ganeshsetmessages: + msg20878
2019-06-30 12:07:28bfsetmessages: + msg20879
2019-06-30 19:49:46bfsetfiles: + patch-preview.txt, fix-in-d_p_named_wrapped.dpatch, unnamed
messages: + msg20880
2019-07-06 09:47:09ganeshsetmessages: + msg20881
2019-07-06 20:28:07bfsetmessages: + msg20882
2019-07-07 09:45:04bfsetmessages: + msg20883
2019-07-07 10:22:03ganeshsetmessages: + msg20884
2019-07-07 16:20:43bfsetfiles: + patch-preview.txt, automatically-specialize-merging-and-equality-for-sequences-of-patches-with-identity.dpatch, unnamed
messages: + msg20890
2019-07-07 17:10:44bfsetmessages: + msg20891
2019-07-08 16:19:29bfsetfiles: - fix-a-bug-in-darcs1__darcs_2-conversion.dpatch
2019-07-08 16:19:33bfsetfiles: - fix-in-d_p_named_wrapped.dpatch
2019-07-08 16:19:38bfsetfiles: - patch-preview.txt
2019-07-08 16:19:40bfsetfiles: - patch-preview.txt
2019-07-08 16:19:42bfsetfiles: - unnamed
2019-07-08 16:19:45bfsetfiles: - unnamed