darcs

Patch 2166 cleaning up Arbitrary instances

Title cleaning up Arbitrary instances
Superseder Nosy List ganesh
Related Issues
Status accepted Assigned To
Milestone

Created on 2021-05-02.08:07:47 by ganesh, last changed 2021-06-04.18:02:33 by ganesh.

Files
File name Status Uploaded Type Edit Remove
introduce-arbitrarys2.dpatch ganesh, 2021-05-02.08:07:44 application/x-darcs-patch
patch-preview.txt ganesh, 2021-05-02.08:07:44 text/x-darcs-patch
unnamed ganesh, 2021-05-02.08:07:44 text/plain
See mailing list archives for discussion on individual patches.
Messages
msg22738 (view) Author: ganesh Date: 2021-05-02.08:07:44
This sequence of small refactorings to the Arbitrary
instances simplifies the existing mess of overlapping
instances significantly in exchange for introducing
various wrapper classes (ArbitraryS2, ArbitraryWS) and
types (V1Gen, Pair, Triple).

It also allows removing a bunch of boilerplate around
instances for Prim and NamedPrim types.

It's a lot of small patches as that's how I found my
way to the changes. If it was merged into one a little
bit of intermediate code would go (e.g. ArbitraryWSPair)
but I think the sequence of changes may be easier to
check for correctness anyway.

This bundle depends on patch2165 but can be rebased to
be independent if needed.

I'll wait a day or two before screening to allow for
any objections to the approach etc.

17 patches for repository darcs-unstable@darcs.net:screened:

patch cd5204ee247c52e17f969b9d3c19674a9297e82a
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:39:19 BST 2021
  * introduce ArbitraryS2

patch 4b781cfb890a9ef9e5b1c2e03bca45fba825fe4d
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:39:19 BST 2021
  * introduce ArbitraryWS

patch 197fac5e5a080499438016a09228be7b315245fb
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:39:20 BST 2021
  * get rid of makeS2Gen

patch 247b006d2bc68af6dc09cfd4cd06f175facfc98d
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:39:20 BST 2021
  * get rid of a bunch of ArbitraryS2 instances

patch 79397ba83503ea5f103761de426bc9153837861a
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:39:21 BST 2021
  * get rid of arbitraryTriple

patch 63bf4d631c74a9b0c8d28a23d31293c0fa816855
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:39:21 BST 2021
  * replace ArbitraryS2 instance with ArbitraryWS

patch 3495585d442d984df17e577921cfa1b1c4d5f6bf
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:39:21 BST 2021
  * introduce Triple to reduce the random instance overlaps

patch 40d31c5b1ab321b4c51707ea2bb8dad62a4d682c
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:39:22 BST 2021
  * introduce Pair

patch 00d368057a36e4af498bf1c763f67ab0c7d8b678
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:39:22 BST 2021
  * introduce ArbitraryWSPair

patch 06db29332f1b673eb5e42b42033cade0587d50cb
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:39:23 BST 2021
  * get rid of some Arbitrary instances

patch 0e7ade582c107af13fbfbe076197da91088fe2e9
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:36:03 BST 2021
  * stop testing the internals of Prim.commute
  
  The current structure of commute isn't very clear and
  should be refactored, but these tests inhibit that.

patch c182ef6aa3ed946a8af4c8c9bdad17da76bd6c3f
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:39:24 BST 2021
  * introduce V1Gen wrapper to replace some ArbitraryS2 instances

patch ed909e76d4da0d36d07b5623d2e3c9fd3a487462
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:39:25 BST 2021
  * restructure ArbitraryWSPair
  
  This allows us to share some boilerplate.

patch de1c5074963501612a32395d7e2cfcbd39f15c07
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:39:26 BST 2021
  * generalise the NamedPrim instances for ArbitraryWS[Pair]

patch d36b77b11e48030e85ce4dbef83d00e3e5ffbf9f
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:39:27 BST 2021
  * merge ArbitraryWSPair into ArbitraryState

patch b1889ec148d5915ac1ab70651deb3e53b83f1ac1
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:39:27 BST 2021
  * change aPrim/aPrimPair to return Sealed

patch e1962b1eb882eb963741742dc17aae4e4164feb0
Author: Ganesh Sittampalam <ganesh@earth.li>
Date:   Sun May  2 08:39:28 BST 2021
  * inline the ArbitraryState instance for Pair
  
  This should remove any risk of a future refactoring making
  it call arbitraryStatePair, which would cause an infinite
  loop.
Attachments
msg22748 (view) Author: bfrk Date: 2021-05-04.20:37:53
> This sequence of small refactorings to the Arbitrary
> instances simplifies the existing mess of overlapping
> instances significantly in exchange for introducing
> various wrapper classes (ArbitraryS2, ArbitraryWS) and
> types (V1Gen, Pair, Triple).

I don't quite understand the need for these wrappers. Could you explain?
Perhaps using the simplest one, the Pair type, as example?

  * generalise the NamedPrim instances for ArbitraryWS[Pair]

Does this one (which gets rid of a lot of code duplication, so much
appreciated) actually depend on the previous refactors? If yes, why is
it possible now but wasn't earlier?

I am trying to understand the big picture...
msg22749 (view) Author: ganesh Date: 2021-05-04.21:30:16
On 04/05/2021 21:37, Ben Franksen wrote:

> I don't quite understand the need for these wrappers. Could you explain?
> Perhaps using the simplest one, the Pair type, as example?

One of the problems with the existing code is that there are a lot of
instances with quite specific types. For example there's this comment in
Darcs.Test.Patch.Arbitrary.RepoPatchV1:

> Until this is cleaned up, we take some care that the Arbitrary instances
> do not overlap and are only used for tests from the respective
> modules.

which refers to various very specialised instances like:

> instance Arbitrary (Sealed2 (Prim :> Prim)) where

whereas in D.T.P.A.PrimV1 we have:

> instance Arbitrary (Sealed2 (Prim1 :> Prim1)) where
>   arbitrary = mapSeal2 wsPatch <$> arbitraryPair

which gets used for a completely different set of tests.

It would be much better if these instances were all built in a
compositional way, with each type constructor like Sealed2, :>, etc,
just having a single instance that builds on the instances for its
parameters. That way it's easy to look at a type and work backwards to
see how its instance will behave, and also to add new instances for new
things without fear of overlap.

Essentially I'm trying to untangle the mess bit by bit. V1Gen is maybe
the simplest example of this as it's very local to RepoPatchV1.
Introducing that means that instead of carefully crafting instances like
the above, we can make the specific tests in D.T.P.RepoPatchV1 that want
those instances always work via the V1Gen wrapper. I found the specific
tests by changing the instances to use V1Gen and then working through
the compiler errors: but now that it's done, we can be confident of
adding instances that don't mention V1Gen without affecting those tests.

Similarly the introduction of classes like ArbitraryS2 means that
instead of having instances like Arbitrary (Sealed2 Prim), Arbitrary
(Sealed2 (Prim :> Prim)) etc, we can have a single instance ArbitraryS2
p => Arbitrary (Sealed2 p). With just that change, we still have
ArbitraryS2 Prim, ArbitraryS2 (Prim :> Prim) but then further
refactorings along similar lines can make those better too.

Instances for things like (Prim :> Prim) are problematic because they
often rely on having the same type on both sides of the :> to do
something special, but overlap with the more generic (p :> q).
Introducing the 'Pair' type means that the two can be clearly
disambiguated at the point the test is run.

>   * generalise the NamedPrim instances for ArbitraryWS[Pair]
> 
> Does this one (which gets rid of a lot of code duplication, so much
> appreciated) actually depend on the previous refactors? If yes, why is
> it possible now but wasn't earlier?

There are a few different pieces of duplication getting removed from
D.T.P.A.NamedPrimV1 etc, not all of them in this specific patch.
Probably the trickiest bit is this pattern:

[NamedPrimV1]
> instance Arbitrary (Sealed2 (Prim :> Prim)) where
>   arbitrary = mapSeal2 wsPatch <$> arbitraryPair

and again in NamedPrimFileUUID.

Then the underlying types have these instances:

[PrimV1]
> instance Arbitrary (Sealed2 (Prim1 :> Prim1)) where
>   arbitrary = mapSeal2 wsPatch <$> arbitraryPair

> instance Arbitrary (Sealed2 (Prim2 :> Prim2)) where
>   arbitrary = mapSeal2 wsPatch <$> arbitraryPair

and again in PrimFileUUID.

Each of these refers to a copy and pasted 'arbitraryPair', which in turn
referred to different 'aPrimPair's. The 'aPrimPair' for NamedPrimV1 and
NamedFileUUID delegate to the ones for PrimV1 and PrimFileUUID
respectively. By moving to a compositional instance structure, the code
for all the NamedPrims is now:

[NamedPrim]
> instance ArbitraryState prim => ArbitraryState (NamedPrim prim) where
>   [..]
>   arbitraryStatePair = ... use arbitraryStatePair from prim ...

and the instance works for NamedPrimFileUUID and NamedPrimV1.

Now that I know it's possible, I probably could have achieved just this
specific goal by a different and shorter route.

> I am trying to understand the big picture...

I guess there isn't a clear big picture that describes my end goals.
What happened was that I started exploring cleanups by making small and
(hopefully) obviously correct refactorings that tried to reduce the mess
of instances. In that process, getting rid of boilerplate just came out
naturally.

I think that even apart from the boilerplate removal, the simpler
instances are a net win because they make the code more modular and make
further cleanups easier. For example another outcome of my patches is
that I've cut down 4 different Arbitrary instances for each of Prim1 and
Prim2 to a single instance of ArbitraryWS along with a new member in
ArbitraryState.
msg22751 (view) Author: bfrk Date: 2021-05-05.10:32:34
>> instance Arbitrary (Sealed2 (Prim :> Prim)) where
> 
> whereas in D.T.P.A.PrimV1 we have:
> 
>> instance Arbitrary (Sealed2 (Prim1 :> Prim1)) where
>>   arbitrary = mapSeal2 wsPatch <$> arbitraryPair
> 
> which gets used for a completely different set of tests.
> 
> It would be much better if these instances were all built in a
> compositional way, with each type constructor like Sealed2, :>, etc,
> just having a single instance that builds on the instances for its
> parameters. That way it's easy to look at a type and work backwards to
> see how its instance will behave, and also to add new instances for new
> things without fear of overlap.

But that is obviously not possible if the pair instances are specialized
for different patch types. I understand that this specialization comes
from the desire to control the distribution of patch pairs. This is why
we have aPrimPair etc. So your Pair type is used to distinguish these
specially generated pairs from generically generated pairs which use a
threaded repo state. Correct?

> Essentially I'm trying to untangle the mess bit by bit. V1Gen is maybe
> the simplest example of this as it's very local to RepoPatchV1.
> Introducing that means that instead of carefully crafting instances like
> the above, we can make the specific tests in D.T.P.RepoPatchV1 that want
> those instances always work via the V1Gen wrapper. I found the specific
> tests by changing the instances to use V1Gen and then working through
> the compiler errors: but now that it's done, we can be confident of
> adding instances that don't mention V1Gen without affecting those tests.

Okay. So we have a better isolation of the messyness now. This is
progress, sure, but I think we can do better. See below.

> Similarly the introduction of classes like ArbitraryS2 means that
> instead of having instances like Arbitrary (Sealed2 Prim), Arbitrary
> (Sealed2 (Prim :> Prim)) etc, we can have a single instance ArbitraryS2
> p => Arbitrary (Sealed2 p). With just that change, we still have
> ArbitraryS2 Prim, ArbitraryS2 (Prim :> Prim) but then further
> refactorings along similar lines can make those better too.

Okay, makes sense.

>> I am trying to understand the big picture...
> 
> I guess there isn't a clear big picture that describes my end goals.

The reason I want to talk about the big picture is that I think a lot of
the complications are just technical debt. Take the specialized
generators for RepoPatchV1: do they add any value? I very much doubt it.
My best guess is that when RepoPatchV2 was added to the test suite they
were left in there more or less unchanged due to a mixture of fear (what
if it breaks things?) and laziness (its just ugly test code anyway, why
bother?).

This is nothing you can fix with pure refactors since of course there
are semantic differences. I think most of them are irrelevant. The only
relevant difference I am aware of is that between MergeableSequence and
PatchTree: for RepoPatchV1 we cannot use the former and must the latter
because otherwise many properties fail. So we need to find a way to test
RepoPatchV1 with generators that merely swap MergeableSequence with
PatchTree but otherwise use the same properties and instances as for the
other RepoPatch types.

None of that means your patches are inappropriate. Go ahead and screen
them. But I would like to enourage you to make much more radical cuts,
especially with regard to the legacy RepoPatchV1 generators. Your
refactors should make this easier now.
msg22753 (view) Author: ganesh Date: 2021-05-05.19:22:09
On 05/05/2021 11:32, Ben Franksen wrote:

>> It would be much better if these instances were all built in a
>> compositional way, with each type constructor like Sealed2, :>, etc,
>> just having a single instance that builds on the instances for its
>> parameters. That way it's easy to look at a type and work backwards to
>> see how its instance will behave, and also to add new instances for new
>> things without fear of overlap.
> 
> But that is obviously not possible if the pair instances are specialized
> for different patch types. I understand that this specialization comes
> from the desire to control the distribution of patch pairs. This is why
> we have aPrimPair etc. So your Pair type is used to distinguish these
> specially generated pairs from generically generated pairs which use a
> threaded repo state. Correct?

Yep.

> The reason I want to talk about the big picture is that I think a lot of
> the complications are just technical debt. Take the specialized
> generators for RepoPatchV1: do they add any value? I very much doubt it.
> My best guess is that when RepoPatchV2 was added to the test suite they
> were left in there more or less unchanged due to a mixture of fear (what
> if it breaks things?) and laziness (its just ugly test code anyway, why
> bother?).

I agree that many of the complications are just moving some technical
debt around. On the other hand it's not always immediately obvious on
the face of it which ones are actually things we want long-term and
which ones we don't. As you say, the V1Gen stuff is almost certainly
something we don't want, but Pair probably does help with signalling we
want a different generator.

> None of that means your patches are inappropriate. Go ahead and screen
> them. But I would like to enourage you to make much more radical cuts,
> especially with regard to the legacy RepoPatchV1 generators. Your
> refactors should make this easier now.

Right now it's this lot or nothing, but maybe I'll come back to it some
other time :-) If I try to bite off too much at once I just never
finish/submit it, like the test refactorings.
msg22828 (view) Author: bfrk Date: 2021-06-03.11:56:22
All accepted, details below.

>   * introduce ArbitraryS2

OK

>   * introduce ArbitraryWS

OK

>   * get rid of makeS2Gen

OK

>   * get rid of a bunch of ArbitraryS2 instances

Ok. Adding arbitraryWSThing is clever.

>   * get rid of arbitraryTriple

OK

>   * replace ArbitraryS2 instance with ArbitraryWS

OK

>   * introduce Triple to reduce the random instance overlaps

OK, though it's not yet clear at this point if this addition amortizes.
Also, you previously removed arbitraryTriple by generalizing the
instance ArbitraryState (p :> p) to ArbitraryState (p :> q) which looks
like a move in the opposite direction.

>   * introduce Pair

OK. Same remark as above applies here. BTW, it is not easy to guess
where the Pair type is defined (it's in D.T.P.WithState). A slightly
longer comment explaining the where and how would have made sense here.

>   * introduce ArbitraryWSPair

OK

>   * get rid of some Arbitrary instances

OK

>   * stop testing the internals of Prim.commute

See (accepted) patch2165.

>   * introduce V1Gen wrapper to replace some ArbitraryS2 instances

This is followed up on in patch2167, OK for now.

>   * restructure ArbitraryWSPair

OK

>   * generalise the NamedPrim instances for ArbitraryWS[Pair]

OK

>   * merge ArbitraryWSPair into ArbitraryState

OK (good idea!)

>   * change aPrim/aPrimPair to return Sealed

OK

>   * inline the ArbitraryState instance for Pair

OK
msg22830 (view) Author: ganesh Date: 2021-06-03.12:33:20
>>   * introduce Triple to reduce the random instance overlaps
> 
> OK, though it's not yet clear at this point if this addition amortizes.
> Also, you previously removed arbitraryTriple by generalizing the
> instance ArbitraryState (p :> p) to ArbitraryState (p :> q) which looks
> like a move in the opposite direction.

The patch that removed arbitraryTriple ("get rid of arbitraryTriple")
replaced both arbitraryTriple and instance ArbitraryState (p :> p) with
instance ArbitraryState (p :> q). arbitraryTriple had a comment above it:

> {- the type instance overlaps!
> type instance ModelOf (p :> p :> p) = ModelOf p
> 
> instance ArbitraryState p => ArbitraryState (p :> p :> p) where
> -}

My perspective is that the generic (p :> q) instance is simpler and more
compositional than having multiple instances for different sets of
arguments to (:>).

This particular patch introduced Triple to replace instance ArbitraryWS
(p :> p :> p), at the cost of some extra boilerplate. Again my goal was
to get to simpler more compositional instances, but I agree the
incremental benefit is less obvious here.

The combined effect is that there is just one instance for (:>) and one
instance for Triple. Overall the direction is to move the burden of
choosing how generation should work from a mess of overlapping instances
into something more type-driven at the sites where tests are defined.

> BTW, it is not easy to guess
> where the Pair type is defined (it's in D.T.P.WithState). A slightly
> longer comment explaining the where and how would have made sense here.

Agreed. When I first wrote these patches I was in exploratory mode, I
should probably have gone back and rebased them with better explanations
instead of submitting as-is.
msg22835 (view) Author: bfrk Date: 2021-06-03.14:07:44
Thanks for the explanations. I'm absolutely d'accord with the general
direction.

Still, the fact that you could generalize the instance for pairs so that
it works for triples, too, and thus could remove arbitraryTriple,
/suggests/ that perhaps we can similarly do away with all the triple
instances. Is there a simple explanation why this is not possible?
msg22840 (view) Author: ganesh Date: 2021-06-03.22:48:55
> Still, the fact that you could generalize the instance for pairs so that
> it works for triples, too, and thus could remove arbitraryTriple,
> /suggests/ that perhaps we can similarly do away with all the triple
> instances. Is there a simple explanation why this is not possible?

I think there are a few different things going in different places.

In the first patch ("get rid of arbitraryTriple") the code pattern for
pairs and triples in that particular place was pretty regular and didn't
depend on the fact that the patch types were all the same. So really it
was just about generalising that code and hence reducing repetition.

That isn't always the case. Most importantly, Pairs are a bit special:
when we get to the ArbitraryState class, the arbitraryStatePair member
is needed because we want to specialise generation for (p :> p)
specifically, ultimately to call 'aPrimPair' for each of the different
underlying primitive types.

That specialisation in turn leads to an ArbitraryWS instance for Pair
that is different from all the other ArbitraryWS instances:

> instance ArbitraryModel p => ArbitraryWS (Triple p) where
>   arbitraryWS = makeWS2Gen aSmallRepo

(and the same code for most other types)

> instance (RepoModel (ModelOf p), ArbitraryState p) =>
>                                    ArbitraryWS (Pair p) where
>   arbitraryWS = arbitraryWSPair

So for this code, we can't trivially replace the handling of pairs and
triples with something more compositional, because we actually want
different behaviour.

That said I'm sure there's still room for simplification. I'm not really
happy with the existence of "glue" classes like ArbitraryWS and similar
the Pair/Triple types.

But it was relatively easy to get to this point by some step-by-step
refactorings and I hope that it helps us spot more cleanup opportunities
in future.

One thought I just had is that maybe the 'aPrimPair' concept itself
could be generalised, because we'd probably like something similar for
triples too. Maybe we could propagate forward properties of the
left-hand side of :> to influence generation of the right-hand side.
msg22841 (view) Author: bfrk Date: 2021-06-04.07:19:04
Right, so the specialization for prim pairs means we can't generalize
such an implementation of pairs to triples. Indeed, aPrimPair can only
work if the types coincide.

On the other hand we have a generic implementation, where we generate
both members of the pair independently, and such an implementation can
always be generalized to two different types (provided they have the
same state type i.e. ModelOf). For instance we use that when generating
FLs of patches.
This suggests that we may want to use the Pair type *only* when we
expect a specialized implementation, and perhaps similarly with Triple.
Whereas for the cases where we chose elements independently, a single
generic instance for (:>) with (possibly) different type variables for
lhs and rhs should suffice.
msg22843 (view) Author: ganesh Date: 2021-06-04.18:02:33
Follow up discussion moved to patch2184.
History
Date User Action Args
2021-05-02 08:07:47ganeshcreate
2021-05-04 20:37:54bfrksetmessages: + msg22748
2021-05-04 21:30:18ganeshsetmessages: + msg22749
2021-05-05 10:32:37bfrksetmessages: + msg22751
2021-05-05 19:22:10ganeshsetmessages: + msg22753
2021-05-06 19:51:54ganeshsetstatus: needs-screening -> needs-review
2021-06-03 11:56:24bfrksetmessages: + msg22828
2021-06-03 11:57:00bfrksetstatus: needs-review -> accepted
2021-06-03 12:33:21ganeshsetmessages: + msg22830
2021-06-03 14:07:45bfrksetmessages: + msg22835
2021-06-03 22:48:58ganeshsetmessages: + msg22840
2021-06-04 07:19:05bfrksetmessages: + msg22841
2021-06-04 18:02:33ganeshsetmessages: + msg22843