darcs

Patch 1885 cleanup and simplify Darcs.Patch.Prim.V1... (and 10 more)

Title cleanup and simplify Darcs.Patch.Prim.V1... (and 10 more)
Superseder Nosy List bf
Related Issues
Status in-discussion Assigned To
Milestone

Created on 2019-08-19.21:39:59 by bf, last changed 2019-09-21.10:30:42 by bf.

Files
File name Status Uploaded Type Edit Remove
cleanup-and-simplify-darcs_patch_prim_v1_coalesce_mapprimfl.dpatch bf, 2019-08-19.21:39:58 application/x-darcs-patch
patch-preview.txt bf, 2019-08-19.21:39:58 text/x-darcs-patch
unnamed bf, 2019-08-19.21:39:58 text/plain
See mailing list archives for discussion on individual patches.
Messages
msg21143 (view) Author: bf Date: 2019-08-19.21:39:58
These patches all concern the PrimSift, PrimClassify, and PrimCanonize
classes and their Prim.V1 implementation. Most are only refactors without
changing functionality but one or two also chage behavior. This is why I am
not screening them for now.

11 patches for repository http://darcs.net/screened:

patch ea9c110154b5453a642ceef1c2f7c2ee372a61b9
Author: Ben Franksen <ben.franksen@online.de>
Date:   Fri Aug  2 22:36:43 CEST 2019
  * cleanup and simplify Darcs.Patch.Prim.V1.Coalesce.mapPrimFL
  
  This eliminates the data type Simple. It is easier and more efficient (wrt
  memory as well as cpu) to put the original patches into the Map, rather than
  take them apart and afterwards put them together again.

patch 818d56d27f76c710a940819c1c7e16e206bfd714
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sat Aug  3 01:36:23 CEST 2019
  * use (Bool,a) instead of Either a a in pushCoalescePatch
  
  This is less verbose and more directly captures our intent.

patch 25075b59cf2e6da61af42fd01ed0404e88e20314
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Aug  1 17:45:08 CEST 2019
  * eliminate class PrimClassify
  
  There were only three use cases left:
  
  (1) The whatsnew command used primIsHunk, which can be easily replaced with
      isJust . isHunk.
  (2) The implementation of RepoPatchV1 used is_filepatch for speedyCommute.
      This is now replaced with calls to listTouchedFiles. To account for
      directory changes, we now check that neither file path is a prefix of
      the other.
  (3) In the instance PrimSift for Prim.V1. Since this is specific to Prim.V1,
      we implement the necessary functions directly.

patch 68e102bdf2dda50041507df1f20dd75403a53553
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Aug  1 17:45:08 CEST 2019
  * move instance PrimSift for Prim.V1 to its own module

patch 45acbf41ed9714dd8cfe01423daf29640b14f02a
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Aug  1 19:19:30 CEST 2019
  * drastically simplify implementation of siftForPending for Prim.V1
  
  All the low-level optimisations have been eliminated. What remains is quite
  simple and easy to understand. Theoretically this change may cause
  performance regressions; but the optimizations only covered some simple
  cases like "pending consists only of hunks, binaries, and setpref" or
  "pending consists only of addfile and adddir patches". These special cases
  are handled just fine and without much overhead by the standard algorithm.
  Besides, the bottleneck in handling the pending patch is not sifting, but
  the complicated algorithm for eliminating recorded changes from pending.

patch 2516f1873b74d50c1bcff49e96a62b08ff201007
Author: Ben Franksen <ben.franksen@online.de>
Date:   Mon Aug  5 10:23:30 CEST 2019
  * move canonizeFL out of class PrimCanonize
  
  The definition of this function is independent of Prim.V1, it only uses
  methods from PrimCanonize.

patch b6545d980d7865d5a83dee944cb3b99b5f8ce2f5
Author: Ben Franksen <ben.franksen@online.de>
Date:   Mon Aug  5 10:47:56 CEST 2019
  * define canonize outside of instance PrimCanonize

patch 4f8d783290555868872896d68488ed29458bd0f0
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sat Aug  3 01:38:01 CEST 2019
  * refactor and extend tryToShrink and sortCoalesceFL
  
  This is a complicated patch that slightly changes the semantics of both
  functions. If one looks very carefully at the previous implementation of
  tryToShrink, it becomes clear that it is very similar to sortCoalesceFL. The
  difference is that tryToShrink doesn't bother to preserve the order
  established by its call to sortCoalesceFL. Whereas sortCoalesceFL takes care
  not to destroy the order it has already established. Both functions
  internally track whether shrinking has made progress, but hide that from the
  API.
  
  The new methods now both share the same core: sortCoalesceFL2. Similar to
  the old tryHarderToShrink, this function now tries harder to shrink the
  sequence than sortCoalesceFL2 did before. However, if this destroys the
  order, it takes care that it gets restored. It is thus able to find as many
  opportunities to shrink the sequence, while still maintaining the ordering.
  
  Internally, keeping track of whether shrinking has made progress is now
  structured as an effect of type (Any,), where Any is the Bool wrapper from
  the base library. This is a Monad due to the Monoid a => Monad (a,) instance
  defined in base, see the doc comments for details. mapPrimFL now takes and
  returns a function with a monadic effect, which allows us to map
  sortCoalesceFL2 directly. Which in turn allows us to change the return type
  of tryToShrink to Maybe, exposing to calling code whether we could shrink
  the sequence. This is used to avoid lengthFL calls when sifting.
  
  This patch comments out some code instead of deleting it. This was done in
  order to produce better diffs. They will be removed in a separate patch.

patch 18bb76a72a80b54649677a343d3bc758dfd75954
Author: Ben Franksen <ben.franksen@online.de>
Date:   Mon Aug  5 12:13:44 CEST 2019
  * separate canonizing from coalescing
  
  This moves the implementation of 'canonize' and 'canonizeFL' to a separate
  module, eliminates the export of canonize, and renames class PrimCanonize to
  PrimCoalesce.
  
  The rationale for this refactor is that 'canonize' for single prims was the
  only remaining method from class PrimCanonize that actually had anything to
  do with canonizing. Its implementation does not depend on coalescing and is
  almost independent from PrimV1, with the exception that it removed patches
  with no effect, using isIdentity. This is, however, also done by
  sortCoalesceFL. Investigating the call sites of the canonize method revealed
  that some of them actually re-implement canonizeFL. The remaining two call
  sites (in Prim.Patch.Split and Darcs.Repository.Diff) now call canonizeFL
  with a singleton FL.

patch a680caba30615c0712be9e77bbfd49fe1e2075dd
Author: Ben Franksen <ben.franksen@online.de>
Date:   Mon Aug  5 10:07:21 CEST 2019
  * remove out-commented functions from D.P.Prim.V1.Coalesce
  
  This is recorded as a separate patch for easier review.

patch 455447e0131a65226296450340273b7da75e640b
Author: Ben Franksen <ben.franksen@online.de>
Date:   Mon Aug  5 10:43:51 CEST 2019
  * slightly extend docs for class PrimCoalesce
Attachments
msg21329 (view) Author: ganesh Date: 2019-09-03.07:53:52
Sorry for taking a while to comment on this bundle.

> (2) The implementation of RepoPatchV1 used is_filepatch for speedyCommute.
>     This is now replaced with calls to listTouchedFiles. To account for
>     directory changes, we now check that neither file path is a prefix of
>     the other.

As with all changes to the commute behaviour of patches, this makes me
slightly nervous. Also, moving from is_filepatch to listTouchedFiles seems
like a complication, and I think that listTouchedFiles was previously only
important for the UI.

But we can't never make changes to the commute code for patches, so we
should do something. How about we just get rid of speedyCommute (and review
that change very carefully)? It should just be an optimisation and this is
just V1 patches, which are probably not very widely used.
msg21333 (view) Author: ganesh Date: 2019-09-03.09:48:02
Also, I'm fine with the general idea of the rest of the changes, though
I haven't looked in detail. If the tests pass then simplifications
to coalescing etc are very likely to be a good thing.
msg21377 (view) Author: bf Date: 2019-09-04.10:37:12
> Sorry for taking a while to comment on this bundle.

No problem.

>> (2) The implementation of RepoPatchV1 used is_filepatch for speedyCommute.
>>     This is now replaced with calls to listTouchedFiles. To account for
>>     directory changes, we now check that neither file path is a prefix of
>>     the other.
> 
> As with all changes to the commute behaviour of patches, this makes me
> slightly nervous.

There is no change to behavior, or at least I didn't intend there to be
one. As you noted below, this is all just an optimization. I slightly
extended its scope: we not only consider files but also directories. I
can't see what could go wrong with that: if both patches touch exactly
one file or directory and none is a prefix of the other, then the
patches are obviously completely unrelated and must commute trivially.

It would be perfectly fine to apply this optimization to the other
RepoPatch types, too. The only thing I don't like is that this ties
commutation to Prim.V1, even if only indirectly.

(Digression: One possible fix for that is to generalize PatchInspect in
such a way that we return some "name/identifier for a file system
object" instead of AnchoredPath. And I think the way to do that is to
define a class for ApplyStates with an associated type synonym, which
for Tree would be AnchoredPath. But this needs more thought and should
be postponed until we make a serious effort to integrate some variant of
FileUUID as an alternative prim patch type.)

BTW, believe we have separate unit tests for speedyCommute. I am not
sure they cover conflicted patches, but I would be willing to extend
them so they do.

If that doesn't satisfy you I could try and limit the optimization to
un-conflicted patches.

> Also, moving from is_filepatch to listTouchedFiles seems
> like a complication, and I think that listTouchedFiles was previously only
> important for the UI.

It is true that until now it was probably used for the UI only. But all
the RepoPatch implementations of listTouchedFiles take care to list all
files touched by any of their ingredients, even if they aren't active
because of conflicts. This behavior should be documented as required.

As to complication: it is a trade-off, as usual. Overall things become
simpler, but of course listTouchedFiles itself is more powerful and
complex than is_filepatch.

> But we can't never make changes to the commute code for patches, so we
> should do something. How about we just get rid of speedyCommute (and review
> that change very carefully)? It should just be an optimisation and this is
> just V1 patches, which are probably not very widely used.

I doubt that. Remember that conversion from darcs-1 to darcs-2 format
means you must either merge all your branches into one repo and then
pull them apart afterwards or else say goodbye to those branches. Also
note that the former may be impossible in practice due to exponential
conflict blow-up. My guess is that many users never bothered to upgrade
to darcs-2. I know for sure that I didn't, mostly, and that we still
have many darcs-1 repos at work and we regularly use them. We just take
care not to create conflicts in upstream, quite similar to the way we
work with darcs itself.
History
Date User Action Args
2019-08-19 21:39:59bfcreate
2019-09-03 07:53:52ganeshsetmessages: + msg21329
2019-09-03 09:48:02ganeshsetmessages: + msg21333
2019-09-04 10:37:13bfsetmessages: + msg21377
2019-09-21 10:30:42bfsetstatus: needs-screening -> in-discussion