darcs

Patch 2227 patch index: remove another kludge for b... (and 7 more)

Title patch index: remove another kludge for b... (and 7 more)
Superseder Nosy List bfrk
Related Issues
Status accepted Assigned To
Milestone

Created on 2021-11-20.11:43:52 by bfrk, last changed 2023-06-24.16:16:10 by ganesh.

Files
File name Status Uploaded Type Edit Remove
patch-index_-remove-another-kludge-for-broken-move-patches.dpatch bfrk, 2021-11-20.11:43:38 application/x-darcs-patch
patch-preview.txt bfrk, 2021-11-20.11:43:34 text/x-darcs-patch
See mailing list archives for discussion on individual patches.
Messages
msg22932 (view) Author: bfrk Date: 2021-11-20.11:43:38
8 patches for repository http://darcs.net/screened:

patch 955d1155f0cd5063fd12f48217f37dfcb3f0a33e
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Jun 10 21:00:38 CEST 2021
  * patch index: remove another kludge for broken move patches

patch 3b5f95878bfbc46cbf3c439c7227dba1db165155
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Jun 10 14:09:49 CEST 2021
  * patch index: debug messages when loading

patch 2f2cfa1bea65fa775f25dd9408e0faa22f1311fb
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Jun 10 14:06:19 CEST 2021
  * patch index: import from Darcs.Patch.Index.Types explicitly

patch b1ed0fbbca4a874be2905ae1c5b1a82a29b719aa
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Jun 10 14:05:01 CEST 2021
  * patch index: reformat a doc comment

patch 05ad2db3ec6da188b5390573154fae5c3aea6025
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Oct  4 18:07:31 CEST 2018
  * patch index: massive performance improvement

  By definition, a directory is regarded as touched whenever a file or
  directory beneath it is. This information was laboriously re-constructed at
  query time (in maybeFilterPatches), which could be very slow when a
  directory had many children. It is much more efficient to add this
  information immediately when we build or update the patch index. This
  requires no change to the data structures, just additional inserts.
  Nonetheless, this is a format change, so we have to increase the version
  number to 4.

patch f61562ebbf31c75f4c7ccfe23751b7b98286a290
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Jun 10 14:15:41 CEST 2021
  * patch index: optimize Set Word32 -> IntSet

  This trades a bit of memory (at least on 64 bit archs) for the ability to
  use an optimized data structure. The version number gets increased to 5.

patch ca8f233393233b0580a20eda7eafa8e379687f1a
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Jun 10 14:09:09 CEST 2021
  * patch index: cleanup doc comments

patch 77d58fea123dc04bf96aa646206381d2d5d280a7
Author: Ben Franksen <ben.franksen@online.de>
Date:   Mon Oct  4 10:09:39 CEST 2021
  * patch index: disentangle Darcs.Patch.Index.* modules, rename PatchMod to FileMod
Attachments
msg23242 (view) Author: ganesh Date: 2023-04-01.09:01:53
reviewing incrementally

>   * patch index: remove another kludge for broken move patches

Can you remind me why we don't need this workaround any more?

>  * patch index: debug messages when loading
>  * patch index: debug messages when loading
>  * patch index: reformat a doc comment

All OK
msg23252 (view) Author: bfrk Date: 2023-04-01.16:30:47
>>    * patch index: remove another kludge for broken move patches
> 
> Can you remind me why we don't need this workaround any more?

I believe it was this patch

patch f28ec42284caa0477537bb5741762abf0ab31b02
Author: Ben Franksen <ben.franksen@online.de>
Date:   Tue Mar  9 17:40:02 CET 2021
  * fail in Darcs.Util.Tree.Monad.rename if source does not exist

Previously, when we applied a move with a non-existing source, this 
was not detected i.e. did not fail, but rather (effectively) ignored 
it. Such changes could then be recorded and indeed we had a very old 
patch in our own history that contained such a bad move. This 
prompted me to extend the basic repair machinery for Prim.V1 to 
include a new case which simply removes such a change from the 
patch.
msg23253 (view) Author: ganesh Date: 2023-04-01.18:19:35
> This prompted me to extend the basic repair machinery for Prim.V1 to 
> include a new case which simply removes such a change from the 
> patch.

So just so I've got it clear in my head (and in the hope I remember
next time...) - we used to silently accept these broken patches,
now we fail and force the user to repair them first before doing
anything else?
msg23254 (view) Author: bfrk Date: 2023-04-01.18:21:48
It is funny how such a small and innocent looking change led to a series of 
patches, some of them much less trivial.

I can't remember exactly how I noticed the broken move in our repos, perhaps I 
just ran a check or repair in one of my clones and noticed that it now 
"crashes". After I extended Prim.V1.applyAndTryToFixFL and doing lots of tests 
and self-review to make sure this is really okay, I went ahead and repair'ed 
screened on darcs.net, and I think the other repos, too (reviewed, releases).

Quite some time later I noticed that my clones had become slower on operations 
like push and pull or with commands that support --not-in-remote. Furthermore, 
a fresh clone with the new patches pulled of top did not show this behavior. 
This vexxed me enough to investigate. Comparing the two repos I found that 
their _darcs/inventories differed a lot, in particular the fresh clone had 
much fewer inventories. So I looked into the upstream repos and saw that 
screened has just a single inventory! This made me remember the repair 
operation and indeed a quick look at the code in Darcs.Repository.Repair 
revealed that repair throws away all Tagged sections (it worked on the result 
of a patchSet2FL). So I fixed that. But then I had already repaired the repos 
on darcs.net, so how to restore them to their old glory? This led me to add --
deep/--shallow options for optimize reorder, --deep having the effect of 
creating inventories for all clean tags in the repo and also trying to make 
all tags clean as long as that doesn't make an earlier tag unclean. Again, 
after lots of testing and self-review, I ran this on on the repos on darcs.net 
(this time I archived the old versions under /opt/darcs/old).

What is still missing is a variant of `darcs optimize reorder` that adapts an 
existing repo to the inventory structure (and thus patch order) of its 
upstream. Like a fresh clone with patches added/removed as required to mirror 
the old clone, as I did manually when I first encountered the mysterious slow-
down. This should make use of packs if present in upstream. We could perhaps 
try to detect situations which are suboptimal (during push and pull and such) 
and issue a warning with a hint how to improve the situation.

This is related to my idea (still not realised) of a "safe mode" which does 
not trust patch identities and instead requires patch content equality (via 
content hashes, including inventory hashes), re-ordering patches locally if 
necessary. This would reliably detect violations of the ident-commute law, 
i.e. our assumption that re-ordering via commute is the only valid way that 
patches can keep their identity (and vice versa). Realising this idea is 
difficult because it means everything in Darcs.Patch.Depends must be 
refactored, if not completely re-written.
msg23255 (view) Author: bfrk Date: 2023-04-01.18:26:51
To answer your question (which I hadn't seen before sending the long 
story): yes, precisely.
msg23258 (view) Author: bfrk Date: 2023-04-01.20:30:37
Well, not quite. We do NOT force users to run repair. If the broken 
move is contained in an old enough patch, then most operations won't 
even notice any problem. Remember that we fail when applying the 
patch (which is our only way to check if a patch actually makes 
sense in a given context) and old patches are usually not applied. 
Pulling all patches into an empty repo will fail, for instance. 
Commuting such a patch may or may not have triggered an error even 
before this change.
msg23377 (view) Author: ganesh Date: 2023-06-24.14:54:57
>   * patch index: massive performance improvement

Looks good. I've read this a few times and while I don't think I 
understand the nuances well enough to be completely certain about it, 
the concept makes sense. Also it's not a disaster if there is a bug
as it won't corrupt repo history or anything.
msg23378 (view) Author: ganesh Date: 2023-06-24.14:57:51
>  * patch index: optimize Set Word32 -> IntSet
>  * patch index: cleanup doc comments
>  * patch index: disentangle Darcs.Patch.Index.* modules, rename 
PatchMod to FileMod

All good.
History
Date User Action Args
2021-11-20 11:43:52bfrkcreate
2021-11-21 07:13:06bfrksetstatus: needs-screening -> needs-review
2023-04-01 09:01:53ganeshsetmessages: + msg23242
2023-04-01 16:30:48bfrksetmessages: + msg23252
2023-04-01 18:19:36ganeshsetmessages: + msg23253
2023-04-01 18:21:52bfrksetmessages: + msg23254
2023-04-01 18:26:54bfrksetmessages: + msg23255
2023-04-01 20:30:38bfrksetmessages: + msg23258
2023-04-02 16:33:25bfrksetstatus: needs-review -> review-in-progress
2023-06-24 14:54:58ganeshsetmessages: + msg23377
2023-06-24 14:57:51ganeshsetstatus: review-in-progress -> accepted-pending-tests
messages: + msg23378
2023-06-24 16:16:10ganeshsetstatus: accepted-pending-tests -> accepted