Patch 1954 resolve issue2639: darcs diff crashes with --last=1 an...

Title resolve issue2639: darcs diff crashes with --last=1 an...
Superseder Nosy List bf
Related Issues
Status accepted Assigned To

Created on 2020-01-25.12:58:23 by bf, last changed 2020-02-27.20:29:01 by ganesh.

File name Status Uploaded Type Edit Remove
diff-command_-make-construction-of-trees-more-obviously-correct.dpatch bf, 2020-01-26.19:50:59 application/x-darcs-patch
pEpkey.asc bf, 2020-01-25.15:30:32 application/pgp-keys
pEpkey.asc bf, 2020-01-25.17:38:01 application/pgp-keys
pEpkey.asc bf, 2020-01-26.18:53:12 application/pgp-keys
pEpkey.asc bf, 2020-01-26.19:47:10 application/pgp-keys
pEpkey.asc bf, 2020-02-10.07:28:14 application/pgp-keys
patch-preview.txt bf, 2020-01-25.12:58:23 text/x-darcs-patch
patch-preview.txt bf, 2020-01-26.19:50:59 text/x-darcs-patch
resolve-issue2639_-darcs-diff-crashes-with-__last_1-and-file-name.dpatch bf, 2020-01-25.12:58:23 application/x-darcs-patch
unnamed bf, 2020-01-25.12:58:23 text/plain
unnamed bf, 2020-01-26.19:50:59 text/plain
See mailing list archives for discussion on individual patches.
msg21703 (view) Author: bf Date: 2020-01-25.12:58:23
1 patch for repository http://darcs.net/screened:

patch 5913eea4ed4c7bcb92bee2987d19a8b36f60d17a
Author: Ben Franksen <ben.franksen@online.de>
Date:   Fri Jan 24 17:11:19 CET 2020
  * resolve issue2639: darcs diff crashes with --last=1 and file name
msg21710 (view) Author: ganesh Date: 2020-01-25.14:57:38
Can you explain the fix? I've stared at it a bit but couldn't see
what was wrong with the old logic and right with the new.
msg21711 (view) Author: bf Date: 2020-01-25.15:30:32
Please see the explanation to issue2639 (msg21697) which I added before
I sent the patch. I think this should clear things up, if not please
yell and i will add more details.
msg21713 (view) Author: ganesh Date: 2020-01-25.16:37:40
Thanks, I missed that (sorry!)

So (in the old code), unrecorded' contains the unrecorded changes
filtered by the files the user selected, but base contains the
unrecorded changes unfiltered.

Is the problem that we then try to remove todiff+>+tounapply from base to
make oldtree, and this fails because tounapply doesn't contain the changes
to unselected files (g in your test) so if todiff also contains changes to
g then unapplying it fails?
msg21716 (view) Author: bf Date: 2020-01-25.17:38:01
No, the problem is not 'base', but the files inside 'pdirpath'.

pdirpath is used in the first call to hashedTreeIO when we create
oldtree. It is the result of copying our (hashed) pristine tree (the 4
lines after the comment

  -- make a temporary copy of pristineDirPath where we have write access

). But if we have no secondMatch this is wrong: pdirpath should be (a
hashed version of) the working tree, not the pristine tree.

My fix returns the same value for base as before (i.e. as
unrecordedChanges does), but /as a side effect/ the unrecorded' changes
are applied to the pristine tree on disk inside pdirpath; which is then
in the correct state, i.e. we can now unapply 'todiff+>+tounapply'
without crashing.

In principle this bug has nothing to do with path filtering. I must
admit that I am not 100% sure why we need path filtering to actually get
the crash. It may have to do with the fact that the directory on disk is
used only as 'backup storage' for the Tree, i.e. we consult it only if
we hit an unrealized Blob inside the Tree (assuming it is already
expanded, but i think it is). This may mask the bug in many common
cases. I know this is not really a satisfying explanation...
msg21722 (view) Author: ganesh Date: 2020-01-26.00:27:30
I see a bit better now. The pre-condition for the arguments to
hashedTreeIO is being violated. I think 'tounapply' is a bit of
a red herring as it'll be empty when there's no second match, 
because we'll have match=all.

But if hashedTreeIO has a side-effect on pdirpath, why is it ok to 
call it twice in succession to make 'oldtree' and 'newtree'?
msg21731 (view) Author: bf Date: 2020-01-26.18:53:12
> I see a bit better now. The pre-condition for the arguments to
> hashedTreeIO is being violated. I think 'tounapply' is a bit of
> a red herring as it'll be empty when there's no second match, 
> because we'll have match=all.

Ah, correct. I should have said 'todiff+>+tounapply', not just
'tounapply'. In the failing test scenario unrecorded' is the last patch
in todiff.

> But if hashedTreeIO has a side-effect on pdirpath, why is it ok to 
> call it twice in succession to make 'oldtree' and 'newtree'?

Yes this is strange, isn't it? I tried for some time to come up with a
test case to show this is unsafe, but to no avail.

Then I thought about it some more and came to the conclusion that the
code actually works, because hashed files are never removed (until
explicitly garbage collected). So all items and subtrees of the Tree
'base' can still be found in 'pdirpath'. The first two hashedTreeIO
calls merely added some more hashed files and these are just ignored.

So this is not (another) bug, strictly speaking. I still think the code
would be more obvious if it re-applied todiff and passed oldtree to the
second call:

  oldtree <- filt `fmap` hashedTreeIO
    (unapply $ unsafeCoercePEnd todiff +>+ tounapply) base pdirpath
  newtree <- filt `fmap` hashedTreeIO
    (apply todiff) oldtree pdirpath

or, better yet, to avoid unnecessary patch application:

  newtree <- filt `fmap` hashedTreeIO
    (unapply tounapply) base pdirpath
  oldtree <- filt `fmap` hashedTreeIO
    (unapply todiff) newtree pdirpath

What do you think?
msg21734 (view) Author: bf Date: 2020-01-26.19:47:10
>   newtree <- filt `fmap` hashedTreeIO
>     (unapply tounapply) base pdirpath
>   oldtree <- filt `fmap` hashedTreeIO
>     (unapply todiff) newtree pdirpath

Ahem. I found that this isn't good when there is path filtering taking
place. We should postpone the filtering to when we write out the plain
trees. Will send a patch.
msg21736 (view) Author: bf Date: 2020-01-26.19:50:59
Following up on my recent comments.

1 patch for repository http://darcs.net/screened:

patch c10013a29a6341db6ed596378539162dd4d75391
Author: Ben Franksen <ben.franksen@online.de>
Date:   Sun Jan 26 20:35:15 CET 2020
  * diff command: make construction of trees more obviously correct
msg21781 (view) Author: bf Date: 2020-02-10.07:28:14
Ganesh: does this explanation satisfy you? I have a few follow-up
patches and would like to move on...

Quoting myself:
> So your initial assumption that this is all about path filtering was
> correct. The underlying problem is that we treat recorded and unrecorded
> changes differently with respect to path filtering. The former are taken
> as is, but the latter are filtered. Unapplying filtered unrecorded
> changes from the full working tree retains filtered-out files as they
> are (i.e. changed, relative to pristine), which means we may get an
> error when we unapply recorded changes (since these are not filtered).
> The reason my patch fixes the problem is that it does not call
> readUnrecorded, but rather re-constructs a "filtered working tree" from
> the (filtered) unrecorded changes.
> A less invasive fix would be to pass Nothing (instead of mpaths) to
> unrecordedChanges, which means we don't filter the unrecorded changes.
> Instead we only filter the Trees that we pass to the external diff
> command. This would perhaps be less efficient in cases where there are
> large amounts of unrecorded changes but only a small subset affect the
> files we are interested in.
msg21786 (view) Author: ganesh Date: 2020-02-10.22:25:24
Apologies, I got distracted by breaking out my testing changes and
forgot about reviews for a while. I think your current fix is fine
but it'd be good to have some comments documenting what we now know
about how things work and what could potentially be changed.
Date User Action Args
2020-01-25 12:58:23bfcreate
2020-01-25 14:20:29bfsetstatus: needs-screening -> needs-review
2020-01-25 14:57:38ganeshsetstatus: needs-review -> review-in-progress
messages: + msg21710
2020-01-25 15:30:32bfsetfiles: + pEpkey.asc
messages: + msg21711
2020-01-25 16:37:40ganeshsetmessages: + msg21713
2020-01-25 17:38:01bfsetfiles: + pEpkey.asc
messages: + msg21716
2020-01-26 00:27:31ganeshsetmessages: + msg21722
2020-01-26 18:53:12bfsetfiles: + pEpkey.asc
messages: + msg21731
2020-01-26 19:47:10bfsetfiles: + pEpkey.asc
messages: + msg21734
2020-01-26 19:50:59bfsetfiles: + patch-preview.txt, diff-command_-make-construction-of-trees-more-obviously-correct.dpatch, unnamed
messages: + msg21736
2020-02-10 07:28:15bfsetfiles: + pEpkey.asc
messages: + msg21781
2020-02-10 22:25:24ganeshsetstatus: review-in-progress -> accepted-pending-tests
messages: + msg21786
2020-02-27 20:29:01ganeshsetstatus: accepted-pending-tests -> accepted