darcs

Issue 2677 pull --reorder fails to copy patches

Title pull --reorder fails to copy patches
Priority bug Status has-patch
Milestone Resolved in
Superseder Nosy List bfrk
Assigned To
Topics

Created on 2021-05-25.09:15:08 by bfrk, last changed 2022-04-12.13:07:51 by bfrk.

Messages
msg22815 (view) Author: bfrk Date: 2021-05-25.09:15:04
Here is how to reproduce:

> darcs clone http://darcs.net/screened
> darcs clone http://darcs.net/releases/branch-2.16
> rm ~/.cache/darcs/patches/8f/0000003658-
8f452a132bf0b0e17f177c51ae6aefb0fb29fbb8e42506710d6bd08589336d6d
> cd screened
> darcs pull ../branch-2.16 --reorder-patches
> darcs check
darcs: failed to read patch:                                                    
patch 0c003b272def1c5c65cdc6827905acc2c84d5429
Author: Ben Franksen <ben.franksen@online.de>
Date:   Thu Oct 29 15:01:41 CET 2020
  * simplify the logic that handles existence and validity of the 
index
Couldn't fetch 0000003658-
8f452a132bf0b0e17f177c51ae6aefb0fb29fbb8e42506710d6bd08589336d6d

I will try to reproduce this now with a simple test repo with the 
aim of creating a regression test.
msg22816 (view) Author: bfrk Date: 2021-05-25.12:15:44
So far I wasn't able to reproduce with a simple example repo.

However, I think I found the bug. I ran the pull --reorder command 
with --debug and searched for the file name in question. The only 
ocurrence was when it said:

In fetchFileUsingCachePrivate I'm directly grabbing file contents 
from /home/ben/scratch/branch-2.16/_darcs/patches/0000003658-
8f452a132bf0b0e17f177c51ae6aefb0fb29fbb8e42506710d6bd08589336d6d

Looking at the place where this message is issued in 
Darcs.Util.Cache.fetchFileUsingCachePrivate, there is a comment 
saying

-- FIXME: create links in caches

Indeed, when I add a the missing call to link the file across all 
writable locations, the bug disappears, which can also be seen in 
the debug output.
msg22817 (view) Author: ganesh Date: 2021-05-25.15:43:40
I keep getting confused about why something related to a "cache" matters
for correctness. I took a look at the code quickly but I still don't really
get what is happening here :-( I guess the high-level point is that even
the current repository is considered a cache and therefore failing to
do the "write to all caches" step means we leave it out from the repo itself?
msg22821 (view) Author: bfrk Date: 2021-05-26.08:27:42
> I keep getting confused about why something related to a "cache" matters
> for correctness.
> I took a look at the code quickly but I still don't really
> get what is happening here :-( I guess the high-level point is that even
> the current repository is considered a cache and therefore failing to
> do the "write to all caches" step means we leave it out from the repo itself?

That's exactly the point. When I started hacking on things related to
the cache I experienced the same sort of confusion. The Cache type
exported by Darcs.Util.Cache is really the set of all relevant (source
and target) locations of patches, inventories, and pristine files that
are considered when executing the current command. The actual global
per-user cache is just one of those locations and the code in
Darcs.Util.Cache is responsible for handling all the low-level details
such as the order in which we search these locations when reading and
where and how to store data when writing. It is also responsible for
making sure that we *never* depend on the global cache for correct
operation. Outside of that module, we simply call fetchFileUsingCache
(to read a hashed file) and writeFileUsingCache (to write them to disk).

Since fetchFileUsingCache is supposed to transparently handle file
download, too, it already has to store those downloads somewhere, so it
chooses one of the writable locations in the Cache for that. (We always
should have one of those, because the "current" repo is always assumed
to writable) and while it does so it also hardlinks that file to all the
other writable locations.

What happened here is that when fetchFileUsingCache first finds the file
in a non-writable location, it failed to store it and just returned the
content. This is a corner case that could happen only if the other repo
is a valid local path (if it is remote then we download and that always
stores the file somewhere) and for some reason the file is not already
present in the global cache nor our local repo.

Unfortunately the code in Darcs.Util.Cache somewhat over-uses exception
handling which makes it hard to see what is going on. I am slowly, piece
by piece, cleaning that up, see some of the upcoming patches which I
really have to properly bundle and send.
msg22822 (view) Author: ganesh Date: 2021-05-26.11:15:10
> That's exactly the point. When I started hacking on things related to
> the cache I experienced the same sort of confusion. The Cache type
> exported by Darcs.Util.Cache is really the set of all relevant (source
> and target) locations of patches, inventories, and pristine files that
> are considered when executing the current command. The actual global
> per-user cache is just one of those locations and the code in
> Darcs.Util.Cache is responsible for handling all the low-level details
> such as the order in which we search these locations when reading and
> where and how to store data when writing. It is also responsible for
> making sure that we *never* depend on the global cache for correct
> operation. Outside of that module, we simply call fetchFileUsingCache
> (to read a hashed file) and writeFileUsingCache (to write them to disk).

It might help if we rename the actual concept to "store" or something
like that. Then a cache is one kind of store and "this repo" is another
kind of store where we have to guarantee the presence of the patch. I
also wonder if our code is robust in cases where cross-filesystem
restrictions prevent hard linking.
msg22823 (view) Author: bfrk Date: 2021-05-26.13:06:05
> It might help if we rename the actual concept to "store" or something
> like that. Then a cache is one kind of store and "this repo" is another
> kind of store where we have to guarantee the presence of the patch. I

Yes, that would help. Implies a lot of churn, unfortunately, since Cache
is passed around quite a lot.

> also wonder if our code is robust in cases where cross-filesystem
> restrictions prevent hard linking.

Giod point. Is this something we can test?
msg22824 (view) Author: bfrk Date: 2021-05-28.13:46:53
Correction: I wrote that "if [the file] is remote then we download and 
that always stores the file somewhere". This is wrong. A remote file 
is not always written to disk. Noth gzFetchFilePS and fetchFilePS 
merely return the content as a ByteString.
msg22825 (view) Author: bfrk Date: 2021-05-29.14:13:00
On IRC yesterday (and again today, sorry) I was confused, so let me clarify.

What we observe is indeed repository corruption in the sense that after
a pull --reorder, the target repo (our "current" repo) can be in a state
where it doesn't have all patches references by its inventories.

Depending on the way you access this repository, particularly depending
on whether your cache already happens to contain the patch in question,
this will manifest as an error or not, e.g. when cloning the repo, or
when you do a darcs check.

We discovered that 'darcs check' can silently "repair" the repo if the
missing file is in the cache. So a more reliable way to detect the error
make darcs fail is to use `darcs check --no-cache`.

Here is what I found out today:

The issue cannot be reproduced with the head. The "fix" (for the scare
quotes see below) can be tracked down to

patch 018c5978374ec2cb3ae4f918d733da948bda73da
Author: Ben Franksen <ben.franksen@online.de>
Date:   Mon Nov  9 15:47:50 CET 2020
  * improve cache utilization

After adding more debug messages I found out that we *do* call
writeFileUsingCache with the missing patch. The relevant debug output is:

Writing hash file to patches
writeFileUsingCache:0000003658-8f452a132bf0b0e17f177c51ae6aefb0fb29fbb8e42506710d6bd08589336d6d
In fetchFileUsingCachePrivate I'm directly grabbing file contents from
/home/ben/scratch/branch-2.16/_darcs/patches/0000003658-8f452a132bf0b0e17f177c51ae6aefb0fb29fbb8e42506710d6bd08589336d6d

and that's it: writeFileUsingCache does not actually write the file if
it finds it in a 'LocalOnly' source location, more precisely, if the
call to fetchFileUsingCachePrivate LocalOnly ... succeeds. This is the
case because in fetchFileUsingCachePrivate, local function ffuc, the
first case matches because we pull from a local repo (so
isValidLocalPath (hashedFilePathReadOnly c subdir f) == True).

So this is (or rather: was) definitely a bug in writeFileUsingCache.

Now, the way the above patch "fixes" writeFileUsingCache is by adding
more side-effects to fetchFileUsingCachePrivate. As Ganesh remarked on
IRC, this is highly questionable! I agree, which is why I am not
considering the changes I made in the above patch as sufficient.
msg22826 (view) Author: bfrk Date: 2021-05-30.18:35:34
> Now, the way the above patch "fixes" writeFileUsingCache is by adding
> more side-effects to fetchFileUsingCachePrivate.

On closer inspection this statement is incorrect. What fixes the bug is
this hunk

    hunk ./src/Darcs/Util/Cache.hs 499
    -    _ <- fetchFileUsingCachePrivate LocalOnly (Ca cache) subdir hash
    +    debugMessage $ "writeFileUsingCache "++hash
    +    (fn, _) <- fetchFileUsingCachePrivate LocalOnly (Ca cache)
subdir hash
    +    mapM_ (tryLinking fn hash subdir) cache

which is inside writeFileUsingCache.

What happens is that fetchFileUsingCache succeeds with no side-effect
i.e. no hard-linking; we merely get the file content. Then, if the file
exists in the same file system, the hard linking succeeds and we are
done. If we are on a different file system, the hard link fails, which
means we enter the wfuc exception handler, where we do an actual write,
using the *first* writable location i.e. the local repo; afterwards we
again try to link with the cache, which fails, but this is no problem
since it is entirely optional, and thus the failure is silently ignored.

Before the patch, fetchFileUsingCache succeeded in the same way, but
then we returned without trying to link, so we never actually made sure
the file exists in our local repo.

So the way the patch fixes the issue is somewhat roundabout, but it does
*not* rely on side-effects of fetchFileUsingCache.
History
Date User Action Args
2021-05-25 09:15:08bfrkcreate
2021-05-25 12:15:47bfrksetmessages: + msg22816
2021-05-25 15:43:43ganeshsetmessages: + msg22817
2021-05-26 08:27:46bfrksetmessages: + msg22821
2021-05-26 11:15:12ganeshsetmessages: + msg22822
2021-05-26 13:06:08bfrksetmessages: + msg22823
2021-05-28 13:46:56bfrksetmessages: + msg22824
2021-05-29 14:13:07bfrksetmessages: + msg22825
2021-05-30 18:35:40bfrksetmessages: + msg22826
2021-06-14 20:37:44ganeshsetmessages: + msg22880
2021-06-14 21:02:30ganeshsetmessages: - msg22880
2022-04-12 12:59:38bfrksetstatus: unknown -> has-patch
2022-04-12 13:07:51bfrksetpriority: bug