Created on 2021-05-25.09:15:08 by bf, last changed 2022-04-12.13:07:51 by bf.
Here is how to reproduce:
> darcs clone http://darcs.net/screened
> darcs clone http://darcs.net/releases/branch-2.16
> rm ~/.cache/darcs/patches/8f/0000003658-
> cd screened
> darcs pull ../branch-2.16 --reorder-patches
> darcs check
darcs: failed to read patch:
Author: Ben Franksen <firstname.lastname@example.org>
Date: Thu Oct 29 15:01:41 CET 2020
* simplify the logic that handles existence and validity of the
Couldn't fetch 0000003658-
I will try to reproduce this now with a simple test repo with the
aim of creating a regression test.
So far I wasn't able to reproduce with a simple example repo.
However, I think I found the bug. I ran the pull --reorder command
with --debug and searched for the file name in question. The only
ocurrence was when it said:
In fetchFileUsingCachePrivate I'm directly grabbing file contents
Looking at the place where this message is issued in
Darcs.Util.Cache.fetchFileUsingCachePrivate, there is a comment
-- FIXME: create links in caches
Indeed, when I add a the missing call to link the file across all
writable locations, the bug disappears, which can also be seen in
the debug output.
I keep getting confused about why something related to a "cache" matters
for correctness. I took a look at the code quickly but I still don't really
get what is happening here :-( I guess the high-level point is that even
the current repository is considered a cache and therefore failing to
do the "write to all caches" step means we leave it out from the repo itself?
> I keep getting confused about why something related to a "cache" matters
> for correctness.
> I took a look at the code quickly but I still don't really
> get what is happening here :-( I guess the high-level point is that even
> the current repository is considered a cache and therefore failing to
> do the "write to all caches" step means we leave it out from the repo itself?
That's exactly the point. When I started hacking on things related to
the cache I experienced the same sort of confusion. The Cache type
exported by Darcs.Util.Cache is really the set of all relevant (source
and target) locations of patches, inventories, and pristine files that
are considered when executing the current command. The actual global
per-user cache is just one of those locations and the code in
Darcs.Util.Cache is responsible for handling all the low-level details
such as the order in which we search these locations when reading and
where and how to store data when writing. It is also responsible for
making sure that we *never* depend on the global cache for correct
operation. Outside of that module, we simply call fetchFileUsingCache
(to read a hashed file) and writeFileUsingCache (to write them to disk).
Since fetchFileUsingCache is supposed to transparently handle file
download, too, it already has to store those downloads somewhere, so it
chooses one of the writable locations in the Cache for that. (We always
should have one of those, because the "current" repo is always assumed
to writable) and while it does so it also hardlinks that file to all the
other writable locations.
What happened here is that when fetchFileUsingCache first finds the file
in a non-writable location, it failed to store it and just returned the
content. This is a corner case that could happen only if the other repo
is a valid local path (if it is remote then we download and that always
stores the file somewhere) and for some reason the file is not already
present in the global cache nor our local repo.
Unfortunately the code in Darcs.Util.Cache somewhat over-uses exception
handling which makes it hard to see what is going on. I am slowly, piece
by piece, cleaning that up, see some of the upcoming patches which I
really have to properly bundle and send.
> That's exactly the point. When I started hacking on things related to
> the cache I experienced the same sort of confusion. The Cache type
> exported by Darcs.Util.Cache is really the set of all relevant (source
> and target) locations of patches, inventories, and pristine files that
> are considered when executing the current command. The actual global
> per-user cache is just one of those locations and the code in
> Darcs.Util.Cache is responsible for handling all the low-level details
> such as the order in which we search these locations when reading and
> where and how to store data when writing. It is also responsible for
> making sure that we *never* depend on the global cache for correct
> operation. Outside of that module, we simply call fetchFileUsingCache
> (to read a hashed file) and writeFileUsingCache (to write them to disk).
It might help if we rename the actual concept to "store" or something
like that. Then a cache is one kind of store and "this repo" is another
kind of store where we have to guarantee the presence of the patch. I
also wonder if our code is robust in cases where cross-filesystem
restrictions prevent hard linking.
> It might help if we rename the actual concept to "store" or something
> like that. Then a cache is one kind of store and "this repo" is another
> kind of store where we have to guarantee the presence of the patch. I
Yes, that would help. Implies a lot of churn, unfortunately, since Cache
is passed around quite a lot.
> also wonder if our code is robust in cases where cross-filesystem
> restrictions prevent hard linking.
Giod point. Is this something we can test?
Correction: I wrote that "if [the file] is remote then we download and
that always stores the file somewhere". This is wrong. A remote file
is not always written to disk. Noth gzFetchFilePS and fetchFilePS
merely return the content as a ByteString.
On IRC yesterday (and again today, sorry) I was confused, so let me clarify.
What we observe is indeed repository corruption in the sense that after
a pull --reorder, the target repo (our "current" repo) can be in a state
where it doesn't have all patches references by its inventories.
Depending on the way you access this repository, particularly depending
on whether your cache already happens to contain the patch in question,
this will manifest as an error or not, e.g. when cloning the repo, or
when you do a darcs check.
We discovered that 'darcs check' can silently "repair" the repo if the
missing file is in the cache. So a more reliable way to detect the error
make darcs fail is to use `darcs check --no-cache`.
Here is what I found out today:
The issue cannot be reproduced with the head. The "fix" (for the scare
quotes see below) can be tracked down to
Author: Ben Franksen <email@example.com>
Date: Mon Nov 9 15:47:50 CET 2020
* improve cache utilization
After adding more debug messages I found out that we *do* call
writeFileUsingCache with the missing patch. The relevant debug output is:
Writing hash file to patches
In fetchFileUsingCachePrivate I'm directly grabbing file contents from
and that's it: writeFileUsingCache does not actually write the file if
it finds it in a 'LocalOnly' source location, more precisely, if the
call to fetchFileUsingCachePrivate LocalOnly ... succeeds. This is the
case because in fetchFileUsingCachePrivate, local function ffuc, the
first case matches because we pull from a local repo (so
isValidLocalPath (hashedFilePathReadOnly c subdir f) == True).
So this is (or rather: was) definitely a bug in writeFileUsingCache.
Now, the way the above patch "fixes" writeFileUsingCache is by adding
more side-effects to fetchFileUsingCachePrivate. As Ganesh remarked on
IRC, this is highly questionable! I agree, which is why I am not
considering the changes I made in the above patch as sufficient.
> Now, the way the above patch "fixes" writeFileUsingCache is by adding
> more side-effects to fetchFileUsingCachePrivate.
On closer inspection this statement is incorrect. What fixes the bug is
hunk ./src/Darcs/Util/Cache.hs 499
- _ <- fetchFileUsingCachePrivate LocalOnly (Ca cache) subdir hash
+ debugMessage $ "writeFileUsingCache "++hash
+ (fn, _) <- fetchFileUsingCachePrivate LocalOnly (Ca cache)
+ mapM_ (tryLinking fn hash subdir) cache
which is inside writeFileUsingCache.
What happens is that fetchFileUsingCache succeeds with no side-effect
i.e. no hard-linking; we merely get the file content. Then, if the file
exists in the same file system, the hard linking succeeds and we are
done. If we are on a different file system, the hard link fails, which
means we enter the wfuc exception handler, where we do an actual write,
using the *first* writable location i.e. the local repo; afterwards we
again try to link with the cache, which fails, but this is no problem
since it is entirely optional, and thus the failure is silently ignored.
Before the patch, fetchFileUsingCache succeeded in the same way, but
then we returned without trying to link, so we never actually made sure
the file exists in our local repo.
So the way the patch fixes the issue is somewhat roundabout, but it does
*not* rely on side-effects of fetchFileUsingCache.
|2022-04-12 12:59:38||bf||set||status: unknown -> has-patch|
|2022-04-12 13:07:51||bf||set||priority: bug|