Issue 2677: pull --reorder fails to copy patches - Darcs bug tracker

Title	pull --reorder fails to copy patches
Priority	bug	Status	has-patch
Milestone		Resolved in
Superseder		Nosy List	bfrk
Assigned To		Topics

Created on 2021-05-25.09:15:08 by bfrk, last changed 2022-04-12.13:07:51 by bfrk.

Messages
msg22815 (view)	Author: bfrk	Date: 2021-05-25.09:15:04
Here is how to reproduce: > darcs clone http://darcs.net/screened > darcs clone http://darcs.net/releases/branch-2.16 > rm ~/.cache/darcs/patches/8f/0000003658- 8f452a132bf0b0e17f177c51ae6aefb0fb29fbb8e42506710d6bd08589336d6d > cd screened > darcs pull ../branch-2.16 --reorder-patches > darcs check darcs: failed to read patch: patch 0c003b272def1c5c65cdc6827905acc2c84d5429 Author: Ben Franksen <ben.franksen@online.de> Date: Thu Oct 29 15:01:41 CET 2020 * simplify the logic that handles existence and validity of the index Couldn't fetch 0000003658- 8f452a132bf0b0e17f177c51ae6aefb0fb29fbb8e42506710d6bd08589336d6d I will try to reproduce this now with a simple test repo with the aim of creating a regression test.
msg22816 (view)	Author: bfrk	Date: 2021-05-25.12:15:44
So far I wasn't able to reproduce with a simple example repo. However, I think I found the bug. I ran the pull --reorder command with --debug and searched for the file name in question. The only ocurrence was when it said: In fetchFileUsingCachePrivate I'm directly grabbing file contents from /home/ben/scratch/branch-2.16/_darcs/patches/0000003658- 8f452a132bf0b0e17f177c51ae6aefb0fb29fbb8e42506710d6bd08589336d6d Looking at the place where this message is issued in Darcs.Util.Cache.fetchFileUsingCachePrivate, there is a comment saying -- FIXME: create links in caches Indeed, when I add a the missing call to link the file across all writable locations, the bug disappears, which can also be seen in the debug output.
msg22817 (view)	Author: ganesh	Date: 2021-05-25.15:43:40
I keep getting confused about why something related to a "cache" matters for correctness. I took a look at the code quickly but I still don't really get what is happening here :-( I guess the high-level point is that even the current repository is considered a cache and therefore failing to do the "write to all caches" step means we leave it out from the repo itself?
msg22821 (view)	Author: bfrk	Date: 2021-05-26.08:27:42
> I keep getting confused about why something related to a "cache" matters > for correctness. > I took a look at the code quickly but I still don't really > get what is happening here :-( I guess the high-level point is that even > the current repository is considered a cache and therefore failing to > do the "write to all caches" step means we leave it out from the repo itself? That's exactly the point. When I started hacking on things related to the cache I experienced the same sort of confusion. The Cache type exported by Darcs.Util.Cache is really the set of all relevant (source and target) locations of patches, inventories, and pristine files that are considered when executing the current command. The actual global per-user cache is just one of those locations and the code in Darcs.Util.Cache is responsible for handling all the low-level details such as the order in which we search these locations when reading and where and how to store data when writing. It is also responsible for making sure that we never depend on the global cache for correct operation. Outside of that module, we simply call fetchFileUsingCache (to read a hashed file) and writeFileUsingCache (to write them to disk). Since fetchFileUsingCache is supposed to transparently handle file download, too, it already has to store those downloads somewhere, so it chooses one of the writable locations in the Cache for that. (We always should have one of those, because the "current" repo is always assumed to writable) and while it does so it also hardlinks that file to all the other writable locations. What happened here is that when fetchFileUsingCache first finds the file in a non-writable location, it failed to store it and just returned the content. This is a corner case that could happen only if the other repo is a valid local path (if it is remote then we download and that always stores the file somewhere) and for some reason the file is not already present in the global cache nor our local repo. Unfortunately the code in Darcs.Util.Cache somewhat over-uses exception handling which makes it hard to see what is going on. I am slowly, piece by piece, cleaning that up, see some of the upcoming patches which I really have to properly bundle and send.
msg22822 (view)	Author: ganesh	Date: 2021-05-26.11:15:10
> That's exactly the point. When I started hacking on things related to > the cache I experienced the same sort of confusion. The Cache type > exported by Darcs.Util.Cache is really the set of all relevant (source > and target) locations of patches, inventories, and pristine files that > are considered when executing the current command. The actual global > per-user cache is just one of those locations and the code in > Darcs.Util.Cache is responsible for handling all the low-level details > such as the order in which we search these locations when reading and > where and how to store data when writing. It is also responsible for > making sure that we never depend on the global cache for correct > operation. Outside of that module, we simply call fetchFileUsingCache > (to read a hashed file) and writeFileUsingCache (to write them to disk). It might help if we rename the actual concept to "store" or something like that. Then a cache is one kind of store and "this repo" is another kind of store where we have to guarantee the presence of the patch. I also wonder if our code is robust in cases where cross-filesystem restrictions prevent hard linking.
msg22823 (view)	Author: bfrk	Date: 2021-05-26.13:06:05
> It might help if we rename the actual concept to "store" or something > like that. Then a cache is one kind of store and "this repo" is another > kind of store where we have to guarantee the presence of the patch. I Yes, that would help. Implies a lot of churn, unfortunately, since Cache is passed around quite a lot. > also wonder if our code is robust in cases where cross-filesystem > restrictions prevent hard linking. Giod point. Is this something we can test?
msg22824 (view)	Author: bfrk	Date: 2021-05-28.13:46:53
Correction: I wrote that "if [the file] is remote then we download and that always stores the file somewhere". This is wrong. A remote file is not always written to disk. Noth gzFetchFilePS and fetchFilePS merely return the content as a ByteString.
msg22825 (view)	Author: bfrk	Date: 2021-05-29.14:13:00
On IRC yesterday (and again today, sorry) I was confused, so let me clarify. What we observe is indeed repository corruption in the sense that after a pull --reorder, the target repo (our "current" repo) can be in a state where it doesn't have all patches references by its inventories. Depending on the way you access this repository, particularly depending on whether your cache already happens to contain the patch in question, this will manifest as an error or not, e.g. when cloning the repo, or when you do a darcs check. We discovered that 'darcs check' can silently "repair" the repo if the missing file is in the cache. So a more reliable way to detect the error make darcs fail is to use `darcs check --no-cache`. Here is what I found out today: The issue cannot be reproduced with the head. The "fix" (for the scare quotes see below) can be tracked down to patch 018c5978374ec2cb3ae4f918d733da948bda73da Author: Ben Franksen <ben.franksen@online.de> Date: Mon Nov 9 15:47:50 CET 2020 * improve cache utilization After adding more debug messages I found out that we do call writeFileUsingCache with the missing patch. The relevant debug output is: Writing hash file to patches writeFileUsingCache:0000003658-8f452a132bf0b0e17f177c51ae6aefb0fb29fbb8e42506710d6bd08589336d6d In fetchFileUsingCachePrivate I'm directly grabbing file contents from /home/ben/scratch/branch-2.16/_darcs/patches/0000003658-8f452a132bf0b0e17f177c51ae6aefb0fb29fbb8e42506710d6bd08589336d6d and that's it: writeFileUsingCache does not actually write the file if it finds it in a 'LocalOnly' source location, more precisely, if the call to fetchFileUsingCachePrivate LocalOnly ... succeeds. This is the case because in fetchFileUsingCachePrivate, local function ffuc, the first case matches because we pull from a local repo (so isValidLocalPath (hashedFilePathReadOnly c subdir f) == True). So this is (or rather: was) definitely a bug in writeFileUsingCache. Now, the way the above patch "fixes" writeFileUsingCache is by adding more side-effects to fetchFileUsingCachePrivate. As Ganesh remarked on IRC, this is highly questionable! I agree, which is why I am not considering the changes I made in the above patch as sufficient.
msg22826 (view)	Author: bfrk	Date: 2021-05-30.18:35:34
> Now, the way the above patch "fixes" writeFileUsingCache is by adding > more side-effects to fetchFileUsingCachePrivate. On closer inspection this statement is incorrect. What fixes the bug is this hunk hunk ./src/Darcs/Util/Cache.hs 499 - _ <- fetchFileUsingCachePrivate LocalOnly (Ca cache) subdir hash + debugMessage $ "writeFileUsingCache "++hash + (fn, _) <- fetchFileUsingCachePrivate LocalOnly (Ca cache) subdir hash + mapM_ (tryLinking fn hash subdir) cache which is inside writeFileUsingCache. What happens is that fetchFileUsingCache succeeds with no side-effect i.e. no hard-linking; we merely get the file content. Then, if the file exists in the same file system, the hard linking succeeds and we are done. If we are on a different file system, the hard link fails, which means we enter the wfuc exception handler, where we do an actual write, using the first writable location i.e. the local repo; afterwards we again try to link with the cache, which fails, but this is no problem since it is entirely optional, and thus the failure is silently ignored. Before the patch, fetchFileUsingCache succeeded in the same way, but then we returned without trying to link, so we never actually made sure the file exists in our local repo. So the way the patch fixes the issue is somewhat roundabout, but it does not rely on side-effects of fetchFileUsingCache.

History
Date	User	Action	Args
2021-05-25 09:15:08	bfrk	create
2021-05-25 12:15:47	bfrk	set	messages: + msg22816
2021-05-25 15:43:43	ganesh	set	messages: + msg22817
2021-05-26 08:27:46	bfrk	set	messages: + msg22821
2021-05-26 11:15:12	ganesh	set	messages: + msg22822
2021-05-26 13:06:08	bfrk	set	messages: + msg22823
2021-05-28 13:46:56	bfrk	set	messages: + msg22824
2021-05-29 14:13:07	bfrk	set	messages: + msg22825
2021-05-30 18:35:40	bfrk	set	messages: + msg22826
2021-06-14 20:37:44	ganesh	set	messages: + msg22880
2021-06-14 21:02:30	ganesh	set	messages: - msg22880
2022-04-12 12:59:38	bfrk	set	status: unknown -> has-patch
2022-04-12 13:07:51	bfrk	set	priority: bug