Patch 1980: split off D.R.Pristine and D.R.Traverse ... (and 4 more)

Title	split off D.R.Pristine and D.R.Traverse ... (and 4 more)
Superseder		Nosy List	bfrk
Related Issues
Status	accepted	Assigned To
Milestone

Created on 2020-02-10.14:17:45 by bfrk, last changed 2020-02-28.12:25:23 by bfrk.

Files
File name	Status	Uploaded	Type	Edit	Remove
in-d_r_traverse_-replace-ifioerror-with-ifdoesnotexisterror.dpatch		bfrk, 2020-02-25.08:03:38	application/x-darcs-patch
pEpkey.asc		bfrk, 2020-02-25.07:46:37	application/pgp-keys
pEpkey.asc		bfrk, 2020-02-26.07:57:11	application/pgp-keys
pEpkey.asc		bfrk, 2020-02-27.23:18:11	application/pgp-keys
pEpkey.asc		bfrk, 2020-02-28.12:25:22	application/pgp-keys
patch-preview.txt		bfrk, 2020-02-10.14:17:44	text/x-darcs-patch
patch-preview.txt		bfrk, 2020-02-25.08:03:38	text/x-darcs-patch
split-off-d_r_pristine-and-d_r_traverse-from-d_r_hashed.dpatch		bfrk, 2020-02-10.14:17:44	application/x-darcs-patch
unnamed		bfrk, 2020-02-10.14:17:44	text/plain
unnamed		bfrk, 2020-02-25.08:03:38	text/plain

See mailing list archives for discussion on individual patches.

Messages
msg21783 (view)	Author: bfrk	Date: 2020-02-10.14:17:45
Over the years Darcs.Repository.Hashed has become a sink for all kinds of Repository code that is only weakly related to the core functionality. In particular, the inventory/patch traversal code that was added to support cleaning of repos and the cache is independent from reading and writing patches. The interfaces are smaller now and refactors become easier. 5 patches for repository http://darcs.net/screened: patch b79753cdb31458f7ae1c763a992f35a940c12742 Author: Ben Franksen <ben.franksen@online.de> Date: Wed Feb 5 09:10:22 CET 2020 * split off D.R.Pristine and D.R.Traverse from D.R.Hashed patch bcd89758f9ada7c68b2c09957271d44d5b1534e3 Author: Ben Franksen <ben.franksen@online.de> Date: Wed Feb 5 09:51:04 CET 2020 * D.R.Traverse: import explicitly from D.R.Inventory patch d42f066b6a9a18e0d90eb957a92220733ae2f647 Author: Ben Franksen <ben.franksen@online.de> Date: Wed Feb 5 12:30:36 CET 2020 * add utility functions handleOnly, handleOnlyIOError, and ifIOError patch 6d97d20f213798d43f7a595591c3e308713de511 Author: Ben Franksen <ben.franksen@online.de> Date: Wed Feb 5 12:15:04 CET 2020 * use ifIOError in D.R.Traverse and remove filterDirContents patch 574949493fbd05fb0645b1ee7cb95505366da84f Author: Ben Franksen <ben.franksen@online.de> Date: Wed Feb 5 13:07:16 CET 2020 * rename some internal functions in D.R.Traverse Attachments patch-preview.txt split-off-d_r_pristine-and-d_r_traverse-from-d_r_hashed.dpatch unnamed
msg21914 (view)	Author: ganesh	Date: 2020-02-24.23:08:52
On 10/02/2020 14:17, Ben Franksen wrote: > patch b79753cdb31458f7ae1c763a992f35a940c12742 > Author: Ben Franksen <ben.franksen@online.de> > Date: Wed Feb 5 09:10:22 CET 2020 > * split off D.R.Pristine and D.R.Traverse from D.R.Hashed OK (I assume this is just moves) > patch bcd89758f9ada7c68b2c09957271d44d5b1534e3 > Author: Ben Franksen <ben.franksen@online.de> > Date: Wed Feb 5 09:51:04 CET 2020 > * D.R.Traverse: import explicitly from D.R.Inventory OK > patch d42f066b6a9a18e0d90eb957a92220733ae2f647 > Author: Ben Franksen <ben.franksen@online.de> > Date: Wed Feb 5 12:30:36 CET 2020 > * add utility functions handleOnly, handleOnlyIOError, and ifIOError OK - there is handleJust etc in Control.Exception already, but using Maybe rather than Bool. > patch 6d97d20f213798d43f7a595591c3e308713de511 > Author: Ben Franksen <ben.franksen@online.de> > Date: Wed Feb 5 12:15:04 CET 2020 > * use ifIOError in D.R.Traverse and remove filterDirContents I'm not too happy about replacing 'doesDirectoryExist'/'doesFileExist' with 'ifIOError'. That'll cover many more errors than just "it doesn't exist". Even though doesFileExist etc are themselves implemented with IO error catching, I am not convinced you'd get the same result for things like permission errors. > patch 574949493fbd05fb0645b1ee7cb95505366da84f > Author: Ben Franksen <ben.franksen@online.de> > Date: Wed Feb 5 13:07:16 CET 2020 > * rename some internal functions in D.R.Traverse OK
msg21916 (view)	Author: bfrk	Date: 2020-02-25.07:46:37
>> * split off D.R.Pristine and D.R.Traverse from D.R.Hashed > > OK (I assume this is just moves) Yes. >> * add utility functions handleOnly, handleOnlyIOError, and ifIOError > > OK - there is handleJust etc in Control.Exception already, but using > Maybe rather than Bool. I tried handleJust but found it too awkward to use. >> patch 6d97d20f213798d43f7a595591c3e308713de511 >> Author: Ben Franksen <ben.franksen@online.de> >> Date: Wed Feb 5 12:15:04 CET 2020 >> * use ifIOError in D.R.Traverse and remove filterDirContents > > I'm not too happy about replacing 'doesDirectoryExist'/'doesFileExist' > with 'ifIOError'. That'll cover many more errors than just "it doesn't > exist". Even though doesFileExist etc are themselves implemented with IO > error catching, I am not convinced you'd get the same result for things > like permission errors. You are right. Will fix, probably adding ifDoesNotExistError and use that instead. Attachments pEpkey.asc
msg21917 (view)	Author: bfrk	Date: 2020-02-25.08:03:38
Following up on review. 1 patch for repository http://darcs.net/screened: patch 7a1839ed4cdc68c3b0db62134f3ff484c1f39896 Author: Ben Franksen <ben.franksen@online.de> Date: Tue Feb 25 08:58:41 CET 2020 * in D.R.Traverse, replace ifIOError with ifDoesNotExistError This restores the behavior before patch 6d97d20f213798d43f7a595 and avoids hiding problems e.g. due to wrong permissions. Attachments in-d_r_traverse_-replace-ifioerror-with-ifdoesnotexisterror.dpatch patch-preview.txt unnamed
msg21923 (view)	Author: ganesh	Date: 2020-02-26.07:07:40
On 25/02/2020 07:46, Ben Franksen wrote: > You are right. Will fix, probably adding ifDoesNotExistError and use > that instead. I should have also commented that I also find using exceptions instead of normal control flow a bit strange, although I don't know of any concrete problems with doing it in Haskell so it's just a matter of a style preference. In many other languages exceptions are often relatively slow. I have no idea of the speed in Haskell but I doubt it's material here. Using exceptions for non-exceptional situations makes debugging harder as a common debugging pattern is to ask the debugger to break the first time an exception is thrown. So "expected" exceptions just get in the way. But since Haskell doesn't have a usable debugger this is somewhat irrelevant too :-)
msg21925 (view)	Author: bfrk	Date: 2020-02-26.07:57:11
> I should have also commented that I also find using exceptions instead > of normal control flow a bit strange, although I don't know of any > concrete problems with doing it in Haskell so it's just a matter of a > style preference. I beg to differ. This is not about style preference, but about reliability. There are two independent questions here. One is whether it is a good idea to use exceptions to handle (or not handle) failures of IO actions, or whether it is better to indicate failure via return value. There is considerable debate on that. However, this one was decided for us by the people designing Haskell & the standard libraries. I agree that the choice is questionable. A completely different question is whether it is a good idea to first check for suitable conditions before running an action to avoid possible failures, or whether it is better to let the action fail and then handle that. There is (astonishingly) an almost universal consensus that the latter is to be preferred. This is because the former method always involves a race condition with outside agents: another process may interfere between the check and the action and invalidate the result of the check before the action is run. This is generally referred to as TOCOU (time of check, time of use). It impedes reliability and can lead to bugs that are very hard to reproduce. We should /always/ let IO actions fail and then handle the failure and /never/ first check for suitable conditions. Attachments pEpkey.asc
msg21927 (view)	Author: ganesh	Date: 2020-02-26.12:42:14
On 26/02/2020 07:57, Ben Franksen wrote: > A completely different question is whether it is a good idea to first > check for suitable conditions before running an action to avoid possible > failures, or whether it is better to let the action fail and then handle > that. There is (astonishingly) an almost universal consensus that the > latter is to be preferred. This is because the former method always > involves a race condition with outside agents: another process may > interfere between the check and the action and invalidate the result of > the check before the action is run. This is generally referred to as > TOCOU (time of check, time of use). It impedes reliability and can lead > to bugs that are very hard to reproduce. > > We should /always/ let IO actions fail and then handle the failure and > /never/ first check for suitable conditions. If we are relying on this property to guarantee atomicity with respect to outside agents, we'd need to check what the library we call actually does. But in practice most IO actions we do are compound ones anyway. Take this example: > -- \| Remove unreferenced files in the patches directory. > cleanPatches :: Repository rt p wR wU wT -> IO () > cleanPatches _ = do > debugMessage "Cleaning out patches..." > hs <- (specialPatches ++) <$> listPatchesLocal PlainLayout darcsdir darcsdir > fs <- ifIOError [] (listDirectory patchesDirPath) > mapM_ (removeFileMayNotExist . (patchesDirPath </>)) (diffHashLists fs hs) Whether or not listDirectory is atomic, removing each unwanted patch file one by one certainly isn't. Yes, removeFileMayNotExist is itself atomic, but by the time we get round to running it, hs might no longer be valid if someone changed the repo, so we'd be doing the wrong thing anyway. Anyway, I'm ok with the current code, even if I don't think it buys us anything.
msg21938 (view)	Author: bfrk	Date: 2020-02-27.23:18:11
>> A completely different question is whether it is a good idea to first >> check for suitable conditions before running an action to avoid possible >> failures, or whether it is better to let the action fail and then handle >> that.[..] >> We should /always/ let IO actions fail and then handle the failure and >> /never/ first check for suitable conditions. > > If we are relying on this property to guarantee atomicity with respect > to outside agents, we'd need to check what the library we call actually > does. Why? The action can either succeed or fail. If it succeeds, fine. If not, we catch the exception. This is 100% reliable by definition, regardless of whether the action we perform is atomic or not. > But in practice most IO actions we do are compound ones anyway. > Take this example: > >> -- \| Remove unreferenced files in the patches directory. >> cleanPatches :: Repository rt p wR wU wT -> IO () >> cleanPatches _ = do >> debugMessage "Cleaning out patches..." >> hs <- (specialPatches ++) <$> listPatchesLocal PlainLayout darcsdir darcsdir >> fs <- ifIOError [] (listDirectory patchesDirPath) >> mapM_ (removeFileMayNotExist . (patchesDirPath </>)) (diffHashLists fs hs) > > Whether or not listDirectory is atomic, removing each unwanted patch > file one by one certainly isn't. Yes, removeFileMayNotExist is itself > atomic, but by the time we get round to running it, hs might no longer > be valid if someone changed the repo, so we'd be doing the wrong thing > anyway. You are missing the point. This code naturally runs under withRepoLock. So we know that the files we got from listPatchesLocal are no longer needed because no other /darcs/ command can change the state. So removing them is safe. There is really nothing that can go wrong. In contrast, if we check if files exist and only then remove them, but without catching possible exceptions, then there is a small chance that the action still fails because some outside process deletes the file before we can. But now we aren't prepared for that and darcs will crash with an unhandled exception. You might be tempted to reply that this is better because it doesn't hide that we have a problem with an outside agent that interferes with our repo. That's a design decision and I'd be okay with that, but then why bother with the check in the first place? If we were doing these things in C, we could ignore the return value of the system call. This would be okay: if we cannot remove the file here, this is not a problem. But we are in Haskell and therefore stuck with IO actions throwing exceptions when something goes wrong. If we don't handle them then the program may crash, even if the error is not fatal and we would normally want to continue. Attachments pEpkey.asc
msg21942 (view)	Author: ganesh	Date: 2020-02-28.07:37:35
On 27/02/2020 23:18, Ben Franksen wrote: > > Ben Franksen <ben.franksen@online.de> added the comment: > >>> A completely different question is whether it is a good idea to first >>> check for suitable conditions before running an action to avoid possible >>> failures, or whether it is better to let the action fail and then handle >>> that.[..] >>> We should /always/ let IO actions fail and then handle the failure and >>> /never/ first check for suitable conditions. >> >> If we are relying on this property to guarantee atomicity with respect >> to outside agents, we'd need to check what the library we call actually >> does. > > Why? The action can either succeed or fail. If it succeeds, fine. If > not, we catch the exception. This is 100% reliable by definition, > regardless of whether the action we perform is atomic or not. What happens if the library is internally (pseudocode): exists <- doesExist d if exists then read d else throw (doesNotExistError d) Then we're back where we started anyway. > You are missing the point. This code naturally runs under withRepoLock. > So we know that the files we got from listPatchesLocal are no longer > needed because no other /darcs/ command can change the state. So > removing them is safe. There is really nothing that can go wrong. The same outside agent that is randomly deleting things without a care for our lock could be randomly editing inventories :-) > In contrast, if we check if files exist and only then remove them, but > without catching possible exceptions, then there is a small chance that > the action still fails because some outside process deletes the file > before we can. But now we aren't prepared for that and darcs will crash > with an unhandled exception. You might be tempted to reply that this is > better because it doesn't hide that we have a problem with an outside > agent that interferes with our repo. That's a design decision and I'd be > okay with that, but then why bother with the check in the first place? If it's already a violation of the repository invariants for the directory not to be there, I'd be fine with us just failing all the time. (In that case my style argument also doesn't apply, as it's not an expected case.)
msg21951 (view)	Author: bfrk	Date: 2020-02-28.12:25:22
>>>> We should /always/ let IO actions fail and then handle the failure and >>>> /never/ first check for suitable conditions. >>> >>> If we are relying on this property to guarantee atomicity with respect >>> to outside agents, we'd need to check what the library we call actually >>> does. >> >> Why? The action can either succeed or fail. If it succeeds, fine. If >> not, we catch the exception. This is 100% reliable by definition, >> regardless of whether the action we perform is atomic or not. > > What happens if the library is internally (pseudocode): > > exists <- doesExist d > if exists then read d else throw (doesNotExistError d) > > Then we're back where we started anyway. Sigh. No we're not. We catch the exception and all is well. We have a problem only if we do /not/ catch the exception, proving my point. (BTW, if I make a logical argument as I did above (by simple case distinction) then you are supposed to either accept that I am right or show me where I made a logical mistake or tell me where my assumptions are invalid.) Of course we must rely on libraries to fulfill their contract. If the docs say that removeFileMayNotExist does not throw an exception if the file doesn't exist, then it should really do so. Implementing it by first testing if it exists and only then removing it would be a bug. But if it does not claim that and we don't catch the exception then this is our bug. >> You are missing the point. This code naturally runs under withRepoLock. >> So we know that the files we got from listPatchesLocal are no longer >> needed because no other /darcs/ command can change the state. So >> removing them is safe. There is really nothing that can go wrong. > > The same outside agent that is randomly deleting things without a care > for our lock could be randomly editing inventories :-) But then the inventories are broken anyway. There is nothing we can do about that. Here is a more likely scenario: suppose we try to optimise 'optimize clean' by running multiple threads in parallel. Then the variant with the TOCTOU would be prone to random failures whereas the one that catches the exception works fine. We actually had exactly such problems with packs IIRC. >> In contrast, if we check if files exist and only then remove them, but >> without catching possible exceptions, then there is a small chance that >> the action still fails because some outside process deletes the file >> before we can. But now we aren't prepared for that and darcs will crash >> with an unhandled exception. You might be tempted to reply that this is >> better because it doesn't hide that we have a problem with an outside >> agent that interferes with our repo. That's a design decision and I'd be >> okay with that, but then why bother with the check in the first place? > > If it's already a violation of the repository invariants for the > directory not to be there, I'd be fine with us just failing all the > time. I agree. But single patch files could rightfully not be there if the repo is lazy. Attachments pEpkey.asc

History
Date	User	Action	Args
2020-02-10 14:17:45	bfrk	create
2020-02-10 14:21:25	bfrk	set	status: needs-screening -> needs-review
2020-02-24 23:08:53	ganesh	set	messages: + msg21914
2020-02-25 07:46:37	bfrk	set	files: + pEpkey.asc messages: + msg21916
2020-02-25 08:03:38	bfrk	set	files: + patch-preview.txt, in-d_r_traverse_-replace-ifioerror-with-ifdoesnotexisterror.dpatch, unnamed messages: + msg21917
2020-02-26 07:07:41	ganesh	set	messages: + msg21923
2020-02-26 07:07:48	ganesh	set	status: needs-review -> accepted-pending-tests
2020-02-26 07:57:12	bfrk	set	files: + pEpkey.asc messages: + msg21925
2020-02-26 12:42:15	ganesh	set	messages: + msg21927
2020-02-27 20:29:12	ganesh	set	status: accepted-pending-tests -> accepted
2020-02-27 23:18:12	bfrk	set	files: + pEpkey.asc messages: + msg21938
2020-02-28 07:37:36	ganesh	set	messages: + msg21942
2020-02-28 12:25:23	bfrk	set	files: + pEpkey.asc messages: + msg21951

Patch 1980 split off D.R.Pristine and D.R.Traverse ... (and 4 more)

Attachments

Attachments

Attachments

Attachments

Attachments

Attachments