Patch 310: Resolve issue 1599: automatically expire unused caches

Title	Resolve issue 1599: automatically expire unused caches
Superseder		Nosy List	abuiles, galbolle, ganesh, kowey, mornfall
Related Issues
Status	accepted	Assigned To	mornfall
Milestone

Created on 2010-07-22.06:12:24 by abuiles, last changed 2011-05-10.22:07:23 by darcswatch. Tracked on DarcsWatch.

Files
File name	Status	Uploaded	Type	Edit	Remove
-resolve-issue-1599_-automatically-expire-unused-caches.dpatch		abuiles, 2010-07-28.13:47:46	text/x-darcs-patch
resolve-issue-1599_-automatically-expire-unused-caches.dpatch		abuiles, 2010-07-22.06:12:24	text/x-darcs-patch
resolve-issue-1599_-automatically-expire-unused-caches.dpatch		abuiles, 2010-07-28.20:55:19	text/x-darcs-patch
resolve-issue-1599_-automatically-expire-unused-caches.dpatch		abuiles, 2010-08-03.14:32:49	text/x-darcs-patch
resolve-issue-1599_-automatically-expire-unused-caches.dpatch		abuiles, 2010-08-06.06:41:02	text/x-darcs-patch
resolve-issue-1599_-automatically-expire-unused-caches.dpatch		abuiles, 2010-08-07.00:41:37	text/x-darcs-patch
unnamed		abuiles, 2010-07-22.06:12:24
unnamed		abuiles, 2010-07-28.13:47:46
unnamed		abuiles, 2010-07-28.20:55:19
unnamed		abuiles, 2010-08-03.14:32:49
unnamed		abuiles, 2010-08-06.06:41:02
unnamed		abuiles, 2010-08-07.00:41:37

See mailing list archives for discussion on individual patches.

Messages
msg11815 (view)	Author: abuiles	Date: 2010-07-22.06:12:24
1 patch for repository http://darcs.net: Wed Jul 21 22:29:45 COT 2010 builes.adolfo@googlemail.com * Resolve issue 1599: automatically expire unused caches Attachments resolve-issue-1599_-automatically-expire-unused-caches.dpatch unnamed
msg11825 (view)	Author: kowey	Date: 2010-07-22.15:47:56
On Thu, Jul 22, 2010 at 06:12:24 +0000, Adolfo Builes wrote: > Wed Jul 21 22:29:45 COT 2010 builes.adolfo@googlemail.com > * Resolve issue 1599: automatically expire unused caches Resolve issue 1599: automatically expire unused caches ------------------------------------------------------ > builes.adolfo@googlemail.com**20100722032945 > Ignore-this: 38455634b5648ed7555380336ea74464 > ] hunk ./src/Darcs/Global.hs 30 > whenDebugMode, withDebugMode, setDebugMode, > debugMessage, debugFail, putTiming, > addCRCWarning, getCRCWarnings, resetCRCWarnings, > - darcsdir > + addBadCache, getBadCacheList, isBadCache, darcsdir, > + isTrustedHttp, addTrustedHttp > hunk ./src/Darcs/Global.hs 73 > Just actions <- swapMVar atexitActions Nothing > -- from now on atexit will not register new actions > mapM_ runAction actions > + badCaches <- getBadCacheList > + when ( not $ null badCaches ) $ reportBadCaches badCaches OK, so after running all of the atExitActions, we print out our accumulated warnings about caches (should we worry about what happens if one of the atexit actions fails)? > - > + reportBadCaches caches = do > + if length caches > 1 > + then > + hPutStrLn stderr "\nI could not reach the following repositories (listed in _darcs/prefs/sources):" > + else > + hPutStrLn stderr "\nI could not reach the following repository (listed in _darcs/prefs/sources):" For what it's worth, this could be more simply expressed using the English module. It lets you write something like "I could not reach the following" ++ englishNum (length caches) (Noun "repository") "(list in in ...)" It looks like you'd have to modify Noun to account for this. I'll submit a patch. It may also be a good idea to give the user some advice about what to do here. > + mapM_ (hPutStrLn stderr) caches > + hPutStrLn stderr "" Alternatively, hPutStrLn (unlines caches ++ "\n") > +type BadCacheList = [String] > +{- NOINLINE _badCacheList -} > +_badCacheList :: IORef BadCacheList > +_badCacheList = unsafePerformIO $ newIORef [] > + > +addBadCache :: String -> IO () > +addBadCache cache = modifyIORef _badCacheList (cache:) > + > +getBadCacheList :: IO [String] > +getBadCacheList = readIORef _badCacheList > + > +{- NOINLINE isBadCache -} > +isBadCache :: String -> Bool > +isBadCache cache = unsafePerformIO $ do badCaches <- getBadCacheList > + return (cache `elem` badCaches) Is it really necessary for this to be String -> Bool? Why not String -> IO Bool? Alternatively, why not IO (String -> Bool)? I'm also a bit sceptical about the need for a type synonym. Maybe just passing around [String] is clearer here. > +type TrustedHttps = [String] > +{- NOINLINE _trustedHttpList -} > +_trustedHttpList :: IORef TrustedHttps > +_trustedHttpList = unsafePerformIO $ newIORef [] So you maintain a list of URLs which you managed to reach the first time you contacted them this session. 'Trust' is probably not the right notion to use here (since I'd be thinking more in terms of security or something like that). How about "reachable" instead? > +addTrustedHttp :: String -> IO () > +addTrustedHttp http = modifyIORef _trustedHttpList (http:) > + > +getTrustedHttps :: IO TrustedHttps > +getTrustedHttps = readIORef _trustedHttpList > + > +{- NOINLINE isTrustedHttp -} > +isTrustedHttp :: String -> Bool > +isTrustedHttp http = unsafePerformIO $ do trustedHttps <- getTrustedHttps > + return (http `elem` trustedHttps) Same complaint about possibly unnecessary unsafePerformIO. I mean, we certainly use it to achieve tracking of some global information in Darcs, but it's something we should be be somewhat reluctant to do lest Darcs surprise us in the future. > +-- \| Returns the sources of a given cache. > +cacheSource :: CacheLoc -> String > +cacheSource (Cache _ _ s) = s Potentially useful helper function. Seems like we could alternatively modify the definition of Cache to use the curly brace syntax. > fetchFileUsingCache :: Cache -> HashedDir -> String -> IO (String, B.ByteString) > -fetchFileUsingCache = fetchFileUsingCachePrivate Anywhere > +fetchFileUsingCache c h s = do result <- fetchFileUsingCachePrivate Anywhere c h s > + return result This change does not seem necessary. What was wrong with the eta-reduced version of this code? > sfuc [] _ = return () > - sfuc (c:cs) out \| not $ writable c = > + sfuc (c:cs) out \| (not $ badCache c) && (not $ writable c) = > - then speculateFileOrUrl (fn c) out > + then speculateFileOrUrl (fn c) out `catchNonSignal` (\e -> do > + addToBadCaches (show e) c) > - else copyFileOrUrl [] (fn c) out Cachable > + else copyFileOrUrl [] (fn c) out Cachable `catchNonSignal` (\e -> do > + addToBadCaches (show e) c) OK, so if something goes wrong when speculating on or copying a file, (but it's not just the user hitting ^-c), we note the cache down as being problematic. > > +-- \| Checks if a given cache needs to be added to the list of bad caches. > +-- It receives an error caught during execution and the cache. > +-- For a local cache, if the given source doesn't exist anymore, it is added. > +-- For HTTP sources if the error is timeout, it is added, if not we check if the source > +-- still exist, if doesn't exist we added to the list of bad caches. > +-- For SSH we check if the server is reachable. > +-- The entries which get added to the cache are no longer tried for the rest of the command. > +addToBadCaches :: String -> CacheLoc -> IO () Thanks for haddocking this. It seems like addToBadCaches is may be a bit of misnomer first this function primarily tests if a cache is reachable, and second because it adds the cache entry to either the white or the blacklist. But I'm not sure; maybe I misunderstand the code. After all, you do only call it when something goes wrong... Style point: The code below is very heavily indented, which makes it harder to read and review without a wide screen or window. It's a tricky trade-off between too much and not enough, so I don't really have any answers. Perhaps this style could help here addToBadCaches e cache \| condition 1 = ... \| condition 2 = ... Note the retracted guards and also starting each case on a new line > +addToBadCaches e cache \| isFile (cacheSource cache) = case "No such file or directory" `isInfixOf` (map toLower e) of > + True -> do > + exist <- doesDirectoryExist $ cacheSource $ cache > + when (not exist) $ addBadCache . cacheSource $ cache > + False -> return () 1. I'm not sure what you gain from pattern-matching on the Bool when you could just use an if-then-else. 2. Do we not have any exception types to work with here? Matching on the string seems really fragile. 3. Case in point, you're matching "Foo" against (map toLower "foo") which means that this code path is never visited. > + \| isUrl (cacheSource cache) = case "timeout" `isInfixOf` (map toLower e) of > + True -> addBadCache . cacheSource $ cache > + False -> case isTrustedHttp (cacheSource cache) of > + True -> return () > + False -> do > + let url = if last (cacheSource cache) == '/' > + then cacheSource cache > + else (cacheSource cache) ++ "/" > + rsp <- simpleHTTP (getRequest url ) > + do case rsp of > + Left _ -> return () > + Right response -> if rspCode response /= (2,0,0) > + then addBadCache . cacheSource $ cache > + else addTrustedHttp (cacheSource cache) >> return () OK earlier we see that this function is called if we receive some exception when fetching over HTTP. If the exception is not a timeout, we test if the URL is reachable and whitelist it if so (the whitelist avoids future testing). But why do we do this? And doesn't that confuse Darcs into thinking it successfully fetched a file when it did not? Have you tested the actual case where you get an exception but not a timeout? Also, are there not web server configurations where http://example.com/foo gives you a 404, but http://example.com/foo/_darcs/foo gives you a result? What happens in those cases? In other words, does the test of trying to fetch http://example.com/foo/ actually make sense? Also, why is it important to normalise the URL to have a trailing slash? Is it just to avoid duplicate entries in the whitelist? Code tidiness: 1. It could be useful to look into reducing the amount of indentation you use. It's a tricky trade-off between too much and not enough, so I don't really have any answers. Perhaps this style could help here addToBadCaches e cache \| condition 1 = ... \| condition 2 = ... It's all about making the code as easy to read as possible. 2. It seems like you should just name a source = cacheSource cache, which I think will make the code a lot simpler looking 3. Why does addTrustedHttp need to be followed by return () if it already returns ()? > + \| isSsh (cacheSource cache) = checkSshError (cacheSource cache) (getHttpUrl (cacheSource cache)) One principle I saw off some blog post somewhere is that when you have a single unit of code, the code in that unit should tend to reside on the same level of abstraction. So a function like if foo then do low-level-A1 low-level-A2 low-level-A3 else high-level-B is a bit odd, when you could instead write something like if foo then do low-level-A1 low-level-A2 low-level-A3 else do low-level-B1 low-level-B2 OR alternatively if foo then do high-level-A else do high-level-B But not a mix. I think why this is useful for readability is that it makes it very clear that A and B are on the same sort of level of operation. Anyway, don't take this sort of advice /too/ seriously. It's one of these fuzzy things that I suspect you develop a feel for over time. In this particular case, you've chosen to break checkSshError into a separate function, but not the HTTP stuff. Why? > + \| otherwise = fail $ "unknown transport protocol for: " ++ (cacheSource cache) > +-- \| Helper function to check reachability of a ssh source. > +checkSshError :: String -> String -> IO () > +checkSshError source url = do > + simpleHTTP (getRequest url ) > + `catchNonSignal` (\e -> do > + case "timeout" `isInfixOf` (map toLower (show e)) of > + True -> (addBadCache source) >> return (Left (ErrorMisc "timeout")) > + False -> return () >> return (Left (ErrorMisc "Unkown error"))) > + return () > +-- \| Given a SSH source it returns the server address appended with HTTP:// > +-- i.e: getHttpUrl "user@darcs.net:repo" returns "http://darcs.net" > +getHttpUrl :: String -> String > +getHttpUrl source = "http://" ++ ((\(_:xs) -> takeWhile (/=':') xs) . dropWhile (/='@') $ source) Why does checkSshError take both the source and url argument? Is there a case where url cannot be systematically derived from the source? What happens to the Either type that you return in your handler? Doesn't it get discarded? Why are we going about things in this roundabout way of checking the SSH path by converting the source name to an HTTP URI and fetching it over HTTP? What about cases where people have a working SSH server but not necessarily an HTTP server on a given machine? > where ffuc (c:cs) > - \| not (writable c) && (Anywhere == fromWhere \|\| isFile (fn c)) = > + \| not (writable c) && (Anywhere == fromWhere \|\| isFile (fn c)) && (not $ badCache c) = This seems ok although incidentally, not (badCache c) is probably simpler than (not $ badCache c) > - `catchall` ffuc cs > + `catchNonSignal` (\e -> do > + addToBadCaches (show e) c > + ffuc cs ) > - `catchall` do (fname,x) <- ffuc cs > - do createCache c subdir > - createLink fname (fn c) > - return (fn c, x) > - `catchall` > - do gzWriteFilePS (fn c) x `catchall` return () > - return (fname,x) > + `catchNonSignal` (\ e -> do addToBadCaches (show e) c ; (fname,x) <- ffuc cs > + do createCache c subdir > + createLink fname (fn c) > + return (fn c, x) > + `catchall` > + do gzWriteFilePS (fn c) x `catchall` return () > + return (fname,x)) This block of code actually has very few changes in it, but the fact would have been more apparent if you tried to avoid the re-indentation (not always possible). In this case, you could have - `catchall` do (fname,x) <- ffuc cs - do createCache c subdir - createLink fname (fn c) ... + `catchNonSignal` (\ e -> + do addToBadCaches (show e) c + (fname,x) <- ffuc cs + do createCache c subdir I also took the liberty of simplifying away the semicolon. > +-- \| Checks if a given cache is in the list of bad caches. > +badCache :: CacheLoc -> Bool > +badCache = isBadCache . cacheSource Do you really use this? > import Data.Maybe ( isJust, catMaybes ) > import Control.Monad ( msum ) > - dff <- fetchFilePS (repo ++ "/" ++ df) Cachable `catchall` return B.empty > + (timeOut,dff) <- (fetchFilePS (repo ++ "/" ++ df) Cachable >>= \ r -> return (False,r)) > + `catchNonSignal` (\ e -> return (isTimeOut e,B.empty)) This may be more readable using do notation (timeOut,dff) <- do r <- fetchFilePS (repo ++ "/" ++ df) Cachable return (False,r) `catchNonSignal` (\ e -> return (isTimeOut e,B.empty)) The same sort of worry as above. What happens if your exception is not a timeout? I'm concerned that you succeed here, where what you really want to do maybe is re-throw the exception. > -- below is a workaround for servers that don't return a 404 on nonexistent files > hunk ./src/Darcs/Repository/Format.hs 57 > - rf <- if B.null dff \|\| isJust (BC.elemIndex '<' dff) [snip] > - else return $ Right $ parseRepoFormat dff > + rf <- case timeOut of > + False -> > + if B.null dff \|\| isJust (BC.elemIndex '<' dff) [snip indentation change] > + else return $ Right $ parseRepoFormat dff > + True -> return $ Left $ "Failed establishing a connection with "++ repo ++ " Timeout was reached" > (\e -> return (Left (prettyException e))) > + isTimeOut e = "timeout" `isInfixOf` map toLower (show e) It seems like the notion of whether something is a timeout or not (you check this by looking at the exception text) could be refactored. I guess it's not a big deal if it is though. -- Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow> For a faster response, please try +44 (0)1273 64 2905.
msg11828 (view)	Author: mornfall	Date: 2010-07-22.18:08:14
Eric Kow <kowey@darcs.net> writes: > If the exception is not a timeout, we test if the URL is reachable and > whitelist it if so (the whitelist avoids future testing). But why do we > do this? And doesn't that confuse Darcs into thinking it successfully > fetched a file when it did not? Have you tested the actual case where > you get an exception but not a timeout? I believe that we do this because a 404 could be quite legitimate: the repository could be partially lazy but still have a lot of things we want to use. We don't want to blacklist a source for this reason alone. We also don't want to inspect a source each time it gives a 404, since this is not for free, either. > Also, are there not web server configurations where > http://example.com/foo gives you a 404, but > http://example.com/foo/_darcs/foo gives you a result? What happens in > those cases? In other words, does the test of trying to fetch > http://example.com/foo/ actually make sense? > > Also, why is it important to normalise the URL to have a trailing slash? > Is it just to avoid duplicate entries in the whitelist? I suspect that there are webservers that treat foo/ and foo differently. But I am not saying one is better than the other. Anyway, I think the right thing to look for is foo/_darcs/hashed_inventory (we shouldn't be using caches for non-hashed repositories). I wholeheartedly agree with the indentation issue Eric has pointed out, too. Please try to fit your code in 80-90 columns if possible, although 100 is acceptable and 110 is stretching it but could be forgiven if there's a good reason. : - ) Code that is wider is usually hard to read for almost everyone (or so I guess). Yours, Petr.
msg11831 (view)	Author: abuiles	Date: 2010-07-23.01:35:57
Hi Eric, > > OK, so after running all of the atExitActions, we print out our > accumulated warnings about caches (should we worry about what > happens if one of the atexit actions fails)? > Yes, we should care about it, I will fix it. > >> +type BadCacheList = [String] >> +{- NOINLINE _badCacheList -} >> +_badCacheList :: IORef BadCacheList >> +_badCacheList = unsafePerformIO $ newIORef [] >> + >> +addBadCache :: String -> IO () >> +addBadCache cache = modifyIORef _badCacheList (cache:) >> + >> +getBadCacheList :: IO [String] >> +getBadCacheList = readIORef _badCacheList >> + >> +{- NOINLINE isBadCache -} >> +isBadCache :: String -> Bool >> +isBadCache cache = unsafePerformIO $ do badCaches <- getBadCacheList >> + return (cache `elem` badCaches) > > Is it really necessary for this to be String -> Bool? > Why not String -> IO Bool? > > Alternatively, why not IO (String -> Bool)? > I used (String -> Bool) basically to escape the IO so that I could use it easily in the guards. > I'm also a bit sceptical about the need for a type synonym. > Maybe just passing around [String] is clearer here. > The synonym was to give a better insight of what we were getting, but indeed, it can be inferred from the function name. >> +type TrustedHttps = [String] >> +{- NOINLINE _trustedHttpList -} >> +_trustedHttpList :: IORef TrustedHttps >> +_trustedHttpList = unsafePerformIO $ newIORef [] > > So you maintain a list of URLs which you managed to reach the > first time you contacted them this session. > > 'Trust' is probably not the right notion to use here (since I'd > be thinking more in terms of security or something like that). > How about "reachable" instead? > "reachable" or "reliable" could be the words to express what I wanted. >> +addTrustedHttp :: String -> IO () >> +addTrustedHttp http = modifyIORef _trustedHttpList (http:) >> + >> +getTrustedHttps :: IO TrustedHttps >> +getTrustedHttps = readIORef _trustedHttpList >> + >> +{- NOINLINE isTrustedHttp -} >> +isTrustedHttp :: String -> Bool >> +isTrustedHttp http = unsafePerformIO $ do trustedHttps <- getTrustedHttps >> + return (http `elem` trustedHttps) > > Same complaint about possibly unnecessary unsafePerformIO. I mean, we > certainly use it to achieve tracking of some global information in > Darcs, but it's something we should be be somewhat reluctant to do lest > Darcs surprise us in the future. I totally agree that I abused of unsafePerformIO, in this case I used for the same reason I gave before. >> +-- \| Returns the sources of a given cache. >> +cacheSource :: CacheLoc -> String >> +cacheSource (Cache _ _ s) = s > > Potentially useful helper function. Seems like we could alternatively > modify the definition of Cache to use the curly brace syntax. > I would totally prefer curly brace syntax, do we have any reason for not using it ? >> fetchFileUsingCache :: Cache -> HashedDir -> String -> IO (String, B.ByteString) >> -fetchFileUsingCache = fetchFileUsingCachePrivate Anywhere >> +fetchFileUsingCache c h s = do result <- fetchFileUsingCachePrivate Anywhere c h s >> + return result > > This change does not seem necessary. What was wrong with the > eta-reduced version of this code? > There is nothing wrong with it, what happened was that I was using something else in the middle, and then took it away, I forgot to rewrite it. >> sfuc [] _ = return () >> - sfuc (c:cs) out \| not $ writable c = >> + sfuc (c:cs) out \| (not $ badCache c) && (not $ writable c) = > >> - then speculateFileOrUrl (fn c) out >> + then speculateFileOrUrl (fn c) out `catchNonSignal` (\e -> do >> + addToBadCaches (show e) c) > >> - else copyFileOrUrl [] (fn c) out Cachable >> + else copyFileOrUrl [] (fn c) out Cachable `catchNonSignal` (\e -> do >> + addToBadCaches (show e) c) > > OK, so if something goes wrong when speculating on or copying a file, > (but it's not just the user hitting ^-c), we note the cache down as > being problematic. > If we determinate that the cause of the error is one that we think is problematic, we added. > > It seems like addToBadCaches is may be a bit of misnomer first this > function primarily tests if a cache is reachable, and second because it > adds the cache entry to either the white or the blacklist. But I'm not > sure; maybe I misunderstand the code. After all, you do only call it > when something goes wrong... > I use the whitelist basically to don't test again if the repo is reachable or not. ( Given the fact that I can get an error just because certain file is not in that repo, but the repo does exist ) > Style point: > The code below is very heavily indented, which makes it harder to read > and review without a wide screen or window. It's a tricky trade-off > between too much and not enough, so I don't really have any answers. > Perhaps this style could help here > > addToBadCaches e cache > \| condition 1 = > ... > \| condition 2 = > ... > Just a stupid I idea I had in my mind that I would have to do things in this way addToBadCaches e cache \| condition 1 = ... \| condition 2 = .... > > 2. Do we not have any exception types to work with here? Matching > on the string seems really fragile. > I agree with you, I tried to work with the exception, unfortunately "fromException" was giving nothing to this error ( I will look at it again though). > 3. Case in point, you're matching "Foo" against (map toLower "foo") > which means that this code path is never visited. I used specifically for "timeout" since the error could show "Timeout" or "timeout". >> + \| isUrl (cacheSource cache) = case "timeout" `isInfixOf` (map toLower e) of >> + True -> addBadCache . cacheSource $ cache >> + False -> case isTrustedHttp (cacheSource cache) of >> + True -> return () >> + False -> do >> + let url = if last (cacheSource cache) == '/' >> + then cacheSource cache >> + else (cacheSource cache) ++ "/" >> + rsp <- simpleHTTP (getRequest url ) >> + do case rsp of >> + Left _ -> return () >> + Right response -> if rspCode response /= (2,0,0) >> + then addBadCache . cacheSource $ cache >> + else addTrustedHttp (cacheSource cache) >> return () > > OK earlier we see that this function is called if we receive some > exception when fetching over HTTP. > > If the exception is not a timeout, we test if the URL is reachable and > whitelist it if so (the whitelist avoids future testing). But why do we > do this? And doesn't that confuse Darcs into thinking it successfully > fetched a file when it did not? Have you tested the actual case where > you get an exception but not a timeout? > I did test it, and the reasons I add it to the white list, is to avoid inspecting again ( as Petr said). > Also, are there not web server configurations where > http://example.com/foo gives you a 404, but > http://example.com/foo/_darcs/foo gives you a result? What happens in > those cases? In other words, does the test of trying to fetch > http://example.com/foo/ actually make sense? > > Also, why is it important to normalise the URL to have a trailing slash? > Is it just to avoid duplicate entries in the whitelist? > I added the trailing slash basically because of the reason Petr said, cause some servers treats foo/ and foo differently, but as Petr suggest what I should test is if foo/_darcs/hashed_inventory exist. > Code tidiness: > > 1. It could be useful to look into reducing the amount of indentation > you use. It's a tricky trade-off between too much and not enough, > so I don't really have any answers. Perhaps this style could help here > > addToBadCaches e cache > \| condition 1 = > ... > \| condition 2 = > ... > > It's all about making the code as easy to read as possible. > > 2. It seems like you should just name a source = cacheSource cache, > which I think will make the code a lot simpler looking > I'll rewrite it thanks :). > > In this particular case, you've chosen to break checkSshError into > a separate function, but not the HTTP stuff. Why? > I did it just because it was getting too long ( yes, I tried to keep the columns <= 80, but I had that clumsy idea I mentioned before) > >> +-- \| Given a SSH source it returns the server address appended with HTTP:// >> +-- i.e: getHttpUrl "user@darcs.net:repo" returns "http://darcs.net" >> +getHttpUrl :: String -> String >> +getHttpUrl source = "http://" ++ ((\(_:xs) -> takeWhile (/=':') xs) . dropWhile (/='@') $ source) > > > Why does checkSshError take both the source and url argument? Is there > a case where url cannot be systematically derived from the source? > none, if fact the url is infere from the source. > What happens to the Either type that you return in your handler? > Doesn't it get discarded? > Yes is discarded, but in fact that's wrong, I wasn't very happy with it and I realized that it is wrong. > Why are we going about things in this roundabout way of checking the SSH > path by converting the source name to an HTTP URI and fetching it over > HTTP? What about cases where people have a working SSH server but not > necessarily an HTTP server on a given machine? > You are right here, and I realized today that this is not the way I should test that the ssh is reachable. ( I was just assuming that every server will give a 200 when fetching over HTTP, I forgot about cases like the one you mentioned). I didn't used the error message in this case, because with ssh we don't get an error like the one with http (i.e "timeout reached"), I will look better how can I handle that error, also in this patch checking for ssh I "wrongly" do it to see if it was a timeout error, but I didn't check if it was because the repo wasn't there any longer, I guess I would have to do something like "existRemoteFileSsh" and check for foo/_darcs/hashed_inventory > > Do you really use this? >> import Data.Maybe ( isJust, catMaybes ) >> import Control.Monad ( msum ) > That's not mine. I just didn't clean that up, as I think that kind of things should go in a different patch. > > (timeOut,dff) <- do r <- fetchFilePS (repo ++ "/" ++ df) Cachable > return (False,r) > `catchNonSignal` (\ e -> return (isTimeOut e,B.empty)) > > The same sort of worry as above. What happens if your exception is not > a timeout? I'm concerned that you succeed here, where what you really > want to do maybe is re-throw the exception. > If the exception is not timeout that means that It could be a possibility that "repo++"/"++darcsdir++"/inventory" exist, if not, we throw the final error. Now I have the doubt of how to go with the ssh server, I mean, how could I check that the SSH server is reachable or that the _darcs/hashed_inventory exist, a first idea come to my mind maybe with ssh, but I'm not sure of portability. Thanks ! -- Adolfo
msg11837 (view)	Author: kowey	Date: 2010-07-23.11:50:46
On Thu, Jul 22, 2010 at 20:38:35 -0500, Adolfo Builes wrote: > > Is it really necessary for this to be String -> Bool? > > Why not String -> IO Bool? > > > > Alternatively, why not IO (String -> Bool)? > > > I used (String -> Bool) basically to escape the IO so that I could > use it easily in the guards. ... > I totally agree that I abused of unsafePerformIO, in this case I used > for the same reason I gave before. I think an IO (String -> Bool) approach could maybe still work because you can grab the (String -> Bool) function within IO, and then use it within a guard. You'll have to decide if minimising the unsafePerformIO is worth the extra code complexity or not. I tend to say it is, but others may disagree with me. > >> +cacheSource (Cache _ _ s) = s > > > > Potentially useful helper function. Seems like we could alternatively > > modify the definition of Cache to use the curly brace syntax. > > > I would totally prefer curly brace syntax, do we have any reason for > not using it ? I'd say go for it. I trust you mean something like data Cache = Cache { ... , cacheSource :: String } > I use the whitelist basically to don't test again if the repo is > reachable or not. ( Given the fact that I can get an error just > because certain file is not in that repo, but the repo does exist ) OK, that makes some sense, that there may be a legitimate 404 on a file that does not exist, but the repository is reachable. Also, would it be necessary to worry about other HTTP status codes, such as the redirect ones (301 and 302?). Is there a more robust way to do this? [Being sort of an abstract, high-level kind of guy, working on Darcs has been a wonderful way for me to learn how things work a bit more on the lower level] > I agree with you, I tried to work with the exception, unfortunately > "fromException" was giving nothing to this error ( I will look at it > again though). Thanks. System.IO.Error may help. > > 3. Case in point, you're matching "Foo" against (map toLower "foo") > > which means that this code path is never visited. > I used specifically for "timeout" since the error could show "Timeout" > or "timeout". Sorry, I was referring to this: case "No such file or directory" `isInfixOf` (map toLower e) of > > If the exception is not a timeout, we test if the URL is reachable and > > whitelist it if so (the whitelist avoids future testing). But why do we > > do this? And doesn't that confuse Darcs into thinking it successfully > > fetched a file when it did not? Have you tested the actual case where > > you get an exception but not a timeout? > > > I did test it, and the reasons I add it to the white list, is to avoid > inspecting again ( as Petr said). Do you test it automatically? > I added the trailing slash basically because of the reason Petr said, > cause some servers treats foo/ and foo differently, but as Petr > suggest what I should test is if foo/_darcs/hashed_inventory exist. Yes. > > In this particular case, you've chosen to break checkSshError into > > a separate function, but not the HTTP stuff. Why? > > > I did it just because it was getting too long ( yes, I tried to keep > the columns <= 80, but I had that clumsy idea I mentioned before) OK, well what I would suggest is to break the local and HTTP stuff out too if you're going to break the SSH stuff out (now I may decide I was wrong after looking at the results of this, because I don't yet have a good feel about what is most readable)... > I guess I would have to do something like > "existRemoteFileSsh" and check for foo/_darcs/hashed_inventory Seems fair enough. > > Do you really use this? > >> import Data.Maybe ( isJust, catMaybes ) > >> import Control.Monad ( msum ) > > > That's not mine. I just didn't clean that up, as I think that kind of > things should go in a different patch. I was referring to badCache :: CacheLoc -> Bool This is one of these subtle and very subjective issues that I don't really have an answer for, but I have a feeling that there is sometimes a point where you're better off redoing application or composition than writing a helper function. An example of this would be a hypothetical putStrLnError = hPutStrLn stderr (In fact, I use this sort of thing in some of my own code). Seems like there the cost of the extra helper function to think about is is higher than that of just repeating 'hPutStrLn stderr'. Maybe somebody else has clearer thinking on this tradeoff. These things aren't big deals, of course. Maybe there is no right answer and there's no point talking about them. > > The same sort of worry as above. What happens if your exception is not > > a timeout? I'm concerned that you succeed here, where what you really > > want to do maybe is re-throw the exception. > > If the exception is not timeout that means that It could be a > possibility that "repo++"/"++darcsdir++"/inventory" exist, if not, we > throw the final error. OK, I have to admit I don't really understand the problem yet (I think Petr does) so I can trust the two of you. What I was worried about was that you were just absorbing the final error instead of throwing it. Are you sure you're actually throwing it? > Now I have the doubt of how to go with the ssh server, I mean, how > could I check that the SSH server is reachable or that the > _darcs/hashed_inventory exist, a first idea come to my mind maybe with > ssh, but I'm not sure of portability. Why not use the preExisting copyFileOrUrl (or was it fetch? not too sure I remember the difference) functions? -- Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow> For a faster response, please try +44 (0)1273 64 2905.
msg11839 (view)	Author: ganesh	Date: 2010-07-23.11:58:05
Eric Kow wrote: > On Thu, Jul 22, 2010 at 20:38:35 -0500, Adolfo Builes wrote: >>> Is it really necessary for this to be String -> Bool? >>> Why not String -> IO Bool? >>> >>> Alternatively, why not IO (String -> Bool)? >>> >> I used (String -> Bool) basically to escape the IO so that I could >> use it easily in the guards. > > ... > >> I totally agree that I abused of unsafePerformIO, in this case I used >> for the same reason I gave before. > > I think an IO (String -> Bool) approach could maybe still work > because you can grab the (String -> Bool) function within IO, and > then use it within a guard. > > You'll have to decide if minimising the unsafePerformIO is worth the > extra code complexity or not. I tend to say it is, but others may > disagree with me. I'm totally against use of unsafePerformIO simply to reduce local code complexity. Where we use it, it's to deal with _global_ issues like needing to thread state or log messages everywhere. Ganesh =============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ===============================================================================
msg11895 (view)	Author: abuiles	Date: 2010-07-28.13:47:46
1 patch for repository http://darcs.net: Wed Jul 28 08:07:31 COT 2010 builes.adolfo@googlemail.com * Resolve issue 1599: automatically expire unused caches Attachments -resolve-issue-1599_-automatically-expire-unused-caches.dpatch unnamed
msg11898 (view)	Author: kowey	Date: 2010-07-28.17:46:38
Just some quick English and Hlinting (not a proper review yet) Resolve issue 1599: automatically expire unused caches ------------------------------------------------------- > + hPutStrLn stderr $ "\nI could not reach the following " ++ > + englishNum (length sources) (Noun "repository") " (listed in _darcs/prefs/sources):" > + hPutStrLn stderr $ (unlines sources) ++ "\nIt is recommendable to delete the " ++ > + englishNum (length sources) (Noun "entry") " listed aboved if not used" How about this? I could not reach the repositories: foo bar baz quux If you're not using them, you should probably delete the the corresponding entries from _darcs/prefs/sources > +isBadSource :: IO (String -> Bool) > +isBadSource = do badSources <- getBadSourcesList > + return (`elem` badSources) Thanks for getting rid of the needless unsafePerformIO > +checkCacheReachability :: String -> CacheLoc -> IO () > +checkCacheReachability e cache > + \| isFile (source cache) = do > + reachable <- isReachableSource > + if not $ reachable (source cache) > + then > + do > + exist <- doesDirectoryExist $ source cache > + when (not exist) $ addBadSource $ source cache > + when exist $ addReachableSource $ source cache > + else > + return () I think you want unless (reachable (source cache)) $ do You could also maybe refactor this a little bit with a helper function that takes the (String -> Bool) function as an argument. Not sure if that would be a good thing. > + else if reachable $ source cache > + then return () > + else see above > + \| otherwise = fail $ "unknown transport protocol for: " ++ source cache those spaces are bit superfluous Overall, I think your code is looking a lot cleaner now! -- Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow> For a faster response, please try +44 (0)1273 64 2905.
msg11899 (view)	Author: abuiles	Date: 2010-07-28.20:55:19
1 patch for repository http://darcs.net: Wed Jul 28 15:37:43 COT 2010 builes.adolfo@googlemail.com * Resolve issue 1599: automatically expire unused caches Attachments resolve-issue-1599_-automatically-expire-unused-caches.dpatch unnamed
msg11905 (view)	Author: kowey	Date: 2010-08-01.18:33:09
Sorry, Adolfo, I've been remiss in my reviewing duty. Anyway, I've interspersed Eric-reads-the-code style comments with comments to the patch author. The latter are highlighted with REMARK Resolve issue 1599: automatically expire unused caches ====================================================== > + badSources <- getBadSourcesList > + when ( not $ null badSources ) $ reportBadSources badSources So here we're modifying the withAtExit function directly to report on the caches that we could not reach. REMARK: An alternative approach may be to register this as an exit action via atexit, which seems a bit cleaner/more modular to me. I guess the component that would register this would be the main function for darcs. On the other hand, I suppose one advantage of baking this into withAtExit is that we control the order of execution; this fires after all the actions have been run. Are there other reasons? If we don't care about the order of execution so much, I'd lean more towards the more modular approach > + reportBadSources sources = do > + hPutStderr $ "\nI could not reach the following " ++ > + englishNum (length sources) (Noun "repository") ":" > + hPutStderr $ (unlines sources) ++ "If you're not using " ++ itThem ( length sources) ++ > + ", you should probably delete\nthe corresponding " ++ > + englishNum (length sources) (Noun "entry") " from _darcs/prefs/sources." > + hPutStderr = hPutStrLn stderr > + itThem num = case num > 1 of > + False -> "it" > + True -> "them" REMARK: Nice touch on the pronoun. This could probably go into the English module. You could even make a Pronoun type which implements the Countable instance, which allows you to reuse englishNum. > +{- NOINLINE _badSourcesList -} > +_badSourcesList :: IORef [String] > +_badSourcesList = unsafePerformIO $ newIORef [] > + > +addBadSource :: String -> IO () > +addBadSource cache = modifyIORef _badSourcesList (cache:) > + > +getBadSourcesList :: IO [String] > +getBadSourcesList = readIORef _badSourcesList > + > +isBadSource :: IO (String -> Bool) > +isBadSource = do badSources <- getBadSourcesList > + return (`elem` badSources) Looks fine; similar for reachableSources > -data CacheLoc = Cache !CacheType !WritableOrNot !String > +data CacheLoc = Cache { cacheType:: !CacheType, writableOrNot:: !WritableOrNot, source:: !String } REMARK: I tend to prefix names of field accessors, which makes the names uglier but reduces the chances I'll need to qualify them later on, for example. data CacheLoc = Cache { cacheType:: !CacheType , cacheWritable :: !WritableOrNot , cacheSource:: !String } copyFileUsingCache ------------------ > - sfuc cache stickItHere > + badSource <- isBadSource > + sfuc cache stickItHere badSource > - sfuc [] _ = return () > - sfuc (c:cs) out \| not $ writable c = > + sfuc [] _ _ = return () > + sfuc (c:cs) out badSource \| not (badSource (source c)) && not (writable c) = > if oos == OnlySpeculate This new version of sfuc makes use of the list of unreachable sources. REMARK: Perhaps another approach you could take is to just apply a filter on the cache, maybe changing it to case filter (\c -> not (badSource c \|\| writable c) cs of [] -> return () (c:_) -> sfuc c stickItHere Personally, I'd just drop the guards and use an if-then-else for clarity > hunk ./src/Darcs/Repository/Cache.hs 222 > - then speculateFileOrUrl (fn c) out > + then speculateFileOrUrl (fn c) out `catchNonSignal` (\e -> > + checkCacheReachability (show e) c) > - else copyFileOrUrl [] (fn c) out Cachable > - \| otherwise = sfuc cs out > + else copyFileOrUrl [] (fn c) out Cachable `catchNonSignal` (\e -> > + checkCacheReachability (show e) c) OK, so the other change being that if something should go wrong when speculating or trying to copy a file, you check to see if the cache is alright. > + \| otherwise = sfuc cs out badSource REMARK: The indentation for this otherwise case is now a bit off, which makes the code harder to understand. But it would be a moot point if you went with the filter approach. checkCacheReachability ----------------------- > +-- \| Checks if a given cache entry is reachable or not. > +-- It receives an error caught during execution and the cache entry. > +-- For a local cache, if the given source doesn't exist anymore, it is added. > +-- For HTTP sources if the error is timeout, it is added, if not we check for the > +-- _darcs/hashed_inventory file, if doesn't exist it means we are pointing to a repository > +-- which used to exist there, but had been moved. > +-- For SSH if we get an error we try to get the file _darcs/hashed_inventory again, if it fails > +-- we add the entry to the list of sources which are no reachables. > +-- The entries which get added to the cache are no longer tried for the rest of the command. > +checkCacheReachability :: String -> CacheLoc -> IO () > +checkCacheReachability e cache > + \| isFile (source cache) = do > + reachable <- isReachableSource > + unless (reachable (source cache)) $ do > + exist <- doesDirectoryExist $ source cache > + unless exist $ addBadSource $ source cache > + when exist $ addReachableSource $ source cache If a source has already been whitelisted, we don't need to check it; otherwise, otherwise either whitelist or blacklist it depending on whether the directory exists. REMARK: I think you can share the source cache expression using a where clause which will be common to all the guards. REMARK: These last two cases look like they could be more simply expressed using an if-then-else if exist then addReachableSource (source cache) else addBadSource (source cache) > + \| isUrl (source cache) = do > + reachable <- isReachableSource > + let string = case dropWhile (/='(') e of > + (_:xs) -> fst (break (==')') xs) > + _ -> e > + let cerror = case reads string ::[(HTTP.ConnectionError,String)] of > + [(ce,_)] -> Just ce > + _ -> Nothing REMARK: We're still parsing error strings here, which seems rather fragile to me. Weren't we going to capture status codes from curl or something? It's an improvement at least that the error code parsing is in one place and not scattered throughout the darcs source. > + if cerror /= Nothing > + then addBadSource $ source cache > + else > + unless (reachable (source cache)) $ > + withTemp $ \tempout -> do > + let f = source cache ++ "/" ++darcsdir ++ "/" ++ "hashed_inventory" > + copyFileOrUrl [] f tempout Cachable > + addReachableSource $ source cache > + `catchNonSignal` (\_ -> addBadSource $ source cache) So the logic here is: If we get a connection error of some sort, then we add the cache to the blacklist, even if it was already previously whitelisted. If it's not a connection error that we got, and we have not already whitelisted the server, then we test for reachability by trying to fetch the hashed_inventory file. If the server is indeed reachable it is whitelisted so we don't bother checking again; otherwise, we add it to the blacklist so we know not to fetch from it again. REMARK: It seems like you have a situation where an item could be added to the blacklist even though it's already whitelisted. Is that OK? Could there be some sort of unpredictable behaviour when an entry is in both lists? Perhaps one way to think about this -- not necessarily the right way, mind you -- is that * If an entry is in the whitelist, it can be added to the blacklist (but then it's removed from the whitelist). This would apply to cases where a server goes down in the middle of a fetch. * If an entry is in the blacklist, and you try to add it to the whitelist, we have an error (bug in darcs...) REMARK: we're baking in the assumption that hashed repositories have a hashed_inventory file... hopefully this sort of baking it won't bite us in the long run. I guess there's not much we can do about this. REMARK: You may find the isJust function to be useful. > + \| isSsh (source cache) = > + withTemp $ \tempout -> do > + let f = source cache ++ "/" ++darcsdir ++ "/" ++ "hashed_inventory" > + copyFileOrUrl [] f tempout Cachable > + addReachableSource $ source cache > + `catchNonSignal` (\_ -> addBadSource $ source cache) REMARK: It looks here like you have some duplication with the HTTP code, maybe you could refactor this into a helper function like checkRemote. REMARK: You seem to have omitted the case where the cache entry is already whitelisted. (Seems like that logic could go into checkRemote) > + \| otherwise = fail $ "unknown transport protocol for: " ++ source cache fetchFileUsingCachePrivate -------------------------- REMARK: What's the difference between copyFileUsingCache and fetchFileUsingCachePrivate? I think it's useful if you know, since you're modifying these two functions. > - ffuc cache > + badSource <- isBadSource > + ffuc cache badSource > - `catchall` debugFail ("Couldn't fetch `"++f++"'\nin subdir "++(hashedDir subdir)++ > + `catchall` debugFail ("Couldn't fetch `"++f++"'\nin subdir "++ hashedDir subdir ++ REMARK: This should be in a separate patch. (Don't worry too much about this, just being super-picky about minimal patches) > hunk ./src/Darcs/Repository/Cache.hs 286 > - where ffuc (c:cs) > - \| not (writable c) && (Anywhere == fromWhere \|\| isFile (fn c)) = > + where ffuc (c:cs) badSource > + \| not (writable c) && (Anywhere == fromWhere \|\| isFile (fn c)) && not (badSource (source c)) = REMARK: Similar remark as with copyFileUsingCache. Maybe we could just filter out the bad sources and leave the ffuc logic intact. > do debugMessage $ "In fetchFileUsingCachePrivate I'm going manually" > debugMessage $ " getting "++f > debugMessage $ " from " ++ fn c > hunk ./src/Darcs/Repository/Cache.hs 299 > fail $ "Hash failure in " ++ fn c > return (fn c, x') > else return (fn c, x) -- FIXME: create links in caches > - `catchall` ffuc cs > + `catchNonSignal` (\e -> do > + checkCacheReachability (show e) c > + foo <- isBadSource OK, so if something goes wrong during fetchFileUsingCachePrivate, we need to checkCacheReachability again. This can modify the white and black lists so we need to pass an updated isBadSource. REMARK: wait a second, why does do we need to care about the updated isBadSource? It's not like sources reappear in the list, do they? REMARK: Actually naming your variables 'foo' is probably a bad idea (particularly since foo tends to be used casually to refer to some hypothetical variable, not an actual one) > - `catchall` do (fname,x) <- ffuc cs > - do createCache c subdir > - createLink fname (fn c) > - return (fn c, x) > - `catchall` > - do gzWriteFilePS (fn c) x `catchall` return () > - return (fname,x) > - \| otherwise = ffuc cs > + `catchNonSignal` (\ e -> > + do > + checkCacheReachability (show e) c > + foo <- isBadSource > + (fname,x) <- ffuc cs foo > + do createCache c subdir > + createLink fname (fn c) > + return (fn c, x) > + `catchall` > + do gzWriteFilePS (fn c) x `catchall` return () > + return (fname,x)) Just a similar change; if something goes wrong we update the black/white lists accordingly and try from the subsequent cache. REMARK: I'm still worried about this moving from catchall to catchNonSignal... Are we relying on any of these handlers doing something clever when the user kills darcs ? > - where handler' se = > + where handler' se = do REMARK: this noise does not belong in the patch :-) HTTP module ----------- > +data ConnectionError = CouldntResolveHost \| > + CouldntConnectToServer \| > + OperationTimeout > + deriving (Eq, Read, Show) REMARK: I would avoid the contraction here and call it CouldNotResolveHost, etc. I can't really say why; I guess some part of me feels it's clearer that way, or some part of me wants ADTs and haddocks to be fairly formal. > -waitNextUrl :: IO (String, String) > +waitNextUrl :: IO (String, String, Maybe ConnectionError) REMARK: It may be worth thinking about just replacing that second String with Maybe ConnectionError... (not sure how wise this is in practice) > #ifdef HAVE_HTTP > > hunk ./src/HTTP.hs 96 > waitNextUrl = do > (u, f) <- readIORef requestedUrl > if null u > - then return ("", "No URL requested") > + then return ("", "No URL requested", Nothing) > else do writeIORef requestedUrl ("", "") > e <- (fetchUrl u >>= \s -> B.writeFile f (BC.pack s) >> return "") `catch` h > hunk ./src/HTTP.hs 99 > - return (u, e) > + let ce = case e of > + "timeout" -> Just OperationTimeout > + _ -> Nothing It looks like this e string is always an error. REMARK: I guess there's not much more than trust the strings here. Have you tested this with the HTTP module? URL module ---------- Phew, this seems a bit confusing. I think the URL module contains both high-level code and curl-specific code which is #ifdef'ed out. Seems like the low-level Curl code should be moved to its own module. > + > + REMARK: another irrelevant change, grumble :-P > hunk ./src/URL.hs 223 > let l = pipeLength st > when (l > 0) $ do > dbg "URL.waitNextUrl start" > - (u, e) <- liftIO $ waitNextUrl' > + (u, e, ce) <- liftIO $ waitNextUrl' > let p = inProgress st > new_st = st { inProgress = Map.delete u p > , pipeLength = l - 1 } > hunk ./src/URL.hs 238 > else case Map.lookup u p of > Just (f, _, _) -> do > removeFileMayNotExist (f++"-new_"++randomJunk st) > - downloadComplete u e > + case ce of > + Just httpError -> downloadComplete u (show httpError) > + Nothing -> downloadComplete u e > debugMessage $ "URL.waitNextUrl failed: "++ > u++" "++f++" "++e > Nothing -> bug $ "Another possible bug in URL.waitNextUrl: "++u++" "++e This is the high-level waitNextUrl, extended to make use of high-level errors or fall back to string type errors in our reporting. REMARK: does this mean the two cannot co-exist? If not, the type signature of waitNextUrl' could probably reflect somehow, maybe using Either or another type like HttpStatus = HttpOK \| HttpConnectionError ConnectionError \| HttpOtherError String > - unless (null e) (debugFail $ "Failed to download URL "++u++": "++e) > + unless (null e) $ do > + debugMessage $ "Failed to download URL "++u++": "++e > + fail e REMARK: irrelevant change (I understand this may be a bit of a pain, but any effort you could put into making your patches easy to understand pays off) darcs revert and darcs record are interactive... (although if you find them a bit of a pain to use, there could a UI discussion there...) > -waitNextUrl' :: IO (String, String) > +waitNextUrl' :: IO (String, String, Maybe ConnectionError) This looks like the curl version of waitNextUrl' > waitNextUrl' = do > e <- curl_wait_next_url >>= peekCString > + ce <- if not (null e) > + then do > + errorNum <- curl_last_error_num > + case errorNum of > + 6 -> return $ Just CouldntResolveHost > + 7 -> return $ Just CouldntConnectToServer > + 29 -> return $ Just OperationTimeout > + _ -> return Nothing > + else > + return Nothing Hmm, I'll take this on faith. REMARK: potential minor refactor return $ case errorNum of 6 -> Just CouldntResolveHost 7 -> Just CouldntConnectToServer 29 -> Just OperationTimeout _ -> return Nothing > +foreign import ccall "hscurl.h curl_last_error_num" > + curl_last_error_num :: IO CInt Exciting; I keep avoiding the FFI for some reason. hscurl.c -------- > +static CURLcode errornum = 0; > > const char *curl_wait_next_url() > { > + errornum = -1; > + > > - if (result != CURLE_OK) > + if (result != CURLE_OK){ > + errornum = result; > return curl_easy_strerror(result); > + } REMARK: Hmm, this sort of stateful way of doing things tends to make me nervous (also, do we need to worry about what happens when you call wait_next_url several times in parallel?). Could we not return a tuple instead? I guess we'd have to find a way to deal with structs in the FFI? -- Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow> For a faster response, please try +44 (0)1273 64 2905.
msg11906 (view)	Author: kowey	Date: 2010-08-01.18:37:41
On Sun, Aug 01, 2010 at 18:33:11 +0000, Eric Kow wrote: > REMARK: Hmm, this sort of stateful way of doing things tends to make me > nervous (also, do we need to worry about what happens when you call > wait_next_url several times in parallel?). Could we not return a tuple > instead? I guess we'd have to find a way to deal with structs in the > FFI? I cut out too much context here. I was referring to this change: +const int curl_last_error_num(); which is how we grab the error code from our last attempt to fetch a URL. I was asking if there is a way we could somehow return this in the fetch function. Solutions I can think of with my non-C-programmer-limited brain () are some kind of struct or maybe passing in the error code as a pointer... Maybe somebody wiser can chime in. (*) Yeah yeah, kids these days... -- Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow> For a faster response, please try +44 (0)1273 64 2905.
msg11921 (view)	Author: abuiles	Date: 2010-08-03.05:50:38
Hi Eric, I'm almost done write the modifications with the patch, I have a small issue though, with: > > REMARK: An alternative approach may be to register this as an exit > action via atexit, which seems a bit cleaner/more modular to me. I > guess the component that would register this would be the main function > for darcs. On the other hand, I suppose one advantage of baking this > into withAtExit is that we control the order of execution; this fires > after all the actions have been run. Are there other reasons? If we > don't care about the order of execution so much, I'd lean more towards > the more modular approach > I register the action using atexit, and it works, but happens that it calls the action twice, so I end with the message repeated. This is the part where I added the atexit hunk ./src/darcs.hs 54 + atexit reportBadSources hunk ./src/darcs.hs 77 - + where + reportBadSources = do + sources <- getBadSourcesList + when ( not $ null sources ) $ do + hPutStderr $ "\nI could not reach the following " ++ + englishNum (length sources) (Noun "repository") ":" + hPutStderr $ (unlines sources) ++ "If you're not using " ++ + englishNum (length sources) Pronoun ", you should probably delete\nthe corresponding " ++ + englishNum (length sources) (Noun "entry") " from _darcs/prefs/sources." + hPutStderr = hPutStrLn stderr Maybe any idea what is going on and why the action gets called twice ?. -- Adolfo
msg11923 (view)	Author: kowey	Date: 2010-08-03.10:25:32
On Tue, Aug 03, 2010 at 00:53:18 -0500, Adolfo Builes wrote: > I register the action using atexit, and it works, but happens that it > calls the action twice, so I end with the message repeated. Perhaps it's being registered more than once? To debug this, you could modify atexit to take a string along with an action and count up the number of instances of each string. -- Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow> For a faster response, please try +44 (0)1273 64 2905.
msg11926 (view)	Author: abuiles	Date: 2010-08-03.14:32:49
1 patch for repository http://darcs.net: Tue Aug 3 09:15:16 COT 2010 builes.adolfo@googlemail.com * Resolve issue 1599: automatically expire unused caches Attachments resolve-issue-1599_-automatically-expire-unused-caches.dpatch unnamed
msg11975 (view)	Author: kowey	Date: 2010-08-05.13:37:59
Hi Adolfo, OK! Only more more iteration with me. In my opinion, what we should after your next revision (sorry! it's mostly nitpicks, and feel free to ignore me if I'm saying things which are evidently stupid or just subjective), we should send your bundle to Somebody Else to review for correctness. I'm bound to have missed the forest for the trees somewhere. We've seen most of this code before, so I'll only comment on what I think are the new bits (or any old bits that may need modifying still) Resolve issue 1599: automatically expire unused caches ------------------------------------------------------ > - sfuc cache stickItHere > + badSource <- isBadSource > + let cacheFiltered = filter (\ cacheEntry -> not . badSource $ cacheSource cacheEntry ) cache > + sfuc cacheFiltered stickItHere FIXME: This is one place that point-free style could produce nicer code > - sfuc (c:cs) out \| not $ writable c = > > - \| otherwise = sfuc cs out > + sfuc (c:cs) out > + \| not (writable c) = ... > + \| otherwise = sfuc cs out I won't really grumble too much here about the irrelevant change since it seems to be partly a consequence of your other work. Improving code style is a good thing; I just try to encourage it in separate patches when I can. > - then speculateFileOrUrl (fn c) out > - else copyFileOrUrl [] (fn c) out Cachable > + then speculateFileOrUrl (fn c) out `catchNonSignal` (\e -> checkCacheReachability (show e) c) > + else copyFileOrUrl [] (fn c) out Cachable `catchNonSignal` (\e -> checkCacheReachability (show e) c) [note, whitespace chopped for review] FIXME: Please amend out the conflict with Petr's DefaultDarcsRepo change. Old news on catching the copy/speculate failure > +-- \| Checks if a given cache entry is reachable or not. > +-- It receives an error caught during execution and the cache entry. > +-- For a local cache, if the given source doesn't exist anymore, it is added. > +-- For HTTP sources if the error is timeout, it is added, if not we check for the > +-- _darcs/hashed_inventory file, if doesn't exist it means we are pointing to a repository > +-- which used to exist there, but had been moved. > > +-- For SSH if we get an error we try to get the file _darcs/hashed_inventory again, if it fails > +-- we add the entry to the list of sources which are no reachables. > +-- The entries which get added to the cache are no longer tried for the rest of the command. FIXME: The haddock above is a bit ambiguous. Added where? You could probably say blacklisted and/or whitelisted. Also, making each of these into bullet points could make it a bit easier to read. > +checkCacheReachability :: String -> CacheLoc -> IO () > +checkCacheReachability e cache > + \| isUrl source = do > + reachable <- isReachableSource > + unless (reachable source) $ do > + let string = case dropWhile (/='(') e of > + (_:xs) -> fst (break (==')') xs) > + _ -> e > + let cerror = case reads string ::[(HTTP.ConnectionError,String)] of > + [(ce,_)] -> Just ce > + _ -> Nothing > + if isJust cerror > + then addBadSource source > + else do > + checkHashedInventoryReachability cache > + addReachableSource source > + `catchNonSignal` (\_ -> addBadSource source) ... > +-- \| Checks if the _darcs/hashed_inventory exist and is reachable > +checkHashedInventoryReachability :: CacheLoc -> IO () > +checkHashedInventoryReachability cache = > + withTemp $ \tempout -> do > + let f = cacheSource cache ++ "/" ++darcsdir ++ "/" ++ "hashed_inventory" > + copyFileOrUrl [] f tempout Cachable FIXME: I see you've done a bit of refactoring. FIXME I think you could make it a bit better. How about returning IO Bool instead? The idea is that you'd move the exception handling into here and the code above could just test a conditional. Code clarity is the number one thing I tend to go after (unfortunately this does not necessarily mean I'm any good at it). It's why I seem to be quite insistent on minor details like that; I think they pay off in the long run especially when working in a team. > - ffuc cache > - `catchall` debugFail ("Couldn't fetch `"++f++"'\nin subdir "++(hashedDir subdir)++ > + badSource <- isBadSource > + let cacheFiltered = filter (\ cacheEntry -> not . badSource $ cacheSource cacheEntry ) cache > + ffuc cacheFiltered > + `catchall` debugFail ("Couldn't fetch `"++f++"'\nin subdir "++ hashedDir subdir ++ > " from sources:\n\n"++show (Ca cache)) FIXME: sounds like you could do another pointfree refactor here By the way, notice how much a little higher-order function like filter lets us do without changing very much code? Just compose some stuff together, feed it to filter and you're done. > - `catchall` ffuc cs > + `catchNonSignal` (\e -> do > + checkCacheReachability (show e) c > + badSource <- isBadSource > + let cacheFiltered = filter (\ cacheEntry -> not . badSource $ cacheSource cacheEntry ) cs > + ffuc cacheFiltered) When I complained about catchall and catchNonSignal you pointed out that x `catchall` y is actually just x `catchNonSignal` const y As an aside, this kinda makes me think that maybe catchall isn't /really/ needed, but I could be wrong. FIXME: you're using filter (not . badSource . cacheSource) quite a lot. Maybe what you want is something like this filterOutBadSources :: [Cache] -> IO [Cache] filterOutBadSources cs = do badSource <- isBadSource return $ filter (not . badSource . cacheSource) cs HTTP module ~~~~~~~~~~~ > -waitNextUrl' = do > - e <- curl_wait_next_url >>= peekCString > - u <- curl_last_url >>= peekCString > - return (u, e) > +waitNextUrl' = > + alloca $ \ errorPointer -> do > + e <- curl_wait_next_url errorPointer >>= peekCString > + ce <- if not (null e) > + then do > + errorNum <- peek errorPointer > + case errorNum of > + 6 -> return $ Just CouldNotResolveHost > + 7 -> return $ Just CouldNotConnectToServer > + 29 -> return $ Just OperationTimeout > + _ -> return Nothing > + else > + return Nothing > + u <- curl_last_url >>= peekCString > + return (u, e, ce) FIXME: You should have a comment pointing us to some documentation on the error codes. As a chatty aside, I like bracket functions like alloca which prevent forgetful people like me from introducing random memory leaks for want of a free and also avoid bugs like me accidentally freeing something I shouldn't. I think we have similar idioms in Darcs.Lock. Sort of an inane comment on my part :-) > foreign import ccall "hscurl.h curl_wait_next_url" > - curl_wait_next_url :: IO CString > + curl_wait_next_url :: Ptr CInt -> IO CString Thanks for teaching me a little more FFI > main = withAtexit $ withSignalsHandled $ > flip catch execExceptionHandler $ > handle (\(AssertionFailed e) -> bug e) $ do > + atexit reportBadSources I think this is much cleaner than modifying Darcs.Global. > + reportBadSources = do > + sources <- getBadSourcesList > + when ( not $ null sources ) $ do FIXME: Clearer as unless (null sources), I think > + hPutStderr $ "\nI could not reach the following " ++ > + englishNum (length sources) (Noun "repository") ":" > + hPutStderr $ (unlines sources) ++ "If you're not using " ++ > + englishNum (length sources) It ", you should probably delete\nthe corresponding " ++ > + englishNum (length sources) (Noun "entry") " from _darcs/prefs/sources." Not sure if this really belongs in main, maybe in one of the Cache modules? I'm not the best person to judge. FIXME: Maybe worth rewriting this using only one hPutStrLn stderr plus some string concatenation. Concat may also be helpful here. Basically, just try to make the code as easy to read as possible. It's a bit subjective though, but I bet there are things you can do which are objectively nicer. > + hPutStderr = hPutStrLn stderr FIXME: Ahah, for all my carrying on about refactoring, I actually think this is one of those nuanced cases where the code duplication of just saying hPutStrLn stderr is clearer than asking people to keep track of an extra name, particularly people who have to maintain the code. Subtle, huh?! You duplicate code and Eric complains. You refactor code and then he complains some more! Can't win... On the other hand, maybe I'd refactor the length sources if I were you. > int error = curl_easy_getinfo(easy, CURLINFO_PRIVATE, (char *)&url_data); > - if (error != CURLE_OK) > + if (error != CURLE_OK){ The if (foo) bar; style always seems to me like a clarity error wrt if (foo) { bar; }... Good thing your change necessitates getting rid of it. > + errorCode = error; > + } -- Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow> For a faster response, please try +44 (0)1273 64 2905.
msg11985 (view)	Author: abuiles	Date: 2010-08-06.06:41:02
1 patch for repository http://darcs.net: Fri Aug 6 01:27:53 COT 2010 builes.adolfo@googlemail.com * Resolve issue 1599: automatically expire unused caches Attachments resolve-issue-1599_-automatically-expire-unused-caches.dpatch unnamed
msg11986 (view)	Author: kowey	Date: 2010-08-06.08:08:47
Hi Florent, This patch bundle has been through a few kowey-lints. But I suspect that in all my tweaking, I may have missed the forest for the trees. May I hand this over to you? (let me know if you're still catching up from holidays, or otherwise are not quite available yet)
msg11987 (view)	Author: kowey	Date: 2010-08-06.08:50:00
Oh, Petr says he's got this one
msg12009 (view)	Author: mornfall	Date: 2010-08-06.21:13:41
Hi, since Eric requested a parallel review, let's go for it. Overall, I am of the opinion that the patch could be pushed as it is, with a few things fixed in a followup patch. I am pointing out some possible stylistic cleanups below, and some refactoring opportunities. I'll be happier if you pick up on those (be it as a followup or through amendment), although I am not going to be terribly unhappy if the patch stays as it is. Yours, Petr. Resolve issue 1599: automatically expire unused caches ------------------------------------------------------ [export/import stuff] > hunk ./src/Darcs/Global.hs 154 > resetCRCWarnings :: IO () > resetCRCWarnings = writeIORef _crcWarningList [] > > +{- NOINLINE _badSourcesList -} > +_badSourcesList :: IORef [String] > +_badSourcesList = unsafePerformIO $ newIORef [] > + > +addBadSource :: String -> IO () > +addBadSource cache = modifyIORef _badSourcesList (cache:) > + > +getBadSourcesList :: IO [String] > +getBadSourcesList = readIORef _badSourcesList > + > +isBadSource :: IO (String -> Bool) > +isBadSource = do badSources <- getBadSourcesList > + return (`elem` badSources) > + > +{- NOINLINE _reachableSourcesList -} > +_reachableSourcesList :: IORef [String] > +_reachableSourcesList = unsafePerformIO $ newIORef [] > + > +addReachableSource :: String -> IO () > +addReachableSource src = modifyIORef _reachableSourcesList (src:) > + > +getReachableSources :: IO [String] > +getReachableSources = readIORef _reachableSourcesList > + > +isReachableSource :: IO (String -> Bool) > +isReachableSource = do reachableSources <- getReachableSources > + return (`elem` reachableSources) > + > darcsdir :: String > darcsdir = "_darcs" Some global bookkeeping. Doesn't make me very happy, but for now it happens to be the best tradeoff we can get. [import/export wibbles for Darcs.Repository] [import/export bits of Darcs.Repository.Cache] > hunk ./src/Darcs/Repository/Cache.hs 63 > > data WritableOrNot = Writable \| NotWritable deriving ( Show ) > data CacheType = Repo \| Directory deriving ( Eq, Show ) > -data CacheLoc = Cache !CacheType !WritableOrNot !String > +data CacheLoc = Cache { cacheType:: !CacheType, cacheWritable:: !WritableOrNot, cacheSource:: !String } > newtype Cache = Ca [CacheLoc] -- abstract type for hiding cache > > instance Eq CacheLoc where > hunk ./src/Darcs/Repository/Cache.hs 209 > do debugMessage $ "I'm doing copyFileUsingCache on "++(hashedDir subdir)++"/"++f > Just stickItHere <- cacheLoc cache > createDirectoryIfMissing False (reverse $ dropWhile (/='/') $ reverse stickItHere) > - sfuc cache stickItHere > + cacheFiltered <- filterBadSources cache > + sfuc cacheFiltered stickItHere Filter the cache. I'd probably write filterBadSources cache >>= (flip sfuc) stickItHere (unless you need the cacheFiltered identifier somewhere later, but I don't see that) This could go in as an extra patch, if you feel like changing it. > `catchall` return () > where cacheLoc [] = return Nothing > cacheLoc (c:cs) \| not $ writable c = cacheLoc cs > hunk ./src/Darcs/Repository/Cache.hs 221 > case othercache of Just x -> return $ Just x > Nothing -> return $ Just (fn c) > sfuc [] _ = return () > - sfuc (c:cs) out \| not $ writable c = > + sfuc (c:cs) out > + \| not (writable c) = > if oos == OnlySpeculate > hunk ./src/Darcs/Repository/Cache.hs 224 > - then speculateFileOrUrl (fn c) out > - else copyFileOrUrl DefaultRemoteDarcs (fn c) out Cachable > - \| otherwise = sfuc cs out > + then speculateFileOrUrl (fn c) out `catchNonSignal` (\e -> checkCacheReachability (show e) c) > + else copyFileOrUrl DefaultRemoteDarcs (fn c) out Cachable `catchNonSignal` (\e -> checkCacheReachability (show e) c) > + \| otherwise = sfuc cs out OK. Some more checking of source validity. > fn c = hashedFilePath c subdir f > > copyFilesUsingCache :: OrOnlySpeculate -> Cache -> HashedDir -> [String] -> IO () > hunk ./src/Darcs/Repository/Cache.hs 236 > > data FromWhere = LocalOnly \| Anywhere deriving ( Eq ) > > +-- \| Checks if a given cache entry is reachable or not. > +-- It receives an error caught during execution and the cache entry. > +-- If the caches is not reachable it is blacklisted and not longer tried for > +-- the rest of the session. If it is reachable it is whitelisted and future errors with such > +-- cache get ignore. > +-- To determine reachability: > +-- * For a local cache, if the given source doesn't exist anymore, it is blacklisted. > +-- * For remote sources if the error is timeout, it is blacklisted, if not, > +-- it checks if _darcs/hashed_inventory exist, if it does, the entry is whitelisted, if > +-- it doesn't, it is blacklisted. > +checkCacheReachability :: String -> CacheLoc -> IO () > +checkCacheReachability e cache > + \| isFile source = do > + reachable <- isReachableSource > + unless (reachable source) $ do > + exist <- doesDirectoryExist source > + if exist > + then > + addReachableSource source > + else > + addBadSource source > + \| isUrl source = do > + reachable <- isReachableSource > + unless (reachable source) $ do > + let string = case dropWhile (/='(') e of > + (_:xs) -> fst (break (==')') xs) > + _ -> e > + let cerror = case reads string ::[(HTTP.ConnectionError,String)] of > + [(ce,_)] -> Just ce > + _ -> Nothing > + if isJust cerror > + then addBadSource source > + else checkFileReachability > + > + \| isSsh source = do > + reachable <- isReachableSource > + unless (reachable source) checkFileReachability > + > + \| otherwise = fail $ "unknown transport protocol for: " ++ source > + where source = cacheSource cache > + checkFileReachability = do > + reachable <- checkHashedInventoryReachability cache > + if reachable > + then > + addReachableSource source > + else > + addBadSource source OK. I am not exactly excited about the abuse of String here, although IIRC you have discussed that with Eric already. I won't dither on this too much. > +-- \| Returns a list of reachables cache entries, > +-- taking out the blacklisted entries > +filterBadSources :: [CacheLoc] -> IO [CacheLoc] > +filterBadSources cache = do > + badSource <- isBadSource > + return $ filter (not . badSource . cacheSource) cache OK. > +-- \| Checks if the _darcs/hashed_inventory exist and is reachable > +checkHashedInventoryReachability :: CacheLoc -> IO Bool > +checkHashedInventoryReachability cache = > + withTemp $ \tempout -> do > + let f = cacheSource cache ++ "/" ++darcsdir ++ "/" ++ "hashed_inventory" > + copyFileOrUrl DefaultRemoteDarcs f tempout Cachable > + return True > + `catchNonSignal` (\_ -> return False) You could use System.FilePath.Posix ( (</>) ) above (note the _Posix_ there). Other than that, OK. > fetchFileUsingCachePrivate :: FromWhere -> Cache -> HashedDir -> String -> IO (String, B.ByteString) > fetchFileUsingCachePrivate fromWhere (Ca cache) subdir f = > do when (fromWhere == Anywhere) $ copyFileUsingCache ActuallyCopy (Ca cache) subdir f > hunk ./src/Darcs/Repository/Cache.hs 303 > - ffuc cache > - `catchall` debugFail ("Couldn't fetch `"++f++"'\nin subdir "++(hashedDir subdir)++ > + cacheFiltered <- filterBadSources cache > + ffuc cacheFiltered > + `catchall` debugFail ("Couldn't fetch `"++f++"'\nin subdir "++ hashedDir subdir ++ > " from sources:\n\n"++show (Ca cache)) OK. Filter the cache. Why not (filterBadSources cache >>= ffuc)? > where ffuc (c:cs) > \| not (writable c) && (Anywhere == fromWhere \|\| isFile (fn c)) = > hunk ./src/Darcs/Repository/Cache.hs 320 > fail $ "Hash failure in " ++ fn c > return (fn c, x') > else return (fn c, x) -- FIXME: create links in caches > - `catchall` ffuc cs > + `catchNonSignal` (\e -> do > + checkCacheReachability (show e) c > + cacheFiltered <- filterBadSources cs > + ffuc cacheFiltered) Another place to check and filter the cache. Makes me wonder if you wouldn't like to write something like checkCaches (show e) c cs >>= ffuc instead... > > \| writable c = > do x1 <- gzFetchFilePS (fn c) Cachable > hunk ./src/Darcs/Repository/Cache.hs 337 > else return x1 > mapM_ (tryLinking (fn c)) cs > return (fn c, x) > - `catchall` do (fname,x) <- ffuc cs > - do createCache c subdir > - createLink fname (fn c) > - return (fn c, x) > - `catchall` > - do gzWriteFilePS (fn c) x `catchall` return () > - return (fname,x) > + `catchNonSignal` (\ e -> > + do > + checkCacheReachability (show e) c > + cacheFiltered <- filterBadSources cs > + (fname,x) <- ffuc cacheFiltered > + do createCache c subdir > + createLink fname (fn c) > + return (fn c, x) > + `catchall` > + do gzWriteFilePS (fn c) x `catchall` return () > + return (fname,x)) Looks equivalent, plus fixes up the cache list on errors plus filters the caches some more. May I add that fetchFileUsingCache is in a dire need of refactor? If you have a bit of time later, I'd welcome that. > \| otherwise = ffuc cs > > ffuc [] = debugFail $ "No sources from which to fetch file `"++f++"'\n"++ show (Ca cache) > hunk ./src/Darcs/Repository/Cache.hs 409 > +-- \| Prints an error message with a list of bad caches. > +reportBadSources :: IO () > +reportBadSources = do > + sources <- getBadSourcesList > + let size = length sources > + unless (null sources) $ do > + hPutStrLn stderr $ > + concat ["\nI could not reach the following ", > + englishNum size (Noun "repository") ":"] > + hPutStrLn stderr $ > + concat [unlines sources, > + "If you're not using ", > + englishNum size It ", you should probably delete\nthe corresponding ", > + englishNum size (Noun "entry") " from _darcs/prefs/sources."] (I know Eric made you use concat here... I'd go with unlines personally. You can do this in an extra patch if you want to, or just forget it otherwise.) [export] > hunk ./src/HTTP.hs 22 > import qualified Data.ByteString.Char8 as BC > #endif > > +data ConnectionError = CouldNotResolveHost \| > + CouldNotConnectToServer \| > + OperationTimeout > + deriving (Eq, Read, Show) Enumerate some common connection errors for use with above. > + > fetchUrl :: String -> IO String > postUrl > :: String -- ^ url > hunk ./src/HTTP.hs 35 > -> IO () -- ^ result > > requestUrl :: String -> FilePath -> a -> IO String > -waitNextUrl :: IO (String, String) > +waitNextUrl :: IO (String, String, Maybe ConnectionError) This Maybe ConnectionError feels a bit backwards, but OK. > #ifdef HAVE_HTTP > > hunk ./src/HTTP.hs 94 > waitNextUrl = do > (u, f) <- readIORef requestedUrl > if null u > - then return ("", "No URL requested") > + then return ("", "No URL requested", Nothing) > else do writeIORef requestedUrl ("", "") > e <- (fetchUrl u >>= \s -> B.writeFile f (BC.pack s) >> return "") `catch` h > hunk ./src/HTTP.hs 97 > - return (u, e) > + let ce = case e of > + "timeout" -> Just OperationTimeout > + _ -> Nothing > + return (u, e, ce) Extract a possible connection error. OK. > where h = return . ioeGetErrorString > > getProxy :: IO String [imports again] > hunk ./src/URL.hs 225 > let l = pipeLength st > when (l > 0) $ do > dbg "URL.waitNextUrl start" > - (u, e) <- liftIO $ waitNextUrl' > + (u, e, ce) <- liftIO $ waitNextUrl' > let p = inProgress st > new_st = st { inProgress = Map.delete u p > , pipeLength = l - 1 } > hunk ./src/URL.hs 240 > else case Map.lookup u p of > Just (f, _, _) -> do > removeFileMayNotExist (f++"-new_"++randomJunk st) > - downloadComplete u e > + case ce of > + Just httpError -> downloadComplete u (show httpError) > + Nothing -> downloadComplete u e Pass errors from waitNextUrl down the pipe. OK. > debugMessage $ "URL.waitNextUrl failed: "++ > u++" "++f++" "++e > Nothing -> bug $ "Another possible bug in URL.waitNextUrl: "++u++" "++e > hunk ./src/URL.hs 265 > Just var -> do > e <- readMVar var > modifyMVar_ urlNotifications (return . (Map.delete u)) > - unless (null e) (debugFail $ "Failed to download URL "++u++": "++e) > + unless (null e) $ do > + debugMessage $ "Failed to download URL "++u++": "++e > + fail e Is this any different? > Nothing -> return () -- file was already downloaded > > dbg :: String -> StateT a IO () > hunk ./src/URL.hs 293 > > setDebugHTTP :: IO () > requestUrl :: String -> FilePath -> Cachable -> IO String > -waitNextUrl' :: IO (String, String) > +waitNextUrl' :: IO (String, String, Maybe ConnectionError) > pipeliningEnabled :: IO Bool > > #ifdef HAVE_CURL > hunk ./src/URL.hs 306 > err <- curl_request_url ustr fstr (cachableToInt cache) >>= peekCString > return err > > -waitNextUrl' = do > - e <- curl_wait_next_url >>= peekCString > - u <- curl_last_url >>= peekCString > - return (u, e) > +waitNextUrl' = > + bracket malloc free $ \ errorPointer -> do > + e <- curl_wait_next_url errorPointer >>= peekCString > + ce <- if not (null e) > + then do > + errorNum <- peek errorPointer > + case errorNum of > + 6 -> return $ Just CouldNotResolveHost > + 7 -> return $ Just CouldNotConnectToServer > + 29 -> return $ Just OperationTimeout > + _ -> return Nothing > + else > + return Nothing > + u <- curl_last_url >>= peekCString > + return (u, e, ce) Oi, ugly. Well, that's the destiny of FFI code... Just translates cURL error codes to something more malleable (ConnectionError). (I'd probably use alloca myself, but bracket malloc free is probably equivalent.) > pipeliningEnabled = do > r <- curl_pipelining_enabled > hunk ./src/URL.hs 330 > curl_request_url :: CString -> CString -> CInt -> IO CString > > foreign import ccall "hscurl.h curl_wait_next_url" > - curl_wait_next_url :: IO CString > + curl_wait_next_url :: Ptr CInt -> IO CString > > foreign import ccall "hscurl.h curl_last_url" > curl_last_url :: IO CString FFI bits. OK. [imports for darcs.hs] > hunk ./src/darcs.hs 53 > main = withAtexit $ withSignalsHandled $ > flip catch execExceptionHandler $ > handle (\(AssertionFailed e) -> bug e) $ do > + atexit reportBadSources > argv <- getArgs > case argv of > -- User called "darcs" without arguments. [OK] Register bad sources reporting to run at exit time. > hunk ./src/hscurl.c 253 > return error_strings[RESULT_OK]; > } > > -const char curl_wait_next_url() > +const char curl_wait_next_url(int* errorCode) > { > hunk ./src/hscurl.c 255 > + errorCode = -1; > + > if (last_url != NULL) > { > free(last_url); > hunk ./src/hscurl.c 280 > CURLcode result = msg->data.result; > struct UrlData url_data; > int error = curl_easy_getinfo(easy, CURLINFO_PRIVATE, (char *)&url_data); > - if (error != CURLE_OK) > + if (error != CURLE_OK){ Iew, that ){ is hurting my eyes. A bit of whitespace couldn't hurt! :) > + errorCode = error; > return curl_easy_strerror(error); > hunk ./src/hscurl.c 283 > + } > > last_url = url_data->url; > fclose(url_data->file); > hunk ./src/hscurl.c 295 > return curl_multi_strerror(error); > curl_easy_cleanup(easy); > > - if (result != CURLE_OK) > + if (result != CURLE_OK){ > + errorCode = result; > return curl_easy_strerror(result); > hunk ./src/hscurl.c 298 > + } > } > else > return error_strings[RESULT_UNKNOWN_MESSAGE]; > hunk ./src/hscurl.h 5 > const char filename, > int cache_time); > > -const char curl_wait_next_url(); > +const char curl_wait_next_url(int errorCode); > > const char curl_last_url(); OK.
msg12010 (view)	Author: abuiles	Date: 2010-08-07.00:32:03
Hi Petr, Thanks for the review !.. I'm amending the stylistic things and resending. waitNextUrl: "++u++" "++e >> hunk ./src/URL.hs 265 >> Just var -> do >> e <- readMVar var >> modifyMVar_ urlNotifications (return . (Map.delete u)) >> - unless (null e) (debugFail $ "Failed to download URL "++u++": "++e) >> + unless (null e) $ do >> + debugMessage $ "Failed to download URL "++u++": "++e >> + fail e > Is this any different? > The message of debugFail is all the string you pass it as parameter. -- Adolfo
msg12011 (view)	Author: abuiles	Date: 2010-08-07.00:41:37
1 patch for repository http://darcs.net: Fri Aug 6 19:33:39 COT 2010 builes.adolfo@googlemail.com * Resolve issue 1599: automatically expire unused caches Attachments resolve-issue-1599_-automatically-expire-unused-caches.dpatch unnamed
msg12021 (view)	Author: darcswatch	Date: 2010-08-07.13:42:16
This patch bundle (with 1 patches) was just applied to the repository http://darcs.net/. This message was brought to you by DarcsWatch http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-90eebb0527b9056e3ab6a71c4e2df7e296fd8866
msg14055 (view)	Author: darcswatch	Date: 2011-05-10.17:35:41
This patch bundle (with 1 patches) was just applied to the repository http://darcs.net/reviewed. This message was brought to you by DarcsWatch http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-90eebb0527b9056e3ab6a71c4e2df7e296fd8866

History
Date	User	Action	Args
2010-07-22 06:12:24	abuiles	create
2010-07-22 06:14:48	darcswatch	set	darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-a21fdcf51e31ec24feb7ff9c98bf3be16c0649b0
2010-07-22 15:47:57	kowey	set	nosy: + kowey messages: + msg11825
2010-07-22 18:08:14	mornfall	set	nosy: + mornfall messages: + msg11828
2010-07-23 01:35:58	abuiles	set	messages: + msg11831
2010-07-23 11:50:46	kowey	set	messages: + msg11837
2010-07-23 11:50:58	kowey	set	status: needs-review -> followup-in-progress
2010-07-23 11:51:15	kowey	set	assignedto: abuiles
2010-07-23 11:58:05	ganesh	set	nosy: + ganesh messages: + msg11839
2010-07-28 13:47:46	abuiles	set	files: + -resolve-issue-1599_-automatically-expire-unused-caches.dpatch, unnamed messages: + msg11895
2010-07-28 13:49:42	darcswatch	set	darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-a21fdcf51e31ec24feb7ff9c98bf3be16c0649b0 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-1092d9b218048d9df86a58a1dd51f87cec476246
2010-07-28 17:46:38	kowey	set	messages: + msg11898
2010-07-28 20:55:19	abuiles	set	files: + resolve-issue-1599_-automatically-expire-unused-caches.dpatch, unnamed messages: + msg11899
2010-07-28 21:00:05	darcswatch	set	darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-1092d9b218048d9df86a58a1dd51f87cec476246 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-886b1eeb38de006ebd09934e0cc42cb01d4192ad
2010-08-01 18:33:11	kowey	set	messages: + msg11905
2010-08-01 18:37:41	kowey	set	messages: + msg11906
2010-08-03 05:50:38	abuiles	set	messages: + msg11921
2010-08-03 10:25:32	kowey	set	messages: + msg11923
2010-08-03 14:32:49	abuiles	set	files: + resolve-issue-1599_-automatically-expire-unused-caches.dpatch, unnamed messages: + msg11926
2010-08-03 14:34:28	darcswatch	set	darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-886b1eeb38de006ebd09934e0cc42cb01d4192ad -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-9d53d8c4757b9f6f451ddcb9cd5d1112aa3f73e6
2010-08-05 13:38:00	kowey	set	status: followup-in-progress -> followup-requested messages: + msg11975
2010-08-06 06:41:02	abuiles	set	files: + resolve-issue-1599_-automatically-expire-unused-caches.dpatch, unnamed messages: + msg11985
2010-08-06 06:43:50	darcswatch	set	darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-9d53d8c4757b9f6f451ddcb9cd5d1112aa3f73e6 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-3cbfa82b84c0f9e7da37c676f5f7203bd88bf716
2010-08-06 08:08:48	kowey	set	status: followup-requested -> needs-review assignedto: abuiles -> galbolle messages: + msg11986 nosy: + galbolle
2010-08-06 08:50:00	kowey	set	assignedto: galbolle -> mornfall messages: + msg11987
2010-08-06 21:13:42	mornfall	set	messages: + msg12009
2010-08-07 00:32:03	abuiles	set	messages: + msg12010
2010-08-07 00:41:37	abuiles	set	files: + resolve-issue-1599_-automatically-expire-unused-caches.dpatch, unnamed messages: + msg12011
2010-08-07 00:43:46	darcswatch	set	darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-3cbfa82b84c0f9e7da37c676f5f7203bd88bf716 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-90eebb0527b9056e3ab6a71c4e2df7e296fd8866
2010-08-07 13:42:16	darcswatch	set	status: needs-review -> accepted messages: + msg12021
2011-05-10 17:35:41	darcswatch	set	messages: + msg14055
2011-05-10 17:36:13	darcswatch	set	darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_.html#bundle-90eebb0527b9056e3ab6a71c4e2df7e296fd8866 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-a21fdcf51e31ec24feb7ff9c98bf3be16c0649b0
2011-05-10 18:05:52	darcswatch	set	darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-a21fdcf51e31ec24feb7ff9c98bf3be16c0649b0 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-9d53d8c4757b9f6f451ddcb9cd5d1112aa3f73e6
2011-05-10 19:05:43	darcswatch	set	darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-9d53d8c4757b9f6f451ddcb9cd5d1112aa3f73e6 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-886b1eeb38de006ebd09934e0cc42cb01d4192ad
2011-05-10 19:37:40	darcswatch	set	darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-886b1eeb38de006ebd09934e0cc42cb01d4192ad -> http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-3cbfa82b84c0f9e7da37c676f5f7203bd88bf716
2011-05-10 22:07:23	darcswatch	set	darcswatchurl: http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-3cbfa82b84c0f9e7da37c676f5f7203bd88bf716 -> http://darcswatch.nomeata.de/repo_http:__darcs.net_reviewed.html#bundle-1092d9b218048d9df86a58a1dd51f87cec476246

Patch 310 Resolve issue 1599: automatically expire unused caches

Attachments

Attachments

Attachments

Attachments

Attachments

Attachments